An Overview of the Private Data Collection Feature in Hyperledger Fabric

1
4922

This article is for open source technology enthusiasts with prior experience in Hyperledger Fabric.

In cases where a group of organisations on a channel need to keep data private from other organisations on that channel, they have the option of creating a new channel comprising just the organisations that need access to the data. However, creating separate channels in each of these cases creates additional administrative overheads like maintaining chaincode versions, policies, MSPs, etc, and doesn’t allow for use cases that require all channel participants to see a transaction while keeping a portion of the data private. To fulfil this last requirement, in v1.2, Hyperledger Fabric offers the ability to create private data collections, which allow a defined subset of organisations on a channel the ability to endorse, commit, or query private data without having to create a separate channel.

Private data collection

A collection is a combination of two elements:

  • The actual private data, sent peer-to-peer using the gossip protocol to only the organisation(s) authorised to see it. This data is stored in a private database on the peer. The ordering service is not involved here and does not see the private data. Note that setting up gossip requires setting up anchor peers in order to bootstrap cross-organisation communication.
  • A hash of that data, which is endorsed, ordered, and written to the ledgers of every peer on the channel. The hash serves as evidence of the transaction and is used for state validation, and can be used for audit purposes.

Private data collection definition

A collection definition contains one or more collections, each having a policy definition listing the organisations in the collection, as well as properties used to control the dissemination of private data at endorsement time and optionally, whether the data will be purged.

Collection definitions are composed of six properties.

name: The name of the collection.

policy: The private data collection distribution policy defines which of the organisations’ peers are allowed to persist the collection data expressed using the signature policy syntax, with each member being included in an OR signature policy list. To support read/write transactions, the private data distribution policy must define a broader set of organisations than the chaincode endorsement policy, as peers must have the private data in order to endorse proposed transactions.

requiredPeerCount: This is the minimum number of peers (across authorised organisations) that each endorsing peer must successfully disseminate private data to, before the peer signs the endorsement and returns the proposal response back to the client. Requiring dissemination as a condition of endorsement will ensure that private data is available in the network even if the endorsing peer(s) become unavailable. When the requiredPeerCount is 0, it means that no distribution is required, but there may be some distribution if maxPeerCount is greater than zero.

maxPeerCount: For data redundancy purposes, this count is the maximum number of other peers (across authorised organisations) that each endorsing peer will attempt to distribute the private data to. If an endorsing peer becomes unavailable between endorsement time and commits time, other peers that are collection members but who did not yet receive the private data at endorsement time, will be able to pull the private data from peers the private data was disseminated to. If this value is set to 0, the private data is not disseminated at endorsement time, forcing private data pulls against endorsing peers on all authorised peers at commit time.

blockToLive: This represents how long the data should live on the private database in terms of blocks. The data will live for this specified number of blocks in the private database and after that it will get purged, making this data obsolete from the network. To keep private data indefinitely, that is, to never purge private data, set the blockToLive property to 0.

memberOnlyRead: A value of true indicates that peers automatically enforce that only clients belonging to one of the collection member organisations are allowed read access to private data. If a client from a non-member organisation attempts to execute a chaincode function that performs a read of a private data, the chaincode invocation is terminated with an error. Utilise the value of false if you would like to encode more granular access control within individual chaincode functions.

Here is a sample collection definition JSON file, containing an array of two collection definitions:

[

{

“name”: “collection1”,

“policy”: “OR(‘Org1MSP.member’, ‘Org2MSP.member’)”,

“requiredPeerCount”: 0,

“maxPeerCount”: 3,

“blockToLive”:1000000,

“memberOnlyRead”: true

},

{

“name”: “collectionPrivate”,

“policy”: “OR(‘Org1MSP.member’)”,

“requiredPeerCount”: 0,

“maxPeerCount”: 3,

“blockToLive”:3,

“memberOnlyRead”: true

}

]

Collection members may decide to share the private data with other parties if they get into a dispute or if they want to transfer the asset to a third party. The third party can then compute the hash of the private data and see if it matches the state on the channel ledger, proving that the state existed between the collection members at a certain point in time.

Using a collection within a channel versus a separate channel

  • Use channels when entire transactions (and ledgers) must be kept confidential within a set of organisations that are members of the channel.
  • Use collections when transactions (and ledgers) must be shared among a set of organisations, but when only a subset of those organisations should have access to some (or all) of the data within a transaction.

Additionally, since private data is disseminated peer-to-peer rather than via blocks, use private data collections when transaction data must be kept confidential from ordering service nodes.

Endorsement

Since private data is not included in the transactions that get submitted to the ordering service, and therefore is not included in the blocks that get distributed to all peers in a channel, the endorsing peer plays an important role in disseminating private data to other peers of authorised organisations. This ensures the availability of private data in the channel’s collection, even if endorsing peers become unavailable after their endorsement. To assist with this dissemination, the maxPeerCount and requiredPeerCount properties in the collection definition control the degree of dissemination at endorsement time.

If the endorsing peer cannot successfully disseminate the private data to at least the requiredPeerCount, it will return an error back to the client. The endorsing peer will attempt to disseminate the private data to peers of different organisations, in an effort to ensure that each authorised organisation has a copy of the private data. Since transactions are not committed at chaincode execution time, the endorsing peer and recipient peers store a copy of the private data in a local transient store alongside their blockchain until the transaction is committed.

Committing private data

When authorised peers do not have a copy of the private data in their transient data store at commit time, either because they were not endorsing peers or because they did not receive the private data via dissemination at endorsement time, they will attempt to pull the private data from another authorised peer for a configurable amount of time, based on the peer property peer.gossip.pvtData.pullRetryThreshold in the peer configuration core.yaml file, Therefore, it is important to set the requiredPeerCount and maxPeerCount properties large enough to ensure the availability of private data in your channel.

Referencing collections from chaincode

A set of shim APIs are available for setting and retrieving private data. The same chaincode data operations can be applied to channel state data and private data, but in the case of private data, a collection name is specified along with the data in the chaincode APIs — for example, PutPrivateData(collection,key, value) and GetPrivateData(collection, key).

Passing private data in a chaincode proposal

Since the chaincode proposal gets stored on the blockchain, it is also important not to include private data in the main part of the chaincode proposal. A special field in the chaincode proposal called the transient field can be used to pass private data from the client (or data that chaincode will use to generate private data) to chaincode invocation on the peer. The chaincode can retrieve the transient field by calling the GetTransient() API. This transient field gets excluded from the channel transaction.

Access control for private data

Until version 1.3, access control to private data based on collection membership was enforced for peers only. Access control based on the organisation of the chaincode proposal submitter was required to be encoded in chaincode logic.

Querying private data

Private collection data can be queried just like normal channel data, using shim APIs:

GetPrivateDataByRange(collection, startKey, endKey string)

GetPrivateDataByPartialCompositeKey(collection, objectType string, keys []string)

And for the CouchDB state database, JSON content queries can be passed using the shim API:

GetPrivateDataQueryResult(collection, query string)

Upgrading a collection definition

If a collection is referenced by a chaincode, the chaincode will use the prior collection definition unless a new collection definition is specified at upgrade time. If a collection configuration is specified during the upgrade, a definition for each of the existing collections must be included, and you can add new collection definitions.

Collection updates become effective when a peer commits the block that contains the chaincode upgrade transaction.

Updates related to private data collections in Hyperledger Fabric V1.4 release

Querying updates: A collection configuration option memberOnlyRead can automatically enforce access control based on the organisation of the chaincode proposal submitter.

Reconciliation updates: A background process allows peers who are part of a collection to receive data they were entitled to receive but did not yet receive — because of a network failure, for example — by keeping track of private data that was ‘missing’ at the time of block commit. The peer will periodically attempt to fetch the private data from other collection member peers that are expected to have it.

1 COMMENT

  1. hi i want to know how to add collection_config.json file in startFabric.sh file i want to add collection in installed version 1.3 of hyperledger fabric i try i lot of tutorial but still there are appear alot of errors plz tell me how to add collection in startFabric.sh file i stuck in this problem since 4 days plz help me

LEAVE A REPLY

Please enter your comment!
Please enter your name here