Transaction tear-off

Transaction tear-off - corda

I am using Corda and in one of our use-cases, we need to limit the transaction information shared among the nodes (e.g. 4 parties) in the network.
The transaction state will contain sensitive data of other nodes but we need to limit this data to be accessed by authorised parties only. e.g. party A should not see data of Party B in the transaction state.
I looked at the Corda documentation and tumbled upon Transaction tear-off but I could not find any concrete implementation of this.
I would be really grateful, if someone can give me an examples of transaction tear-off implementation or point me to right direction if there is better approach to limit transaction state sharing among parties than using transaction tear-off.
Thanks in advance

Transaction tear-offs are typically used in the use case where you're using an oracle to attest to a fact within your transaction.
The idea being, you want to strictly limit the oracle to viewing only what is required in order for it to attest, so you filter out all other parts of the transaction.
Take a look at the following tutorial https://docs.corda.net/tutorial-tear-offs.html it contains the steps required to create a filtered transaction.
You can also take a look at the samples on https://www.corda.net/samples that make use of an oracle. In particular, if you look at the Options sample, the OptionIssueFlow creates a filtered transaction for the oracle to sign.
One other approach - would it matter if Party A could see the data of Party B if it didn't know it was Party B? If not, you could look at using confidential identities https://docs.corda.net/api-identity.html#confidential-identities

Related

Request StateStatus from a Notary

Is there any way a CorDapp can ask a Notary if a state has been consumed prior to using it in a Transaction?
Background:
I am testing FungibleToken’s that point to EvolvableTokenType’s. Eventually the EvolvableTokenType changes and holders of the tokens that are not Participants of the EvolvableTokenType end up with states in their vault that have been unknowingly consumed. When they try to execute a transaction involving these states the Notary will refuse to sign because it knows the states have been consumed.
I have written flows that will contact a Participant and request the missing state(s). However it would be more efficient if I could first ask the Notary if I need to do that (i.e. if the state hasn’t been consumed I don’t need to ask a participant for an update).

You could do this a couple ways.
For example, in your vault query, you could simply make sure to filter on StateStatus = UNCONSUMED. That's one way you can ensure this works the way you expect and you never get a state that won't fit your criteria.
check out this section of the docs on vaultQuery : https://docs.corda.net/docs/corda-os/4.7/api-vault-query.html#querycriteria-interface
The other way I'd recommend that might also work is just to include this filter in your contract verification for the transaction, but doing it at the flow level just catches the problem sooner rather than later.

Debezium initial data snapshot and related entities order

After the first launch Debezium will do initial data snapshot of the already existing data.
Let's say I have two tables - A and B. Table B have NOT NULL FK constraint on A. According to Debezium default approach - Debezium will create two separate Kafka topics for data from tables A and B.
In my understanding, there is a very big chance that I'll potentially try to create record in new table B while appropriate record A will not be present in the appropriate new table A. This way I'll run into constraint violation error.
Do I need to use some internal 3rd party buffer and organize the proper order of insert into the sink database by myself or there is some standard mechanism in Debezium in order to handle such situations?
For example - can I use Debezium Topic Routing https://debezium.io/documentation/reference/configuration/topic-routing.html in order to fix such issue? I can potentially configure Topic Routing to send all depended events (from tables A and B in my example above) to the same topic. In case of the Kafka topic with a single partition all events must be ordered in a correct way. Will it work and this way will I have a correct related entities order for initial snapshot data load?

The IBM IDR (Data Replication) Product solved this with a solution that allows for exactly once semantics and re-creates the ordering of operations within a transaction and ordering of transactions.
Kafka's built in exactly once features has some limitations beyond performance, you don't inherently get the transaction re-ordered by operation, which is important for things like applying with referential integrity constraints.
So in our product we have a proper and a poor man's way to solve the problem. The poor man's is to send all the data for all the tables to a single topic. Obviously this is sub-optimal, but our product will produce data in operation order from a single producer if you do this. You'd probably want idempotence to avoid batches showing up out of order.
Now the pro-level way to solve this is a feature called the TCC (Transactionally Consistent Consumer).
I'm not sure if you need an enterprise level solution performance and feature wise.
If this is a non-critical project you might find the following discussion useful in how we approach delivering the features your looking for.
https://www.confluent.io/kafka-summit-sf18/a-solution-for-leveraging-kafka-to-provide-end-to-end-acid-transactions/
And here's our docs on the feature for reference.
https://www.ibm.com/support/knowledgecenter/en/SSTRGZ_11.4.0/com.ibm.cdcdoc.cdckafka.doc/concepts/kafkatcc.html
That should give background as to why this problem is hard to solve and what goes into a solution hopefully.

How does corda verification work for transactions between multiple states?

I'm currently tying to make a CordApp That will be used for DVP, but I'm having trouble understanding some key concepts. For instance, I understand that Contracts apply to one type of state in particular. What I don't really get is if the contract validation logic should apply to only that state object or all the states that will be in the given transaction.
The typical example would be the issuance of a sell order :
The input of a transaction the state of the stock account of the issuer and the outputs are the sell Order and the modified stock account.
Basically my question is were do I do checks like : I don't sell more than I own, the sum of the number of stocks in the sell order and what is left in the account is equal to what was initially in the account, ... ?
I have followed the Corda tutorials but I'm still not clear on that logic.

It boils down to the Orchestration Layer (flows or APIs, what the user intends to do) vs the Ledger layer (what the user can do. IE - guaranteed shared logic).
Contract code absolutely must be adhered to, so in your case not being able to sell more than you own would be part of the explicit contract.
The CMN guide here helped me conceptualise.
Flows are better described as business logic, so anything can be achieved within a flow as long as it adheres to the contract.
A security consideration: Anybody may create a flow, and they are equally able to use any Asset (and thus it's state) within their third-party flow.
It is the relevant contract that ensures that your asset is used for the purpose you imagine, and that it isn't used malevolently.

How to handle large Vault Size in Corda?

The data in our vault is manageable. Eventually, we would accumulate a large volume. It is not possible to retain such large data for every day transactions. We would want to periodically archive or warehouse the data, so that query performance is maintained.
May I know if you have thought about handling large scale datasets and what would be your advise.

From the corda-dev mailing list:
Yep, we should do some design work around this. As you note it’s not a pressing issue right now but may become one in future.
Our current implementation is actually designed to keep data around even when it’s no longer ‘current’ on the ledger. The ORM mapped vault tables prefer to mark a row as obsolete rather than actually delete the data from the underlying database. Also, the transaction store has no concept of garbage collection or pruning so it never deletes data either. This has clear benefits from the perspective of understanding the history of the ledger and how it got into its current state, but it poses operational issues as well.
I think people will have different preferences here depending on their resources and jurisdiction. Let’s tackle the two data stores separately:
Making the relationally mapped tables delete data is easy, it’s just a policy change. Instead of marking a row as gone, we actually issue a SQL DELETE call.
The transaction store is trickier. Corda benefits from its blockless design here; in theory we can garbage collect old transactions. The devil is in the details however because for nodes that use SGX the tx store will be encrypted. Thus not only do we need to develop a parallel GC for the tx graph, but also, run it entirely inside the enclaves. A fun systems engineering problem.
If the concern is just query performance, one obvious move is to shift the tx store into a scalable K/V store like Cassandra, hosted BigTable etc. There’s no deep reason the tx store must be in the same RDBMS as the rest of the data, it’s just convenient to have a single database to backup. Scalable K/V stores don’t really lose query performance as the dataset grows, so, this is also a nice solution.
W.R.T. things like the GDPR, being able to delete data might help or it might be irrelevant. As with all things GDPR related nobody knows because the EU didn’t bother to define any answers - auditing a distributed ledger might count as a “legitimate need” for data, or it might not, depending on who the judge is on the day of the case.
It is at any rate only an issue when personal data is stored on ledger, which is not most use cases today.

Corda dynamic data handling at the time creating a contract

I am finding a solution to handle dynamic data / list to all nodes or specified nodes at the time creating a contract in Corda. I don’t think Oracle is a good approach to use in my case for the following reasons:
The data can be a list of for example legal entity names, they are not from outside world, not a single value;
The list is depended on particular field(s) selected, therefore will need perhaps a centralized place to maintain the data relationship;
Appreciate if anyone can help on this. Thanks.
Kwan

This question is a little difficult to answer without further details on your use-case. However, on the surface, an Oracle doesn't sound like a bad solution:
The data provided by an oracle can be a list
The term "outside world" simply refers to any information not included in the transaction itself. This term should not be taken too literally.
Ultimately, you can think of an Oracle as a provider of "official" data. You request a command including the data from the oracle, include it in the transaction, and the oracle will sign over the transaction if and only if it agrees that the data in the command is true. As long as the Oracle is trusted by all parties involved, this allows data from outside the transaction to be included in the transaction in a reliable way.