Constraints on number of subflows? - corda

(Asking on behalf of a client.) In Corda is there any issue with a flow initiating many (even hundreds) of subflows, for example updating many participants from a single flow? If not, what is considered the best practice when one main transaction needs to trigger many other transactions?

Initiating many subflows within a flow is not an issue and will not affect performance.
Deciding whether to use many subflows or keep everything in a single flow is a matter of personal preference and readability.

Related

Request StateStatus from a Notary

Is there any way a CorDapp can ask a Notary if a state has been consumed prior to using it in a Transaction?
Background:
I am testing FungibleToken’s that point to EvolvableTokenType’s. Eventually the EvolvableTokenType changes and holders of the tokens that are not Participants of the EvolvableTokenType end up with states in their vault that have been unknowingly consumed. When they try to execute a transaction involving these states the Notary will refuse to sign because it knows the states have been consumed.
I have written flows that will contact a Participant and request the missing state(s). However it would be more efficient if I could first ask the Notary if I need to do that (i.e. if the state hasn’t been consumed I don’t need to ask a participant for an update).
You could do this a couple ways.
For example, in your vault query, you could simply make sure to filter on StateStatus = UNCONSUMED. That's one way you can ensure this works the way you expect and you never get a state that won't fit your criteria.
check out this section of the docs on vaultQuery : https://docs.corda.net/docs/corda-os/4.7/api-vault-query.html#querycriteria-interface
The other way I'd recommend that might also work is just to include this filter in your contract verification for the transaction, but doing it at the flow level just catches the problem sooner rather than later.

DDD and uniqueness constraint

How would one validate a unique constraint using DDD? Let's say that an Entity has a property name that must be unique among the system and there is a specific EntityRepository method nameExists(name): bool... This is what I found people suggests to do, because the repository is the abstraction of the collection of all the Entityies and should be able to perform this check.
So before creating/adding the new Entity the command / domain service could check for the existence of a newName against the repository, but I think that this will not always work because of concurrency.
In a concurrent scenario where two transactions are started simultaneously, the EntityRepository's nameExists method might return false for both transactions, and as a result of this two entries with the same name will be incorrectly inserted.
I am sure that I am missing something basic, but the answers I found all point to the repository exists method - TBH others say that a UNIQUE constraint should be put on the DB to catch the concurrency case, but what if one uses Event Sourcing or a persistence layer that does not have unique constraints?
| Follow up question |
What if the uniqueness constraint is to be applied in different levels of a hierarchy?
A Container's name must be unique in the system and then Child names must be unique inside a Container.
Let's say that a transactional DB takes care of the uniqueness at the lowest possible level, what about the domain?
Should I still express the uniqueness logic at the domain level, e.g. with a Domain Service for the system-level uniqueness and embedding Child entities inside the Container entity and having a business rule (and therefore making Container the aggregate root)?
Or should I not bother with "replicating" the uniqueness in the domain and (given there are no other rules to apply between the two) split Container and Child? Will the domain lack expressiveness then?
I am sure that I am missing something basic
Not something basic.
The term we normally use for enforcing a constraint, like uniqueness, across a set of entities is set validation. Greg Young calls your attention to a specific question:
What is the business impact of having a failure
Most set constraints fall into one of two categories
constraints that need to be true when the system reaches steady state, but may not hold while work is in progress. In business processes, these are often handled by detecting conflicts in the stored data, and then invoking various mitigation processes to resolve the conflict.
constraints that need to be true always.
The first category includes things like double booking a seat on an airplane; it's not necessarily a problem unless both people show up, and even then you can handle it by bumping someone to another seat, or another flight.
In these cases, you make a best effort - you look at a recent copy of the set, make sure there are no conflicts there, then hope for the best (accepting that some percentage of the time, you'll have missed a change).
See Memories, Guesses and Apologies (Pat Helland, 2007).
Second category is the hard one; to ensure the invariant holds you have to lock the entire set to ensure that races don't allow two different writers to insert conflicting information.
Relational databases tend to be really good at set validation - putting the entire set into a single database is going to be the right answer (note the assumption that the set is small enough to fit into a single database -- trying to lock two databases at the same time is hard).
Another possibility is to ensure that only one writer can update the set at any given time -- you don't have to worry about a losing a race when you are the only one running in it.
Sometimes you can lock a smaller set -- imagine, for example, having a collection of locks with numbers, and the hash code for the name tells you which lock you have to grab.
This simplest version of this is when you can use the name as the aggregate identifier itself.
if one uses Event Sourcing or a persistence layer that does not have unique constraints?
Sometimes, you introduce a persistent store dedicated to the set, just to ensure that you can maintain the invariant. See "microservices".
But if you can't change the database, and you can't use a database with the locking guarantees that you need, and the business absolutely has to have the set valid at all times... then you single thread that part of the work.
Everybody that wants to change a name puts a request into a queue, and the one thread responsible for managing the invariant certifies each and every change.
There's no magic; just hard work and trade offs.

How to handle large Vault Size in Corda?

The data in our vault is manageable. Eventually, we would accumulate a large volume. It is not possible to retain such large data for every day transactions. We would want to periodically archive or warehouse the data, so that query performance is maintained.
May I know if you have thought about handling large scale datasets and what would be your advise.
From the corda-dev mailing list:
Yep, we should do some design work around this. As you note it’s not a pressing issue right now but may become one in future.
Our current implementation is actually designed to keep data around even when it’s no longer ‘current’ on the ledger. The ORM mapped vault tables prefer to mark a row as obsolete rather than actually delete the data from the underlying database. Also, the transaction store has no concept of garbage collection or pruning so it never deletes data either. This has clear benefits from the perspective of understanding the history of the ledger and how it got into its current state, but it poses operational issues as well.
I think people will have different preferences here depending on their resources and jurisdiction. Let’s tackle the two data stores separately:
Making the relationally mapped tables delete data is easy, it’s just a policy change. Instead of marking a row as gone, we actually issue a SQL DELETE call.
The transaction store is trickier. Corda benefits from its blockless design here; in theory we can garbage collect old transactions. The devil is in the details however because for nodes that use SGX the tx store will be encrypted. Thus not only do we need to develop a parallel GC for the tx graph, but also, run it entirely inside the enclaves. A fun systems engineering problem.
If the concern is just query performance, one obvious move is to shift the tx store into a scalable K/V store like Cassandra, hosted BigTable etc. There’s no deep reason the tx store must be in the same RDBMS as the rest of the data, it’s just convenient to have a single database to backup. Scalable K/V stores don’t really lose query performance as the dataset grows, so, this is also a nice solution.
W.R.T. things like the GDPR, being able to delete data might help or it might be irrelevant. As with all things GDPR related nobody knows because the EU didn’t bother to define any answers - auditing a distributed ledger might count as a “legitimate need” for data, or it might not, depending on who the judge is on the day of the case.
It is at any rate only an issue when personal data is stored on ledger, which is not most use cases today.

Corda: Creating contracts dynamically

In our use case, we need to define certain rules at run-time based on which a node will transact with other nodes in the network. For example, we want to define a rate at the front end and check that the transaction is happening with this rate only for that particular node. In other words, can we define the terms and conditions at run-time and would this still be called a smart contract or does a smart contract need to be always hard-coded. Is there any alternate way to look at this?
The contract itself is hard-coded. This is because every node needs to agree that a given transaction is valid according to the contract rules, forever. If they varied based on the node, some nodes would consider a transaction valid while another would consider the transaction invalid, leading to inconsistencies in their ledgers.
Instead, you'd have to impose this logic in the flow. Let's say you have a TradeOffer flow that proposes a trade. Each node could install their own response flow that is initiated by TradeOffer flow. Each node's response flow could impose different conditions. For example, one node might sign any transaction, while another one would check that the proposed rate is within specified bounds.
To extend Joel's comment, the contract is indeed hard-coded, but there's nothing wrong with putting meta logic in there as long as the code runs the same way every time (i.e. it's deterministic).
What do I mean by this? Well, you can put a String type in your state which contains an expression that can then be evaluated (if you refer to https://relayto.com/r3/FIjS0Jfy/VB8epyay73 you can see the inclusion of a very basic maths expression used in a smart contract). There's nothing wrong with making this String as complex as possible, but just be aware that any potential users of your application will start raising eyebrows if you remove a lot of the protection that Corda offers of validation if you start dumbing down the coded verification logic and putting it all into a String.

Doing compex reports with microservices

I'm starting a new project and am interested in architecting it as microservices. I'm trying to wrap my head around it:
Say that I have an order service and a product service. Now I want to make a report service that gives me all orders that contain a product from a certain product category.
Since order's dont know about products that means that I would need to fetch all orders, loop them and fetch products for each order and then return those how match.
Is this assumption correct or is there any more efficient way of doing this with microservices?
In a microservices architecture, the procedure is to distill the use cases and the service boundaries of the application. In the question above, there are at least two service boundaries, namely one for transactions and another for reporting.
When you have two different service boundaries, the typical approach is to duplicate some data elements between them eg. whenever you make a sale, the data, should be sent to both the reporting and transactional services. One possible approach of broadcasting the data to the different boundaries is to use a message queue. Duplicating the data allows them to be evolve and operate independently and become self sufficient which is one of the goals of microservices.
A personal word of advice though, you might want to start with a monolith before going the microservices route. Microservices are generally more operationally heavy; it will be difficult to reason about its advantages during the initial application stages. It tends to work better after having developed the monolithic application since it would be easier to see what didn't work and what could be improved by a microservices-like system.

Resources