How are parent-child aggregate relationships managed in Axon Framework - axon

The Axon documentation describes how to create a child aggregate from a parent, but not how to retrieve it, or delete it (e.g. for cascading deletes)
Does the parent aggregate typically-explicitly, or automatically-internally keep a list of references to the child aggregates?
Would such references be a collection of aggregate IDs, or, to be more object oriented, a collection of actual instance references to the child aggregates?
Another way to pose this question: What is different about child aggregates vs entities in multi-entity aggregates, and what is different about child aggregates vs totally independent aggregates?
I want a cascading delete (containment) model between parent and child, but I want separate concurrent access to the child objects in a very large collection, hence aggregate member entities are not suitable.
Also note a similar question in the forum: the OP, Jakob describes a model at the end that includes his own table managing references for cascading... do I need that?

If you require the Entities to be separate Aggregates, then you will be required to maintain a reference table from parent to child.
The support Axon provides to create child Aggregates from a parent Aggregate is to ensure the framework uses a single transaction to publish multiple events. By no means does Axon Framework automatically store the relationships for you.
Instead, all of this should be known within the event stream of the Aggregates. With that in mind, combined with Event Sourcing, you can source any form of data within the Aggregates.
To circle back to your cascading delete scenario: I've actually had direct contact with Jakob about the matter. In his case (and potentially yours) we ended up with an `aggregateId-to-childAggregateIds model dedicated to keeping the references. Upon a delete from a parent Aggregate (on any level), this model is referred to, ensuring the right set of children is deleted too. Note that all this is custom code.
Furthermore, this aggregateId-to-childAggregateIds model can be regarded as part of your Command Model (granted that you're aiming to apply CQRS). As such, it's purely used to drive decision-making. Where the decision-making, in this case, is deciding on the right children to send delete commands to.
So, to summarize:
Axon does not keep parent-child relations for you, other than in the contents of the events you publish.
I'd opt the aggregateId-to-childAggregateIds model to never store the entire Aggregate instance. You simply don't need all that data for deciding who to delete. The child's Aggregate identifier should suffice.
Axon's child Aggregate creation support is purely there to use a single transaction towards the event store to publish the parent's change and the creation of a child, while still benefitting from separate instances for increased concurrency. Axon's Aggregate Member support would mark the children as entities under the parent Aggregate Root instead of their own Aggregate instances.

Related

DDD and uniqueness constraint

How would one validate a unique constraint using DDD? Let's say that an Entity has a property name that must be unique among the system and there is a specific EntityRepository method nameExists(name): bool... This is what I found people suggests to do, because the repository is the abstraction of the collection of all the Entityies and should be able to perform this check.
So before creating/adding the new Entity the command / domain service could check for the existence of a newName against the repository, but I think that this will not always work because of concurrency.
In a concurrent scenario where two transactions are started simultaneously, the EntityRepository's nameExists method might return false for both transactions, and as a result of this two entries with the same name will be incorrectly inserted.
I am sure that I am missing something basic, but the answers I found all point to the repository exists method - TBH others say that a UNIQUE constraint should be put on the DB to catch the concurrency case, but what if one uses Event Sourcing or a persistence layer that does not have unique constraints?
| Follow up question |
What if the uniqueness constraint is to be applied in different levels of a hierarchy?
A Container's name must be unique in the system and then Child names must be unique inside a Container.
Let's say that a transactional DB takes care of the uniqueness at the lowest possible level, what about the domain?
Should I still express the uniqueness logic at the domain level, e.g. with a Domain Service for the system-level uniqueness and embedding Child entities inside the Container entity and having a business rule (and therefore making Container the aggregate root)?
Or should I not bother with "replicating" the uniqueness in the domain and (given there are no other rules to apply between the two) split Container and Child? Will the domain lack expressiveness then?
I am sure that I am missing something basic
Not something basic.
The term we normally use for enforcing a constraint, like uniqueness, across a set of entities is set validation. Greg Young calls your attention to a specific question:
What is the business impact of having a failure
Most set constraints fall into one of two categories
constraints that need to be true when the system reaches steady state, but may not hold while work is in progress. In business processes, these are often handled by detecting conflicts in the stored data, and then invoking various mitigation processes to resolve the conflict.
constraints that need to be true always.
The first category includes things like double booking a seat on an airplane; it's not necessarily a problem unless both people show up, and even then you can handle it by bumping someone to another seat, or another flight.
In these cases, you make a best effort - you look at a recent copy of the set, make sure there are no conflicts there, then hope for the best (accepting that some percentage of the time, you'll have missed a change).
See Memories, Guesses and Apologies (Pat Helland, 2007).
Second category is the hard one; to ensure the invariant holds you have to lock the entire set to ensure that races don't allow two different writers to insert conflicting information.
Relational databases tend to be really good at set validation - putting the entire set into a single database is going to be the right answer (note the assumption that the set is small enough to fit into a single database -- trying to lock two databases at the same time is hard).
Another possibility is to ensure that only one writer can update the set at any given time -- you don't have to worry about a losing a race when you are the only one running in it.
Sometimes you can lock a smaller set -- imagine, for example, having a collection of locks with numbers, and the hash code for the name tells you which lock you have to grab.
This simplest version of this is when you can use the name as the aggregate identifier itself.
if one uses Event Sourcing or a persistence layer that does not have unique constraints?
Sometimes, you introduce a persistent store dedicated to the set, just to ensure that you can maintain the invariant. See "microservices".
But if you can't change the database, and you can't use a database with the locking guarantees that you need, and the business absolutely has to have the set valid at all times... then you single thread that part of the work.
Everybody that wants to change a name puts a request into a queue, and the one thread responsible for managing the invariant certifies each and every change.
There's no magic; just hard work and trade offs.

Ancestor index or global index?

I have an entity that represents a relationship between two entity groups but the entity belongs to one of the groups. However, my queries for this data are going to be mostly with the other entity group. To support the queries I see I have two choices a) Create a global index that has the other entity group key as prefix b) Move the entity into the other entity group and create ancestor index.
I saw a presentation which mentioned that ancestor indexes map internally to a separate table per entity group while there is a single table for the global index. That makes me feel that ancestors are better than using global indexes which includes the ancestor keys as prefix for this specific use case where I will always be querying in the context of some ancestor key.
Looking for guidance on this in terms of performance, storage characteristics, transaction latency and any other architectural considerations to make the final call.
From what I was able to found I would say it depends on the of the type of work you'll be doing. looked at this docs and it suggest you Avoid writing to an entity group more than once per second. Also indexing a property could result in increased latency. Also it states that If you need strong consistency for your queries, use an ancestor query, in that docs there are many advice's on how to avoid latency and other issues. it should help you on taking a call.
I ended up using a 3rd option which is to have another entity denormalized into the other entity group and have ancestor queries on it. This allows me to efficiently query data for either of the entity groups. Since I was using transactions already, denormalizing wouldn't cause any inconsistencies and everything seems to work well.

Corda State Management Best Practice

A strategic question… When a state is going to have one to many type of data, should we always create a collection under the parent state object or create a separate state object for the child with the reference to parent? Example (Employer 1:M Employee) or (Employer 1:M Location) …. When to decide which strategy? I've listed some PROS & CONS for each. Not sure when to use what strategy. Looking for some feedback
Adding child as collection
PROS
=====
- Easier to manage from coding standpoint
- Easy access to child data as it will always be available when querying parent from vault
CONS
=====
- As each collection object is going to be represented as separate table in the database, Each time a new state is created child data is also replicated even though there may not be update on child which will cause database to grow unessential
- If we have too many of such collection objects then serialized transaction size could be huge so performance could be worst
Adding child as Separate State Object
PROS
=====
- Child data is not replicated with each time a new parent state is updated
- When there is an update on any of the Child data only that state needs to be communicated other participant
CONS
=====
- More coding needed in order to manage child state object separately
- Child data won't be available when querying parent from vault
- Each state needs to have its own contract so child objects can't be validated on the same parent contract
Since states are linked to persisting to a single table on the backend, it is difficult to manage child collections. At present, I think you would need to leave the collection property unbound (i.e. not mapped to a database column and marked as transient so that the class can still be serializable) and then do a separate filtered query for the child records that can be assigned to the collection property of the state. Then when any change is persisted, it will not try to persist the child records. Changes to child records should be done individually through their own state transactions. It would be nice if Corda had a feature that would support the JPA feature of doing joins between tables such as #OneToMany. This would facilitate queries, but persisting state changes would still need to be handled separately. There may be a way of doing this that I am unaware of.
It's an old question, but there does not seem to be an accepted answer, so I'll have a go.
Firstly, the Corda node is not just a back-end for your application, it's a node in a decentralised transaction processing system. The latter must be key requirement for you, otherwise you wouldn't use Corda.
Secondly, Corda implements UTXO (Unspent Transaction Output) paradigm for evolving distributed state through a series of transaction whereby a collection of objects representing input states are 'spent' (or consumed, become unavailable) and replaced by another collection of objects representing output states. The state objects themselves may have complex structures, but when they evolve they meant to be swapped as a whole. That is unlike, say, Ethereum or Hyperledger, where the global state is basically a large collection of unrelated key-value pairs that can change arbitrarily. UTXO model allows to easily implement features that are very hard to achieve with the global state model, such as transaction privacy. Important point here is that Corda can be made to emulate global state model, but it will be inefficient at it and lose most of its benefits.
Because of this, the way states are modelled must be based on the intended evolution of the distributed state of a CorDapp. The questions to ask yourself therefore are probably the following:
Will the child states 'live their own life', i.e. evolve
independently from the parent states? An example of a 'yes' would be
Corda Token SDK, whereby token type and tokens themselves are
separate states despite obvious parent-child relationship. Network
participants can trade the child states without controlling the
parent state.
Will there be a need to withhold the information in
a parent state but not the child, or vice versa from a party
participating in a transaction? An example of a 'yes' would be the
use of Oracle to sign off a child state object output without being
shown the parent. Corda IRS example does something similar with
transaction tear-offs when the Oracle fixes the interest rate.
Will there be a situation whereby a network participant may need to
know (i.e. keep in the vault) the child state, but not the parent
state. An example of a 'yes' would be an off-ledger settlement
workflow similar to the Corda Settler example, whereby paying agents
can be settling obligations without necessarily knowing the details
of the agreements that led to the obligations to arise.
If the answer to any of the above questions is 'yes', then you have to represent the children as separate states, otherwise it is better to leave them embedded into the parent state.

Redux: is state normalization necessary for composition relationship?

As we know, when saving data in a redux store, it's supposed to be transformed into a normalized state. So embedded objects should be replaced by their ids and saved within a dedicated collection in the store.
I am wondering, if that also should be done if the relationship is a composition? That means, the embedded data isn't of any use outside of the parent object.
In my case the embedded objects are registrations, and the parent object is a (real life) event. Normalizing this data structure to me feels like a lot of boilerplate without any benefit.
State normalization is more than just how you access the data by traversing the object tree. It also has to do with how you observe the data.
Part of the reason for normalization is to avoid unnecessary change notifications. Objects are treated as immutable so when they change a new object is created so that a quick reference check can indicate if something in the object changed. If you nest objects and a child object changes then you should change the parent. If some code is observing the parent then it will get change notifications every time a child changes even though it might not care. So depending on your scenario you may end up with a bunch of unnecessary change notifications.
This is also partly why you see lists of entities broken out into an array of identifiers and a map of objects. In relation to change detection, this allows you to observe the list (whether items have been added or removed) without caring about changes to the entities themselves.
So it depends on your usage. Just be aware of the cost of observing and the impact your state shape has on that.
I don't agree that data is "supposed to be [normalized]". Normalizing is a useful structure for accessing the data, but you're the architect to make that decision.
In many cases, the data stored will be an application singleton and a descriptive key is more useful than forcing some kind of id.
In your case I wouldn't bother unless there is excessive data duplication, especially because your would have to then denormalize for the object to function properly.

How to realize persistence of a complex graph with an Object Database?

I have several graphs. The breadth and depth of each graph can vary and will undergo changes and alterations during runtime. See example graph.
There is a root node to get a hold on the whole graph (i.e. tree). A node can have several children and each child serves a special purpose. Furthermore a node can access all its direct children in order to retrieve certain informations. On the other hand a child node may not be aware of its own parent node, nor other siblings. Nothing spectacular so far.
Storing each graph and updating it with an object database (in this case DB4O) looks pretty straightforward. I could have used a relational database to accomplish data persistence (including database triggers, etc.) but I wanted to realize it with an object database instead.
There is one peculiar thing with my graphs. See another example graph.
To properly perform calculations some nodes require informations from other nodes. These other nodes may be siblings, children/grandchildren or related in some other kind. In this case a specific node knows the other relevant nodes as well (and thus can get the required informations directly from them). For the sake of simplicity the first image didn't show all potential connections.
If one node has a change of state (e.g. triggered by an internal timer or triggered by some other node) it will inform other nodes (interested obsevers, see also observer pattern) about the change. Each informed node will then take appropriate actions to update its own state (and in turn inform other observers as needed). A root node will not know about every change that occurs, since only the involved nodes will know that something has changed. If such a chain of events is triggered by the root node then of course it's not much of an issue.
The aim is to assure data persistence with an object database. Data in memory should be in sync with data stored within the database. What adds to the complexity is the fact that the graphs don't consist of simple (and stupid) data nodes, but that lots of functionality is integrated in each node (i.e. events that trigger state changes throughout a graph).
I have several rough ideas on how to cope with the presented issue (e.g. (1) stronger separation of data and functionality or (2) stronger integration of the database or (3) set an arbitrary time interval to update data and accept that data may be out of synch for a period of time). I'm looking for some more input and options concerning such a key issue (which will definitely leave significant footprints on a concrete implementation).
(edited)
There is another aspect I forgot to mention. A graph should not reside all the time in memory. Graphs that are not needed will be only present in the database and thus put in a state of suspension. This is another issue which needs consideration. While in suspension the update mechanisms will probably be put to sleep as well and this is not intended.
In the case of db4o check out "transparent activation" to automatically load objects on demand as you traverse the graph (this way the graph doesn't have to be all in memory) and check out "transparent persistence" to allow each node to persist itself after a state change.
http://www.gamlor.info/wordpress/2009/12/db4o-transparent-persistence/
Moreover you can use db4o "callbacks" to trigger custom behavior during db4o operations.
HTH
German
What's the exact question? Here a few comments:
As #German already mentioned: For complex object graphs you probably want to use transparent persistence.
Also as #German mentione: Callback can help you to do additional stuff when objects are read/written etc on the database.
To the Observer-Pattern. Are you on .NET or Java? Usually you don't want to store the observers in the database, since the observers are usually some parts of your business-logic, GUI etc. On .NET events are automatically not stored. On Java make sure that you mark the field holding the observer-references as transient.
In case you actually want to store observers, for example because they are just other elements in your object-graph. On .NET, you cannot store delegates / closures. So you need to introduce a interface for calling the observer. On Java: Often we use anonymous inner classes as listener: While db4o can store those, I would NOT recommend that. Because a anonymous inner class gets generated name which can change. Then db4o will not find that class later if you've changed your code.
Thats it. Ask more detailed questions if you want to know more.

Resources