DDD and uniqueness constraint - constraints

How would one validate a unique constraint using DDD? Let's say that an Entity has a property name that must be unique among the system and there is a specific EntityRepository method nameExists(name): bool... This is what I found people suggests to do, because the repository is the abstraction of the collection of all the Entityies and should be able to perform this check.
So before creating/adding the new Entity the command / domain service could check for the existence of a newName against the repository, but I think that this will not always work because of concurrency.
In a concurrent scenario where two transactions are started simultaneously, the EntityRepository's nameExists method might return false for both transactions, and as a result of this two entries with the same name will be incorrectly inserted.
I am sure that I am missing something basic, but the answers I found all point to the repository exists method - TBH others say that a UNIQUE constraint should be put on the DB to catch the concurrency case, but what if one uses Event Sourcing or a persistence layer that does not have unique constraints?
| Follow up question |
What if the uniqueness constraint is to be applied in different levels of a hierarchy?
A Container's name must be unique in the system and then Child names must be unique inside a Container.
Let's say that a transactional DB takes care of the uniqueness at the lowest possible level, what about the domain?
Should I still express the uniqueness logic at the domain level, e.g. with a Domain Service for the system-level uniqueness and embedding Child entities inside the Container entity and having a business rule (and therefore making Container the aggregate root)?
Or should I not bother with "replicating" the uniqueness in the domain and (given there are no other rules to apply between the two) split Container and Child? Will the domain lack expressiveness then?

I am sure that I am missing something basic
Not something basic.
The term we normally use for enforcing a constraint, like uniqueness, across a set of entities is set validation. Greg Young calls your attention to a specific question:
What is the business impact of having a failure
Most set constraints fall into one of two categories
constraints that need to be true when the system reaches steady state, but may not hold while work is in progress. In business processes, these are often handled by detecting conflicts in the stored data, and then invoking various mitigation processes to resolve the conflict.
constraints that need to be true always.
The first category includes things like double booking a seat on an airplane; it's not necessarily a problem unless both people show up, and even then you can handle it by bumping someone to another seat, or another flight.
In these cases, you make a best effort - you look at a recent copy of the set, make sure there are no conflicts there, then hope for the best (accepting that some percentage of the time, you'll have missed a change).
See Memories, Guesses and Apologies (Pat Helland, 2007).
Second category is the hard one; to ensure the invariant holds you have to lock the entire set to ensure that races don't allow two different writers to insert conflicting information.
Relational databases tend to be really good at set validation - putting the entire set into a single database is going to be the right answer (note the assumption that the set is small enough to fit into a single database -- trying to lock two databases at the same time is hard).
Another possibility is to ensure that only one writer can update the set at any given time -- you don't have to worry about a losing a race when you are the only one running in it.
Sometimes you can lock a smaller set -- imagine, for example, having a collection of locks with numbers, and the hash code for the name tells you which lock you have to grab.
This simplest version of this is when you can use the name as the aggregate identifier itself.
if one uses Event Sourcing or a persistence layer that does not have unique constraints?
Sometimes, you introduce a persistent store dedicated to the set, just to ensure that you can maintain the invariant. See "microservices".
But if you can't change the database, and you can't use a database with the locking guarantees that you need, and the business absolutely has to have the set valid at all times... then you single thread that part of the work.
Everybody that wants to change a name puts a request into a queue, and the one thread responsible for managing the invariant certifies each and every change.
There's no magic; just hard work and trade offs.

Related

Does DynamoDB expose an API to query or detect when there is conflict in merging item data

DynamoDB is an AP system based on the original dynamo paper.
Is there any API to detect when a merge conflict has happened or resolved?
Is there any API to provide a strategy to resolve a conflict if it happens.
Your question is based on a wrong premise. Although DynamoDB shares the name, and some goals and implementation details, with the original "Dynamo" paper, it is not very close, and the data model in particular is completely different.
Whereas in the Dynamo paper multiple clients could store multiple different values for an item concurrently - and later readers need to resolve the conflict - DynamoDB does things very differently:
If two clients replace an item, DynamoDB offers a "last write wins" - one of these writes will win, you don't know or care which.
If two clients modify different attributes in the same item concurrently, both changes will be merged. I never found this explicitly promised, but it appears to work this way.
You also have a powerful conditional update feature, which can do a modification to a single item based on some condition on the old value of this item. These conditional updates are guaranteed to be isolated, so they can be used to ensure safe concurrent modification. For example, a conditional update can be used to implement so-called optimistic locking: An item has a version attribute among other attributes, a client reads the old item, decides what to change it to, and then does the write - with the condition that the version still hasn't changed. If the condition fails (because some other client raced us), the write fails and the client tries the whole process again (read again, apply a change, and write back).
DynamoDB also has a new feature of full (multi-item) transactions. This feature did not exist in Dynamo at all.

Corda: Creating contracts dynamically

In our use case, we need to define certain rules at run-time based on which a node will transact with other nodes in the network. For example, we want to define a rate at the front end and check that the transaction is happening with this rate only for that particular node. In other words, can we define the terms and conditions at run-time and would this still be called a smart contract or does a smart contract need to be always hard-coded. Is there any alternate way to look at this?
The contract itself is hard-coded. This is because every node needs to agree that a given transaction is valid according to the contract rules, forever. If they varied based on the node, some nodes would consider a transaction valid while another would consider the transaction invalid, leading to inconsistencies in their ledgers.
Instead, you'd have to impose this logic in the flow. Let's say you have a TradeOffer flow that proposes a trade. Each node could install their own response flow that is initiated by TradeOffer flow. Each node's response flow could impose different conditions. For example, one node might sign any transaction, while another one would check that the proposed rate is within specified bounds.
To extend Joel's comment, the contract is indeed hard-coded, but there's nothing wrong with putting meta logic in there as long as the code runs the same way every time (i.e. it's deterministic).
What do I mean by this? Well, you can put a String type in your state which contains an expression that can then be evaluated (if you refer to https://relayto.com/r3/FIjS0Jfy/VB8epyay73 you can see the inclusion of a very basic maths expression used in a smart contract). There's nothing wrong with making this String as complex as possible, but just be aware that any potential users of your application will start raising eyebrows if you remove a lot of the protection that Corda offers of validation if you start dumbing down the coded verification logic and putting it all into a String.

Doctrine 2 optimistic locking only on changed fields

Is anyone aware of any bundle, or of any plans for Doctrine to implement a new optimistic locking strategy like the one in Telerik framework?
http://docs.telerik.com/data-access/developers-guide/crud-operations/concurrency-control/data-access-tasks-define-model-concurrency-optimistic#checking_for_any_changes
This strategy is great because it's not using version numbers, or timestamps, it compares the old values of the inputs with new values, and only for the changed inputs.
So if I have two different forms updating the same entity, using the current strategies (timestamp or versioning), 2 users will get into conflict even if the do not update the same data.
I know this is a rather old question on the interwebs, so, I'm going to glaze over how this can be done in theory.
The per column lock implies there is a mechanism for detecting per column changes - therefore either the row is hashed (or the columns we are concerned about) and that hash comparison is used to detect/track changes OR we track each column individually.
This tracking would then have to be applied on a per request basis to determine what's changed, when. This is where the normal #Version decorator logic would apply.
Telerik is able to handle the per column changes because the information is compared (array comparisons, most likely) in the browser. This is an isolated state of the data that is not in sync with other browsers/database until update.
Easiest option here is to not allow the two forms to collide (in terms of data) and remove the lock - OR - decouple the forms data from the primary table and allow them to be updated separately (and merged with a sql join).

How to realize persistence of a complex graph with an Object Database?

I have several graphs. The breadth and depth of each graph can vary and will undergo changes and alterations during runtime. See example graph.
There is a root node to get a hold on the whole graph (i.e. tree). A node can have several children and each child serves a special purpose. Furthermore a node can access all its direct children in order to retrieve certain informations. On the other hand a child node may not be aware of its own parent node, nor other siblings. Nothing spectacular so far.
Storing each graph and updating it with an object database (in this case DB4O) looks pretty straightforward. I could have used a relational database to accomplish data persistence (including database triggers, etc.) but I wanted to realize it with an object database instead.
There is one peculiar thing with my graphs. See another example graph.
To properly perform calculations some nodes require informations from other nodes. These other nodes may be siblings, children/grandchildren or related in some other kind. In this case a specific node knows the other relevant nodes as well (and thus can get the required informations directly from them). For the sake of simplicity the first image didn't show all potential connections.
If one node has a change of state (e.g. triggered by an internal timer or triggered by some other node) it will inform other nodes (interested obsevers, see also observer pattern) about the change. Each informed node will then take appropriate actions to update its own state (and in turn inform other observers as needed). A root node will not know about every change that occurs, since only the involved nodes will know that something has changed. If such a chain of events is triggered by the root node then of course it's not much of an issue.
The aim is to assure data persistence with an object database. Data in memory should be in sync with data stored within the database. What adds to the complexity is the fact that the graphs don't consist of simple (and stupid) data nodes, but that lots of functionality is integrated in each node (i.e. events that trigger state changes throughout a graph).
I have several rough ideas on how to cope with the presented issue (e.g. (1) stronger separation of data and functionality or (2) stronger integration of the database or (3) set an arbitrary time interval to update data and accept that data may be out of synch for a period of time). I'm looking for some more input and options concerning such a key issue (which will definitely leave significant footprints on a concrete implementation).
(edited)
There is another aspect I forgot to mention. A graph should not reside all the time in memory. Graphs that are not needed will be only present in the database and thus put in a state of suspension. This is another issue which needs consideration. While in suspension the update mechanisms will probably be put to sleep as well and this is not intended.
In the case of db4o check out "transparent activation" to automatically load objects on demand as you traverse the graph (this way the graph doesn't have to be all in memory) and check out "transparent persistence" to allow each node to persist itself after a state change.
http://www.gamlor.info/wordpress/2009/12/db4o-transparent-persistence/
Moreover you can use db4o "callbacks" to trigger custom behavior during db4o operations.
HTH
German
What's the exact question? Here a few comments:
As #German already mentioned: For complex object graphs you probably want to use transparent persistence.
Also as #German mentione: Callback can help you to do additional stuff when objects are read/written etc on the database.
To the Observer-Pattern. Are you on .NET or Java? Usually you don't want to store the observers in the database, since the observers are usually some parts of your business-logic, GUI etc. On .NET events are automatically not stored. On Java make sure that you mark the field holding the observer-references as transient.
In case you actually want to store observers, for example because they are just other elements in your object-graph. On .NET, you cannot store delegates / closures. So you need to introduce a interface for calling the observer. On Java: Often we use anonymous inner classes as listener: While db4o can store those, I would NOT recommend that. Because a anonymous inner class gets generated name which can change. Then db4o will not find that class later if you've changed your code.
Thats it. Ask more detailed questions if you want to know more.

How to implement gapless, user-friendly IDs in NHibernate?

I'm designing an application where my Order objects need to have a sequential and user-friendly Id field. I'm avoiding the HiLo algorithm because of the rather large gaps it produces (see here). Naturally, Guid values would make my corporate users go bananas. I'm also avoiding Oracle sequences because of the major disadvantages of it:
(From: NHibernate POID Generators revealed)
Post insert generators, as the name
suggest, assigns the id’s after the
entity is stored in the database. A
select statement is executed against
database. They have many drawbacks,
and in my opinion they must be used
only on brownfield projects. Those
generators are what WE DO NOT SUGGEST
as NH Team.
> Some of the drawbacks are the
following:
Unit Of Work is broken with the use of
those strategies. It doesn’t matter if
you’re using FlushMode.Commit, each
Save results in an insert statement
against DB. As a best practice, we
should defer insertions to the commit,
but using a post insert generator
makes it commit on save (which is what
UoW doesn’t do).
Those strategies
nullify batcher, you can’t take the
advantage of sending multiple queries
at once(as it must go to database at
the time of Save).
Any ideas/experience on implementing user-friendly IDs without major gaps between them?
Edit:
User friendly Id fields are ones my corporate users can memorize and even discuss and/or have phone conversations talking about a particular Order by its code, e.g. "I'm calling to know why the order #1625 was denied.".
The Id doesn't need to be strictly gapless, but I am worried that my users would get confused when they see gaps like 100, 201, 305. For my older projects, I currently implement NHibernate using Oracle sequences which occasionally lose a few sequences when exceptions are thrown, but yet keep a rather tidy order to them. The downside to them is how they break the Unit of Work which results in additional hits to the database for every Save command with or without the Session.Flush.
One option would be to keep a key-table that simply stores an incrementing value. This can introduce a few problems, namely possible locking issues as well as additional hits to the database.
Another option might be to refine what you mean by "User-friendly Id". This could consist of a combination of a Date/Time and a customer-specific sequence (or including the customer id as well). Also, your order id does not necessarily have to be the actual key on the table. There is nothing to say that you can't use a surrogate key with a separate "calculated" column which represents the order id.
The bottom-line is that it sounds like you want to use a surrogate key, but have the benefits of a natural key. It can be very difficult to have it both ways and a lot comes down to how you actually plan on using the data, how users interpret the data, and personal preference.

Resources