I understand the CommitId is used internally by Jonathan Oliver's event store.
So far I've always provided a Guid.NewGuid() for the CommitId.
When would you every want to do anything different?
I don't understand why it is exposed within his common domain Repository.
Can anyone can shed some light on this?
In general, CommonDomain doesn't try to force a one-size-fits-all opinionated structure.
One way to leverage it is by having writers use their unique incoming Command Id as a CommitId - this means that competing (or retries competing with runs that have yet to time out) will get rejected with a specific exception without needing to enter into/consider/manage Conflict Resolution logic. This is used to fulfil the key tenet of Idempotent Commands.
Related
My girlfriend was asked the below question in an interview:
We trigger 5 independent APIs simultaneously. Once they have all completed, we want to trigger a function. How will you design a system to do this?
My girlfriend replied she will use a flag variable, but the interviewer was evidently not happy with it.
So, is there a good way in which this could be handled (in a distributed context)? Note that each of the 5 API calls are made by different servers and the function to be triggered is on a 6th server.
The other answers suggesting Promises seem to assume all these requests necessarily come from the same client. If the context here is distributed systems, as you said it is, then I don't think those are valid answers. If they were, then the interview question would have nothing to do with distributed systems, except to essay your girlfriend's ability to recognize something that isn't really a distributed systems problem.
And the question does have the shape of some classic problems in distributed systems. It sounds a lot like YouTube view counting: How do you achieve qualities like atomicity and consistency in a multi-threaded, multi-process, or multi-client environment? Failing to recognize this, thinking the answer could be as simple as "a flag", betrayed a lack of experience in distributed systems.
Another thing about that answer is that it leaves many ambiguities. Where does the flag live? As a variable in another (Java?) API? In a database? In a file? Even in a non-distributed context, these are important questions. And if she had gone on to address these questions, even being innocent of all the distributed systems complications, she might have happily fallen into a discussion of the kinds of D.S. problems that occur when you use, say, a file; and how using a ACID-compliant database might solve those problems, and what the tradeoffs might be there... And she might have corrected herself and said "counter" instead of "flag"!
If I were asked this, my first thought would be to use promises/futures. The idea behind them is that you can execute time-consuming operations asynchronously and they will somehow notify you when they've completed, either successfully or unsuccessfully, typically by calling a callback function. So the first step is to spawn five asynchronous tasks and get five promises.
Then I would join the five promises together, creating a unified promise that represents the five separate tasks. In JavaScript I might call Promise.all(); in Java I would use CompletableFuture.allOf().
I would want to make sure to handle both success and failure. The combined promise should succeed if all of the API calls succeed and fail if any of them fail. If any fail there should be appropriate error handling/reporting. What happens if multiple calls fail? How would a mix of successes and failures be reported? These would be design points to mention, though not necessarily solve during the interview.
Promises and futures typically have modular layering system that would allow edge cases like timeouts to be handled by chaining handlers together. If done right, timeouts could become just another error condition that would be naturally handled by the error handling already in place.
This solution would not require any state to be shared across threads, so I would not have to worry about mutexes or deadlocks or other thread synchronization problems.
She said she would use a flag variable to keep track of the number of API calls have returned.
One thing that makes great interviewees stand out is their ability to anticipate follow-up questions and explain details before they are asked. The best answers are fully fleshed out. They demonstrate that one has thought through one's answer in detail, and they have minimal handwaving.
When I read the above I have a slew of follow-up questions:
How will she know when each API call has returned? Is she waiting for a function call to return, a callback to be called, an event to be fired, or a promise to complete?
How is she causing all of the API calls to be executed concurrently? Is there multithreading, a fork-join pool, multiprocessing, or asynchronous execution?
Flag variables are booleans. Is she really using a flag, or does she mean a counter?
What is the variable tracking and what code is updating it?
What is monitoring the variable, what condition is it checking, and what's it doing when the condition is reached?
If using multithreading, how is she handling synchronization?
How will she handle edge cases such API calls failing, or timing out?
A flag variable might lead to a workable solution or it might lead nowhere. The only way an interviewer will know which it is is if she thinks about and proactively discusses these various questions. Otherwise, the interviewer will have to pepper her with follow-up questions, and will likely lower their evaluation of her.
When I interview people, my mental grades are something like:
S — Solution works and they addressed all issues without prompting.
A — Solution works, follow-up questions answered satisfactorily.
B — Solution works, explained well, but there's a better solution that more experienced devs would find.
C — What they said is okay, but their depth of knowledge is lacking.
F — Their answer is flat out incorrect, or getting them to explain their answer was like pulling teeth.
After a watching a few videos regarding DynamoDB and its best practices, I decided to give it a try; however, I cannot help but feel what I'm doing may be an anti-pattern. As I understand it, the best practice is to leverage as few tables as possible while also taking advantage of GSIs to do some 'heavy' lifting. Unfortunately, I'm working with a use case that doesn't actually have strictly defined access patterns yet since we're still in early development.
Some early access patterns that we may see are:
Retrieve the number of wins for a particular game: rock paper scissors, boxing, etc. [1 quick lookup]
Retrieve the amount of coins a user has. [1 quick lookup]
Retrieve all the items that someone has purchased (don't care about date). [Not sure?]
Possibly retrieve all the attributes associated with a user (rps wins, box wins, coins, etc). [I genuinely don't know.]
Additionally, there may be 2 operations we will need to complete. For example, if the user wins a particular game they may receive "coins". Effectively, we'll need to add coins to the user "coins" attribute & update their number of wins for the game.
Do you think I should revisit this strategy? Additionally, we'll probably start creating 'logs' associated with various games and each individual play.
Designing a DynamoDB data model without fully understanding your applications access patterns is the anti-pattern.
Take the time to define your entities (Users, Games, Orders, etc), their relationship to one another and your applications key access patterns. This can be hard work when you are just getting started, but it's absolutely critical to do this when working with DynamoDB. How else can we (or you, or anybody) evaluate whether or not you're using DDB correctly?
When I first worked with DDB, I approached the process in a similar way you are describing. I was used to working with SQL databases, where I could define a few tables and rely on the magic of SQL to support my access patterns as my understanding of the application access patterns evolved. I quickly realized this was not going to work if I wanted to use DynamoDB!
Instead, I started from the front-end of my application. I sketched out the different pages in my app and nailed down the most important concepts in my application. Granted, I may not have covered all the access patterns in my application, but the exercise certainly nailed down the minimal access patterns I'd need to have a usable app.
If you need to rapidly prototype your application to get a better understanding of your acecss patterns, consider using the skills you and your team already have. If you already understand data modeling with SQL databses, go with that for now. You can always revisit DynamoDB once you have a better understanding of your access patterns and determine that your application can benefit from using a NoSQL databse.
The verbs are pretty straightforward for CRUD actions.
What would be the right HTTP verb for only performing an action, something
like an upvote?
Maybe this speaks more to data modeling? Is an upvote a resource or just an attribute? I'm unsure about that. Let's say it does modify the resource directly by calling #upvote on the model.
For example, if I upvote a question here on SO, what verb should be ideally used for that action? I am modifying the resource in a partial manner (PATCH?), but at the same time, I don't want to specify the new value as I could encounter concurrency issues, so this would best be managed by the database. In other words, we want to ask the server to perform an incremental action on a resource. Is that covered by PATCH?
I've seen a similar question asked there, but their case pointed to the creation of a new resource by viewing the job request as an object to be created. Are we in the same case here?
If the PATCH method really would be appropriate, what would it contain?
Maybe this speaks more to data modeling? Is an upvote a resource or just an attribute?
Modelling impacts Implementation
We are usually modelling something from the real world and our choice of representation will seriously affect the capabilities of the developed system. We could implement our vote in two ways: as an attribute on the thing being voted on or as an entity in its own right. The choice will affect how easily we can implement desired features.
Two possible implementations ...
1. Votes as entities
I would model this with a resource which modelled the relationship between the voter and the thing being voted on. Why?
The vote has state:
what was being voted on
who voted,
when did they vote.
was it an up vote or a down vote (you mentioned SO as an example so I include that possibility here)
It is a resource in its own right with interesting behaviour around the votes
maintain a correct count of the votes
prevent multiple up votes / down votes
It can be modelled easily with REST.
I can POST/PUT a new vote, DELETE a previous vote, check my votes with a qualified GET.
The system can ensure that I only vote once - something which would not be easy to do if a simple counter was being maintained.
2. Votes as an attribute
In this implementation, we model the vote as a counter. In this case we have to
Get the entire state of the thing being voted on - maximising the interface between client and server
Update the counter
Put back the updated state - oops, someone already updated the
resource in the meantime!
The server now has no easy way to handle multiple votes from the same person without managing some state 'on the side'. We also have that 'lost update' problem.
Things quickly get complicated.
Final advice
The decision on how you model something should be driven by what you need the system to do.
There is often no correct decision, just the best compromise between effort and value.
Choose a design which most easily implements the most common Use Cases. Common things should be quick and simple to do, uncommon things need only be possible.
Chris
Let me define the problem first and why a messagequeue has been chosen. I have a datalayer that will be transactional and EXTREMELY insert heavy and rather then attempt to deal with these issues when they occur I am hoping to implement my application from the ground up with this in mind.
I have decided to tackle this problem by using the Microsoft Message Queue and perform inserts as time permits asynchronously. However I quickly ran into a problem. Certain inserts that I perform may need to be recalled (ie: retrieved) immediately (imagine this is for POS system and what happens if you need to recall the last transaction - one that still hasn’t been inserted).
The way I decided to tackle this problem is by abstracting the MessageQueue and combining it in my data access layer thereby creating the illusion of a single set of data being returned to the user of the datalayer (I have considered the other issues that occur in such a scenario (ie: essentially dirty reads and such) and have concluded for my purposes I can control these issues).
However this is where things get a little nasty... I’ve worked out how to get the messages back and such (trivial enough problem) but where I am stuck is; how do I create a generic (or at least somewhat generic) way of querying my message queue? One where I can minimize the duplication between the SQL queries and MessageQueue queries. I have considered using LINQ (but have very limited understanding of the technology) and have also attempted an implementation with Predicates which so far is pretty smelly.
Are there any patterns for such a problem that I can utilize? Am I going about this the wrong way? Does anyone have any of their own ideas about how I can tackle this problem? Does anyone even understand what I am talking about? :-)
Any and ALL input would be highly appreciated and seriously considered…
Thanks again.
For anyone interested. I decided in
the end to simply cache the
transaction in another location and
use the MSMQ as intended and described
below.
If the queue has a large-ish number of messages on it, then enumerating those messages will become a serious bottleneck. MSMQ was designed for first-in-first-out kind of access and anything that doesn't follow that pattern can cause you lots of grief in terms of performance.
The answer depends greatly on the sort of queries you're going to be executing, but the answer may be some kind of no-sql database (CouchDB or BerkeleyDB, etc)
I wanted to use [StudyCompletionDate (0032,1050)] but since it is retired I would like to know the attribute to be used to determine whether a study is complete or is ready for Archival.
We are writing a archival solution for a PACS server, I would like to query the PACS server for the DICOM images that are marked for archival. I want to know if there is any flag that indicates that a DICOM image is marked for archival.
The preferred method for this type of operation would be through the use of the DICOM Instance Availability service, which is defined in DICOM Supplement 93. The beginning of the supplement describes several use cases similar to what you're discussing.
As far as just performing a DICOM C-FIND, and determining the study status, there's no real method to find out what you're looking for. The Instance Availability tag only tells if the study is Online, Nearline, or Offline. To see if its complete, and then you could monitor the Number of Study Related Instances tag to see if the number of instances is increasing. If its been stable for a configured amount of time, you could assume the study is complete.
I'm afraid Steve is correct. There really is no way to tell that a study has been completed externally using only DICOM data. The real solution is to build your system to not expect a 'done' state. DICOM assumes a study is never static. This is probably because a study is not a concrete thing. A study is inferred via the actual instances. If you make the same assumption (that studies are not static), you should be fine. Good luck on your data model! :)
Also, you could use a performed procedure step to know that the study has moved forward in workflow. The problem with that, is that now you're talking HL7, and you're not talking to a PACS, you're talking to the RIS, that talks to the PACS. Maybe too indirect for a well engineered solution. However, this does not tell you that a study is complete, it only tells you that it has moved. The problem with number of related instances is that it tells you how many images were shot. It tells you nothing about how many images actually exist (read: whether the tech deleted bad images or not).
Oh, and use Storage Commit for verifying archival.