Chicken or egg in functions/actors processing WorldState - functional-programming

I have read the series "Purely Functional Retrogames"
http://prog21.dadgum.com/23.html
It discusses some interesting techniques to build a (semi-)pure game world update loop.
However, I have the following side remark, that I cannot seem to get my head around:
Suppose you have a system, where every enemy, and every player are separate actors, or separate pure functions.
Suppose they all get a "WorldState" as input, and output a New WorldState (or, if you're thinking in actor terms, send the new WorldState to the next actor, ending with for instance a "Game Render" actor).
Then, there's two ways to go with this:
Either you start with one actor, (f.i. the player), and feed him the "current world".
Then, you feed the new world, to the next enemy, and so on, until all actors have converted the worlds. Then, the last world is the new world you can feed to the render loop. (or, if you followed the above article, you end up with a list of events that have occurred in the world, which can be processed).
A second way, is to just give all actors the current WorldState at the same time. They generate any changes, which may conflict (for instance, two enemies and the player can take a coin in the same animation frame) -> it is up to the game system to solve these conflicts by processing the events. By processing all events, the Game actor creates the new world, to be used in the next update frame.
I have a feeling I'm just confronted with the exact same "race condition" problem I wished to avoid by using pure functions with immutable data.
Any advice here?

I didn't read the article, but with the coin example you are creating a kind of global variable: you give a copy of the world state to all actors and you suppose that each actor will evaluate the game, take decision and expect that their action will succeed, regardless of the last phase which is the conflict solving. I would not call this a race condition but rather a "blind condition", and yes, this will not work.
I imagine that you came to this solution in order to allow parallelism, not available in solution 1. In my opinion, the problem is about responsibility.
The coin must belong to an actor (a server acting as resource manager) as anything in the application. This actor is the only responsible to decide what will happen to the coin.
All requests (is there something to grab, grab it, drop something...) should be sent to this actor (one actor per cell, or per map, level or any split that make sense for the game).
The way you will manage it is then up to you: serve all requests in the receive order, buffer them until a synchro message comes and make a random decision or priority decision... In any case the server will be able to reply to all actors with success or failure without any risk of race condition since the server process is run (at least in erlang) on a single core and proceed one message at a time.

In addition to Pascal answer, you can solve parallelization by splitting (i assume huge map) to smaller chunks which depend on last state (or part of it, like an edge) of its neighbours. This allows you to distribute this game among many nodes.

Related

Functional approach to the 'Account Balance' problem?

I know very little about functional programming but can see its appeal, but I wonder, whether in practice you really want to "build the world from scratch" every time you need an account balance.
I understand that the way to model an account balance is to store each transaction (deposits and withdrawals) immutably, and then implement a balance function = sum(deposits)-sum(withdrawals). But I'm wondering if this is the way you would model it practice, say on a website that displayed a customer's bank balance.
Would it perhaps make sense to save a checkpoint (e.g. balance at every nth transaction), and then when somebody wants to get a balance, you model the function is balance = checkpointBalance[checkpointBalance.length-1] + sum(deposits).where(index>checkpointIndex) - sum(withdrawals),where(index>checkpointIndex). (I guess the idea would be set n to a number which appropriately balances storage of checkpoints cost against cost/speed of summing n transactions?) Or something along those lines? Or does that violate the spirit of functional programming?
There's more than one way to do that. In a traditional bank account, the aggregated value (the balance) is recalculated for every transation.
In other systems, the history may be sufficiently shallow that you can easily recalculate it from scratch whenever you need it.
If you have an aggregate with a particularly long and convoluted history, you can do as you suggest. In Event Sourcing literature, this is typically called a Snapshot.
Find the most recent Snapshop and only replay the events that occurred after the Snapshot.
The beautiful thing about an immutable append-only history is that you can always add the Snapshots as an afterthought. You don't have to design that optimization in from the start. You can even have a background worker that asynchronously creates Snapshots if required.
Now, granted, saving a Snapshot somewhere is a side effect, but so is saving each of the events as they occur. Even in FP, you can't avoid some side effects. The trick is to minimise them.
You can consider Snapshots as essentially persisted memoisation. And again, you might argue that memoisation is impure (because it involves the ostensible side effect of mutating some internal map of inputs to values), but even if impure, a memoised function would still be referentially transparent because it would always return the same result for the same input with no observable side effects.

How do I guarantee task order processing for a queue with multiple consumers in RabbitMQ?

Say I want to start friendship between A and B.
Say I want to end friendship between A and B.
Those are two tasks I want to send to a queue having multiple consumers (workers).
I want to guarantee processing order so, how to avoid the second task to be performed before the first?
My solution: make tasks sticky (tasks about A are always sent to the same consumer).
Implementation: use RabbitMQ's exchanges and map tasks to the available consumers.
How do I map A to its consumer? I'm thinking about nginx's ip_hash. I think I need something similar.
I don't know if it is relevant but A and B are uuid.v4() UUIDs.
Can you point me out to the algorithm I need to accomplish mapping, please?
Well, there are two options:
make one exchange / queue for all events and guarantee that they're gonna be inserted in proper order. Create one worker for them. This costs more on inserting data (and doesn't give you option of scalability).
prepare your app for such situation, e.g. when you get message destroyFriendship and friendship does not exist - save message to db containing future friendship ending. Then you can have multiple workers making and destroying friendship and do not have to care about proper order. Simply do your job, make friends and if there's row in db about ending of friendship - destroy it (or simply do not create). Of course you need to check timestamp of creation/destroying time and check if destroying time was after creation time!
Of course you can count somehow hash of A/B, but it would be IMO more costfull then preparing app. Scalling app using excahnges/queues is not really good - you're going to create more and more queues and it's going to end up in too many queues/exchanges in rabbitmq.
If you have to use solution you specified - you can for example count crc32 from A and B, and using it's value calcalate to which queue task should be send. But having multiple consumers might result wrong here - what if one of consumers is blocked somehow and other receive message with destroying friendship? Using this solution I'd say that it's dangerous to have more than 1 worker per group of A/B.

Blackheath's "Functional reactive programming" book, 2.6.3 section clarification

Section speaks about a merge operation in FRP streams processing (Sodium library is used). Book shows a below diagram of streams combination, and says that when event enters FRP logic through a stream, it causes a cascade of state changes that happen in a transactional context, so all changes are atomic.
Streams of events - sDeselect, sSelect (see 2 events: "+" and "-") are originating from UI controls, since they happen within the same FRP transaction their carried events are considered simultaneous. Then book says
The merge implementation has to store the events in temporary storage
until a time when it knows it won’t receive any more input. Then it
outputs an event: if it received more than one, it uses the supplied
function to combine them; otherwise, it outputs the one event it
received.
Question: When it is a time when "no more input will come"? How merge function knows this moment? Is it simply the time it gets a value from the 2nd incoming stream on a given diagram or i'm missing smth? Can you illustrate it with a better streams example?
The way Sodium does this is to assign rank numbers to the structure of the directed graph of FRP logic held in memory, in such a way that if B depends on A, then B's rank will be higher than A's. (Cycles are broken in the graph traversal that assigns these ranks.) These numbers are then used as the priorities in a priority queue with low rank values processed first.
During event processing, when the priority queue contains nothing lower than the rank of the merge, then it is known that there can be no more input data for the merge, and it will output a value.

How the chances of getting "read-your-writes" consistency are increased in Dynamo?

In Section 5 of Dynamo paper, there is the following content:
In particular, since each write usually follows a read operation, the
coordinator for a write is chosen to be the node that replied fastest to the
previous read operation which is stored in the context information of the
request. This optimization enables us to pick the node that has the data that
was read by the preceding read operation thereby increasing the chances of
getting "read-your-writes" consistency.
How the chances of getting "read-your-writes" consistency are increased?
"read-your-writes" means that a read following a write gets the value set by the
write. The read and the write are performed by two different clients for this
context. The reason is that the choice of the write coordinator does not impact
on the chances of getting "read-your-writes" by the same client.
But the above text is talking about a write following a read. Here is my guess.
The read coordinator will try to do syntactic reconciliation if it is possible.
If syntactic reconciliation is impossible because of divergent versions, the
client need to do semantic reconciliation before doing a write. Either way, the
versions on all the nodes involved in the read operation is an ancestor of the
reconciled version. So the following write can be sent to any of them to get
applied. The earliest time for a write to be seen by a read is after the
following steps are finished:
Client contact the write coordinator.
The write coordinator generates the version clock for the new version.
The write coordinator writes the new version locally.
The shorter the time to perform the above steps, the more likely another
following read sees the new version. Since it is very possible that the node
which replied fastest to the previous read can perform the following steps in a
shorter time. Such a node is chosen as the write coordinator.
Section 2.3 talks about performing the reconciliation at read time rather than write time.
Data versioning - "One can determine whether two versions of an
object are on parallel branches or have a causal ordering, by
examine their vector clocks."
This paragraph from section 4. [emphasis mine]
In Dynamo, when a client wishes to update an object, it must specify
which version it is updating. This is done by passing the context it
obtained from an earlier read operation, which contains the vector
clock information. Upon processing a read request, if Dynamo has
access to multiple branches that cannot be syntactically reconciled,
it will return all the objects at the leaves, with the corresponding
version information in the context. An update using this context is
considered to have reconciled the divergent versions and the
branches are collapsed into a single new version
So by performing the read first, you're effectively reconciling all divergent versions prior to writing. By writing to that same node, the version you've updated is marked with the context and vector clock of the most up to date version and all divergent branches can be collapsed. This is sent to the top N nodes (as you've stated) as fast as possible. But by removing the divergent branches - you reduce the chance that multiple values could be returned. You only need one of the N nodes read in the next read to get the reconciled write. ie - the node as part of the quorum of R reads says - "I am the reconciled version, and all others must bow to me". (and if that has already been distributed to another of the "R" nodes, then there's even greater chance of getting the reconciled version in the quorum)
But, if you wrote to a different node, one that you hadn't read from - the vector clock that is being updated may not necessarily be a reconciled version of the object. Therefore, you could still have divergent branches. The following read will try and reconcile it, but it's more more probable that you could have multiple divergent data and no reconciliation.
If you've made it this far, I think the most interesting part is that per Section 6, client applications can dictate the values of N, R and W - ie - number of nodes that constitute the pool to draw from, and the number of nodes that must agree on a read or write for it to be successful.
Geez - my head hurts now.
I re-read the Dynamo paper. I have a new understanding of "read-your-write" consistency. "read-your-writes" involves only one client. Image the following requests performed by one client on the same key:
read-1
write-1
read-2
"read-your-writes" means that read-2 sees write-1. The write coordinator has the best chance to have write-1. To ensure "read-your-writes", it is desired that the write coordinator replies fastest to read-2. It is highly possible that the node replies fastest to read-1 also reply fastest to read-2. So choose the node replies fastest to read-1 as the write coordinator.
And what is the node that replied fastest to the previous read operation? Such a node only makes sense if client-driven coordination is used. For server-side coordination, the coordinator nodes replies to the client and the other involved nodes reply to the coordinator node. replied fastest is meaningless in this case.

Starting mutliple orchestrations from parent orchestration and passing messages to them

I have a situation where a main orchestration is responsible for processing a convoy of messages. These messages belong to a set of customers, the orchestration will read the messages as they come in, and for each new customer id it finds, it will spin up a new orchestration that is responsible for processing the messages of a particular customer. I have to preserve the order of messages as they come in, so the newly created orchestrations should process the message it has and wait for additional messages from the main orchestration.
Tried different ways to tackle this, but was not able to successfuly implement it.
I would like to hear your opinions on how this could be done.
Thanks.
It sounds like what you want is a set of nested convoys. While it might be possible to get that working, it's going to... well, hurt. In particular, my first worry would be maintenance: any changes to the process would be a pain in the neck to make, and, much worse, deployment would really, really suck.
Personally, I would really try to find an alternative way to implement this and avoid the convoys if possible, but that would depend a lot on your specific scenario.
A few questions, if you don't mind:
What are your ordering requirements? For example, do you only need ordered processing for each customer on a single incoming batch, or across batches? If the latter, could you make do without the master orchestration and just force a single convoy'd instance per customer? Still not great, but would likely simplify things a lot.
What are you failure requirements with respect to ordering? Should it completely stop processing? Save message and keep going? What about retries?
Is ordering based purely on the arrival time of the message? Is there anything in the message that you could use to force ordering internally instead of relying purely on the arrival time?
What does the processing of the individual messages do? Is the ordering requirement only to ensure that certain preconditions are met when a specific message is processed (for example, messages represent some tree structure that requires parents are processed before children).
I don't think you need a master orchestration to start up the sub-orchestrations. I am assumin you are not talking about the master orchestration implmenting a convoy pattern. So, if that's the case, here's what I might do.
There is a brief example here on how to implment a singleton orchestration. This example shows you how to setup an orchestration that will only ever exist once. All the messages going to it will be lined up in order of receipt and processed one at a time. Your example differs in that you want to have this done by customer ID. This is pretty simple. Promote the customer ID in the inbound message and add it to the correlation type. Now, there will only ever be one instance of the orchestration per customer.
The problem with singletons is this. You have to kill them at some point or they will live forever as dehydrated orchestrations. So, you need to have them end. You can do this if there is a way for the last message for a given customer to signal the orchestration that it's time to die through an attribute or such. If this is not possible, then you need to set a timer. If no messags are received in x seconds, terminate the orch. This is all easy to do, but it can introduce Zombies. Zombies occur when that orchestration is in the process of being shut down when another message for that customer comes in. this can usually be solved by tweeking the time to wait. Regardless, it will cause the occasional Zombie.
A note fromt he field. We've done this and it's really not a great long term solution. We were receiving customer info updates and we had to ensure ordered processing. We did this singleton approach and it's been problematic from the Zombie issue and the exeption issue. If the Singleton orchestration throws an exception, it will block the processing for a all future messages for that customer. So - handle every single possible exception. The real solution would have been to have the far end system check the time stamps from the update messages and discard ones that were older than the last update. We wanted to go this way, but the receiving system didn't want to do this extra work.

Resources