How to handle DynamoDB Global streams - amazon-dynamodb

Looking to create a DynamoDB global table for storing customer information. The problem I have is my current pattern is to listen to changes on this table and send email updates using Lambda triggers.
i.e. Your profile information was changed. If this was not you..
Do I now need to have that Lambda in each region and will data replication mean that it is triggered for each region?

I think you might have misunderstood with streams.
Global Tables needs streams enabled on the table to replicate between regions. You can check the requirements and how it works.
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/globaltables_HowItWorks.html
If you have trigger, you can have only in one region. Whichever region is having the lambda associated to the trigger will get notified with the updates.
The benefit from the global table you get is, if any regions updates the data, the lambda in the region you have configured will get triggered. Only one trigger will be sent to the lambda.
Enabling streams is one of the requirements for DynamoDB Global Tables.
If you create trigger in multiple regions, you need to implement your Lambda with idempotency i.e., if the same data is delivered any number of times, it will perform the operation only once.
Hope it helps.

Related

Firestore : Maintaining the count of a collection. Trigger function vs transaction

Let's say I have a collection called persons and another collection called cities with a field population. When a Person is created in a City, I would like to increment the population field in the corresponding city.
I have two options.
Create a onCreate trigger function. Find the city document and increment using FieldValue.increment(1).
Create an HTTPS callable cloud function to create the person. The cloud function executes a transaction in which the person is created and the population is incremented.
The first one is simpler and I am using it right now. But, I am wondering if there could be cases where the onCreate is not called due to some glitch...
I am thinking of moving to the second option. I am wondering if there are any disadvantages. Does HTTPS callable function cost more?
The only problem I see with the HTTPS callables would be that if something fails you would need to handle that on your client side. That would be (at least for me) a little bit to much logic for the client side.
What I can recommend you after almost 4 years experience with exactly that problem is a solution with a virtual queue. I had a long dicussion on that theme here and even with the Firebase ppl on the last in person Google IO and Firebase Summit.
Our problem was that there where those glitches and even if they happend sometimes the changes and transaction failed due to too much requests. After trying every offical recommendation like the shard counters etc. we ended up creating a virtual queue where each onCreate adds an entry to just a Firestore or RTD list/collection and another function that runs eaither by crone or another trigger (that doesn't matter). That cloud function handles each entry in the queue one by one and starts again for each of them to awoid timouts and memeroy limits. We made sure one handler/calculation is enought for a single function to handle it.
This method was the only bullet proof one that could handle thousands of new entries in a second without having an issue. The only downside is that it takes more time than an usual trigger because each entries is calculated one by one. If your calculations are smaller you could do them in batches (that is how we started to).

How to use DynamoDB streams to maintain duplicated data consistency?

From what I understand one of the uses cases of DynamoDB Streams is to maintain/update duplicated data.
Let's say I have a User object, and its name attribute is replicated in many Invoice objects.
When a User edits/updates its name, I will have a lambda using DynamoDb Streams to then update all Invoices related to this user with his new name.
There could be thousands of Invoices related to this user so this updating could take a while, specially because I will want to do a rate limited batch_write so that this operation doesn't throttle my table.
The question is : How can my (web)application know that the lambda has finished updating? For example, I want to show a loading screen to the client using the application untill the duplicated data updating is done, so that he doesn't see any outdated information on his browser.
Or is there other ways of rapidly dealing with updating thousands of duplicated data?
Why aren't you capturing the output of Lambda. You can make Lambda return successful status, once all the updates are persisting to DDB.
Invoice can keep a reference to User object instead of storing the exact name and can fetch name at the time of generating/printing

Whats the best way to generate ledger change Events that include the Transaction Command?

The goal is to generate events on every participating node when a state is changed that includes the business action that caused the change. In our case, Business Action maps to the Transaction command and provides the business intent or what the user is doing in business terms. So in our case, where we are modelling the lifecycle of a loan, an action might be to "Close" the loan.
We model Event at a state level as follows: Each Event encapsulates a Transaction Command and is uniquely identified by a (TxnHash, OutputIndex) and a created/consumed status.
We would prefer a polling mechanism to generate events on demand, but an asynch approach to generate events on ledger changes would be acceptable. Either way our challenge is in getting the Command from the Transaction.
We considered querying the States using the Vault Query API vaultQueryBy() for the polling solution (or vaultTrackBy() for the asynch Obvservalble Stream solution). We were able to create a flow that gets the txn for a state. This had to be done in a flow, as Corda deprecated the function that would have allowed us to do this in our Springboot client. In the client we use vaultQueryBy() to get a list of States. Then we call a flow that iterates over the states, gets txHash from each StateRef and then calls serviceHub.validatedTransactions.getTransaction(txHash) to get signedTransaction from which we can ultimately retrieve the Command. Is this the best or recommended approach?
Alternatively, we have also thought of generating events of the Transaction by querying for transactions and then building the Event for each input and output state in the transaction. If we go this route what's the best way to query transactions from the vault? Is there an Observable Stream-based option?
I assume this mapping of states to command is a common requirement for observers of the ledger because it is standard to drive contract logic off the transaction command and quite natural to have the command map to the user intent.
What is the best way to generate events that encapsulate the transaction command for each state created or consumed on the ledger?
If I understand correctly you're attempting to get a notified when certain types of ledger updates occur (open, approved, closed, etc).
First: Asynchronous notifications are best practice in Corda, polling should be avoided due to the added weight it puts on the node for constant querying and delays. Corda provides several mechanisms for Observables which you can use: https://docs.corda.net/api/kotlin/corda/net.corda.core.messaging/-corda-r-p-c-ops/vault-track-by.html
Second: Avoid querying transactions from the database as these are intended to be internal to the node. See this answer for background on why to avoid transaction querying. In general only tables that begin with "VAULT_*" are intended to be queried.
One way to solve your use case would be a "status" field which reflects the command that was used to produce the current state. For example: if a "Close" command was used to produce the state it's status field could be "closed". This way you could use the above vaultTrackBy to look at each state's status field and infer the action that occured.
Just to finish up on my comment: While the approach met the requirements, The problem with this solution is that we have to add and maintain our own code across all relevant states to capture transaction-level information that is already tracked by the platform. I would think a better solution would be for the platform to provide consumers access to transaction-level information (selectively perhaps) just as it does for states. After all, the transaction is, in part, a business/functional construct that is meaningful at the client application level. For example, If I am "transferring" a loan, that may be a complex business transaction that involves many input and output states and may be an important construct/notion for the client application to manage.

Fetch new entities only

I thought Datastore's key was ordered by insertion date, but apparently I was wrong. I need to periodically look for new entities in the Datastore, fetch them and process them.
Until now, I would simply store the last fetched key and wrongly query for anything greater than it.
Is there a way of doing so?
Thanks in advance.
Datastore automatically generated keys are generated with uniform distribution, in order to make search more performant. You will not be able to understand which entity where added last using keys.
Instead, you can try couple of different approaches.
Use Pub/Sub and architecture your app so another background task will consume this last added entities. On entities add in DB, you will just publish new Event into Pub/Sub with key id. You event listener (separate routine) will receive it.
Use names and generate you custom names. But, as you want to create sequentially growing names, this will case performance hit on even not big ranges of data. You can find more about this in Best Practices of Google Datastore.
https://cloud.google.com/datastore/docs/best-practices#keys
You can add additional creation time column, and still use automatic keys generation.

Use transaction to update value at two different nodes

I have two different nodes in database.
all posts
users
As per the fan-out model when a user adds a post , it gets updated at both all posts and users/uid/posts.
Each post consists of a like button which displays the number of likes.
When a user clicks on it the like should increase by +1.
According to the docs, we use transactionfor this kind of process.
But the problem with using transaction is that it updates only one node as far as i know
But my problem is how shall i update this transaction in both the nodes as mentioned above
Shall i use update method
What is the way to use transaction that gets updated at both the nodes
You can push all your logic for updating the database onto the server side with Cloud Functions for Firebase. Use can use a database trigger to respond to data being written in the database, then execute some JavaScript to make sure the fan-out finishes correctly. It will have the advantage of making sure all the changes happen without depending on the client.
Transactions can't modify data at two different locations at once, but you will still probably want to use them in your client and Cloud Functions to make sure concurrent writes will not have problems.

Resources