How to implements publish/subscribed with DynamoDB? - amazon-dynamodb

I need to implement a publish/subscribe with DynamoDB. Every of my cloud node should send events to all other nodes of my application that are connected to the same DynamoDB database. A typical use case is to clear caches if data in the database has changed.
Can be the DynamoDB streams a solution for it? It look for me that the stream can consumed only once and not from every other node. Is this right?
Is there some like tailed cursor support in DynamoDB?
Are there any other feature that I can use?

DynamoDB doesn't support Pub/Sub as a feature like you might see in Redis.
If you need to have cache functionality, you can check DynamoDB Accelerator (Dax). You can check specifically the consistency best practices documentation.
You can also use a dedicated Pub/Sub service such as Simple Notification Service (SNS).

Related

Is transaction really required in a distributed counter?

According to firestore documentation:
a transaction is a set of read and write operations on one or more documents.
Also:
Transactions will fail when the client is offline.
Now the limitation in firestore is that:
In Cloud Firestore, you can only update a single document about once per second, which might be too low for some high-traffic applications.
So using cloud functions and running transactions to increment/decrement counters when the traffic is high will fail.
So they have discussed to use the approach of distributed counters.
According to the algorithm of distrbuted counter:
create shards
choose a shard randomly
run a transaction to increment/decrement the counter
get all the shards and aggregate the
result to show the value of a counter
Scenerio:
consider you have a counter which is to be updated when a document is added and that counter is being displayed in the UI. Now for good UX, I cannot block the UI when network is offline. So I must allow creation/updation of documents even when client is offline and sync these changes once client is online so that everyone else listening to these changes receive the correct value of the counter.
Now transactions fail when the client is offline.
So my question for best user experience (even when offline) is:
Do you really require a transaction to increment a counter? I know
transactions ensure that writes are atomic and are either
successful/unsuccessful and prevent partial writes. But what's the
point when they fail offline? I was thinking maybe write them to local cache and sync it once the network is back online.
Should this be done via client sdks of via cloud functions?
Do you really require a transaction to increment a counter?
Definitely yes! Because we are creating apps that can be used in a multi user environment, transactions are mandatory, so we can provide consistent data.
But what's the point when they fail offline?
When there is a loss of network connectivity (there is no network connection on user device), transactions are not supported for offline use. This is because a transaction absolutely requires round trip communications with server in order to ensure that the code inside the transaction completes successfully. So, transactions can only execute when you are online.
Should this be done via client sdks of via cloud functions?
Please note, that the Firestore SDK for Android has a local cache that's enabled by default. According to the official documentation regarding Firestore offline persistence:
For Android and iOS, offline persistence is enabled by default. To disable persistence, set the PersistenceEnabled option to false.
So all read operations will come from cache if there are no updates on the server. So Firestore provides this feature for handle offline data.
You can also write a function in Cloud Function that will increment the counter while a new document is added or to decrement the conter while a document is deleted.
I also recommend you to take a look:
How to count the number of documents under a collection in Firestore?
So you may also consider using Firebase realtime database for that. Cloud Firestore and Firebase realtime database work very well together.
Edit:
It allows one to upvote the answer even when the device is offline. After the network is online, it syncs to the server and the counter is updated. Is there a way i can do this in firestore when the device is offline.
This is also happening by default. So if the user tries to add/delete documents while offline, every operation is added to a queue. Once the user regains the connection, every change that is made while offline, will be updated on Firebase servers. With other words, all queries will be commited on the server.
Cloud fnctions are triggered only when the change is received and that can only happen when the device is online.
Yes, that correct. Once the device regains the network connection, the document is added/deleted from the database, moment in which the function fires and increases/decreases the counter.
Edit2:
Suppose I have made around 100 operations offline, will that not put a load on the cloud functions when the device comes online? What's your thought on this?
When offline, pending writes that have not yet been synced to the server are held in a queue. If you do too many write operations without going online to sync them, that queue will grow fast and it will not slow down only the write operations it will also slow down your read operations. So I suggest use this database for its online capabilities.
Regarding Cloud Functions for those 100 offline operations, there will be no issues. Firebase servers work very well with concurent operations.

AWS DynamoDB to Kinesis without DynamoDB Stream

We have a requirement to stream data from DynamoDB tables to Kinesis stream for event monitoring. We started looking into DynamoDB stream but the issue is DynamoDB stream is not encrypted and we can't have any unencrypted data in our solution anywhere.
What is the other approach in serverless to stream data from DynamoDB to Kinesis? I don't want to stand up a server to use DynamoDB adapter.
Thanks
As of now(Sept. 2019), at rest encryption is supported in DynamoDB stream.
DynamoDB encryption at rest provides an additional layer of data protection by securing your data in the encrypted table, including its primary key, local and global secondary indexes, streams, global tables, backups, and DynamoDB Accelerator (DAX) clusters whenever the data is stored in durable media.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EncryptionAtRest.html
If you wanted to wanted to use DynamoDB streams this is how you could do it:
Enable DynamoDB steams, but set it to "Keys only". This mode will give only the key attributes of the modified item. Then setup a Lambda to trigger off this DynamoDB stream, this will send batches of keys to your Lambda. You then code the lambda to lookup the key in your DynamoDB database and then push it into Kinesis.
It's not a perfect solution because the data may have been updated again before the Lambda get operation, but it's pretty good depending on the situation.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html
If you didn't want to use DynamoDB streams would you either have to have your client application also push the data to Kinesis, or if you can't configure the client application instead don't let anyone talk directly to DynamoDB and instead have them call a lambda synchronous where that Lambda will do the DynamoDB and Kinesis call for you.

How to consume changes to Google Cloud Datastore as a stream?

The Cloud Dataflow page implicates that this would be possible, but I haven't found a way of observing change events in the Google Cloud Datastore docs. How is it done?
As far as I am aware, the integration of Cloud Datastore with Dataflow is through DatastoreIO (now based on DatastoreV1), which can only be used as a bounded source for batch jobs.
I have been trying to find an alternative solution that would allow you to use Datastore (directly or indirectly) as an unbounded source (for instance creating a Pub/Sub topic where Datastore changes are published and can be consumed from Dataflow), but I do not think that would be a viable solution given that, as you said, there is no easy way to detect changes (addition of entities, modification of entities, etc.) in Datastore.
For now, I have filed an internal request to improve the documentation to either modify the image so that it does not imply that Cloud Datastore can be used with a Streaming Pipeline, or clarify this use case.

Does the firestore sdk use a connected socket to make its requests or individual http requests?

I'm using the react native firebase sdk and am wondering about how the underlying network calls are implemented. When making firestore get queries, is the sdk just keeping a socket open when it's initialized and make requests over the open socket, or does it make individual http requests to an endpoint?
Specifically I'm looking for an efficient way to get a batch of documents (profile thumbnail properties given a batch of profile ids), and I saw an answer that said that firebase calls are pipelined so calling the gets in parallel is efficient. However, I'm not sure if that applies to firestore as well.
The Firestore SDK uses gRPC to communicate with the server. This is the same layer that many of Google's other Cloud products use under the hood. It is quite different from the Web Sockets communication layer that the Firebase Realtime Database relied on.
Digging into the code it seems like the "real time" part of firebase uses websockets. The database module also has a dependency on
faye-websocket

Consume DynamoDB streams in Apache Flink

Has anyone tried to consume DynamoDB streams in Apache Flink ?
Flink has a Kinesis consumer. But I am looking for how can i consume the Dynamo stream directly.
DataStream<String> kinesis = env.addSource(new FlinkKinesisConsumer<>(
"kinesis_stream_name", new SimpleStringSchema(), consumerConfig));
I tried searching a lot, but did not find anything. However found an open request pending the Flink Jira board. So I guess this option is not available yet ? What alternatives do I have ?
Allow FlinkKinesisConsumer to adapt for AWS DynamoDB Streams
UPDATED ANSWER - 2019
FlinkKinesisConsumer connector can now process a DynamoDB stream after this JIRA ticket is implemented.
UPDATED ANSWER
It seems that Apache Flink does not use the DynamoDB stream connector adapter, so it can read data from Kinesis, but it can't read data from DynamoDB.
I think one option could be implement an app that would write data from DynamoDB streams to Kinesis and then read data from Kinesis in Apache Flink and process it.
Another option would be to implement custom DynamoDB connector for Apache Flink. You can use existing connector as a starting point.
Also you can take a look at the Apache Spark Kinesis connector. But it seems that it has the same issue as well.
ORIGINAL ANSWER
DynamoDB has a Kinesis adaptor that allow you to consume a stream of DynamoDB updates using Kinesis Client Library. Using Kinesis adaptor is a recommended way (according to AWS) of consuming updates from DynamoDB. This will give you same data as using DynamoDB stream directly (also called DynamoDB low-level API).

Resources