Has anyone tried to consume DynamoDB streams in Apache Flink ?
Flink has a Kinesis consumer. But I am looking for how can i consume the Dynamo stream directly.
DataStream<String> kinesis = env.addSource(new FlinkKinesisConsumer<>(
"kinesis_stream_name", new SimpleStringSchema(), consumerConfig));
I tried searching a lot, but did not find anything. However found an open request pending the Flink Jira board. So I guess this option is not available yet ? What alternatives do I have ?
Allow FlinkKinesisConsumer to adapt for AWS DynamoDB Streams
UPDATED ANSWER - 2019
FlinkKinesisConsumer connector can now process a DynamoDB stream after this JIRA ticket is implemented.
UPDATED ANSWER
It seems that Apache Flink does not use the DynamoDB stream connector adapter, so it can read data from Kinesis, but it can't read data from DynamoDB.
I think one option could be implement an app that would write data from DynamoDB streams to Kinesis and then read data from Kinesis in Apache Flink and process it.
Another option would be to implement custom DynamoDB connector for Apache Flink. You can use existing connector as a starting point.
Also you can take a look at the Apache Spark Kinesis connector. But it seems that it has the same issue as well.
ORIGINAL ANSWER
DynamoDB has a Kinesis adaptor that allow you to consume a stream of DynamoDB updates using Kinesis Client Library. Using Kinesis adaptor is a recommended way (according to AWS) of consuming updates from DynamoDB. This will give you same data as using DynamoDB stream directly (also called DynamoDB low-level API).
Related
I need to implement a publish/subscribe with DynamoDB. Every of my cloud node should send events to all other nodes of my application that are connected to the same DynamoDB database. A typical use case is to clear caches if data in the database has changed.
Can be the DynamoDB streams a solution for it? It look for me that the stream can consumed only once and not from every other node. Is this right?
Is there some like tailed cursor support in DynamoDB?
Are there any other feature that I can use?
DynamoDB doesn't support Pub/Sub as a feature like you might see in Redis.
If you need to have cache functionality, you can check DynamoDB Accelerator (Dax). You can check specifically the consistency best practices documentation.
You can also use a dedicated Pub/Sub service such as Simple Notification Service (SNS).
I am trying to process the DynamoDB changes using Kinesis Data Streams in my local. I am using the Localstack to spin up both DynamoDB and Kinesis in my local. Which is working as expected but sometimes, the Localstack DynamoDB slows down. This is a known issue as per this discussion in the Localstack forum:
https://github.com/localstack/localstack/issues/1205
So, I am planning to make use of DynamoDB local which is InMemory database. But when I am trying to enable the Kinesis Data Stream which is running on Localstack the InMemory database is not able to communicate with the Localstack Kinesis service and it throws the resource not found exception.
Can somebody help me on this?
We have a requirement to stream data from DynamoDB tables to Kinesis stream for event monitoring. We started looking into DynamoDB stream but the issue is DynamoDB stream is not encrypted and we can't have any unencrypted data in our solution anywhere.
What is the other approach in serverless to stream data from DynamoDB to Kinesis? I don't want to stand up a server to use DynamoDB adapter.
Thanks
As of now(Sept. 2019), at rest encryption is supported in DynamoDB stream.
DynamoDB encryption at rest provides an additional layer of data protection by securing your data in the encrypted table, including its primary key, local and global secondary indexes, streams, global tables, backups, and DynamoDB Accelerator (DAX) clusters whenever the data is stored in durable media.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EncryptionAtRest.html
If you wanted to wanted to use DynamoDB streams this is how you could do it:
Enable DynamoDB steams, but set it to "Keys only". This mode will give only the key attributes of the modified item. Then setup a Lambda to trigger off this DynamoDB stream, this will send batches of keys to your Lambda. You then code the lambda to lookup the key in your DynamoDB database and then push it into Kinesis.
It's not a perfect solution because the data may have been updated again before the Lambda get operation, but it's pretty good depending on the situation.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html
If you didn't want to use DynamoDB streams would you either have to have your client application also push the data to Kinesis, or if you can't configure the client application instead don't let anyone talk directly to DynamoDB and instead have them call a lambda synchronous where that Lambda will do the DynamoDB and Kinesis call for you.
Is it possible to use the Datastore as an input on a streaming basis? I.e. any time an entity is saved to the datastore it streams that to a dataflow project?
Currently we do not stream out of Datastore automatically, but I've made a note of your interest in it. One approach you can consider is to monitor any relevant source from a App Engine and publish its contents to PubSub.
I have a simple use case.
I am using S3, DynamoDB now I want to send a message to a device using SNS.
This message is triggered by DynamoDB update.
How do I do that?
Are there any 'examples' available for this kind of problem?
We use a DynamoDB stream to trigger a Lambda function in JavaScript which then uses SNS to send a message.
There is no direct notification(sns) that can be generated by dynamodb update.
Refer to the Dynamodb streams. It is still in preview mode but will soon be available for all.
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html