AWS DynamoDB to Kinesis without DynamoDB Stream - amazon-dynamodb

We have a requirement to stream data from DynamoDB tables to Kinesis stream for event monitoring. We started looking into DynamoDB stream but the issue is DynamoDB stream is not encrypted and we can't have any unencrypted data in our solution anywhere.
What is the other approach in serverless to stream data from DynamoDB to Kinesis? I don't want to stand up a server to use DynamoDB adapter.
Thanks

As of now(Sept. 2019), at rest encryption is supported in DynamoDB stream.
DynamoDB encryption at rest provides an additional layer of data protection by securing your data in the encrypted table, including its primary key, local and global secondary indexes, streams, global tables, backups, and DynamoDB Accelerator (DAX) clusters whenever the data is stored in durable media.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EncryptionAtRest.html

If you wanted to wanted to use DynamoDB streams this is how you could do it:
Enable DynamoDB steams, but set it to "Keys only". This mode will give only the key attributes of the modified item. Then setup a Lambda to trigger off this DynamoDB stream, this will send batches of keys to your Lambda. You then code the lambda to lookup the key in your DynamoDB database and then push it into Kinesis.
It's not a perfect solution because the data may have been updated again before the Lambda get operation, but it's pretty good depending on the situation.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html
If you didn't want to use DynamoDB streams would you either have to have your client application also push the data to Kinesis, or if you can't configure the client application instead don't let anyone talk directly to DynamoDB and instead have them call a lambda synchronous where that Lambda will do the DynamoDB and Kinesis call for you.

Related

How to implements publish/subscribed with DynamoDB?

I need to implement a publish/subscribe with DynamoDB. Every of my cloud node should send events to all other nodes of my application that are connected to the same DynamoDB database. A typical use case is to clear caches if data in the database has changed.
Can be the DynamoDB streams a solution for it? It look for me that the stream can consumed only once and not from every other node. Is this right?
Is there some like tailed cursor support in DynamoDB?
Are there any other feature that I can use?
DynamoDB doesn't support Pub/Sub as a feature like you might see in Redis.
If you need to have cache functionality, you can check DynamoDB Accelerator (Dax). You can check specifically the consistency best practices documentation.
You can also use a dedicated Pub/Sub service such as Simple Notification Service (SNS).

How to enable Localstack Kinesis Data Streams to capture Dynamodb changes from InMemory DynamoDB

I am trying to process the DynamoDB changes using Kinesis Data Streams in my local. I am using the Localstack to spin up both DynamoDB and Kinesis in my local. Which is working as expected but sometimes, the Localstack DynamoDB slows down. This is a known issue as per this discussion in the Localstack forum:
https://github.com/localstack/localstack/issues/1205
So, I am planning to make use of DynamoDB local which is InMemory database. But when I am trying to enable the Kinesis Data Stream which is running on Localstack the InMemory database is not able to communicate with the Localstack Kinesis service and it throws the resource not found exception.
Can somebody help me on this?

How do I subscribe directly to my AWS AppSync data source?

I have a DynamoDB connected to step functions and I am building a UI to display changes. I connected the DB to an AppSync instance and have tried using subscriptions through AppSync, but it seems they only observe mutations within the current AppSync.
How can I subscribe to the data source changes directly?
You are correct. Currently, AppSync Subscriptions are only triggered from GraphQL Mutations. If there are changes made to the DynamoDB from a source other than AppSync, subscriptions will not trigger.
If you want to track all changes being made to DynamoDB table and publish them using AppSync, you can do the following:
1) Setup a DynamoDB stream to capture changes and feed the changes to AWS Lambda
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.Lambda.html
2) Setup an AppSync mutation with a Local (no datasource) resolver. You can use this to publish messages to subscribers without writing to a datasource.
https://docs.aws.amazon.com/appsync/latest/devguide/tutorial-local-resolvers.html
3) Make the DynamoDB Stream Lambda function (setup in step 1) call the AWS AppSync mutation (setup in step 2).
This will enable publishing ALL changes made to a DynamoDB table to AppSync subscribers, regardless of where the change came from.

Does the encryption in Kinesis Streams cause the outgoing records to be encrypted?

I am working with a Cloudformation stack and I have a pipeline Kinesis Stream ==> Kinesis Firehose ==> S3 Bucket. In the Kinesis Stream template, it lists a StreamEncryption field and says,
StreamEncryption
Enables or updates server-side encryption using an
AWS KMS key for a specified stream.
where I bolded the statement in question.
Also, from here,
When you send data from your data producers to your Kinesis stream,
the Kinesis Data Streams service encrypts your data using an AWS KMS
key before storing it at rest. When your Kinesis Data Firehose
delivery stream reads the data from your Kinesis stream, the Kinesis
Data Streams service first decrypts the data and then sends it to
Kinesis Data Firehose. Kinesis Data Firehose buffers the data in
memory based on the buffering hints that you specify and then delivers
it to your destinations without storing the unencrypted data at rest.
I want the records that ultimately end up in S3 to be encrypted using my KMS key such that a person with full access to my S3, but not KMS, would not be able to read the data.
Will this do that? If not, how do I get the data to reside "doubly" encrypted inside S3?
A hack, that is not a good solution, but works for now is to have Firehose do a Lambda Transform on the data, but encrypt and write the data out yourself from the Lambda.
Non ideal, but functional.

Consume DynamoDB streams in Apache Flink

Has anyone tried to consume DynamoDB streams in Apache Flink ?
Flink has a Kinesis consumer. But I am looking for how can i consume the Dynamo stream directly.
DataStream<String> kinesis = env.addSource(new FlinkKinesisConsumer<>(
"kinesis_stream_name", new SimpleStringSchema(), consumerConfig));
I tried searching a lot, but did not find anything. However found an open request pending the Flink Jira board. So I guess this option is not available yet ? What alternatives do I have ?
Allow FlinkKinesisConsumer to adapt for AWS DynamoDB Streams
UPDATED ANSWER - 2019
FlinkKinesisConsumer connector can now process a DynamoDB stream after this JIRA ticket is implemented.
UPDATED ANSWER
It seems that Apache Flink does not use the DynamoDB stream connector adapter, so it can read data from Kinesis, but it can't read data from DynamoDB.
I think one option could be implement an app that would write data from DynamoDB streams to Kinesis and then read data from Kinesis in Apache Flink and process it.
Another option would be to implement custom DynamoDB connector for Apache Flink. You can use existing connector as a starting point.
Also you can take a look at the Apache Spark Kinesis connector. But it seems that it has the same issue as well.
ORIGINAL ANSWER
DynamoDB has a Kinesis adaptor that allow you to consume a stream of DynamoDB updates using Kinesis Client Library. Using Kinesis adaptor is a recommended way (according to AWS) of consuming updates from DynamoDB. This will give you same data as using DynamoDB stream directly (also called DynamoDB low-level API).

Resources