Reprocessing all records - DynamoDB Stream - amazon-dynamodb

I am using DynamoDB stream with a Aws lambda function + firehose to sync my data with redshift. I would like to know if it's possible to add all DynamoDB records to stream to reprocessing purposes. If not, what's the right approach?

For new data,you can do this.
For history data,you'd better don't do this. You can dump you table first, then import.

For reprocessing old data a parallelized full table scan is the way to go. There is the matter of deciding how to handle the transition from "old data" to "new data" but that could be achieved using either a timestamp attribute if one is available or by stopping writes to the table if that is possible.

Related

Query DynamoDB table attributes using an AND and WHERE like statement

I have a flat table with around 30 attributes in DynamoDB. I would like to expose an API for my end users/applications to query on a random combination of those attributes.
This is trivial to do in a typical RDBMS.
How can we do this in DynamoDB? What kind of modelling techniques and/or Key condition expressions can we use to achieve this.
Multi-faceted search like you describe can be challenging in DynamoDB. It can certainly be done, but you may be fighting the tool depending on your specific access patterns. Search in DynamoDB is supported through query (fast and cheap) and scan (slower and expensive) operations. You may want to take some time to read the docs to understand how each works, and why it's critical to structure your data to support your access patterns.
One options is to use ElasticSearch. DynamoDB Streams can be used to keep the ElasticSearch index updated when an operation happens in DynamoDb. There are even AWS docs on this particular setup.

Process every item in a Cosmos DB using Azure Data Factory

I am hoping this is an appropriate usecase for Azure Data Factory.
I have a Cosmos DB that has ~200k records, and I would like to iterate over the entire database, passing each record into a Logic App. Is there an easy way to foreach over every record? I thought that Azure Data Factory would have this capability, but the "Lookup + Foreach" combo doesn't like the number of records I have. My attempts at creating a while loop with the "Lookup + Foreach" pipeline also feels slightly clunky.
I don't feel that 200k records is a large dataset. Am I missing something? Is there a better way?
I believe the ideal way is for you to use Change feed mechanism. This would be a perfect use-case for it.
https://learn.microsoft.com/en-us/azure/cosmos-db/change-feed-functions

How to read and decrypt dynamodb stream update event?

I store our data in a dynamodb table and on every update, a listener lambda (in Java) receives an update from the dynamodb stream. I was parsing the dynamodb update event using JacksonConverter
However, I would like to encode the dynamodb content in the tables. So, I can't use the JacksonConverter directly.
I would like to know if anyone has done the decryption of data from the dynamodb stream and did you use any libraries?
I use DynamodbMapper's AttributeTransformer to encode the stuff. Can I use the same for decrypting the output from this stream too?
One possible approach seems to call the DDB using the un-encrypted attributes, if the use-case allows so.

Purging technique for Dynamodb

I am a newbie in Amazon Dynamodb world with strong background from relation database world :-p
I am writing a service using AWS lambda functionality that migrates the data from dynamodb to RedShift for analytics purpose. My aim is to keep only active data of say 1 month in dynamodb and then purge it periodically.
I researched a lot but could not find a precise purging technique for Amazon dynamodb that will avoid full table scan.
Also, I want to perform delete based on the Range key attribute which is a timestamp attribute.
Can somebody help me out here?
Thanks
From my experience the easiest and most cost-effective way to handle this job is to create a new table each month, and remove complete old tables when time passes and you are done crunching them.
If you can make your use case use a TABLE-MMYYYY it would help you a lot.

Is it okay to filter using code instead of the NoSQL database?

We are using DynamoDB and have some complex queries that would be very easily handled using code instead of trying to write a complicated DynamoDB scan operation. Is it better to write a scan operation or just pull the minimal amount of data using a query operation (query on the hash key or a secondary index) and perform further filtering and reduction in the calling code itself? Is this considered bad practice or something that is okay to do in NoSQL?
Unfortunately, it depends.
If you have an even modestly large table a table scan is not practical.
If you have complicated query needs the best way to tackle that using DynamoDB is using Global Secondary Indexes (GSIs) to act as projections on the fields that you want. You can use techniques such as sparse indexes (creating a GSI on fields that only exist on a subset of the objects) and composite attributes keys (concatenating two or more attributes and using this as a new attribute to create a GSI on).
However, to directly address the question "Is it okay to filter using code instead of the NoSQL database?" the answer would be yes, that is an acceptable approach. The reason for performing filters in DynamoDB is not to reduce the "cost" of the query, that is actually the same, but to decrease unnecessary data transfer over the network.
The ideal solution is to use a GSI to get to reduce the scope of what is returned to as close to what you want as possible, but if it is necessary some additional filtering can be fine to eliminate some records either through a filter in DynamoDB or using your own code.

Resources