I store our data in a dynamodb table and on every update, a listener lambda (in Java) receives an update from the dynamodb stream. I was parsing the dynamodb update event using JacksonConverter
However, I would like to encode the dynamodb content in the tables. So, I can't use the JacksonConverter directly.
I would like to know if anyone has done the decryption of data from the dynamodb stream and did you use any libraries?
I use DynamodbMapper's AttributeTransformer to encode the stuff. Can I use the same for decrypting the output from this stream too?
One possible approach seems to call the DDB using the un-encrypted attributes, if the use-case allows so.
Related
I would like to take a single DynamoDB table which contains a data field with JSON data. This data has a schema that is dependent on the user associated with the table entry. Let's assume that the schema is part of the entry for simplicity.
I would like to stream this data into S3 as Parquet with embedded schema, transformation (i.e. just sending the data field) and custom file naming based on the user ID.
I am using CDK v2.
What I found so far:
I can go from DynamoDB to Kinesis Stream to Firehose but Glue is required - I don't need it nor sure how I would provide it with these various "dynamic" schemas.
CDK asks for the S3 filename - I see there may be the possibility of a dynamic field in the name but not sure how I would use that (i've seen date for ex - I would need it to be something coming from the transform lambda),
I think that using Kinesis stream directly in DynamoDB config may not be what I want and I should just use regular DB streams but then.... Would I transform data, pass this to a Firehose? where does file naming etc come in.
I've read so many docs but all seem to deal with a standard table to file and Athena.
Summary: How can I append streaming dynamodb data to various parquet file and transform/determine file name from a lambda in the middle. I think I have to go from a dyanamodb stream lambda handler and append directly to S3 but not finding too much in examples. Concerned about buffering etc
I have a list attribute in a table which is a moving window. I wish to create a trigger in DynamoDb such that whenever something is appended to the list it shifts by one, dropping the earliest value. If I were using SQL, create trigger would've been my go to, but what about DynamoDb?
AWS refers to it as a trigger in this document. Basically you write a Lambda function to do what you want. However, in your example you would have to be careful not to create an infinite loop where DynamoDB is updated, Lambda is called and updates DynamoDB and then your Lambda is called again. But this post actually calls this design pattern a database trigger.
Dynamo DB doesn't have anything like SQL's "Before Update" trigger.
DDB's Stream functionality, while often referred to and used like an "After Update" trigger...isn't really at all like a real RDMS SQL trigger.
If you really must use DDB, then you're stuck with fronting DDB with your own API that implements the logic you require.
I suppose as suggested by another answer you might carefully implement a DDB "trigger" lambda. But realize you're going to pay for 2 writes for every update instead of just 1. In addition, let say you want you list to have the most recent 10 items. Your apps would have to be prepared to see 11, or 12, or 13 sometimes. Since the the "trigger" is async from the actual DB writes.
When data is sent from my thing to AWS, it will be a string with several values separated by spaces. One of these values is the ID of the device, that I want to use to query the database and insert the data into that row. How can I do this?
This might be tricky using just a single topic rule since the topic rules mainly work with JSON data. You can handle non-json data but your options are limited. You can always trigger a lambda function from the topic rule. The lambda function could then easily parse the message and write to DynamoDB.
See https://docs.aws.amazon.com/iot/latest/developerguide/binary-payloads.html for how you can work with non JSON data in IoT topic rules.
Well, I have been using Relay for a while now, and I am realising that I am still not sure about some fundamental aspects of using Relay, Graphql, React and Dynamodb.
Im using Dynamodb as the database and Dynamodb discourages the use of uuids as identifiers. At the same time, the relay Nodedefinitions function expects to output an object with a type and an id field, so my question is, can you reconcile the best practice for Relay and for Dynamodb?
Or is it just me who does not understand?
Relay doesn't have any requirements of your database. You can use whatever kind of primary key in Dynamo that you want.
You'll end up with two functions:
f(relayId) -> dynamoEntity
g(dynamoEntity) -> relayId
Those functions can be implemented however you like. E.g. maybe you can base64-encode the Dynamo primary key to produce a relay id.
I am using DynamoDB stream with a Aws lambda function + firehose to sync my data with redshift. I would like to know if it's possible to add all DynamoDB records to stream to reprocessing purposes. If not, what's the right approach?
For new data,you can do this.
For history data,you'd better don't do this. You can dump you table first, then import.
For reprocessing old data a parallelized full table scan is the way to go. There is the matter of deciding how to handle the transition from "old data" to "new data" but that could be achieved using either a timestamp attribute if one is available or by stopping writes to the table if that is possible.