SQL Triggers alternative in DynamoDB - amazon-dynamodb

I have a list attribute in a table which is a moving window. I wish to create a trigger in DynamoDb such that whenever something is appended to the list it shifts by one, dropping the earliest value. If I were using SQL, create trigger would've been my go to, but what about DynamoDb?

AWS refers to it as a trigger in this document. Basically you write a Lambda function to do what you want. However, in your example you would have to be careful not to create an infinite loop where DynamoDB is updated, Lambda is called and updates DynamoDB and then your Lambda is called again. But this post actually calls this design pattern a database trigger.

Dynamo DB doesn't have anything like SQL's "Before Update" trigger.
DDB's Stream functionality, while often referred to and used like an "After Update" trigger...isn't really at all like a real RDMS SQL trigger.
If you really must use DDB, then you're stuck with fronting DDB with your own API that implements the logic you require.
I suppose as suggested by another answer you might carefully implement a DDB "trigger" lambda. But realize you're going to pay for 2 writes for every update instead of just 1. In addition, let say you want you list to have the most recent 10 items. Your apps would have to be prepared to see 11, or 12, or 13 sometimes. Since the the "trigger" is async from the actual DB writes.

Related

What is the best way to schedule tasks in a serverless stack?

I am using NextJS and Firebase for an application. The users are able to rent products for a certain period. After that period, a serverless function should be triggered which updates the database etc. Since NextJS is event-driven I cannot seem to figured out how to schedule a task, which executes when the rental period ends and the database is updated.
Perhaps cron jobs handled elsewhere (Easy Cron etc) are a solution. Or maybe an EC2 instance just for scheduling these tasks.
Since this is marked with AWS EC2, i've assumed it's ok to suggest a solution with AWS services in mind.
What you could do is leverage DynamoDB's speed & sort capabilities. If you specify a table with both the partition key and the range key, the data is automatically sorted in the UTF-8 order. This means iso-timestamp values can be used to sort data historically.
With this in mind, you could design your table to have a partition key of a global, constant value across all users (to group them all) and a sort key of isoDate#userId, while also creating an GSI (Global Secondary Index) with the userId as the partition key, and the isoDate as the range key.
With your data sorted, you can use the BETWEEN query to extract the entries that fit to your time window.
Schedule 1 lambda to run every minute (or so) and extract the entries that are about to expire to notify them about it.
Important note: This sorting method works when ALL range keys have the same size, due to how sorting with the UTF-8 works. You can easily accomplish this if your application uses UUIDs as ids. If not, you can simply generate a random UUID to attach to the isoTimestamp, as you only need it to avoid the rare exact time duplicity.
Example: lets say you want to extract all data from expiring near the 2022-10-10T12:00:00.000Z hour:
your query would be BETWEEN 2022-10-10T11:59:00.000Z#00000000-0000-0000-0000-000000000000 and 2022-10-10T12:00:59.999Z#zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz
Timestamps could be a little off, but you get the idea. 00.. is the start UTF8 of an UUID, and zz.. (or fff..) is the end.
In AWS creating periodic triggers to Lambda using AWS Console is quite simple and straight-forward.
Login to console and navigate to CloudWatch.
Under Events, select Rules & click “Create Rule”
You can either select fixed rate or select Cron Expression for more control
Cron expression in CloudWatch starts from minutes not seconds, important to remember if you are copying Cron expression from somewhere else.
Click “Add Target”, select “Lambda Function” from drop down & then select appropriate Lambda function.
If you want to pass some data to the target function when triggered, you can do so by expanding “Configure Input”

How to handle DynamoDB Global streams

Looking to create a DynamoDB global table for storing customer information. The problem I have is my current pattern is to listen to changes on this table and send email updates using Lambda triggers.
i.e. Your profile information was changed. If this was not you..
Do I now need to have that Lambda in each region and will data replication mean that it is triggered for each region?
I think you might have misunderstood with streams.
Global Tables needs streams enabled on the table to replicate between regions. You can check the requirements and how it works.
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/globaltables_HowItWorks.html
If you have trigger, you can have only in one region. Whichever region is having the lambda associated to the trigger will get notified with the updates.
The benefit from the global table you get is, if any regions updates the data, the lambda in the region you have configured will get triggered. Only one trigger will be sent to the lambda.
Enabling streams is one of the requirements for DynamoDB Global Tables.
If you create trigger in multiple regions, you need to implement your Lambda with idempotency i.e., if the same data is delivered any number of times, it will perform the operation only once.
Hope it helps.

Fetch new entities only

I thought Datastore's key was ordered by insertion date, but apparently I was wrong. I need to periodically look for new entities in the Datastore, fetch them and process them.
Until now, I would simply store the last fetched key and wrongly query for anything greater than it.
Is there a way of doing so?
Thanks in advance.
Datastore automatically generated keys are generated with uniform distribution, in order to make search more performant. You will not be able to understand which entity where added last using keys.
Instead, you can try couple of different approaches.
Use Pub/Sub and architecture your app so another background task will consume this last added entities. On entities add in DB, you will just publish new Event into Pub/Sub with key id. You event listener (separate routine) will receive it.
Use names and generate you custom names. But, as you want to create sequentially growing names, this will case performance hit on even not big ranges of data. You can find more about this in Best Practices of Google Datastore.
https://cloud.google.com/datastore/docs/best-practices#keys
You can add additional creation time column, and still use automatic keys generation.

Clear DocumentDB Collection before inserting documents

I need to know how to clear documentdb collection before inserting new documents. I am using datafactory pipeline activity to fecth data from on-prem sql server and insert into documentdb collection. The frequency is set to every 2 hrs. So when the next cycle runs, I want to first clear the exisitng data in documentdb collection. How do I do that?
The easiest way is to programmatically delete the collection and recreate it with the same name. Our test scripts do this automatically. There is the potential for this to fail due to a subtle race condition, but we've found that adding a half second delay between the delete and recreate avoids this.
Alternatively, it would be possible to fetch every document id and then delete them one at a time. This would be most efficiently done from a stored procedure (sproc) so you didn't have to send it all over the wire, but it would still be consuming RUs and take time.

Mongodb automatically write into capped collection

I need to manage the acquisition of many record at hour. About 1000000 records. And I need to get every second the last insert value for every primary key. It works quit well with sharding. I was thinking to try the use os capped collection to get only the last record for every primary key. In order to do this, I made two separated insert, there is a way, into mongodb, to make some kind of trigger to propagate the insert into a collection to another collection?
MongoDB does not have any support for triggers or similar behavior.
The only way to do this is to make it happen in your code. So the code that writes the first entry should also write the second.
People have definitely requested triggers. If they are necessary for your solution, please cast a vote on the feature request.
I disagree with "triggers is needed". People, MongoDB was created to be very fast and to provide as basic functionalities as can be. This is a power of this solution.
I think that here the best think is to create triggers inside Your application as a part of Data Access layer.

Resources