What is the best way to schedule tasks in a serverless stack? - firebase

I am using NextJS and Firebase for an application. The users are able to rent products for a certain period. After that period, a serverless function should be triggered which updates the database etc. Since NextJS is event-driven I cannot seem to figured out how to schedule a task, which executes when the rental period ends and the database is updated.
Perhaps cron jobs handled elsewhere (Easy Cron etc) are a solution. Or maybe an EC2 instance just for scheduling these tasks.

Since this is marked with AWS EC2, i've assumed it's ok to suggest a solution with AWS services in mind.
What you could do is leverage DynamoDB's speed & sort capabilities. If you specify a table with both the partition key and the range key, the data is automatically sorted in the UTF-8 order. This means iso-timestamp values can be used to sort data historically.
With this in mind, you could design your table to have a partition key of a global, constant value across all users (to group them all) and a sort key of isoDate#userId, while also creating an GSI (Global Secondary Index) with the userId as the partition key, and the isoDate as the range key.
With your data sorted, you can use the BETWEEN query to extract the entries that fit to your time window.
Schedule 1 lambda to run every minute (or so) and extract the entries that are about to expire to notify them about it.
Important note: This sorting method works when ALL range keys have the same size, due to how sorting with the UTF-8 works. You can easily accomplish this if your application uses UUIDs as ids. If not, you can simply generate a random UUID to attach to the isoTimestamp, as you only need it to avoid the rare exact time duplicity.
Example: lets say you want to extract all data from expiring near the 2022-10-10T12:00:00.000Z hour:
your query would be BETWEEN 2022-10-10T11:59:00.000Z#00000000-0000-0000-0000-000000000000 and 2022-10-10T12:00:59.999Z#zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz
Timestamps could be a little off, but you get the idea. 00.. is the start UTF8 of an UUID, and zz.. (or fff..) is the end.

In AWS creating periodic triggers to Lambda using AWS Console is quite simple and straight-forward.
Login to console and navigate to CloudWatch.
Under Events, select Rules & click “Create Rule”
You can either select fixed rate or select Cron Expression for more control
Cron expression in CloudWatch starts from minutes not seconds, important to remember if you are copying Cron expression from somewhere else.
Click “Add Target”, select “Lambda Function” from drop down & then select appropriate Lambda function.
If you want to pass some data to the target function when triggered, you can do so by expanding “Configure Input”

Related

Constraints (Meta Information) in Kusto / Azure Data Explorer

For some of our logs we have the following schema:
A "master" event table that collects events. Each event comes with a unique id (guid).
For each event we collect additional IoT data (sensor data) which also contains the guid as a link the event table
Now, we often see the schema that someone starts with the IoT data and then wants to query the master event table. The join or query criteria is the guid. Now, as we have a lot of data the unconditioned query, of course, does not return within a short time frame, if at all.
Now what our analysts do is to use the time range as a factor. Typically, the sensor data refers to events that happend on the same day or +/- a few hours, or minutes or seconds (depends on the events). This query typically returns, but not always as fast as it could be. Given that the guid is unique, queries that explicitely state this knowledge are typically way faster than those that don't, e.g.
Event_Table | where ... | take 1
unfortuntely, everyone needs to remember those properties of the data.
After this long intro: Is there a way in Kusto to speed up those queries without explictely write "take 1"? As in, telling the Kusto engine that this column holds unique keys? I am not talking about enforcing that (as a DB unique key would do), but just to give hints to kusto on how to improve the query? Can this be done somehow?
It sounds that you can benefit from introducing server-side function:
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/schema-entities/stored-functions
Using this method - you can define a function on a server - and users will provide a parameter to the function.

Schedule function in firebase

The problem
I have a firebase application in combination with Ionic. I want the user to create a group and define a time, when the group is about to be deleted automatically. My first idea was to create a setTimeout(), save it and override it whenever the user changes the time. But as I have read, setTimeout() is a bad solution when used for long durations (because of the firebase billing service). Later I have heard about Cron, but as far as I have seen, Cron only allows to call functions at a specific time, not relative to a given time (e.g. 1 hour from now). Ideally, the user can define any given time with a datetime picker.
My idea
So my idea is as following:
User defines the date via native datepicker and the hour via some spinner
The client writes the time into a seperate firebase-database with a reference of following form: /scheduledJobs/{date}/{hour}/{groupId}
Every hour, the Cron task will check all the groups at the given location and delete them
If a user plans to change the time, he will just delete the old value in scheduledJobs and create a new one
My question
What is the best way to schedule the automatic deletion of the group? I am not sure if my approach suits well, since querying for the date may create a very flat and long list in my database. Also, my approach is limited in a way, that only full hours can be taken as the time of deletion and not any given time. Additionally I will need two inputs (date + hour) from the user instead of just using a datetime (which also provides me the minutes).
I believe what you're looking for is node schedule. Basically, it allows you to run serverside cron jobs, it has the ability to take date-time objects and schedule the job at that time. Since I'm assuming you're running a server for this, this would allow you to schedule the deletion at whatever time you wish based on the user input.
An alternative to TheCog's answer (which relies on running a node server) is to use Cloud Functions for Firebase in combination with a third party server (e.g. cron-jobs.org) to schedule their execution. See this video for more or this blog post for an alternative trigger.
In either of these approaches I recommend keeping only upcoming triggers in your database. So delete the jobs after you've processed them. That way you know it won't grow forever, but rather will have some sort of fixed size. In fact, you can query it quite efficiently because you know that you only need to read jobs that are scheduled before the next trigger time.
If you're having problems implementing your approach, I recommend sharing the minimum code that reproduces where you're stuck as it will be easier to give concrete help that way.

how to write DynamoDB table1 to table2 daily?

how to write DynamoDB table1 to table2?
Table1: id,name,mobile,address,createdDate.
Table2: id,name,mobile,address,createdDate.
Condition: only Yesterday added records are write to "table2". And also remove that Yesterday data from "table1".
how to do this process on daily basis.
You would need to write your own code to do this, but you could implement it as an AWS Lambda function, triggered by an Amazon CloudWatch Events schedule.
See: Tutorial: Schedule Lambda Functions Using CloudWatch Events
Alternatively, rather than copying the data once per day, you could create a DynamoDB Stream that triggers an AWS Lambda function that can insert the data immediately. If your goal is to have one table per day, you could populate today's table during the day and simply switch over to it at the end of the day, without having to copy any data -- because it is already there!
See: DynamoDB Streams and AWS Lambda Triggers
Here is one possible approach:
1) Enable TTL on table 1 and ensure that items are deleted the next day (you can put a next day date into ttl date field on creation).
2) Enable DynamoDB stream on table 1 and ensure it includes old image.
3) use AWS lambda that processes ttl events (check dynamodb ttl documentation to see how ttl event can be identified) and writes old image data into table 2.
This solution should be fault tolerant and scalable, as lambdas will retry on any failed operation. One downside of this solution is that ttl does not guarantee immediate execution. It can in some rare cases take up to 48 hours for item to be deleted, but usually it is much faster.

DynamoDB table structure

We are looking to use AWS DynamoDB for storing application logs. Logs from multiple components in our system would be stored here. We are expecting a lot of writes and only minimal number of reads.
The client that we use for writing into DynamoDB generates a UUID for the partition key, but using this makes it difficult to actually search.
Most prominent search cases are,
Search based on Component / Date / Date time
Search based on JobId / File name
Search based on Log Level
From what I have read so far, using a UUID for the partition key is not suitable for our case. I am currently thinking about using either / for our partition key and ISO 8601 timestamp as our sort key. Does this sound reasonable / widely used setting for such an use case ?
If not kindly suggest alternatives that can be used.
Using UUID as partition key will efficiently distribute the data amongst internal partitions so you will have ability to utilize all of the provisioned capacity.
Using sortable (ISO format) timestamp as range/sort key will store the data in order so it will be possible to retrieve it in order.
However for retrieving logs by anything other than timestamp, you may have to create indexes (GSI) which are charged separately.
Hope your logs are precious enough to store in DynamoDB instead of CloudWatch ;)
In general DynamoDB seems like a bad solution for storing logs:
It is more expensive than CloudWatch
It has poor querying capabilities, unless you start utilising global secondary indexes which will double or triple your expenses
Unless you use random UUID for hash key, you are risking creating hot partitions/keys in your db (For example, using component ID as a primary or global secondary key, might result in throttling if some component writes much more often than others)
But assuming you already know these drawbacks and you still want to use DynamoDB, here is what I would recommend:
Use JobId or Component name as hash key (one as primary, one as GSI)
Use timestamp as a sort key
If you need to search by log level often, then you can create another local sort key, or you can combine level and timestamp into single sort key. If you only care about searching for ERROR level logs most of the time, then it might be better to create a sparse GSI for that.
Create a new table each day(let's call it "hot table"), and only store that day's logs in that table. This table will have high write throughput. Once the day finishes, significantly reduce its write throughput (maybe to 0) and only leave some read capacity. This way you will reduce risk of running into 10 GB limit per hash key that Dynamo DB has.
This approach also has an advantage in terms of log retention. It is very easy and cheap to remove log older than X days this way. By keeping old table capacity very low you will also avoid very high costs. For more complicated ad-hoc analysis, use EMR

How can I have events in aws lambda triggered regularly?

SHORT VERSION: How can I trigger events regularly in AWS lambda?
LONG VERSION: My situation is such that I have events in a database that expire within a certain time. I want to run a function (send push notifications, delete rows, etc.) whenever I figure out that an event has expired. I know that setting up a timer for every single event created would be impractical, but is there something that would scan my database every minute or something and look for expired events to run my code on? If not, is there some alternative for my solution?
You could store your events in a DynamoDB table keyed at a UUID, and have a hash-range schema GSI on this table where the hash key would be an expiry time bucket, like the hour an event expires, 20150701T04Z, and the range key of the GSI could be the exact timestamp (unix time). That way, for a given hour-expiry bucket, you can use a range Query on the hour you are expiring events for, and take advantage of key conditions to limit your read to the time range you are interested in. GSI do not enforce uniqueness, so you are still OK even if there are multiple events at the same Unix time. By projecting ALL attributes instead of KEYS_ONLY or INCLUDE, you can drive your event expiry off the GSI, without a second read to the base table. By adjusting the size of your expiry buckets (hours or minutes or days are all good candidates), you can greatly reduce the chances that your writes to the base table and queries on the GSI do not get throttled, as the expiry buckets, having different hash keys, will be evenly distributed throughout the hash key space.
Regarding event processing and the use of Lambda, first, you could have an EC2 instance perform the queries and delete items from the event table as they expire (or tombstone them by marking them as expired). Deleting the event items will keep the size of your table manageable and help you avoid IOPS dilution in the partitions of your table. If the number of items grows without bound, then your table's partitions will keep splitting resulting in smaller and smaller amounts of provisioned throughput on each partition, unless you up-provision your table. Next in the pipeline, you could enable a DynamoDB stream on the event table with the stream view type that includes old and new images. Then, you could attach a Lambda function to your Stream that does the event-driven processing (push notifications, etc). You can have your Lambda function fire notifications when old is populated and new is null, or when the difference between old and new image indicates that an event was tombstoned.
There's support now for scheduled Lambda jobs I believe, but I haven't tried it yet. https://aws.amazon.com/about-aws/whats-new/2015/10/aws-lambda-supports-python-versioning-scheduled-jobs-and-5-minute-functions/

Resources