Dynamodb ttl actual time of removal - amazon-dynamodb

I have setup TTL on dynamodb table and enabled a stream. According to aws docs it can take up to 48hrs before item is removed. I have run some experiments and I am seeing a 10min delay. I can live with this but has anyone else had longer delays?

Yes,
There are instances where the time taken for the item removal to happen takes more than 10 mins! In fact, the SLA from DynamoDB is 48 hours. The time needed for the actual removal to happen depends on the activity levels of DynamoDB tables.

A more pointed rephrase of Allan:
Even if no one has seen that delay (and chances of finding it anecdotally through a Q&A site seems like a bad statistical test) Amazon says to expect the possibility of that much of a delay. This is for resource cleanup only, and most likely a breach of the 48h SLA will only allow you a refund of storage costs.
Do not depend on the absence of a given item to trigger logic within your application (e.g., user session timeout).

Related

How to handle offline aggregation using Firestore?

I have been scouring the internet for days on a solution to this problem.
That is, how to handle aggregation when there is no network connection? I have a task management app that looks to aggregate meta data about user tasks. For example, the task can contain tags that can be aggregated to be shown in a dashboard to the user on a daily basis. This would be easy if the user is always online, so I could use transaction or cloud function to aggregate, but when the user is offline, the aggregation will appear to be incorrect, until the user restores their network connection.
Aggregation queries are explained here:
https://firebase.google.com/docs/firestore/solutions/aggregation
Which states a limitation:
Offline support - Client-side transactions will fail when the user's
device is offline, which means you need to handle this case in your
app and retry at the appropriate time.
However, there has yet to be any example or documentation on how to 'handle this case'. How would I go about addressing this problem?
Some thoughts:
I could cache the item if a transaction fails. This item will be aggregated on top of the stored aggregation. However, going down this line would mean that I can't take advantage of the Firestore's "offline mode", because I'm using my own cache on every write while offline anyway.
I could aggregate on demand. That is, never store the aggregation. This is going to be very heavy on read depending on how many tasks a user has. Furthermore, if the aggregation will need to be shared as insights to other users, this option will not work because other users do not have access to the tasks.
I'm at a loss and any help would be appreciated, thanks!
After a lot of research and trial and error I found a solution that can address this problem gracefully.
FieldValue.increment to the rescue.
What FieldValue.increment does is bypass the use of transaction while respecting the default Firestore's offline cache behaviour. It requires the use of set or update on the field directly. The drawback is the inability to use the 'withConverter' on the collection for type safety. I'm willing to live with the drawback considering how useful FieldValue.increment is.
I've done multiple tests and can confirm that the values can be incremented/decremented multiple times locally while offline. This offline value is reflected in a get or snapshot call to the cache. When the network connection is restored, the values are updated on the server.
The value itself is not stored on the cache, it simply stores the "difference" in the FieldValue sentinel for when it is time to update it on the server.
This method only works with incrementing and decrementing values. Storing averages will not be possible using this method. That is because the true total number of items is not known at the time of its calculation when offline.
Instead, the total number of items are stored along side the total value. The average is then calculated when and as needed. In this way the average will always be accurate from a local perspective when offline, and it will also be accurate when online when the total value and count has been synced.

Cloud Firestore throttling high-volume update syncing

(Note: sorry if I am using the relational DB terms here.)
Let's say I have ten clients that are connected to a database. This database has a sustained throughput of about 1k updates per second. Obviously sending 1k updates per second to a web-browser (let's say 1MB data changes per second) is not going to be a good experience for the end-user. Does Firebase have any controls as to how much data a client can 'accept' before it starts throttling it? I understand it may batch requests, but my point here is, Google can accept data/updates faster than a browser can (potentially from a phone on a weak internet connection), so what controls or techniques are there in place to control this experience for the end-user?
The only items I see from the docs are:
You should not update a single document more than once per second. If you update a document too quickly, then your application will experience contention, including higher latency, timeouts, and other errors.
https://firebase.google.com/docs/firestore/best-practices#updates_to_a_single_document
This topic is covered in here, keeping the language used to code in aside, the linked code in that answer can assist.
In a general explanation, if your client application is configured to listen for Firestore updates, it will receive all the update events to that listener (just like you mentioned is happening).
You can consider polling Firebase for changes. The poll can even be an extension of the client application code where the code tracks the frequency of the updates being received and has a maximum value of updates per second which, when reached, results in the client disconnecting as a listener and performing periodic polls for the data.
The listener could then be re-established after a period to continue the normal workflow when there are fewer updates again.
The above being said, this is not optimal and treats the symptom rather than the cause. If a listener is returning too many updates, you should consider the structure of the data and look to isolate the updates to only require updates to listeners that require it.
Similarly, the large updates can be mitigated by ensuring smaller records contain the changes resulting in less data.
A generalized example is where two fields of data are updated, but the record is 150 fields in size. Rather than returning the full 150 fields, shard the fields into different data sets, so the two fields are in their own record with an additional reference field used to correlate with a second data set of the remaining 148 fields (plus the reference field).
When the smaller record is updated, the client application receives the small update, determines if the update is applicable to itself, and if so, fetches the corresponding larger record.
To prevent high volumes of writes from overwhelming the client's snapshot listeners, you could periodically duplicate the writes to a proxy collection that the client watches instead.
Documents would need a field to record the time of the last duplicate write to the proxy collection, and the process performing the writes should avoid making writes to the proxy collection until after the frequency duration has elapsed.
A small number of unnecessary writes may still occur due to any concurrent processes you have, but these might be insignificant in practice (with a reasonably long duplication frequency).
If the data belongs to a user, rather than being global data, then you could conceivably adjust the frequency of writes per user to suit their connection, either dynamically or based on user configuration.
In this way, your processes get to control the frequency of writes seen by clients, without needing to throttle or otherwise reject ingress writes (which would presumably be bad news for the upstream processes).
Relevant part of the documentation below.
https://firebase.google.com/docs/firestore/best-practices#realtime_updates
Limit the collection write rate
1,000 operations/second
Keep the rate of write operations for an individual collection under 1,000 operations/second.
Limit the individual client push rate
1 document/second
Keep the rate of documents the database pushes to an individual client under 1 document/second.

Delete data in dynamoDB without bringing site down

I have multi-tenant product offering and use dynamodb database, so all our web-request is being served from dynamodb. I have use case where I want to move data of a tenant from one region to another, this would be background process.
How do I ensure background process does not hog the database ? otherwise it will give bad user experience and may bring website down.
Is there a way I can have dedicated read and write capacity provisioned for background process.
You cannot dedicate read and write capacity units to specific processes, but you could temporarily change the table's capacity mode to on-demand for the move, and then switch it back to provisioned mode later when the move is complete. You can make this capacity mode switch once every 24 hours. By changing to on-demand capacity mode, you are less likely to be throttled in this specific situation.
That said, without knowing your current table capacity mode and capacity settings on those tables, it is difficult for me to make concrete recommendations though.
Sorry answer from Kirk is not a good idea for saving $$$. DynamoDB has TTL feature so say you want to delete something, you expire the item, meaning queries for that used to get that item no longer retrieve it, because the TTL has expired.
But it is not yet DELETED ! It will be scheduled for deletion later, saving you those precious capacity units when it deletes items in batches as opposed to one by one, greatly saving you money and is what the technology is for.

DynamoDB atomic counter for account balance

In DynamoDB an Atomic Counter is a number that avoids race conditions
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/WorkingWithItems.html#WorkingWithItems.AtomicCounters
What makes a number atomic, and can I add/subtract from a float in non-unit values?
Currently I am doing: "SET balance = balance + :change"
(long version) I'm trying to use DynamoDB for user balances, so accuracy is paramount. The balance can be updated from multiple sources simultaneously. There is no need to pre-fetch the balance, we will never deny a transaction, I just care that when all the operations are finished we are left with the right balance. The operations can also be applied in any order, as long as the final result is correct.
From what I understand, this should be fine, but I haven't seen any atomic increment examples that do changes of values other than "1"
My hesitation arises because questions like Amazon DynamoDB Conditional Writes and Atomic Counters suggest using conditional writes for similar situation, which sounds like a terrible idea. If I fetch balance, change and do a conditional write, the write could fail if the value has changed in the meantime. However, balance is the definition of business critical, and I'm always nervous when ignoring documentation
-Additional Info-
All writes will originate from a Lambda function, and I expect pretty much 100% success rates in writes. However, I also maintain a history of all changes, and in the event the balance is in an "unknown" state (eg network timeout), could lock the table and recalculate the correct balance from history.
This I think gives the best "normal" operation. 99.999% of the time, all updates will work with a single write. Failure could be very costly, as we would need to scan a clients entire history to recreate the balance, but in terms of trade-off that seems a pretty safe bet.
The documentation for atomic counter is pretty clear and in my opinion it will be not safe for your use case.
The problem you are solving is pretty common, AWS recommends using optimistic locking in such scenarios.
Please refer to the following AWS documentation,
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBMapper.OptimisticLocking.html
It appears that this concept is workable, from a AWS staff reply
Often application writers will use a combination of both approaches,
where you can have an atomic counter for real-time counting, and an
audit table for perfect accounting later on.
https://forums.aws.amazon.com/thread.jspa?messageID=470243&#470243
There is also confirmation that the update will be atomic and any update operation will be consistent
All non batch requests you send to DynamoDB gets processed atomically
- there is no interleaving involved of any sort between requests. Write requests are also consistent, so any write request will update
the latest version of the item at the time the request is received.
https://forums.aws.amazon.com/thread.jspa?messageID=621994&#621994
In fact, every write to a given item is strongly consistent
in DynamoDB, all operations against a given item are serialized.
https://forums.aws.amazon.com/thread.jspa?messageID=324353&#324353

Graphite Render URL API to Splunk - Track received events?

I'd like to setup a scripted input in Splunk to do a curl against the render url api for Graphite. I imagine I could configure this input to run on the minute, and retrieve that last minutes worth of events.
My concern with this is that some events might be missed, or duplicated.
Has anybody done something similar to this? How could I keep track of the events from Graphite that I have already read?
If you write a modular input you can use data checkpoints. See the docs for more info: http://docs.splunk.com/Documentation/Splunk/6.2.1/AdvancedDev/ModInputsCheckpoint
My concern with this is that some events might be missed, or duplicated.
Yes, it may go missing. In two cases-
If you're pushing your graphite server to the limits, there is a lag between the point wherein the datapoint is received and its flushing to disk. With large queues, i have seen this go upto 20 mins. (IO is the constraint here).
For example- in the case above wherein there's a 20 minute lag, and i am storing data at a 1m granularity- i will have the latest 20 datapoints with NULL against the timestamp. Of-course, they will soon fill in with the next flush.
Know that these are indeterminate. So if you have a zero lag deployment- go for this approach.
The latest datapoint can or cannot be NULL at any given point, because of the flushing nature of graphite, even if nothing is throttling. You can use something like &from=-21m&to=-1m to make sure you never encounter this. Note: Your monitoring now lags by a minute. :)
All said, graphite is a great monitoring tool if your requirements aren't realtime.

Resources