Using timestamp as an Attribute in DynamoDB - amazon-dynamodb

I'm quite new to DynamoDB, but have some experience in Cassandra. I'm trying to adapt a pattern I followed in Cassandra, where each column represented a timestamped event, and wondering if it will carry over gracefully into DynamoDB or if I need to change my approach.
My goal is to query a set of documents within a date range by using the milliseconds-since-epoch timestamp as an Attribute name. I'm successfully storing the following as each report is generated with each new report being added under its own column:
{ PartitionKey:customerId,
SortKey:reportName_yyyymm,
'#millis_1#':{'report':doc_1},
'#millis_2#':{'report':doc_2},
. . .
'#millis_n#':{'report':doc_n}
}
My question is, given a millisecond-based date range, and the accompanying Partition and Sort keys, is it possible to query the set of Attributes that fall within that range or must I retrieve all columns for the matching keys and filter them at the client?

Welcome to the most powerful NoSQL database ;)
To kick off with the positive news, there is no way to query out specific attributes. You can project certain attributes in a query. But you would have to write your own logic to determine which attributes or columns should be included in the projected query. To get close to your solution you could use a map attribute inside an item with the milliseconds as a key. But there is another thing you have to be aware of when starting on this path.
There is a maximum total item size of 400KB for each item in DynamoDB, including key and attribute names.(Limits in DynamoDB Items) This means you can only store so many attributes in an item. This is especially true if you intend to put the actual report inside of the attribute. Which I would advise against, also because you will be burning up read capacity units every time you get one attribute out of the whole item. You would be better of putting this data in a separate table with the keys in the map. But truthfully in DynamoDB I would split this whole thing up, just add the milliseconds to the sort key and make every document its own item. That way you can directly query to these items and you can use the "between" where clause to select specific date-time ranges. Please let me you meant something else.

Related

Querying and paginating by a user-defined date in DynamoDB

This seems like a relatively common use-case but I can't find any slam-dunk answer. I would like the ability to paginate my results, sorted by a date that is both user-defined and can be modified by the user at any time.
I understand I can add the date to the sort key and delete/re-add the document should the user update the date, or use a secondary index, but neither of these options seem great.
Are there any other options?
Why do neither of those options seem great? Can you elaborate some more?
You want to sort based on the timestamp, as we know DynamoDB sorts items based on a Sort Key. For that reason, to fit your use-case needs you would need to use timestamp as the sort key. If your base table is already using something else as a sort-key then you can create an index, which will allow you to define it.
When you change the key of an indexed item, in this case when the user modifies the timestamp value, DynamoDB handles the underlying delete and write for you. All you need to think about is the update.

How to introduce a new column in dynamo DB running in production?

I have a use case where DynamoDB is running in production and I need to add a new column IDUpdatedAt which will also be serving as a sort key for one of the GSIs.
I tried a thing in test where my application adds the new rows with IDUpdatedAt, it's working fine but what about the existing rows? How to add the values for those?
Also the new rows will not be added without IDUpdatedAt, but how will the search be impacted for older rows?
PS: IDUpdatedAt is being used as a filter in the application, i.e., user can search for specific ID and can get results sorted by date. That's why IDUpdatedAt is also a part of GSI (sort key).
Please help.
You've got the right idea by adding the field to new items. After all, DynamoDB does not enforce a particular schema outside of the primary key.
This also happens to be a very useful feature, especially when defining a GSI on that attribute; if the atttibute exists on the item, it ends up in the index! For example, imagine modeling an email inbox in DDB where each item represents an email. You could include an attribute 'is_read' and define a GSI using that atttibute.
If the 'is_read' attribute exists on the item, it's in the index. Otherwise, it's not. A cool way to use GSIs to implement filtering.
Pretty neat stuff!
However, there is no way to retroactively update all items with a new attribute other than manually updating each item (or in batches). The equivalent in SQL databases is defining a new column. Unfortunately, an analogous operation in DDB does not exist.

How can I get a document at a specific index after orderBy

I have some code like this:
...
const snapshot = firestore().collection("orders").orderBy("deliveryDate")
...
I want to access only the 100th order in the returned documents. So far, the only way I achieve this is to do firestore().collection("orders").orderBy("deliveryDate").limit(100) and this returns first 100 documents and I can access the last order. But, I end up fetching 99 unwanted documents and this could become quite slower if I want the 200th document or higher.
So, I basically want to know if there's a possible way of getting just the index I want after sorting.
As far as I know, startAt() and startAfter() only accept a doc reference or field values, not an index/offset
Firestore does not offer any way to offset by some numeric amount to web and mobile clients (and doing so would end up having the exact same cost as what you're doing now).
If you need to impose some sort of offset into your collection, you will need to maintain that in the document itself for querying, or use some other type of storage that gives you fast cheap access by index.

Voting on items - how to design database/aws-lambda to minimize AWS costs

I'm working on a website that mostly displays items created by registered users. So I'd say 95% of API calls are to read a single item and 5% are to store a single item. System is designed with AWS API Gateway that calls AWS Lambda function which manipulates data in DynamoDB.
My next step is to implement voting system (upvote/downvote) with basic fetaures:
Each registered user can vote only once per item, and later is only allowed to change that vote.
number of votes needs to be displayed to all users next to every item.
items have only single-item views, and are (almost) never displayed in a list view.
only list view I need is "top 100 items by votes" but it is ok to calculate this once per day and serve cached version
My goal is to design a database/lambda to minimize costs of AWS. It's easy to make the logic work but I'm not sure if my solution is the optimal one:
My items table currently has hashkey slug and sortkey version
I created items-votes table with hashkey slug and sortkey user and also voted field (containing -1 or 1)
I added field votes to items table
API call to upvote/downvote inserts to item-votes table but before checks constraints that user has not already voted that way. Then in second query updates items table with updated votes count. (so 1 API call and 2 db queries)
old API call to show an item stays the same but grabs new votes count too (1 API call and 1 db query)
I was wondering if this can be done even better with avoiding new items-votes table and storing user votes inside items table? It looks like it is possible to save one query that way, and half the lambda execution time but I'm worried it might make that table too big/complex. Each user field is a 10 chars user ID so if item gets thousands of votes I'm not sure how Lambda/DynamoDB will behave compared to original solution.
I don't expect thousands of votes any time soon, but it is not impossible to happen to a few items and I'd like to avoid situation where I need to migrate to different solution in the near future.
I would suggest to have a SET DynamoDB (i.e. SS) attribute to maintain the list of users who voted against the item. Something like below:-
upvotes : ['user1', 'user2']
downvotes : ['user1', 'user2']
When you update the votes using UpdateExpression, you can use ADD operator which adds users to SET only if it doesn't exists.
ADD - Adds the specified value to the item, if the attribute does not
already exist. If the attribute does exist, then the behavior of ADD
depends on the data type of the attribute:
If the existing data type is a set and if Value is also a set, then
Value is added to the existing set. For example, if the attribute
value is the set [1,2], and the ADD action specified [3], then the
final attribute value is [1,2,3]. An error occurs if an ADD action is
specified for a set attribute and the attribute type specified does
not match the existing set type. Both sets must have the same
primitive data type. For example, if the existing data type is a set
of strings, the Value must also be a set of strings.
This way you don't need to check whether the user already upvote or downvote for the item or not.
Only thing you may need to ensure is that the same user shouldn't be present on upvote and downvote set. Probably, you can use REMOVE or ConditionExpression to achieve this.

Firebase better way of getting total number of records

From the Transactions doc, second paragraph:
The intention here is for the client to increment the total number of
chat messages sent (ignore for a moment that there are better ways of
implementing this).
What are some standard "better ways" of implementing this?
Specifically, I'm looking at trying to do things like retrieve the most recent 50 records. This requires that I start from the end of the list, so I need a way to determine what the last record is.
The options as I see them:
use a transaction to update a counter each time a record is added, use the counter value with setPriority() for ordering
forEach() the parent and read all records, do my own sorting/filtering at client
write server code to analyze Firebase tables and create indexed lists like "mostRecent Messages" and "totalNumberOfMessages"
Am I missing obvious choices?
To view the last 50 records in a list, simply call "limit()" as shown:
var data = new Firebase(...);
data.limit(50).on(...);
Firebase elements are ordering first by priority, and if priorities match (or none is set), lexigraphically by name. The push() command automatically creates elements that are ordered chronologically, so if you're using push(), then no additional work is needed to use limit().
To count the elements in a list, I would suggest adding a "value" callback and then iterating through the snapshot (or doing the transaction approach we mention). The note in the documentation actually refers to some upcoming features we haven't released yet which will allow you to count elements without loading them first.

Resources