DynamoDB query/sort based on timestamp - amazon-dynamodb

In DynamoDB, I have a table where each record has two date attributes, create_date and last_modified_date. These dates are in ISO-8601 format e.g. 2016-01-22T16:19:52.464Z.
I need to have a way of querying them based on the create_date and last_modified_date e.g.
get all records where create_date > [some_date]
get all records where last_modified_date < [some_date]
In general, I need to get all records where [date_attr] [comparison_op] [some_date].
One way of doing it is to insert a dummy fixed attribute with each record and create an index with the dummy attribute as the partition key and the create_date as the sort key (likewise for last_modified_date.)
Then I'll be able to query it as such by providing the fixed dummy attribute as partition key, the date attributes as the sort key and use any comparison operators <, >, <=, >=, and so on.
But this doesn't seem good and looks like a hack instead of a proper solution/design. Are there any better solutions?

There are some things that NoSQL DBs are not good at, but you can solve this with the following solutions:
Move this table data to SQL database for searching purpose: This can be effective because you will be able to query as per your requirement, this might be tedious sometimes because you need to synchronize the data between two different DBs
Integrate with Amazon CloudSearch: You can integrate this table with CloudSearch and then rather than querying your DynamoDB table you can query Cloudsearch
Integrate with Elasticsearch: Elasticsearch is similar to CloudSearch although each has pros and cons, the end result would be same - rather than querying DynamoDB, instead query Elasticsearch
As you have mentioned in your question, add GSI indexes

Related

Querying DynamoDb timestamp data in range

I want to migrate my data from DynamoDb to Redshift. I dont want to scan the whole table at once as this might result in throttling.
My Table is as below:
acountId(hash key), lastUpdatedTime.
I thought I can create GSI on lastUpdatedTime and then I can query like give me the data between day1 to day5. Again next day I can do give me data between day6 to day7.
But even with GSI my understanding is that It will scan the whole table As I wont have any hash key to provide. I just have some range of timestamp to query.
Creating a GSI is the right solution indeed. However the GSI creation operation might be a bit slow/expensive if you set GSI to project all attributes. I would recommend creating the GSI on lastUpdatedTime, and project only the partition key (and order key if you have one) using KEYS_ONLY. Then, when you scan, you will only retrieve the item keys and query the item there and then, when migrating.
I recommend reading up on GSIs here: https://docs.aws.amazon.com/fr_fr/amazondynamodb/latest/developerguide/GSI.html

Range query on cloudsearch date

I have a dynamodb table which stores creation_date epoch in string format. This date is neither hash key nor sort key. Ultimate goal is querying the creation_date for a range i.e. I need all the ids in the give time range.
The table schema is:
id, version, creation_date, info.
id is hash key and version is sort key.
I was thinking of creating a cloudsearch domain and link that to dynamodb table. Is it possible to use a range query in cloudsearch using java if the date is in string format? If yes how?
Here’s how you can accomplish this in DynamoDB using a GSI with a hash key of creation_y_m and a GSI range key of creation_date.
When you’re querying for a range of creation dates, you need to do a bit of date manipulation to find out all of the months in between your two dates, but then you can query your GSI with a key condition expression like this one.
creation_y_m = 2019-02 AND creation_date BETWEEN 2019-02-05T12:00.00Z AND 2019-02-18T06:00:00Z
Given that most of your queries are a two week range, you will usually only have to make only one or two queries to get all of the items.
You may need to backfill the creation_y_m field, but it’s fairly straightforward to do that by scanning your table and updating each item to have the new attribute.
There are, of course, many variations on this. You could tweak how granular your hash key is (maybe you want just year, maybe you want year-month-day). You could use epoch time instead of ISO 8601 strings.

DynamoDB how to search for a list of values

I have a DynamoDB instance with a partition key and sort key. Let's say that they are organisation (hash key) and employee id (sort key).
I want to retrieve all employees who's ids are in a list. They all work for the same organisation but they are not all of the employees of that organisation.
In SQL I'd do something like:
select * from table where organisation_id = 'org' and employee_id in [list of ids]
There does not seem to be an equivalent in DynamoDB.
My choices seem to be:
1) Iterate over all employee IDs using a Query OR
2) Use BatchGetItems and provide organisation_id:employee_id for all items
The first seems like it will be slower as it involves multiple requests while the second is a single request but may consume more RCUs.
Which of these is preferred solution to this problem? Or am I missing a better third way?
I would iterate your list using GetItem, adding each employee found to a collection. This approach isn't slow - DynamoDB is designed specifically for getting lots of items fast using their keys.
There is no need to use Query as you have both the partition key and range key. You would only use a Query if say you wanted all employees of one organisation.
If your list is particularly large you could use BatchGetItem, which will create multiple parallel threads and therefore reduce latency. You won't find much a difference though unless you have a lot of items to get.
By the way, DynamoDB does have an 'IN' operator but your can't use it on KeyConditions.

Dynamodb query expression

Team,
I have a dynamodb with a given hashkey (userid) and sort key (ages). Lets say if we want to retrieve the elements as "per each hashkey(userid), smallest age" output, what would be the query and filter expression for the dynamo query.
Thanks!
I don't think you can do it in a query. You would need to do full table scan. If you have a list of hash keys somewhere, then you can do N queries (in parallel) instead.
[Update] Here is another possible approach:
Maintain a second table, where you have just a hash key (userID). This table will contain record with the smallest age for given user. To achieve that, make sure that every time you update main table you also update second one if new age is less than current age in the second table. You can use conditional update for that. Update can either be done by application itself, or you can have AWS lambda listening to dynamoDB stream. Now if you need smallest age for each use, you still do full table scan of the second table, but this scan will only read relevant records, to it will be optimal.
There are two ways to achieve that:
If you don't need to get this data in realtime you can export your data into a other AWS systems, like EMR or Redshift and perform complex analytics queries there. With this you can write SQL expressions using joins and group by operators.
You can even perform EMR Hive queries on DynamoDB data, but they perform scans, so it's not very cost efficient.
Another option is use DynamoDB streams. You can maintain a separate table that stores:
Table: MinAges
UserId - primary key
MinAge - regular numeric attribute
On every update/delete/insert of an original query you can query minimum age for an updated user and store into the MinAges table
Another option is to write something like this:
storeNewAge(userId, newAge)
def smallestAge = getSmallestAgeFor(userId)
storeSmallestAge(userId, smallestAge)
But since DynamoDB does not has native transactions support it's dangerous to run code like that, since you may end up with inconsistent data. You can use DynamoDB transactions library, but these transactions are expensive. While if you are using streams you will have consistent data, at a very low price.
You can do it using ScanIndexForward
YourEntity requestEntity = new YourEntity();
requestEntity.setHashKey(hashkey);
DynamoDBQueryExpression<YourEntity> queryExpression = new DynamoDBQueryExpression<YourEntity>()
.withHashKeyValues(requestEntity)
.withConsistentRead(false);
equeryExpression.setIndexName(IndexName); // if you are using any index
queryExpression.setScanIndexForward(false);
queryExpression.setLimit(1);

Whats the best way to query DynamoDB based on date range?

As part of migrating from SQL to DynamoDB I am trying to create a DynamoDB table. The UI allows users to search based on 4 attributes start date, end date, name of event and source of event.
The table has 6 attributes and the above four are subset of it with other attributes being priority and location. The query as described above makes it mandatory to search based on the above four values. whats the best way to store the information in DynamoDB that will help me in querying based on start date and end date fairly easy.
I thought of creating a GSI with hashkey as startdate, rangekey as end date and GSI on the rest two attributes ?
Inshort:
My table in DynamoDB will have 6 attributes
EventName, Location, StartDate, EndDate, Priority and source.
Query will have 4 mandatory attributes
StartDate, EndDate, Source and Event Name.
Thanks for the help.
You can use greater than/less than comparison operators as part of your query http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html
So you could try to build a table with schema:
(EventName (hashKey), "StartDate-EndDate" (sortKey), other attributes)
In this case the sort-key is basically a combination of start and end date allowing you to use >= (on the first part) and <= (on the second part)... dynamodb uses ASCII based alphabetical ordering... so lets assume your sortKey looks like the following: "73644-75223" you could use >= "73000-" AND <= "73000-76000" to get the given event.
Additionally, you could create a GSI on your table for each of your remaining attributes that need to be read via query. You then could project data into your index that you want to fetch with the query. In contrast to LSI, queries from GSI do not fetch attributes that are not projected. Be aware of the additional costs (read/write) involved by using GSI (and LSI)... and the additional memory required by data projections...
Hope it helps.

Resources