I have a table that has an article a day and it looks like this.
I set date as a primary key and post_id as sort key. I want to make a query that gets the one latest row by date. Is it possible to do it with Query? or I have to use Scan and filter that out?
Firstly, DynamoDB doesn't have aggregate functions such as min and max like RDBMS aggregate function. However, it does have one feature to get the latest date from table. In order to use that option, the attribute should be defined as SORT key. Also, the latest date can be found for the specific partition key only i.e. it does not apply for the whole table, just the chosen partition key.
ScanIndexForward can be used to get the latest item for the specific partition key.
ScanIndexForward — (Boolean) Specifies the order for index traversal:
If true (default), the traversal is performed in ascending order; if
false, the traversal is performed in descending order.
In order to use the above option, the table design has to be changed.
post_id - partition key
date - sort key
Related
How to order the result inside the scan operation dynamodb based on an attribute
need to arrange the scan operation using a attribute in the entity. The attribute value is a timestamp
You cannot choose the order of items during a Scan.
If you want items ordered by timestamp across the table, you can see my previous answer to that question at How to query 100 first items ordered by sort key of a DynamoDB table?
We have a Dynamodb table Events with about 50 million records that look like this:
{
"id": "1yp3Or0KrPUBIC",
"event_time": 1632934672534,
"attr1" : 1,
"attr2" : 2,
"attr3" : 3,
...
"attrN" : N,
}
The Partition Key=id and there is no Sort Key. There can be a variable number of attributes other than id (globally unique) and event_time, which are required.
This setup works fine for fetching by id but now we'd like to efficiently query against event_time and pull ALL attributes for records that match within that range (could be a million or two items). The criteria would be equal to something like WHERE event_date between 1632934671000 and 1632934672000, for example.
Without changing any existing data or transforming it through an external process, is it possible to create a Global Secondary Index using event_date and projecting ALL attributes that could allow a range query? By my understanding of DynamoDB this isn't possible but maybe there's another configuration I'm overlooking.
Thanks in advance.
(Edit: I rewrote the answer because the OP's comment clarified that the requirement is to query event_time ranges ignoring id. OP knows the table design is not ideal and is trying to make the best of a bad situation).
Is it possible to create a Global Secondary Index using event_date and projecting ALL attributes that could allow a range query?
Yes. You can add a Global Secondary Index to an existing table and choose which attributes to project. You cannot add an LSI to an existing table or change the table's primary key.
Without changing any existing data or transforming it through an external process?
No. You will need to manipulate the attibutes. Although arbitrary range queries are not its strength, DynamoDB has a time series pattern that can be adapted to your query pattern.
Let's say you query mostly by a limitied number of days. You would add a GSI with yyyy-mm-dd PK (Partition Key). Rows are made unique by a SK (Sort Key) that concatenates the timestamp with the id: event_time#id. PK and SK together are the Index's Composite Primary Key.
GSIPK1 = yyyy-mm-dd # 2022-01-20
GSISK1 = event_time#id # 1642709874551#1yp3Or0KrPUBIC
Querying for a single day needs 1 query operation, for a calendar week range needs 7 operations.
GSI1PK = "2022-01-20" AND GSI1SK > ""
Query a range within a day by adding a SK between condition:
GSI1PK = "2022-01-20" AND GSI1SK BETWEEN "1642709874" AND "16427098745"
It seems like one can create a global secondary index at any point.
Below is an excerpt from the Managing Global Secondary Indexes documentation which can be found here https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.OnlineOps.html
To add a global secondary index to an existing table, use the UpdateTable operation with the GlobalSecondaryIndexUpdates parameter.
So I wanted to fetch the last item/row of my dynamodb table but i am not finding resources. My primary key is id having series of incremented numbers such as 1,2,3... for each row respectively.
This is my function.
async function readMessage(){
const params = {
TableName: table,
};
return dynamo.getItem(params).promise();
}
I am not sure as to what i should be adding in my params.
DynamoDB has two types of primary keys:
Partition key – A simple primary key, composed of one attribute known as the partition key.
Partition key and sort key – Referred to as a composite primary key, this type of key is composed of two attributes. The first attribute is the partition key, and the second attribute is the sort key.
When fetching an item by partition key, you need to specify the exact partition key. You cannot fetch the max/min partition key.
Instead, you may want to create a sort key with a timestamp (or the ID if it's a sequential number) and use the sort key to fetch the last item.
Check out the AWS docs on Choosing the Right Partition Key for more info.
The proper way to design a table in DynamoDB is based on its expected access patterns; if this is something you need perhaps you should consider using this id as Sort Key instead of Primary Key and then query the table in descending order while also limiting the amount of items to 1.
If instead you don't want to change the schema of your items and you don't care about making at least two operations to do this you have two, not optimal options:
If none of your items ever gets deleted, just make a count first and use that information to know what's the latest item that was written.
Alternatively, if you could consider keeping a "special" record in your DynamoDB table that is basically a count that gets always incremented/written when one of your "other" items gets written. Upon retrieval you first retrieve the value of this special record and use this info to retrieve the actual one.
The combination of the partition key and sort key, makes the primary key of your item in the dynamoDB, so their combination must be unique, otherwise the item will be overwritten.
In almost all my use-cases, I select the primary key as an object attribute, like the brand, an email or a class and then, for the sort key I select the TimeStamp. So in this way, you always know the partition key, we need it to retrieve the values and then you can query your dynamoDB by making some filters by the sort key. For more extensive examples using Python, check the AWS page: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GettingStarted.Python.04.html, where it shows, how you can query your DynamoDB items.
There is also other ways to define the keys in your Dynamo and for that I advise you to check https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-sort-keys.html
I have added an index to my DynamoDB table in order to order the results but it doesn't appear to be doing anything. In the DyanmoDB dashboard it shows with 0 size and 0 item count.
There are several hundred items in the table and they all have an id (the primary key) and a created value. I didn't set a range property when I created the table. The items in the picture below are in the correct order but the response via appsync is not.
I have added the index to the query which returns all the items and it does not seem to do anything, the order of the items is the same with or without the index:
"version" : "2017-02-28",
"operation" : "Scan",
"index" : "id-created-index",
"limit": $util.defaultIfNull(${ctx.args.limit}, 20),
"nextToken": $util.toJson($util.defaultIfNullOrBlank($ctx.args.nextToken, null))
What am I missing? Has the index not been built or is there something else I need to do to use it in a query?
Update:
The index now shows the correct item_count although it is still not ordering the results:
Your base table has a partition key of id and no sort key. By definition this means each item in your table has a unique id.
Your GSI has a partition key of id and a sort key of created. Data is sorted by the created attribute within each partition key. As each of your ids is unique, the sort key is basically not doing anything.
Scan operations against a table or index returns the results in a random order. In order to have results sorted coming from DynamoDB, you'll need to run a Query operation, where the partition/hash key is fixed, and results will be sorted according to the sort key. However, since your table/GSI always have unique IDs, there's no additional records within a single partition (the id).
So yes, if you wanted results ordered by created, you'd need to have a fixed attribute on your table set as the partition key for your Index. The caveat here is that all your records in the index would belong to a single partition, which would be a bottleneck. There are a few ways around this; one way would be to see if there's a different access pattern where you can keep a different attribute fixed to query against (ie. owner_id). If the number of records are low enough, filtering on the client side is probably the best option.
I am studying DynamoDB and confuse on the order.
a. Could I use multiple conditions in the KeyConditions field of query command to do the 'AND' query? i.e. Set condition to the following keys:
hash part of primary key,
range part of primary key,
local secondary index 1,
b. If it's workable, how would DynamoDB sort the result?
DynamoDB can only use one index at a time so you can't really query using both a range primary key AND a secondary index.
The sort will be based on the index actually used.
The conditions are filtering out results and are not limited to indices.