How to order the result inside the scan operation dynamodb based on an attribute - amazon-dynamodb

How to order the result inside the scan operation dynamodb based on an attribute
need to arrange the scan operation using a attribute in the entity. The attribute value is a timestamp

You cannot choose the order of items during a Scan.
If you want items ordered by timestamp across the table, you can see my previous answer to that question at How to query 100 first items ordered by sort key of a DynamoDB table?

Related

dynamodb query to select all items that match a set of values

In a dynamo table I would like to query by selecting all items where an attributes value matches one of a set of values. For example my table has a current_status attribute so I would like all items that either have a 'NEW' or 'ASSIGNED' value.
If I apply a GSI to the current_status attribute it looks like I have to do this in two queries? Or instead do a scan?
DynamoDB does not recommend using scan. Use it only when there is no other option and you have fairly small amount of data.
You need use GSIs here. Putting current_status in PK of GSI would result in hot
partition issue.
The right solution is to put random number in PK of GSI, ranging from 0..N, where N is number of partitions. And put the status in SK of GSI, along with timestamp or some unique information to keep PK-SK pair unique. So when you want to query based on current_status, execute N queries in parallel with PK ranging from 0..N and SK begins_with current_status. N should be decided based on amount of data you have. If the data on each row is less than 4kb, then this parallel query operation would consume N read units without hot partition issue. Below link provides the details information on this
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-indexes-gsi-sharding.html
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-modeling-nosql-B.html

DyanmoDB shows Item Count = 0, not being populated, and not working in Appsync query

I have added an index to my DynamoDB table in order to order the results but it doesn't appear to be doing anything. In the DyanmoDB dashboard it shows with 0 size and 0 item count.
There are several hundred items in the table and they all have an id (the primary key) and a created value. I didn't set a range property when I created the table. The items in the picture below are in the correct order but the response via appsync is not.
I have added the index to the query which returns all the items and it does not seem to do anything, the order of the items is the same with or without the index:
"version" : "2017-02-28",
"operation" : "Scan",
"index" : "id-created-index",
"limit": $util.defaultIfNull(${ctx.args.limit}, 20),
"nextToken": $util.toJson($util.defaultIfNullOrBlank($ctx.args.nextToken, null))
What am I missing? Has the index not been built or is there something else I need to do to use it in a query?
Update:
The index now shows the correct item_count although it is still not ordering the results:
Your base table has a partition key of id and no sort key. By definition this means each item in your table has a unique id.
Your GSI has a partition key of id and a sort key of created. Data is sorted by the created attribute within each partition key. As each of your ids is unique, the sort key is basically not doing anything.
Scan operations against a table or index returns the results in a random order. In order to have results sorted coming from DynamoDB, you'll need to run a Query operation, where the partition/hash key is fixed, and results will be sorted according to the sort key. However, since your table/GSI always have unique IDs, there's no additional records within a single partition (the id).
So yes, if you wanted results ordered by created, you'd need to have a fixed attribute on your table set as the partition key for your Index. The caveat here is that all your records in the index would belong to a single partition, which would be a bottleneck. There are a few ways around this; one way would be to see if there's a different access pattern where you can keep a different attribute fixed to query against (ie. owner_id). If the number of records are low enough, filtering on the client side is probably the best option.

DynamoDB get the lasted row by date

I have a table that has an article a day and it looks like this.
I set date as a primary key and post_id as sort key. I want to make a query that gets the one latest row by date. Is it possible to do it with Query? or I have to use Scan and filter that out?
Firstly, DynamoDB doesn't have aggregate functions such as min and max like RDBMS aggregate function. However, it does have one feature to get the latest date from table. In order to use that option, the attribute should be defined as SORT key. Also, the latest date can be found for the specific partition key only i.e. it does not apply for the whole table, just the chosen partition key.
ScanIndexForward can be used to get the latest item for the specific partition key.
ScanIndexForward — (Boolean) Specifies the order for index traversal:
If true (default), the traversal is performed in ascending order; if
false, the traversal is performed in descending order.
In order to use the above option, the table design has to be changed.
post_id - partition key
date - sort key

getting results for a list of primary keys from dynamodb using table

I have a dynamodB table with which i fetch a single row in the following way:
private Table myTable;
myTable = dynamoDB.getTable(tableName);
myTable.getItem(new PrimaryKey(primaryKey, primaryKeyValue));
Is there a way for me to retrieve with a list of primary keys? I see that I can use batchGetItem but however for that I will need to use the interface AmazonDynamoDB. Is there an alternative way using the table?
To get all items in your table you need to use Scan operation:
The Scan operation returns one or more items and item attributes by
accessing every item in a table or a secondary index. To have DynamoDB
return fewer items, you can provide a FilterExpression operation.
If the total number of scanned items exceeds the maximum data set size
limit of 1 MB, the scan stops and results are returned to the user as
a LastEvaluatedKey value to continue the scan in a subsequent
operation. The results also include the number of items exceeding the
limit. A scan can result in no table data meeting the filter criteria.
By default it will return all fields, but you can provide a projection expression to get only some fields (ids in your case):
To read data from a table, you use operations such as GetItem, Query,
or Scan. DynamoDB returns all of the item attributes by default. To
get just some, rather than all of the attributes, use a projection
expression.
A projection expression is a string that identifies the attributes you
want. To retrieve a single attribute, specify its name. For multiple
attributes, the names must be comma-separated.
Keep in mind that scans are expensive, since you pay not for items that DynamoDB returns, but for items that DynamoDB reads in the database:
A Scan operation always scans the entire table or secondary index,
then filters out values to provide the desired result, essentially
adding the extra step of removing data from the result set. Avoid
using a Scan operation on a large table or index with a filter that
removes many results, if possible. Also, as a table or index grows,
the Scan operation slows. The Scan operation examines every item for
the requested values, and can use up the provisioned throughput for a
large table or index in a single operation. For faster response times,
design your tables and indexes so that your applications can use Query
instead of Scan. (For tables, you can also consider using the GetItem
and BatchGetItem APIs.).
Reasons for not having batch get item on Table class:-
Table class is Thread safe
Table class implements the atomic operations of items such as DeleteItemApi, GetItemApi, PutItemApi, QueryApi, ScanApi, UpdateItemApi
Batch get item needs to deal with multiple items
The most important point is that Batch get item can get items from multiple tables
Example code to get items from multiple tables:-
The below code get items from Movies and Post table
DynamoDB dynamoDB = new DynamoDB(dynamoDBClient);
TableKeysAndAttributes movieTableKeyAndAttributes = new TableKeysAndAttributes("Movies").withPrimaryKeys(new PrimaryKey("yearkey",1999 ,"title", "List test title"));
TableKeysAndAttributes postableKeyAndAttributes = new TableKeysAndAttributes("post").withPrimaryKeys(new PrimaryKey("postId", "14"));
BatchGetItemSpec batchGetItemSpec = new BatchGetItemSpec().withTableKeyAndAttributes(movieTableKeyAndAttributes,postableKeyAndAttributes);
BatchGetItemOutcome batchGetItemOutcome = dynamoDB.batchGetItem(batchGetItemSpec);
System.out.println(batchGetItemOutcome.getBatchGetItemResult().getResponses());

Does dynamodb support something like an "in" clause in its queries?

Say I have table of photos and users.
Given I have a list of users I'm following [user1,user2,...] and I want to get a list of photos of people I'm following.
How can I query the table of photos where photo.createdBy in [user1,user2,user3...]
I saw that dynamodb has a batch operation, but that takes a primary key, and in this case we would be querying against a secondary index (createdBy).
Is there a way to do a query like this in dynamodb?
If you are querying purely on photo.createdBy, then you should create a global secondary index:
To speed up queries on non-key attributes, you can create a global secondary index. A global secondary index contains a selection of attributes from the table, but they are organized by a primary key that is different from that of the table. The index key does not need to have any of the key attributes from the table; it doesn't even need to have the same key schema as a table.
This will, of course, only retrieve one item. To limit results when returning more items, use a FilterExpression:
With a Query or a Scan operation, you can provide an optional filter expression to refine the results returned to you. A filter expression lets you apply conditions to the data after it is queried or scanned, but before it is returned to you. Only the items that meet your conditions are returned.
This can be applied to a Filter or Scan, but be careful of using too many Read Capacity Units when scanning for matching entries.

Resources