In mongo we can do something like as follows in order to select first or last count elements:
The document looks like as follows:
{
id: 123,
aliases: [
{name: john}
{name: alpha}
{name: tom}
{name: alpha}
]
}
You can query in mongo and also restrict the number of aliases you want to retrieve from the database as follows:
db.collection.find( { field: value }, { array: {$slice: count } } );
where,
count = 3
Is there anything straightforward way to achieve the same result in DynamoDB?
There is no exact equivalent available on DynamoDB. In fact, there is no near equivalent as well.
DynamoDB has a feature to limit the items on evaluation process. However, it is not equivalent to limiting the number of items in the result set.
Limit — (Integer)
The maximum number of items to evaluate (not necessarily the number of
matching items). If DynamoDB processes the number of items up to the
limit while processing the results, it stops the operation and returns
the matching values up to that point, and a key in LastEvaluatedKey to
apply in a subsequent operation, so that you can pick up where you
left off. Also, if the processed data set size exceeds 1 MB before
DynamoDB reaches this limit, it stops the operation and returns the
matching values up to the limit, and a key in LastEvaluatedKey to
apply in a subsequent operation to continue the operation. For more
information, see Query and Scan in the Amazon DynamoDB Developer
Guide.
Related
I have a doubt about Limit on query/scans on DynamoDB.
My table has 1000 records, and the query on all of them return 50 values, but if I put a Limit of 5, that doesn't mean that the query will return the first 5 values, it just say that query for 5 Items on the table (in any order, so they could be very old items or new ones), so it's possible that I got 0 items on the query. How can actually get the latest 5 items of a query? I need to set a Limit of 5 (numbers are examples) because it will to expensive to query/scan for more items than that.
The query has this input
{
TableName: 'transactionsTable',
IndexName: 'transactionsByUserId',
ProjectionExpression: 'origin, receiver, #valid_status, createdAt, totalAmount',
KeyConditionExpression: 'userId = :userId',
ExpressionAttributeValues: {
':userId': 'user-id',
':payment_gateway': 'payment_gateway'
},
ExpressionAttributeNames: {
'#valid_status': 'status'
},
FilterExpression: '#valid_status = :payment_gateway',
Limit: 5
}
The index of my table is like this:
Should I use a second index or something, to sort them with the field createdAt but then, how I'm sure that the query will look into all the items?
if I put a Limit of 5, that doesn't mean that the query will return the first 5 values, it just say that query for 5 Items on the table (in any order, so they could be very old items or new ones), so it's possible that I got 0 items on the query. How can actually get the latest 5 items of a query?
You are correct in your observation, and unfortunately there is no Query options or any other operation that can guarantee 5 items in a single request. To understand why this is the case (it's not just laziness on Amazon's side), consider the following extreme case: you have a huge database with one billion items, but do a very specific query which has just 5 matching items, and now making the request you wished for: "give me back 5 items". Such a request would need to read the entire database of a billion items, before it can return anything, and the client will surely give up by then. So this is not how DyanmoDB's Limit works. It limits the amount of work that DyanamoDB needs to do before responding. So if Limit = 100, DynamoDB will read internally 100 items, which takes a bounded amount of time. But you are right that you have no idea whether it will respond with 100 items (if all of them matched the filter) or 0 items (if none of them matched the filter).
So to do what you want to do efficiently, you'll need to think of a different way to model your data - i.e., how to organize the partition and sort keys. There are different ways to do it, each has its own benefits and downsides, you'll need to consider your options for yourself. Since you asked about GSI, I'll give you some hints about how to use that option:
The pattern you are looking for is called filtered data retrieval. As you noted, if you do a GSI with the sort key being createdAt, you can retrieve the newest items first. But you still need to do a filter, and still don't know how to stop after 5 filtered results (and not 5 pre-filtering) results. The solution is to ask DynamoDB to only put in the GSI, in the first place, items which pass the filtering. In your example, it seems you always use the same filter: "status = payment_gateway". DynamoDB doesn't have an option to run a generic filter function when building the GSI, but it has a different trick up its sleeve to achieve the same thing: Any time you set "status = payment_gateway", also set another attribute "status_payment_gateway", and when status is set to something else, delete the "status_payment_gateway". Now, create the GSI with "status_payment_gateway" as the partition key. DynamoDB will only put items in the GSI if they have this attribute, thereby achieving exactly the filtering you want.
You can also have multiple mutually-exclusive filtering criteria in one GSI by setting the partition key attribute to multiple different values, and you can then do a Query on each of these values separately (using KeyConditionExpression).
I have a dynamo DB table (id(pk),name(sk),email,date,itemId(number))
and GSI on (itemId pk, date(sk)
trying to query for an array of itemIds [1,2,3,4] but getting error using the IN statement in KeyExperssionValue when doing
aws.DocClient.query
const IdsArrat = [1,2,3,4,5];
const query: {
IndexName: 'accountId-createdAt-index',
KeyConditionExpression: 'itemId IN (:a1,:a2,:a3)',
ExpressionAttributeValues: {
{
':a1':1,
':a2':2,
.......
}
},
ScanIndexForward: false,
},
getting error using the IN statement in.
This it possible to query for multiple values on GSI in dynamoDb ?
You're trying to query for multiple different partition key's in a GSI. This can only be done by doing multiple individual queries (3 in the example). It's also possible with a GSI that multiple values would get returned for a single Partition key lookup, so it's better to query the partition key "itemId" individually.
See the following for reference:
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html#DDB-Query-request-KeyConditionExpression
It's not possible to have a IN and join multiple values in a query , but it's possible to use BatchGetItem to request multiple queries that are solved in parallel . This is actually very close to the IN solution you want.
The result will be a list of the elements in the table.
There are limits in the number of queries in the size of the result set < 16 MB and the number of queries < 100.
Please check this document for details :
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchGetItem.html
refering to this answer https://stackoverflow.com/a/70494101/7706503, you could try partiQL to construct similar statement for querying gsi table with multiple key,
select * from table."gsi_index_name" where partition_key in [key1,key2]
then you could send the statement with low level api in one shot, for example, in dotnet, it's called ExecuteStatementAsync
I have a dynamodB table with which i fetch a single row in the following way:
private Table myTable;
myTable = dynamoDB.getTable(tableName);
myTable.getItem(new PrimaryKey(primaryKey, primaryKeyValue));
Is there a way for me to retrieve with a list of primary keys? I see that I can use batchGetItem but however for that I will need to use the interface AmazonDynamoDB. Is there an alternative way using the table?
To get all items in your table you need to use Scan operation:
The Scan operation returns one or more items and item attributes by
accessing every item in a table or a secondary index. To have DynamoDB
return fewer items, you can provide a FilterExpression operation.
If the total number of scanned items exceeds the maximum data set size
limit of 1 MB, the scan stops and results are returned to the user as
a LastEvaluatedKey value to continue the scan in a subsequent
operation. The results also include the number of items exceeding the
limit. A scan can result in no table data meeting the filter criteria.
By default it will return all fields, but you can provide a projection expression to get only some fields (ids in your case):
To read data from a table, you use operations such as GetItem, Query,
or Scan. DynamoDB returns all of the item attributes by default. To
get just some, rather than all of the attributes, use a projection
expression.
A projection expression is a string that identifies the attributes you
want. To retrieve a single attribute, specify its name. For multiple
attributes, the names must be comma-separated.
Keep in mind that scans are expensive, since you pay not for items that DynamoDB returns, but for items that DynamoDB reads in the database:
A Scan operation always scans the entire table or secondary index,
then filters out values to provide the desired result, essentially
adding the extra step of removing data from the result set. Avoid
using a Scan operation on a large table or index with a filter that
removes many results, if possible. Also, as a table or index grows,
the Scan operation slows. The Scan operation examines every item for
the requested values, and can use up the provisioned throughput for a
large table or index in a single operation. For faster response times,
design your tables and indexes so that your applications can use Query
instead of Scan. (For tables, you can also consider using the GetItem
and BatchGetItem APIs.).
Reasons for not having batch get item on Table class:-
Table class is Thread safe
Table class implements the atomic operations of items such as DeleteItemApi, GetItemApi, PutItemApi, QueryApi, ScanApi, UpdateItemApi
Batch get item needs to deal with multiple items
The most important point is that Batch get item can get items from multiple tables
Example code to get items from multiple tables:-
The below code get items from Movies and Post table
DynamoDB dynamoDB = new DynamoDB(dynamoDBClient);
TableKeysAndAttributes movieTableKeyAndAttributes = new TableKeysAndAttributes("Movies").withPrimaryKeys(new PrimaryKey("yearkey",1999 ,"title", "List test title"));
TableKeysAndAttributes postableKeyAndAttributes = new TableKeysAndAttributes("post").withPrimaryKeys(new PrimaryKey("postId", "14"));
BatchGetItemSpec batchGetItemSpec = new BatchGetItemSpec().withTableKeyAndAttributes(movieTableKeyAndAttributes,postableKeyAndAttributes);
BatchGetItemOutcome batchGetItemOutcome = dynamoDB.batchGetItem(batchGetItemSpec);
System.out.println(batchGetItemOutcome.getBatchGetItemResult().getResponses());
Have to list all the records from a DynamoDB table, without any filter expression.
I want to limit the number of records hence using DynamoDBScanExpression with setLimit.
DynamoDBScanExpression scanExpression = new DynamoDBScanExpression();
....
// Set ExclusiveStartKey
....
scanExpression.setLimit(10);
However, the scan operation returns more than 10 results always !!!!
Is this the expected behaviour and if so how?
Python Answer
It is not possible to set a limit for scan() operations, however, it is possible to do so with a query.
A query searches through items, the rows in the database. It starts at the top or bottom of the list and finds items based on set criteria. You must have a partion and a sort key to do this.
A scan on the other hand searches through the ENTIRE database and not by items, and, as a result, is NOT ordered.
Since queries are based on items and scan is based on the ENTIRE database, only queries can support limits.
To answer OP's question, essentially it doesn't work because you're using scan not query.
Here is an example of how to use it using CLIENT syntax. (More advanced syntax version. Sorry I don't have a simpler example that uses resource. you can google that.)
def retrieve_latest_item(self):
result = self.dynamodb_client.query(
TableName="cleaning_company_employees",
KeyConditionExpression= "works_night_shift = :value",
ExpressionAttributeValues={':value': {"BOOL":"True"}},
ScanIndexForward = False,
Limit = 3
)
return result
Here is the DynamoDB module docs
I am currently using DynamoDB and having a problem scanning. I am able to get paged results in forward order by using the ExclusiveStartKey. However, regardless of whether I set ScanIndexForward true or false, I get results in forward order from my scan operation. How can i get results in reverse order from a Scan in DynamoDB?
ScanIndexForward is the correct way to get items in descending order by the range key of the table or index you are querying. From the AWS API Reference:
A value that specifies ascending (true) or descending (false)
traversal of the index. DynamoDB returns results reflecting the
requested order determined by the range key. If the data type is
Number, the results are returned in numeric order. For type String,
the results are returned in order of ASCII character code values. For
type Binary, DynamoDB treats each byte of the binary data as unsigned
when it compares binary values.
Based on the docs for Scan, I conclude that there is no way to Scan in reverse. However, I would say that you are not using DynamoDB correctly if you need to do that. When designing a schema for a database like DyanmoDB you should plan the schema based on your expected queries to ensure that almost all application queries have a good index. Scans are meant more for sys admin operations or for feeding into MapReduce or analytics. "A Scan operation always scans the entire table, then filters out values to provide the desired result, essentially adding the extra step of removing data from the result set." (Query and Scan Performance) That can lead to performance problems and other issues.
Using DynamoDB is fundamentally different from working with a traditional relational database and requires a big change in the way you think about using it. You need to decide whether DynamoDB's advantages of availability in storage and performance, reliability and availability are worth accepting its limitations.
As of now the dynamoDB scan cannot return you sorted results.
You need to use a query with a new global secondary index (GSI) with a hashkey and range field. The trick is to use a hashkey which is assigned the same value for all data in your table.
I recommend making a new field for all data and calling it "Status" and set the value to "OK", or something similar.
Then your query to get all the results sorted would look like this:
{
TableName: "YourTable",
IndexName: "Status-YourRange-index",
KeyConditions: {
Status: {
ComparisonOperator: "EQ",
AttributeValueList: [
"OK"
]
}
},
ScanIndexForward: false
}
The docs for how to write GSI queries are found here: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html#GSI.Querying