Boto3: querying DynamoDB with multiple sort key values - amazon-dynamodb

Is there any way of supplying multiple values for a DynamoDB table's Sort Key whilst doing a query in Boto3?
For a single SK value to search on, I'm doing this:
table.query(
IndexName="my_gsi",
KeyConditionExpression=Key('my_gsi_pk').eq({pk value}) & Key('my_gsi_sk').eq({sk value}),
FilterExpression={filter expression}
)
... which works.
However, my scenario involves searching on one of a couple of potential SK values, so I'd like to, in SQL terms, do something like this:
WHERE my_gsi_pk = {pk value}
AND my_gsi_sk IN ({sk value 1}, {sk value 2})
I've looked in the Boto3 documentation in the .query() section and concentrated upon the KeyConditionExpression syntax but can't identify whether this is possible or not.

The query API does not support the IN operator in the KeyConditionExpression.
Use the execute_statement API instead. This executes a PartiQL statement, which does accept the IN operator in query operations for the Partition and Sort keys:
sk = ["Foo", "Bar"]
res = client.execute_statement(
Statement=f'SELECT * FROM "my_table"."my_gsi" WHERE my_gsi_pk = ? AND my_gsi_sk IN [{",".join(["?" for k in sk])}]',
Parameters= [{"S": "1"}] + [{"S": k} for k in sk]
)
This creates a PartiQL Statement like SELECT * FROM "my_table"."my_gsi" WHERE my_gsi_pk = ? AND my_gsi_sk IN [?, ?] and substitution Parameters like [{"S": "1"}, {"S": "Foo"}, {"S": "Bar"}].

Please note that the PartiQL will spend much more RCU than the Query. You can check this by requesting ReturnConsumedCapacity = ReturnConsumedCapacity.TOTAL

Related

Dynamodb query with date range issue in python boto3

I have a DynamoDB table, I need to find the records which are between the given date range.
So here is my table structure
{
"Id":"String",
"Name":"String",
"CrawledAt":"String"
}
In this table partition key as Id and CrawledAt fileds used. And also created local secondary index with CrawledAt field and it's name "CrawledAt-index"
When querying most of the articles using Id with CreatedAt. But in my case I don't know what is the Id, I only need to retrieve records for a particular date range.
Here is the code I have tried
request = {
"TableName": "sflnd00001-test",
"IndexName": "CrawledAt-index",
"ConsistentRead": False,
"ProjectionExpression": "Name",
"KeyConditionExpression":
"CrawledAt between :v_start and :v_end",
"ExpressionAttributeValues": {
":v_start": {"S": "2020-01-31T00:00:00.000Z"},
":v_end": {"S": "2025-11-31T00:00:00.000Z"} }
}
response = table.query(**request)
It's returning this error
"An error occurred (ValidationException) when calling the Query operation: Invalid KeyConditionExpression: Incorrect operand type for operator or function; operator or function: BETWEEN, operand type: M",
Can someone please tell me how to find data set with the given date range without providing primary key
You cannot do a between or any other function on a partition key, you must always provide the entire key.
For your use-case your GSI partition key should be a single value, and the crawledAt should be the sort key.
{
"Id":"String",
"Name":"String",
"CrawledAt":"String",
"GsiPk": "Number"
}
.
"KeyConditionExpression":
"GsiPk = 1 AND CrawledAt between :v_start and :v_end"
This would then allow you to retrieve all the data in the table between two dates. But be aware of the caveat of doing this, using a single value for a GSIPK is not scalable, and would cap the write requests to approx 1000WCU.
If you need more scale you can assign a random number to the GSIPK (n) to increase the number of partitions which would then require you to make (n) queries to collect all the data.
Alternatively you can Scan the table and use FilterExpression which is also not a scalable solution:
aws dynamodb scan \
--table-name MusicCollection \
--filter-expression "timestamp between :a and :b" \
--expression-attribute-names file://expression-attribute-names.json \
--expression-attribute-values file://expression-attribute-values.json

PartiQL BatchExecuteStatementCommand in DynamoDB SDK v3

I'm attempting to send a batch of PartiQL statements in the NodeJS AWS SDK v3. The statement works fine for a single ExecuteStatementCommand, but the Batch command doesn't.
The statement looks like
const statement = `
SELECT *
FROM "my-table"
WHERE "partitionKey" = '1234'
AND "filterKey" = '5678'
`
This code snippet works as expected:
const result = await dynamodbClient.send(new ExecuteStatementCommand(
{ Statement: statement}
))
The batch snippet does not:
const result = await dynamodbClient.send(new BatchExecuteStatementCommand({
Statements: [
{
Statement: statement
}
]
}))
The batch call produces the following error:
"Code": "ValidationError",
"Message": "Select statements within BatchExecuteStatement must specify the primary key in the where clause."
Any insight is greatly appreciated. Thanks for taking the time to reading my question!
Seems like what I needed was a rubber duck.
DynamoDB primary keys consists of partition key + sort key. My particular table has a sort key, which is missing from the statement. Batch jobs cannot handle filtering of responses, and each statement must match a single item in the database.

Query DynamoDB with multiple begins_with clause in AppSync

I'm currently trying to create a dynamic query using AppSync and Apache Velocity Template Language (VTL).
I want to evaluate series of begins_with with "OR"
Such as:
{
"operation": "Query",
"query": {
"expression": "pk = :pk and (begins_with(sk,:sk) or begins_with(sk, :sk1)",
"expressionValues": {
":pk": { "S": "tenant:${context.args.tenantId}",
":sk": {"S": "my-sort-key-${context.args.evidenceId[0]}"},
":sk1": {"S": "my-sort-key-${context.args.evidenceId[1]}"}
}
}
But that isn't working. I've also tried using | instead of or but it hasn't worked either. I get:
Invalid KeyConditionExpression: Syntax error; token: "|", near: ") | begins_with" (Service: AmazonDynamoDBv2;
How can I achieve this using VTL?
Original answer
you're missing a closing parenthesis after the begins_with(sk, :sk1). That is, the third line should be:
"expression": "pk = :pk and (begins_with(sk,:sk) or begins_with(sk, :sk1))"
I just ran the fixed expression and it worked as expected.
Revised
Actually, there are subtleties.
the or operator can be used in filter-expression but not in key-condition-expressions. For instance, a = :v1 and (b = :v2 or b = :v3) will work as long as a and b are "regular" attributes. If a and b are the table's primary key (partition key, sort key) then DDB will reject the query.
Reading this answer seems that this isn't possible, as DynamoDB only accepts a single Sort key value and a single operation.
There's also no "OR" condition in the operation:
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html#DDB-Query-request-KeyConditionExpression
If you also want to provide a condition for the sort key, it must be combined using AND with the condition for the sort key. Following is an example, using the = comparison operator for the sort key:
I am going to be restructuring the access pattern to better match my request.

What's the equivalent DynamoDB solution for this MySQL Query?

I'm familiar with MySQL and am starting to use Amazon DynamoDB for a new project.
Assume I have a MySQL table like this:
CREATE TABLE foo (
id CHAR(64) NOT NULL,
scheduledDelivery DATETIME NOT NULL,
-- ...other columns...
PRIMARY KEY(id),
INDEX schedIndex (scheduledDelivery)
);
Note the secondary Index schedIndex which is supposed to speed-up the following query (which is executed periodically):
SELECT *
FROM foo
WHERE scheduledDelivery <= NOW()
ORDER BY scheduledDelivery ASC
LIMIT 100;
That is: Take the 100 oldest items that are due to be delivered.
With DynamoDB I can use the id column as primary partition key.
However, I don't understand how I can avoid full-table scans in DynamoDB. When adding a secondary index I must always specify a "partition key". However, (in MySQL words) I see these problems:
the scheduledDelivery column is not unique, so it can't be used as a partition key itself AFAIK
adding id as unique partition key and using scheduledDelivery as "sort key" sounds like a (id, scheduledDelivery) secondary index to me, which makes that index pratically useless
I understand that MySQL and DynamoDB require different approaches, so what would be a appropriate solution in this case?
It's not possible to avoid a full table scan with this kind of query.
However, you may be able to disguise it as a Query operation, which would allow you to sort the results (not possible with a Scan).
You must first create a GSI. Let's name it scheduled_delivery-index.
We will specify our index's partition key to be an attribute named fixed_val, and our sort key to be scheduled_delivery.
fixed_val will contain any value you want, but it must always be that value, and you must know it from the client side. For the sake of this example, let's say that fixed_val will always be 1.
GSI keys do not have to be unique, so don't worry if there are two duplicated scheduled_delivery values.
You would query the table like this:
var now = Date.now();
//...
{
TableName: "foo",
IndexName: "scheduled_delivery-index",
ExpressionAttributeNames: {
"#f": "fixed_value",
"#d": "scheduled_delivery"
},
ExpressionAttributeValues: {
":f": 1,
":d": now
},
KeyConditionExpression: "#f = :f and #d <= :d",
ScanIndexForward: true
}

Query DynamoDB with a hash key and a range key with Boto3

I am having trouble using AWS Boto3 to query DynamoDB with a hash key and a range key at the same time using the recommend KeyConditionExpression. I have attached an example query:
import boto3
from boto3 import dynamodb
from boto3.session import Session
dynamodb_session = Session(aws_access_key_id=AWS_KEY,
aws_secret_access_key=AWS_PASS,
region_name=DYNAMODB_REGION)
dynamodb = dynamodb_session.resource('dynamodb')
table=dynamodb.Table(TABLE_NAME)
request = {
'ExpressionAttributeNames': {
'#n0': 'hash_key',
'#n1': 'range_key'
},
'ExpressionAttributeValues': {
':v0': {'S': MY_HASH_KEY},
':v1': {'N': GT_RANGE_KEY}
},
'KeyConditionExpression': '(#n0 = :v0) AND (#n1 > :v1)',
'TableName': TABLE_NAME
}
response = table.query(**request)
When I run this against a table with the following scheme:
Table Name: TABLE_NAME
Primary Hash Key: hash_key (String)
Primary Range Key: range_key (Number)
I get the following error and I cannot understand why:
ClientError: An error occurred (ValidationException) when calling the Query operation: Invalid KeyConditionExpression: Incorrect operand type for operator or function; operator or function: >, operand type: M
From my understanding the type M would be a map or dictionary type and I am using a type N which is a number type and matches my table scheme for the range key. If someone could explain why this error is happening or I am also open to a different way of accomplishing the same query even if you cannot explain why this error exists.
The Boto 3 SDK constructs a Condition Expression for you when you use the Key and Attr functions imported from boto3.dynamodb.conditions:
response = table.query(
KeyConditionExpression=Key('hash_key').eq(hash_value) & Key('range_key').eq(range_key_value)
)
Reference: Step 4: Query and Scan the Data
Hope it helps
Adding this solution as the accepted answer did not address why the query used did not work.
TLDR: Using query on a Table resource in boto3 has subtle differences as opposed to using client.query(...) and requires a different syntax.
The syntax is valid for a query on a client, but not on a Table. The ExpressionAttributeValues on a table do not require you to specify the data type. Also if you are executing a query on a Table resource you do not have to specify the TableName again.
Working solution:
from boto3.session import Session
dynamodb_session = Session(aws_access_key_id=AWS_KEY,aws_secret_access_key=AWS_PASS,region_name=DYNAMODB_REGION)
dynamodb = dynamodb_session.resource('dynamodb')
table = dynamodb.Table(TABLE_NAME)
request = {
'ExpressionAttributeNames': {
'#n0': 'hash_key',
'#n1': 'range_key'
},
'ExpressionAttributeValues': {
':v0': MY_HASH_KEY,
':v1': GT_RANGE_KEY
},
'KeyConditionExpression': '(#n0 = :v0) AND (#n1 > :v1)',
}
response = table.query(**request)
I am the author of a package called botoful which might be useful to avoid dealing with these complexities. The code using botoful will be as follows:
import boto3
from botoful import Query
client = boto3.Session(
aws_access_key_id=AWS_KEY,
aws_secret_access_key=AWS_PASS,
region_name=DYNAMODB_REGION
).client('dynamodb')
results = (
Query(TABLE_NAME)
.key(hash_key=MY_HASH_KEY, range_key__gt=GT_RANGE_KEY)
.execute(client)
)
print(results.items)

Resources