Querying dynamodb with begins_with - amazon-dynamodb

How do I query a dynamodb with both dataset_id and an image_name. Using the code below:
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('table_name')
response = table.query(
IndexName='dataset_id',
KeyConditionExpression='dataset_id = :value AND begins_with (image_name, :name)',
ExpressionAttributeValues={
':value': str(dataset_id),
':name': {'S', 'a'}
},
Limit=int(results_per_page)
This is my dynamodb GSIs.
dymamodb GSIs
What I'm I doing wrong here?
I am expecting the dynamodb response to return images that start with 'a'.

First of all I assume your GSI is called dataset_id and its partition key is dataset_id and its sort key is image_name, if this assumption is false then your use-case is not valid.
Now to the issue I see, you are using Resource client which uses native JSON not DDB-JSON, so your query should look like this:
response = table.query(
IndexName='dataset_id',
KeyConditionExpression='dataset_id = :value AND begins_with (image_name, :name)',
ExpressionAttributeValues={
':value': str(dataset_id),
':name': 'a'
},
Limit=int(results_per_page))

Related

Boto3: querying DynamoDB with multiple sort key values

Is there any way of supplying multiple values for a DynamoDB table's Sort Key whilst doing a query in Boto3?
For a single SK value to search on, I'm doing this:
table.query(
IndexName="my_gsi",
KeyConditionExpression=Key('my_gsi_pk').eq({pk value}) & Key('my_gsi_sk').eq({sk value}),
FilterExpression={filter expression}
)
... which works.
However, my scenario involves searching on one of a couple of potential SK values, so I'd like to, in SQL terms, do something like this:
WHERE my_gsi_pk = {pk value}
AND my_gsi_sk IN ({sk value 1}, {sk value 2})
I've looked in the Boto3 documentation in the .query() section and concentrated upon the KeyConditionExpression syntax but can't identify whether this is possible or not.
The query API does not support the IN operator in the KeyConditionExpression.
Use the execute_statement API instead. This executes a PartiQL statement, which does accept the IN operator in query operations for the Partition and Sort keys:
sk = ["Foo", "Bar"]
res = client.execute_statement(
Statement=f'SELECT * FROM "my_table"."my_gsi" WHERE my_gsi_pk = ? AND my_gsi_sk IN [{",".join(["?" for k in sk])}]',
Parameters= [{"S": "1"}] + [{"S": k} for k in sk]
)
This creates a PartiQL Statement like SELECT * FROM "my_table"."my_gsi" WHERE my_gsi_pk = ? AND my_gsi_sk IN [?, ?] and substitution Parameters like [{"S": "1"}, {"S": "Foo"}, {"S": "Bar"}].
Please note that the PartiQL will spend much more RCU than the Query. You can check this by requesting ReturnConsumedCapacity = ReturnConsumedCapacity.TOTAL

boto3 resource for querying dynamodb : Query condition missed key schema element

I have a table as : AdvgCountries which has two columns
a. CountryId (String) (Parition Key)
b. CountryName(String) Sort Key
While creating the table , I created with only Partition Key and then later added a Global Secondary Index with Index name as:
CountryName-index
Type : GSI
Partition key : CountryId
Sort Key : CountryName
I am able to retrieve CountryName based upon CountryId but unable to retrieve CountryId based upon CountryName. Based upon my reading I found that there are options to do this by providing indexname but I get the following error:
botocore.exceptions.ClientError: An error occurred
(ValidationException) when calling the Query operation: Query
condition missed key schema element: CountryId
import boto3
import json
import os
from boto3.dynamodb.conditions import Key, Attr
def query_bycountryname(pCountryname, dynamodb=None):
if not dynamodb:
dynamodb = boto3.resource('dynamodb', endpoint_url="https://dynamodb.us-east-1.amazonaws.com")
table = dynamodb.Table('AdvgCountires')
print(f"table")
attributes = table.query(
IndexName="CountryName-index",
KeyConditionExpression=Key('CountryName').eq(pCountryname),
)
if 'Items' in attributes and len(attributes['Items']) == 1:
attributes = attributes['Items'][0]
print(f"before return")
return attributes
if __name__ == '__main__':
CountryName = "India"
print(f"Data for {CountryName}")
countries = query_bycountryname(CountryName)
for country in countries:
print(country['CountryId'], ":", country['CountryName'])
Any help is appreciated.
You can't be able to fetch primary key value based on sort key. DynamoDB does not work like this.
In Dynamodb, each item’s location is determined by the hash value of
its partition key.
The Query operation in Amazon DynamoDB finds items based on primary
key values.
KeyConditionExpression are used to write conditional statements by
using comparison operators that evaluate against a key and limit the
items returned. In other words, you can use special operators to
include, exclude, and match items by their sort key values.

Query DynamoDB with multiple begins_with clause in AppSync

I'm currently trying to create a dynamic query using AppSync and Apache Velocity Template Language (VTL).
I want to evaluate series of begins_with with "OR"
Such as:
{
"operation": "Query",
"query": {
"expression": "pk = :pk and (begins_with(sk,:sk) or begins_with(sk, :sk1)",
"expressionValues": {
":pk": { "S": "tenant:${context.args.tenantId}",
":sk": {"S": "my-sort-key-${context.args.evidenceId[0]}"},
":sk1": {"S": "my-sort-key-${context.args.evidenceId[1]}"}
}
}
But that isn't working. I've also tried using | instead of or but it hasn't worked either. I get:
Invalid KeyConditionExpression: Syntax error; token: "|", near: ") | begins_with" (Service: AmazonDynamoDBv2;
How can I achieve this using VTL?
Original answer
you're missing a closing parenthesis after the begins_with(sk, :sk1). That is, the third line should be:
"expression": "pk = :pk and (begins_with(sk,:sk) or begins_with(sk, :sk1))"
I just ran the fixed expression and it worked as expected.
Revised
Actually, there are subtleties.
the or operator can be used in filter-expression but not in key-condition-expressions. For instance, a = :v1 and (b = :v2 or b = :v3) will work as long as a and b are "regular" attributes. If a and b are the table's primary key (partition key, sort key) then DDB will reject the query.
Reading this answer seems that this isn't possible, as DynamoDB only accepts a single Sort key value and a single operation.
There's also no "OR" condition in the operation:
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html#DDB-Query-request-KeyConditionExpression
If you also want to provide a condition for the sort key, it must be combined using AND with the condition for the sort key. Following is an example, using the = comparison operator for the sort key:
I am going to be restructuring the access pattern to better match my request.

Delete large data with same partition key from DynamoDB

I have DynamoDB table structured like this
A B C D
1 id1 foo hi
1 id2 var hello
A is the partition key and B is the sort key.
Let' say I only have the partition key and don't know the sort key and I'd like to delete all entries have the same partition key.
So I am thinking about loading entries by query with a fixed size (e.g 1000) and delete them in a batch until there are no more entries with the partition key left in DynamoDB.
Is it possible to delete entries without loading them first?
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_DeleteItem.html
DeleteItem
Deletes a single item in a table by primary key.
For the primary key, you must provide all of the attributes. For
example, with a simple primary key, you only need to provide a value
for the partition key. For a composite primary key, you must provide
values for both the partition key and the sort key.
In order to delete an item you must provide the whole primary key (partition + sort key). So in your case you would need to query on the partition key, get all of the primary keys, then use those to delete each item. You can also use BatchWriteItem
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchWriteItem.html
BatchWriteItem
The BatchWriteItem operation puts or deletes multiple items in one or
more tables. A single call to BatchWriteItem can write up to 16 MB of
data, which can comprise as many as 25 put or delete requests.
Individual items to be written can be as large as 400 KB.
DeleteRequest - Perform a DeleteItem operation on the specified item. The item to be deleted is identified by a Key subelement: Key -
A map of primary key attribute values that uniquely identify the item.
Each entry in this map consists of an attribute name and an attribute
value. For each primary key, you must provide all of the key
attributes. For example, with a simple primary key, you only need to
provide a value for the partition key. For a composite primary key,
you must provide values for both the partition key and the sort key.
No, but you can Query all the items for the partition, and then issue an individual DeleteRequest for each item, which you can batch in multiple BatchWrite calls of up to 25 items.
JS code
async function deleteItems(tableName, partitionId ) {
const queryParams = {
TableName: tableName,
KeyConditionExpression: 'partitionId = :partitionId',
ExpressionAttributeValues: { ':partitionId': partitionId } ,
};
const queryResults = await docClient.query(queryParams).promise()
if (queryResults.Items && queryResults.Items.length > 0) {
const batchCalls = chunks(queryResults.Items, 25).map( async (chunk) => {
const deleteRequests = chunk.map( item => {
return {
DeleteRequest : {
Key : {
'partitionId' : item.partitionId,
'sortId' : item.sortId,
}
}
}
})
const batchWriteParams = {
RequestItems : {
[tableName] : deleteRequests
}
}
await docClient.batchWrite(batchWriteParams).promise()
})
await Promise.all(batchCalls)
}
}
// https://stackoverflow.com/a/37826698/3221253
function chunks(inputArray, perChunk) {
return inputArray.reduce((all,one,i) => {
const ch = Math.floor(i/perChunk);
all[ch] = [].concat((all[ch]||[]),one);
return all
}, [])
}
For production databases and critical Amazon DynamoDB tables, recommendation is to use batch-write-item to purge huge data.
batch-write-item (with DeleteRequest) is 10 to 15 times faster than delete-item.
aws dynamodb scan --table-name "test_table_name" --projection-expression "primary_key, timestamp" --filter-expression "timestamp < :oldest_date" --expression-attribute-values '{":oldest_date":{"S":"2020-02-01"}}' --max-items 25 --total-segments "$TOTAL_SEGMENT" --segment "$SEGMENT_NUMBER" > $SCAN_OUTPUT_FILE
cat $SCAN_OUTPUT_FILE | jq -r ".Items[] | tojson" | awk '{ print "{\"DeleteRequest\": {\"Key\": " $0 " }}," }' | sed '$ s/.$//' | sed '1 i { "test_table_name": [' | sed '$ a ] }' > $INPUT_FILE
aws dynamodb batch-write-item --request-items file://$INPUT_FILE
Please find more information # https://medium.com/analytics-vidhya/how-to-delete-huge-data-from-dynamodb-table-f3be586c011c
You can use "begins_with" on the range key.
For example (pseudo code)
DELETE WHERE A = '1' AND B BEGINS_WITH 'id'

Query DynamoDB with a hash key and a range key with Boto3

I am having trouble using AWS Boto3 to query DynamoDB with a hash key and a range key at the same time using the recommend KeyConditionExpression. I have attached an example query:
import boto3
from boto3 import dynamodb
from boto3.session import Session
dynamodb_session = Session(aws_access_key_id=AWS_KEY,
aws_secret_access_key=AWS_PASS,
region_name=DYNAMODB_REGION)
dynamodb = dynamodb_session.resource('dynamodb')
table=dynamodb.Table(TABLE_NAME)
request = {
'ExpressionAttributeNames': {
'#n0': 'hash_key',
'#n1': 'range_key'
},
'ExpressionAttributeValues': {
':v0': {'S': MY_HASH_KEY},
':v1': {'N': GT_RANGE_KEY}
},
'KeyConditionExpression': '(#n0 = :v0) AND (#n1 > :v1)',
'TableName': TABLE_NAME
}
response = table.query(**request)
When I run this against a table with the following scheme:
Table Name: TABLE_NAME
Primary Hash Key: hash_key (String)
Primary Range Key: range_key (Number)
I get the following error and I cannot understand why:
ClientError: An error occurred (ValidationException) when calling the Query operation: Invalid KeyConditionExpression: Incorrect operand type for operator or function; operator or function: >, operand type: M
From my understanding the type M would be a map or dictionary type and I am using a type N which is a number type and matches my table scheme for the range key. If someone could explain why this error is happening or I am also open to a different way of accomplishing the same query even if you cannot explain why this error exists.
The Boto 3 SDK constructs a Condition Expression for you when you use the Key and Attr functions imported from boto3.dynamodb.conditions:
response = table.query(
KeyConditionExpression=Key('hash_key').eq(hash_value) & Key('range_key').eq(range_key_value)
)
Reference: Step 4: Query and Scan the Data
Hope it helps
Adding this solution as the accepted answer did not address why the query used did not work.
TLDR: Using query on a Table resource in boto3 has subtle differences as opposed to using client.query(...) and requires a different syntax.
The syntax is valid for a query on a client, but not on a Table. The ExpressionAttributeValues on a table do not require you to specify the data type. Also if you are executing a query on a Table resource you do not have to specify the TableName again.
Working solution:
from boto3.session import Session
dynamodb_session = Session(aws_access_key_id=AWS_KEY,aws_secret_access_key=AWS_PASS,region_name=DYNAMODB_REGION)
dynamodb = dynamodb_session.resource('dynamodb')
table = dynamodb.Table(TABLE_NAME)
request = {
'ExpressionAttributeNames': {
'#n0': 'hash_key',
'#n1': 'range_key'
},
'ExpressionAttributeValues': {
':v0': MY_HASH_KEY,
':v1': GT_RANGE_KEY
},
'KeyConditionExpression': '(#n0 = :v0) AND (#n1 > :v1)',
}
response = table.query(**request)
I am the author of a package called botoful which might be useful to avoid dealing with these complexities. The code using botoful will be as follows:
import boto3
from botoful import Query
client = boto3.Session(
aws_access_key_id=AWS_KEY,
aws_secret_access_key=AWS_PASS,
region_name=DYNAMODB_REGION
).client('dynamodb')
results = (
Query(TABLE_NAME)
.key(hash_key=MY_HASH_KEY, range_key__gt=GT_RANGE_KEY)
.execute(client)
)
print(results.items)

Resources