AppSync query resolver: are expressionNames and expressionValues necessary? - amazon-dynamodb

https://docs.aws.amazon.com/appsync/latest/devguide/resolver-mapping-template-reference-dynamodb.html#aws-appsync-resolver-mapping-template-reference-dynamodb-query
AppSync doc says that expressionNames and expressionValues are optional fields, but they are always populated by code generation. First question, should they be included as a best practice when working with DynamoDB? If so, why?
AppSync resolver for a query on the partition key:
{
"version": "2017-02-28",
"operation": "Query",
"query": {
"expression": "#partitionKey = :partitionKey",
"expressionNames": {
"#partitionKey": "partitionKey"
},
"expressionValues": {
":partitionKey": {
"S": "${ctx.args.partitionKey}"
}
}
}
}
Second question, what exactly is the layman translation of the expression field here in the code above? What exactly is that statement telling DynamoDB to do? What is the use of the # in "expression": "#partitionKey = :partitionKey" and are the expression names and values just formatting safeguards?

Let me answer your second question first:
expressionNames
expressionNames are used for interpolation. What this means is after interpolation, this filter expression object:
"expression": "#partitionKey = :value",
"expressionNames": {
"#partitionKey": "id"
}
will be transformed to:
"expression": "id = :value",
the #partitionKey acts as a placeholder for your column name id. '#' happens to be the delimiter.
But why?
expressionNames are necessary because certain keywords are reserved by DynamoDB, meaning you can't use these words inside a DynamoDB expression.
expressionValues
When you need to compare anything in a DynamoDB expression, you will need also to use a substitute for the actual value using a placeholder, because the DynamoDB typed value is a complex object.
In the following example:
"expression": "myKey = :partitionKey",
"expressionValues": {
":partitionKey": {
"S": "123"
}
}
:partitionKey is the placeholder for the complex value
{
"S": "123"
}
':' is the different delimiter that tells DynamoDB to use the expressionValues map when replacing.
Why are expressionNames and expressionValues always used by code generation?
It is just simpler for the code generation logic to always use expressionNames and expressionValues because there is no need to have two code paths for reserved/non-reserved DynamoDB words. Using expressionNames will always prevent collisions!

Related

ConditionExpression for PutItem not evaluating to false

I am trying to guarantee uniqueness in my DynamoDB table, across the partition key and other attributes (but not the sort key). Something is wrong with my ConditionExpression, because it is evaluating to true and the same values are getting inserted, leading to data duplication.
Here is my table design:
email: partition key (String)
id: sort key (Number)
firstName (String)
lastName (String)
Note: The id (sort key) holds randomly generated unique number. I know... this looks like a bad design, but that is the use case I have to support.
Here is the NodeJS code with PutItem:
const dynamodb = new AWS.DynamoDB({apiVersion: '2012-08-10'})
const params = {
TableName: <table-name>,
Item: {
"email": { "S": "<email>" },
"id": { "N": "<someUniqueRandomNumber>" },
"firstName": { "S": "<firstName>" },
"lastName": { "S": "<lastName>" }
},
ConditionExpression: "attribute_not_exists(email) AND attribute_not_exists(firstName) AND attribute_not_exists(lastName)"
}
dynamodb.putItem(params, function(err, data) {
if (err) {
console.error("Put failed")
}
else {
console.log("Put succeeded")
}
})
The documentation https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Expressions.OperatorsAndFunctions.html says the following:
attribute_not_exists (path)
True if the attribute specified by path does not exist in the item.
Example: Check whether an item has a Manufacturer attribute.
attribute_not_exists (Manufacturer)
it specifically says "item" not "items" or "any item", so I think it really means that it checks only the item being overwritten. As you have a random sort key, it will always create a new item and the condition will be always true.
Any implementation which would check against a column which is not an index and would test all the records would cause a scan of all items and that is something what would not perform very well.
Here is an interesting article which covers how to deal with unique attributes in dynamodb https://advancedweb.hu/how-to-properly-implement-unique-constraints-in-dynamodb/ - the single table design together with transactions would be a possible solution for you if you can allow the additional partition keys in your table. Any other solution may be challenging under your current schema. DynamoDB has its own way of doing things and it may be frustrating to try to push to do things which it is not designed for.

filtering Dynamo DB in Step functions JSONPath

I am trying to build a step function that has a choice state based on a map in a result of a dynamo db map. An example result from my dynamo GetItem request would be.
{
"Item": {
"organisationId": {
"S": "Andys-test"
},
"id": {
"S": "Andy2"
},
"states": {
"L": [
{
"M": {
"year": {
"N": "2021"
},
"status": {
"S": "started"
}
}
},
{
"M": {
"year": {
"N": "2022"
},
"status": {
"S": "started"
}
}
}
]
},
},
My condition will be checking the status of the states map against the year 2021. I have attempted to use this JSONPath which from what I can tell is valid, although I am getting nothing in the data flow simulator in the step functions console. I have tried various iterations of the below, with quotes escaped quotes etc and can't get anything to parse the correct value out.
I have been doing this in the input selector as I can see that the result path does not support the [?()] notation.
$.Item.states.L..M[?(#.year.N == 2022) ]
Without an example choice state with some rules, it'll be hard to answer definitively but I think I'm following.
JSONPath with expression filters
I run into this sort of problem more often than I'd like when using JSONPath filter expressions and have to add "helper" tasks to get things moving (wasting valuable state transitions 😣 ).
When you specify a Path that includes a filter expression, the result is always going to be a list (and you won't be able to reference index afterwards either).
ChoiceRules don't really have a comparator that deals with arrays/lists (at least i haven't been able to get it to work, so let me know if you do 😄).
The hack I've found easiest to reason about/maintain is creating a simple Pass Task that "pops" the values I need from the filter expression and then pass that along to the Choice Task.
Here's an example from one of my previous answers (the pattern used in the PassDef task would be defined before your choice task in your scenario, instead of after)
I've found it easier to start with https://jsonpath.herokuapp.com then progress to the data flow simulator. Just remember to always keep in mind whether
you're trying implement a Path or a Reference Path!

Project by, with optional properties

I believe this question is for Tinkerpop, not specific to the CosmosDB implementation; just some semantics might be baked into my query examples.
I've developed a data layer that creates queries based on some metadata information. Currently, my data layer will only persist non-null data values to the graph vertex; this is causing troubles with my retrieval mechanism.
Provided the following data model, where the field "HomeRoute" may or may not exist on the actual vertex (depending on whether it was populated or not).
{
"ApplicationModule": string
"Title": string
"HomeRoute": string?
}
My initial query structure is as follows, which does not support the optional properties (discussed later).
g.V()
.has('ApplicationsTest', 'partitionId', '')
.project('ApplicationModule','Title','HomeRoute')
.by('ApplicationModule')
.by('Title')
.by('HomeRoute');
To simulate, we can insert a vertex:
g.addV('ApplicationsTest')
.property('partitionId', '')
.property('ApplicationModule', 'TestApp')
.property('Title', 'Test App')
.property('HomeRoute', 'testapphome');
And we can successfully query it using my base query noted above, which returns it in my desired JSON format.
[
{
"ApplicationModule": "TestApp",
"Title": "Test App",
"HomeRoute": "testapphome"
}
]
If we now insert a vertex without the HomeRoute property (since it was null within the application layer), my base query will fail.
g.addV('ApplicationsTest')
.property('partitionId', '')
.property('ApplicationModule', 'TestApp')
.property('Title', 'Test App');
Executing my base query now results in error:
Gremlin Query Execution Error: Project By: Next: The provided
traverser of key "HomeRoute" maps to nothing.
I can apply a coalesce operation against "optional" fields; my current understanding has allowed me to return a constant value in the case of undefined properties. Updating my base query as follows will return "!dbnull" when a property does not exist on the vertex:
g.V()
.has('ApplicationsTest', 'partitionId', '')
.project('ApplicationModule','Title','HomeRoute')
.by('ApplicationModule')
.by('Title')
.by(values('HomeRoute')
.fold()
.coalesce(unfold(), constant('!dbnull')));
This query when executed returns the values as expected, again in JSON format.
[
{
"ApplicationModule": "TestApp",
"Title": "Test App",
"HomeRoute": "testapphome"
},
{
"ApplicationModule": "TestApp",
"Title": "Test App",
"HomeRoute": "!dbnull"
}
]
My question (still new to Gremlin / Tinkerpop queries) - is there any way that I can get this result with only the properties which are present on the respective vertices?
My desired output from this example is below, which would allow my data layer to only unbundle the values present on the graph vertex and not have to consider string "!dbnull" values.
[
{
"ApplicationModule": "TestApp",
"Title": "Test App",
"HomeRoute": "testapphome"
},
{
"ApplicationModule": "TestApp",
"Title": "Test App"
}
]
I've found a way to achieve what I'm looking for. Would still love input from the community though, if there's optimizations or other considerations.
g.V()
.has('ApplicationsTest', 'partitionId', '')
.project('ApplicationModule','Title','HomeRoute')
.by('ApplicationModule')
.by('Title')
.by(values('HomeRoute')
.fold()
.coalesce(unfold(), constant('!dbnull')))
.local(unfold()
.where(select(values).is(without('!dbnull')))
.group().by(select(keys)).by(select(values)))
If you only need specific keys that already exist on the vertex you can use valueMap no need to use project:
g.V()
.has('ApplicationsTest', 'partitionId', '')
.valueMap("ApplicationModule", "Title", "HomeRoute").by(unfold())
example: https://gremlify.com/9fua9jsu0dh

Composite index for optional field in Cosmos

I have a collection in Cosmos DB which contains documents of different types (and schemas):
{
"partKey": "...",
"type": "type1",
"data": {
"field1": 123,
"field2": "sdfsdf"
}
}
{
"partKey": "...",
"type": "type2",
"data": {
"field3": ["123", "456", "789"]
}
}
I'm trying to create a composite index [/type, /data/field3/[]/?], but faced an issue:
The indexing path '\\/data\\/field3\\/[]\\/?' could not be accepted, failed near position '15'. Please ensure that the path is a valid path. Common errors include invalid characters or absence of quotes around labels
We don't support wildcards for Composite Indexes in Cosmos DB. Here is a composite index sample as reference.
We will update our docs to be more clear in this. I looked over these and we don't currently document this today.
Thanks.
In composite indexes, you just need to specify the paths that you want to index, rather than the values, so for your example:
"compositeIndexes":[
[
{
"path":"/type",
"order":"ascending"
},
{
"path":"/data/field3",
"order":"descending"
}
]
]
Just specify the order type you need for your queries (I've just used these ones as an example).
For different documents that have different properties underneath your data property, I believe you will have to add each composite index for each use case that you need since composite indexes don't support wildcards, so you would need to add:
/data/field1 /data/field2 etc etc
Hope this helps.

Add to list only if string doesn't already exist in DynamoDB table

I'm trying with the following code
{
ExpressionAttributeNames: {
"#items": "items"
},
ExpressionAttributeValues: {
":item": [slug]
},
Key: {
listId: listId,
userId: userData.userId,
},
UpdateExpression: "SET #items = list_append(#items,:item)",
ConditionExpression: "NOT contains (#items, :item)",
TableName: process.env.listsTableName,
}
but the item is still added even if string already exists in the list. What am I doing wrong?
The list structure is like so:
{
Item: {
userId: userData.userId,
listId: crypto.createHash('md5').update(Date.now() + userData.userId).digest('hex'),
listName: 'Wishlist',
items: [],
},
TableName: process.env.listsTableName,
};
Later Edit: I know I should use SS as it does the condition for me but SS doesn't work in my context because SS can't be empty.
As the documentation explains, the contains() function only works on a string value (checking for a substring) or a set value (checking for membership). But in your case, you don't have a set but rather a list - with are different things in DynamoDB.
If all the items which you want to add to this list are strings, and you anyway don't want duplicates in the list, the most efficient way would be to stop using a list, and instead use the set-of-strings (a.k.a. SS) type. To add an item to the set (without duplicates), you would simply use "ADD #items :item" (no need for any additional condition - duplicates will not be added).

Resources