How to Filter Nested Array Object in DynamoDB - amazon-dynamodb

I am very beginner to AWS DynamoDB, I want to scan the DynamoDB with SENDTO.emailAddress = "first#first.com" as FilterExpression.
The DB Structure looks like this
{
ID
NAME
MESSAGE
SENDTO[
{
name
emailAddress
}
]
}
A Sample Data
{
ID: 1,
NAME: "HELLO",
MESSAGE: "HELLO WORLD!",
SENDTO: [
{
name: "First",
emailAddress: "first#first.com"
},
{
name: "Second",
emailAddress: "second#first.com"
}
]
}
I want to retrieve document that match emailAddress. I tried to scan with filter expression and here is my code to retrieve the data. I am using AWS Javascript SDK.
let params = {
TableName : "email",
FilterExpression: "SENDTO.emailAddress = :emailAddress",
ExpressionAttributeValues: {
":emailAddress": "first#first.com",
}
}
let result = await ctx.docClient.scan(params).promise();

In order to find the item by sendto attribute, you need to know both name and emailAddress attribute value. DynamoDB can't find the data by just one of the attributes in an object (i.e. email attribute value alone).
CONTAINS function can be used to find the data in List data type.
CONTAINS is supported for lists: When evaluating "a CONTAINS b", "a"
can be a list; however, "b" cannot be a set, a map, or a list.
Sample code using Contains:-
var params = {
TableName: "email",
FilterExpression: "contains (SENDTO, :sendToVal)",
ExpressionAttributeValues: {
":sendToVal": {
"name" : "First",
"emailAddress" : "first#first.com"
}
}
};
If you don't know the value of name and emailAddress attribute, you may need to remodel the data to fulfill your use case.

I think that you should create two tables for users and for messages.
The user table has partition_key: user_id and sort_key: email and a field with an array of his messages ids.
The message table has partition_key: message_id and a field with an array of users ids.
When you will get the array of users ids you can use BATCH GET query to get all users of one message.
When you will get the array of message ids you can use BATCH GET query to get all messages of one user.
If you want to get one user by email you can use QUERY method.
Docs

Related

Does azure cosmosdb have like any operator to find records matching a pattern

I am just wondering how do I do the below using the SQL API on Azure CosmosDB -
SELECT user_id FROM users WHERE user_id LIKE ANY(contacts);
The above statement works on postgres, wondering if there is anything similar in Azure CosmosDB.
The above statement receives a set of contacts in an array format like this ["4160000000","7780000000"] and finds the corresponding records in Postgres db.
UPDATE #Sajeetharan
Below are the documents I have in Cosmos DB-
{
"users": [
{
"partitionKey": "user",
"userPhoneNumber": "14161231234",
"userDisplayName": "Test User 1"
},
{
"partitionKey": "user",
"userPhoneNumber": "18055678978",
"userDisplayName": "Test User 2"
},
{
"partitionKey": "user",
"userPhoneNumber": "17202228799",
"userDisplayName": "Test User 3"
},
{
"partitionKey": "user",
"userPhoneNumber": "17780265987",
"userDisplayName": "Test User 4"
}
]
}
I will be sending in a set of userPhoneNumbers from javascript in an array format like below and then I need the SQL query to return the corresponding records in cosmos db.
var userPhoneNumbers = ["4161231234","7202228799"];
The above array has two values, which when sent to the cosmosdb should return the first and third record.
The userPhoneNumbers sent in will be sometimes missing the country code, so the search should be performed using CONTAINS or ENDSWITH.
Please advise!
As mentioned in question, need to perform like operation on phone numbers passing as array.
In cosmosDb there is no in-built function helps us to achieve the result. The way to achieve expected result is using cosmosDb UDF. Below is code snippet for the same.
function findUserNameByPhone(users, userPhoneNumbers) {
var s, i, j;
let result = [];
for(j = 0; j < userPhoneNumbers.length; j++)
{
for (i = 0; i < users.length; i++)
{
s = users[i];
if(s.userPhoneNumber.match(userPhoneNumbers[j]))
result = result.concat(s);
}
}
return result;
}
Consume the udf in query :-
SELECT udf.findUserNameByPhone(c.users,["4161231234","7202228799"]) FROM c
Edit as per comment
Use the UDF in select query. Also, as per your latest comment, if you need the result based on specific partition key you can use the self join as shown in the updated query. As partitionKey property is part of users array, hence the self join by passing the partitionKey value in where clause.
SELECT DISTINCT udf.findUserNameByPhone(c.users,["4161231234","7202228799"]) FROM c
JOIN u in c.users WHERE u.partitionKey = "user"
Consuming parameterized queries in Cosmos DB using Node.js check here

How to force the DynamoDB query's ExclusiveStartKey to use exact match?

I'm using DynamoDB for my new Serverless Restful API with nodejs.
The Restful API supports query for resources with the limit and lastKey query parameters for key pagination.
Assume there's a table like below:
PK
SK
School
firstSchool
School
secondSchool
School
thirdSchool
PK is partition key, and SK is sort key.
I use SK for key pagination.
If I call the api with http://somewhere/api/school?limit=1&lastKey=secondSchool, ExclusiveStartKey in query will be {"PK" : "School", "SK" : "secondSchool"}, and the returned item will be {"PK" : "School", "SK" : "thirdSchool"}.
It works well in that case, but the problem is the same result is created with the url like http://somewhere/api/school?limit=1&lastKey=seco.
In this case, ExclusiveStartKey in query will be {"PK" : "School", "SK" : "seco"}
It seems DynamoDB doesn't use exact match for a sk value in ExclusiveStartKey.
Is there any way to force DynamoDB to use exact match for ExclusiveStartKey?
I attach my test code below:
const { DynamoDBClient } = require("#aws-sdk/client-dynamodb");
const { DynamoDBDocument } = require("#aws-sdk/lib-dynamodb");
const ddbClient = new DynamoDBClient({
region: AWS_REGION,
endpoint: AWS_DYNAMODB_END_POINT,
credentials: {
accessKeyId: AWS_ACCESSKEY_ID,
secretAccessKey: AWS_SECRET_ACCESS_KEY,
},
});
const ddbDocClient = DynamoDBDocument.from(ddbClient);
(async () => {
try {
const data = await ddbDocClient.query({
TableName: "Table Name",
KeyConditionExpression: "#pk = :pk",
ExpressionAttributeNames: {
"#pk": "PK",
},
ExpressionAttributeValues: {
":pk": "Test",
},
Limit: 1,
ExclusiveStartKey: { PK: "Test", SK: "Seco" },
});
console.log(data);
} catch (err) {
console.log("Error", err);
}
})();
The ExclusiveKeyStart is used mainly for paging large Scan or Query requests - i.e., retrieving the next page of results after the previous page ended with a LastEvaluatedKey, and you are supposed to give exactly that key (not some subset of it...) as the ExclusiveKeyStart of the next request.
You are trying to do something different, and to achieve you can't use ExclusiveKeyStart, but you can use something else:
The Query request has a KeyConditionExpression. You can specify sk > :value as a key condition expression (don't pass ExclusiveKeyStart), and you'll get this all the sort keys higher than that :value like your string "seco". Please note, however, that because your sort key is truncated, this result may actually include one or more extra results before the first key you want (e.g., the keys "seco" and "secoaaaa" come before "secondSchool") so you may need to drop them yourself from the results.
The KeyConditionExpression is implemented efficiently - DynamoDB knows how to skip directly to that sort key in the partition, and doesn't charge you for reading the entire partition, so in this respect it is just as good as ExclusiveKeyStart.

ConditionExpression for PutItem not evaluating to false

I am trying to guarantee uniqueness in my DynamoDB table, across the partition key and other attributes (but not the sort key). Something is wrong with my ConditionExpression, because it is evaluating to true and the same values are getting inserted, leading to data duplication.
Here is my table design:
email: partition key (String)
id: sort key (Number)
firstName (String)
lastName (String)
Note: The id (sort key) holds randomly generated unique number. I know... this looks like a bad design, but that is the use case I have to support.
Here is the NodeJS code with PutItem:
const dynamodb = new AWS.DynamoDB({apiVersion: '2012-08-10'})
const params = {
TableName: <table-name>,
Item: {
"email": { "S": "<email>" },
"id": { "N": "<someUniqueRandomNumber>" },
"firstName": { "S": "<firstName>" },
"lastName": { "S": "<lastName>" }
},
ConditionExpression: "attribute_not_exists(email) AND attribute_not_exists(firstName) AND attribute_not_exists(lastName)"
}
dynamodb.putItem(params, function(err, data) {
if (err) {
console.error("Put failed")
}
else {
console.log("Put succeeded")
}
})
The documentation https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Expressions.OperatorsAndFunctions.html says the following:
attribute_not_exists (path)
True if the attribute specified by path does not exist in the item.
Example: Check whether an item has a Manufacturer attribute.
attribute_not_exists (Manufacturer)
it specifically says "item" not "items" or "any item", so I think it really means that it checks only the item being overwritten. As you have a random sort key, it will always create a new item and the condition will be always true.
Any implementation which would check against a column which is not an index and would test all the records would cause a scan of all items and that is something what would not perform very well.
Here is an interesting article which covers how to deal with unique attributes in dynamodb https://advancedweb.hu/how-to-properly-implement-unique-constraints-in-dynamodb/ - the single table design together with transactions would be a possible solution for you if you can allow the additional partition keys in your table. Any other solution may be challenging under your current schema. DynamoDB has its own way of doing things and it may be frustrating to try to push to do things which it is not designed for.

How do I suppress CosmosDB "default" info in resultsets?

I want to suppress the CosmosDB information in the following resultset, how can that be done?
{
"id": null,
"_rid": null,
"_self": null,
"_ts": 0,
"_etag": null,
"topLevelCategory": "Shorts,Skirt"
},
This is an extract of course but I dont want to show the ID etc as they serve no purpose in this result but I cannot figure out how to suppress that info.
I expect the following
{
"topLevelCategory": "Shorts,Skirt"
},
Query looks as follows
$"SELECT DISTINCT locales.categories[0] AS topLevelCategory " +
$"FROM c JOIN locales in c.locales " +
$"WHERE locales.country = '{apiInputObject.Locale}' " +
$"AND locales.language = '{apiInputObject.Language}'";
Interesting thing is if I cast the result as a JOBJECT I dont get the system data, I only get it if I createDOcumentQuery as DOcument, so a workaround would be as follows
IQueryable<JObject> queryResultSet = client.CreateDocumentQuery<JObject>(UriFactory.CreateDocumentCollectionUri(databaseName, databaseCollection), parsedQueryObject.SqlStatement, queryOptions);
but that has other async issues but the above does not show the system generate IDs but the below one does
var query = client.CreateDocumentQuery<Document>(UriFactory.CreateDocumentCollectionUri(databaseName, databaseCollection), parsedQueryObject.SqlStatement, queryOptions).AsDocumentQuery();
var result = await query.ExecuteNextAsync<Document>();
These are system-generated properties of items in Cosmos DB.
Surely,you could filter them in the sql: select c.topLevelCategory from c, don't mention them or use select * from c. Filtering in sql is the best method, better than secondary processing of result set.
Update Answer:
Your situation is executing the exact same query the JOBJECT does not show the system data but the Document does.
My explanation as below:
Document Class is a self-contained base class of Document DB .NET package.It has these generate properties:
SDK will try to map the result data one by one to the entity class which you defined in the CreateDocumentQuery<T>.
So actually,you already find the solution.You could define your custom pojo to receive the result data. Just contain the properties you want in that pojo inside like:
class Pojo : Document
{
public string id { get; set; }
public string name { get; set; }
}
That would have both business implications and no more redundant fields.Hope i'm clear on this.

AWS AppSync Query to shape response data (Similar to Group By in SQL)

I have one DynamoDB table with all the data I need for the client, however, I want to shape the data the client receives to reduce client-side manipulation.
My Schema:
type StateCounty {
id: ID!
StateName: String
CountyName: String
FIPSST: Int
FIPSCNTY: Int
Penetration: String
Date: String
}
and to return a custom query I have the type:
type Query {
getStateCountybyState(StateName: String): StateCountyConnection
}
This works - and with a simple query
query getStateCountybyState {
getStateCountybyState (StateName: "Delaware") {
items {
StateName
CountyName
Date
}
}
}
the results are returned as expected:
{
"StateName": "Delaware",
"CountyName": "Kent",
"Date": "02-01-2017"
},
{
"StateName": "Delaware",
"CountyName": "Sussex",
"Date": "02-01-2016"
},
{
"StateName": "Delaware",
"CountyName": "New Castle",
"Date": "02-01-2018"
}
etc.
I would like to return the data in the following format:
{
"StateName": "Delaware" {
{ "CountyName": "Kent",
"Date": "02-01-2017"
},
{
"CountyName": "Sussex",
"Date": "02-01-2016"
},
{
"CountyName": "New Castle",
"Date": "02-01-2018"
}
}
}
I have tried adding GroupCounty: [StateCountyGroup] to the schema:
type StateCounty {
id: ID!
StateName: String
CountyName: String
FIPSST: Int
FIPSCNTY: Int
Penetration: String
Date: String
GroupCounty: [StateCountyGroup]
}
and then a reference to that in the query
query getStateCountybyState {
getStateCountybyState (StateName: "Delaware") {
items {
StateName
CountyName
Date
GroupCounty: [StateCountyGroup]
}
}
}
I think my issue is within the resolver - currently, it is configured to use the StateName as a key, but I am not sure how to pass the StateName from the primary query to the subquery.
Resolver:
{
"version" : "2017-02-28",
"operation" : "Query",
"query" : {
"expression" : "StateName = :StateName",
"expressionValues" : {
":StateName" : { "S" : "${context.arguments.StateName}" },
}
},
"index" : "StateName-index-copy",
"select" : "ALL_ATTRIBUTES",
}
Any guidance appreciated - I have gone through the documentation several times, but cannot find an example.
UPDATE
I tried the suggestion below from Richard - and it is definitely on the right track, however, despite multiple variations on the theme, I either return null or the following error (I eliminated some of the county objects returned in the error for brevity):
"message": "Unable to convert set($myresponse = {\n \"Delaware\":
[{SSA=8000, Eligibles=32295, FIPS=10001, StateName=Delaware, SSACNTY=0,
Date=02-01-2016, CountyName=Kent, Enrolled=3066, Penetration=0.0949,
FIPSCNTY=1, FIPSST=10, SSAST=8, id=6865},
{SSA=8010, Eligibles=91332, FIPS=10003, StateName=Delaware, SSACNTY=10, Date=02-01-2016, CountyName=New Castle, Enrolled=10322, Penetration=0.113, FIPSCNTY=3, FIPSST=10, SSAST=8, id=6866},
{SSA=0, Eligibles=10, FIPS=10, StateName=Delaware, SSACNTY=0, Date=02-01-2018, CountyName=Pending County Designation, Enrolled=0, Penetration=0, FIPSCNTY=0, FIPSST=10, SSAST=0, id=325},
{SSA=8000, Eligibles=33371, FIPS=10001, StateName=Delaware, SSACNTY=0, Date=02-01-2017, CountyName=Kent, Enrolled=3603, Penetration=0.108, FIPSCNTY=1, FIPSST=10, SSAST=8, id=3598},
{SSA=8020, Eligibles=58897, FIPS=10005, StateName=Delaware, SSACNTY=20, Date=02-01-2016, CountyName=Sussex, Enrolled=3760, Penetration=0.0638, FIPSCNTY=5, FIPSST=10, SSAST=8, id=6867}) \nnull\n\n to class java.lang.Object."
}
]
}
From reading the above, it sounds like your original query is returning the correct results that you want but not in the response format that you would prefer, as you would like the "StateName" to be a top-level JSON key with the value being a JSON object of the state which you passed in as an argument. Is that accurate? If so then why not use the same query that already works but with a different response template. Something like:
#set($myresponse = {
"$ctx.args.StateName": $ctx.result.items
})
$util.toJson($myresponse)
Note that $myresponse isn't exactly the same as you had above as your example with "stateName" : "Delaware" { ... } wasn't completely valid JSON so I didn't want to make an assumption on what a good structure would be, but the point remains if you're already getting the proper results from your query I would just try to change the structure of your GraphQL results.
Now if I misread the above and you're NOT getting the proper results from the query, the other way that I could read your statement of "primary query to the subquery" is that you're trying to apply an additional "filter" to your query results. If that is the case then you need something like this:
{
"version" : "2017-02-28",
"operation" : "Query",
"query" : {
"expression" : "StateName = :StateName",
"expressionValues" : {
":StateName" : { "S" : "${context.arguments.StateName}" },
}
},
"index" : "StateName-index-copy",
"select" : "ALL_ATTRIBUTES",
"filter" : {
"expression" : "#population >= :population",
"expressionNames" : {
"#population" : "population"
},
"expressionValues" : {
":population" : $util.dynamodb.toDynamoDBJson($ctx.args.population)
}
}
}
I used an example here where maybe your query also needed to filter by the population size in each county. This may not be representative of what you're looking for but hopefully it helps.
EDITED WITH MORE INFORMATION 4/16/18
I've written up more information on this in a step-by-step manner, to go through the concepts in pieces.
The key here is not just the response template, but also the fields that you're requesting to be returned (as this is the nature of GraphQL). Let's walk through this by way of example. Now that you're returning an individual item with GraphQL (since your response template is converting an array to a single item) so you'll need to change the expected GraphQL query response type. Suppose you have a GraphQL type in your schema like this:
type State {
id: ID!
population: String!
governor: String!
}
type Query {
allStates: [State]
}
If you just convert the response in the template as above you'll see an error like "type mismatch error, expected type LIST" if you run something like this:
query {
allStates{
id
population
}
}
That's because your response is no longer returning the individual items. Instead you'll need to change the GraphQL response type [State] to match what your template conversion is doing State like so:
type State {
StateName: String
}
type Query {
allStates: State
}
Now if your resolver request template is doing something that returns a list of items (like a DynamoDB scan or Query) you can convert the list to a single item in the response template like so:
#set($convert = {"StateName" : $ctx.result.items })
$util.toJson($convert)
Then run the following GraphQL query:
query {
allStates{
StateName
}
}
And you'll get a single object containing an array of your results back:
{
"data": {
"allStates": {
"StateName": "[{id=1, population=10000, governor=John Smith}]"
}
}
}
However while this might be pointing out the errors you are having, this is returning a StateName and from your original question I think you are looking to do a bit more by combining records in the response for some optimization, along with some potential filtering. One way to do this would be to create an array (or you could create a map {}) and populate it based on some conditional. For example modify your query to have a StateName as an argument:
type Query {
allStates(StateName: String!): Post
}
Then you can filter on this in the resolver response template, by using a #foreach and an #if() conditional, then calling .add() only if items in the response are for the state which you requested:
#set($convert = {"StateName" : [] })
#foreach($item in $ctx.result.items)
#if($item["StateName"]=="$ctx.args.StateName")
$util.qr($convert.get("StateName").add("$item"))
#end
#end
$util.toJson($convert)
So now you could run something like this:
query {
allStates(StateName:"Texas"){
StateName
}
}
And this will give you back just the results for that specific state which you passed as an argument. But you'll notice the selection set of the query is StateName. You could introduce a bit more flexibility by having the possible states listed in your GraphQL type:
type State {
StateName: String
Seattle: String
Texas: String
}
Now you alter your resolver response template to use the argument for building up the return array since it can specify this in the selection set:
#set($convert = {"$ctx.args.StateName" : [] })
#foreach($item in $ctx.result.items)
#if($item["StateName"]=="$ctx.args.StateName")
$util.qr($convert.get("$ctx.args.StateName").add("$item"))
#end
#end
$util.toJson($convert)
So I can run this query:
query {
allPosts(StateName:"Seattle"){
Seattle
}
}
And I get back my result. Note though that passing Seattle as the argument but requesting back Texas:
query {
allPosts(StateName:"Seattle"){
Texas
}
}
This will not work as the response object you created in your map was Seattle: [...] but you had Texas as the selection set.
The final thing that you might want to do is have multiple states returned, which you could do by building up one giant map keyed by the state name, or maybe it's done using the arguments or the selection set through adding state names to the return type as demonstrated above. That's up to you so I'm not sure how you'll want that but hopefully this demonstrates how you can manipulate the responses to meet your needs.

Resources