Azure Cosmos Db document partition key having duplicate, but find duplicate document with combination of other columns - azure-cosmosdb

I have below document JSON (pasted partial JSON, actual JSON will be complex and embedded). The JSON has Code as ParitionKey, I am trying to build No SQL database documents by migrating my sql tables, and I will have Code, Type making Unique row, as you can see below Code = 4 is duplicated with different Type and id I just generated GUID (not sure on id field so generated GUID and assigned to it).
we only have two values for Type filed, it's either RI or NRI for entire data, and Code is duplicated like below sample data Code:4, but combination of Type & Code fields make it unique.
Example JSON:
{
"id" : "88725628-2a9a-4fc7-90ed-29c5ffbd45fa"
"Code": "4",
"Type": "RI",
"Description": "MAC/CHEESE ",
},
{
"id" : "88725628-9a3b-4fc7-90ed-29c5ffbd34sk"
"Code": "8",
"Type": "RI",
"Description": "Cereals",
},
{
"id" : "88725628-6d9f-4fc7-90ed-29c4ffbd87de"
"Code": "4",
"Type": "NRI",
"Description": "Christmas Deal",
}
In NoSQL cosmos document db, I couldn't use two columns as partition key, so I have only code as Partition key, but when I am trying to insert into Cosmos Db how do I check if not exists then only insert or else I would end up creating duplicate documents:
CreateItemAsync --> I need a way to check if the document already exists if not then create
I have below code to check and if not found create Item
try
{
// Read the item to see if it exists.
ItemResponse<Item> itemResponse = await this.container.ReadItemAsync<Item>(itm.Id, new PartitionKey(itm.Code));
}
catch (CosmosException ex) when (ex.StatusCode == HttpStatusCode.NotFound)
{
// Create an item in the container representing the Andersen family. Note we provide the value of the partition key for this item, which is "Andersen"
ItemResponse<Item> itemResponse = await this.container.CreateItemAsync<Item>(itm, new PartitionKey(itm.Code));
}
But from above code in ReadItemAsync parameters, how do I know id parameter as it is a GUID randomly generated on every insert, is there a better way to utilize id property before insert into Cosmos DB, so it can be utilized while ReadItemAsync ?
second parameter is paritionKey, If I give code as partition key, it wouldn't work as expected as Code can be duplicated with different "Type" values and it's valid, but Code & Type together makes it unique and we shouldn't allow another document to be inserted if code and type are same.
How do I do it in Cosmos db insert ? I have below questions:
id field --> can I generate GUID and save document or id filed has any purpose which can be utilized during reads ?
Is it ok to pick a partition key which can potentially have duplicates like Code field.
How do I check document exists before insert with above qualifiers as Code filed can be duplicated but only With Type it makes it unique ?
Any suggestions ?

If code and type make a unique row then you should use the value of type for id as well rather than generating a GUID because in Cosmos DB the combination of your partition key and id must be unique.
Then when you do an insert, if the data is already there it will throw an exception which you can catch. For reads, if you know the value for code and type, you can use these to perform a point read to get a single row of data, rather than using a query. This is the most efficient way to fetch data in Cosmos DB.
It is fine to have duplicates for partition key values. You only need to make sure that you have less than 20GB of data for each partition key value.

Related

query dynamodb map field

I'm unable to filter dynamodb map in AWS console
Querying with mapper contains "A" works. structure of mapper is
"mapper": [
"\"A\"",
"\"B\"",
{
"bar": "foo"
}
]
How can I filter {"bar":"foo"}.
I tried
contains {"bar":"foo"}
contains '{"bar":"foo"}'
contains {bar:foo}
But none works. Please suggest.
If the mapper list is reliably ordered, the console's PartiQL editor can filter records containing {"bar": "foo"} at index 2:
select * from myTable where mapper[2].bar = 'foo'
Note: This is technically a scan operation. You can make it a query by adding WHERE conditions for your primary key.

Can a DynamoDB Condition Expression work on just the Partition Key of a table with a Composite Key

I have a DynamoDB table, with a composite key, which looks like this:
PK
SK
Type
Email
Description
USER#A
USER#A
User
a#example.com
USER#A
BUG#1
Bug
This looks ok
USER#B
BUG#2
Bug
My user wasn't created first!
I'd like to ensure that a "User" record exists before adding a related "Bug" record - So the 3rd item here is incorrect.
When I put a bug item with the condition attribute_exists(PK), the condition is never true. When I remove the condition, I end up with a that third row; A Bug with no corresponding User.
My understanding is that attribute_exists() only looks at items with the combined composite key, and not across the whole table, regardless of which attribute you supply.
Is there a method of ensuring an item with the same Partition Key exists, while ignoring the Sort Key in this scenario?
DynamoDB condition expressions can be confusing, and the docs can compound that problem!
The DynamoDB condition expression works by 1) finding the item, 2) evaluating the condition expression, and finally 3) writing to the database if the condition evaluates to true.
I assume your put operation looks something like this:
ddbClient.put({
TableName: "YOUR TABLE",
Item: {
PK: "USER#B",
SK: "BUG#2",
Type "Bug",
Description: "My user wasn't created first!"
},
ConditionExpression: "attribute_exists(PK)"
})
In this example, DynamoDB first tries to find the item with PK: "USER#B" SK: "BUG#2", which does not exist. As you're experiencing, this item will not be written to DynamoDB because an item with that primary key does not exist.
The problem you are seeing, as you've alluded to in your question, is that a CondttionExpression applies to only a single item. However, you are trying to conditionally put an item in the database by applying the condition to another item. That is a great candidate for a DynamoDB transaction.
Transactions let you group operations together in an all-or-nothing operation. If one of the operations in your transaction fails, the entire transaction will fail and none of the operations will apply.
You can achieve what you are after by taking this approach
ddbClient.transactWriteItems({
TransactItems=[
{ "PUT":
{
TableName: "YOUR TABLE",
Item: {
PK: "USER#B",
SK: "BUG#2",
Type "Bug"
}
}
},
{ "ConditionCheck":
{
TableName: "YOUR TABLE",
Item: {
PK: "USER#B",
SK: "USER#B"
},
ConditionExpression: "attribute_exists(PK)"
}
}
]
})
In the above transaction, I'm using a ConditionCheck to confirm the existence of a user before entering the bug. If the user does not exist, the transaction will fail and the bug won't be written to DDB.
For a more thorough explanation of DynamoDB Condition Expressions, I highly recommend you check out Understanding DynamoDB Condition Expressions by Alex Debrie.

How to update only part of the JSON?

I have a DynamoDB table:
+--------------------------------------------------+
| Customer ID (Primary Key)|Gamestats (JSON entry) |
+--------------------------------------------------+
JSON:
{
"Gamestats": [
{
"ID": "QuickShootingMode",
"status": 1
},
{
"ID": "FastReloadMode", // Just want to update this and not update the entire JSON
"status": 0
}
],
"CustomerID": "xyz"
}
I want to update only parts of the JSON. What is the best way to do it? Eg, update the QuickShootingMode to be false.
One way is to make a call and fetch the JSON and then Iterate the JSON and update the value and then put the new JSON back in dynamo DB. It means it would make 2 calls
A) to get the data and
B) to put the data in DB.
Is there a better way by which I could directly update the data and avoid making these extra network calls? I could convert each key of the JSON to be a column in dynamo BD, but if the number of keys grows then I’ll end up having lots of column (which might be a bad design), hence I think having the JSON saved in one column Game stats would make more sense.
Map<String, AttributeValue> key = new HashMap<>();
AmazonDynamoDB dynamoDB = dynamoDBClient.getDynamoDB();
key.put(USER_ID_KEY, new AttributeValue().withS("xyz"));
key.put("Gamedata", new AttributeValue().withS("some JSON"));
PutItemRequest request = new PutItemRequest()
.withTableName(table)
.withItem(key);
PutItemResult result = dynamoDB.putItem(request);
Is there a better way to achieve what I want?
It looks like from your question you are storing stringified JSON. If so an update won't help you, but as far as I can tell there is no value in storing stringified JSON instead of using dynamodb maps and lists.
You can use an update to set a nested attribute in a map or a list. Using a map instead of a list for the gamestats attribute is better because then you don't have to worry about the order of the attributes.
Javascript example with Gamestats being a map.
dynamodb.update({
TableName: table,
Key: key,
UpdateExpression: 'SET #gs.#qs.#status = :newStatus',
ExpressionAttributeNames: {'#gs': 'Gamestats', '#qs': 'QuickShootingMode', '#status': 'status' },
ExpressionAttributeValues: { ':newStatus': false }
}, callback)

DynamoDb - .NET Object Persistence Model - LoadAsync does not apply ScanCondition

I am fairly new in this realm and any help is appreciated
I have a table in Dynamodb database named Tenant as below:
"TenantId" is the hash primary key and I have no other keys. And I have a field named "IsDeleted" which is boolean
Table Structure
I am trying to run a query to get the record with specified "TenantId" while it is not deleted ("IsDeleted == 0")
I can get a correct result by running the following code: (returns 0 item)
var filter = new QueryFilter("TenantId", QueryOperator.Equal, "2235ed82-41ec-42b2-bd1c-d94fba2cf9cc");
filter.AddCondition("IsDeleted", QueryOperator.Equal, 0);
var dbTenant = await
_genericRepository.FromQueryAsync(new QueryOperationConfig
{
Filter = filter
}).GetRemainingAsync();
But no luck when I try to get it with following code snippet (It returns the item which is also deleted) (returns 1 item)
var queryFilter = new List<ScanCondition>();
var scanCondition = new ScanCondition("IsDeleted", ScanOperator.Equal, new object[]{0});
queryFilter.Add(scanCondition);
var dbTenant2 = await
_genericRepository.LoadAsync("2235ed82-41ec-42b2-bd1c-d94fba2cf9cc", new DynamoDBOperationConfig
{
QueryFilter = queryFilter,
ConditionalOperator = ConditionalOperatorValues.And
});
Any Idea why ScanCondition has no effect?
Later I also tried this: (throw exception)
var dbTenant2 = await
_genericRepository.QueryAsync("2235ed82-41ec-42b2-bd1c-d94fba2cf9cc", new DynamoDBOperationConfig()
{
QueryFilter = new List<ScanCondition>()
{
new ScanCondition("IsDeleted", ScanOperator.Equal, 0)
}
}).GetRemainingAsync();
It throws with: "Message": "Must have one range key or a GSI index defined for the table Tenants"
Why does it complain about Range key or Index? I'm calling
public AsyncSearch<T> QueryAsync<T>(object hashKeyValue, DynamoDBOperationConfig operationConfig = null);
You simply cant query a table only giving a single primary key (only hash key). Because there is one and only one item for that primary key. The result of the Query would be that still that single item, which is actually Load operation not Query. You can only query if you have composite primary key in this case (Hash (TenantID) and Range Key) or GSI (which doesn't impose key uniqueness therefore accepts duplicate keys on index).
The second code attempts to filter the Load. DynamoDBOperationConfig's QueryFilter has a description ...
// Summary:
// Query filter for the Query operation operation. Evaluates the query results and
// returns only the matching values. If you specify more than one condition, then
// by default all of the conditions must evaluate to true. To match only some conditions,
// set ConditionalOperator to Or. Note: Conditions must be against non-key properties.
So works only with Query operations
Edit: So after reading your comments on this...
I dont think there conditional expressions are for read operations. AWS documents indicates they are for put or update operations. However, not being entirely sure on this since I never needed to do a conditional Load. There is no such thing like CheckIfExists functionality as well in general. You have to read the item and see if it exists. Conditional load will still consume read throughput so your only advantage would be only NOT retrieving it in other words saving the bandwith (which is very negligible for single item).
My suggestion is read it and filter it in your application layer. Dont query for it. However what you can also do is if you very need it you can use TenantId as hashkey and isDeleted for range key. If you do so, you always have to query when you wanna get a tenant. With the query you can set rangeKey(isDeleted) to 0 or 1. This isnt how I would do it. As I said, would just read it and filter it at my application.
Another suggestion thing could be setting a GSI on isDeleted field and writing null when it is 0. This way you can only see that attribute in your table when its only 1. GSI on such attribute is called sparse index. Later if you need to get all the tenants that are deleted (isDeleted=1) you can simply scan that entire index without conditions. When you are writing null when its 0 dynamoDB wont put it in the index at the first place.

DynamoDB Descending Order fetch records

i have 100 records in collection,
collection name:'users'
{
"name":'senthilkumar',
"email":'senthily88#gmail.com', //HashKey
"age":21,
"created":1465733486137, //RangeKey-timestamp
}
i need to fetch records the following sql query wise
select * from users order by created desc limit 10
How i can get above query format records from DynamoDB
Dynamodb sorts the results by the range key attribute. You can set the ScanIndexForward boolean parameter to true for ascending or false for descending.
resource: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html
Use the KeyConditionExpression parameter to provide a specific value
for the partition key. The Query operation will return all of the
items from the table or index with that partition key value. You can
optionally narrow the scope of the Query operation by specifying a
sort key value and a comparison operator in KeyConditionExpression.
You can use the ScanIndexForward parameter to get results in forward
or reverse order, by sort key.
To Save Json Data to DynamoDB us put()
var Newparams = {
TableName: this.SuffleTableName,
Item: {
"userId": /* YOUR PRIMARY KEY */,
"addedAt": /* YOUR SORT KEY */,
"status": /* Additional Datas */,
}
}
Fetch Data From DynamoDB using Query()
QueryParam = {
TableName: 'YOUR TABLE NAME HERE',
IndexName: 'YOUR INDEX NAME HERE', //IF YOUR CREATED NEW INDEX
KeyConditionExpression: "UserId = :UserId ", //YOUR PRIMARY KEY
ExpressionAttributeValues: {
":UserId": UserId,
},
ScanIndexForward: false, //DESC ORDER, Set 'true' if u want asc order
ExclusiveStartKey: LastEvalVal, //Pagination - LastEvaluatedKeyPair
Limit: 10 //DataPerReq
}
If you want to return all rows in your table, you cannot use the query API, because that API requires you to provide a partition key value to filter your results by (i.e. assuming that your partition key is name you would only be able to use the query API to bring back the subset of results that have name = a given value, i.e. name= senthilkumar
If you want to return all rows in your table, you must use the Scan API: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SQLtoNoSQL.ReadData.Scan.html
Note that all results will be provided in ascending order by the value of the Range Key. You cannot reverse sort the contents with the Scan API. You would need to reverse your resultset in the application tier using whatever language you're writing your code in to turn the results upside down.
Scan does not scale well and it is not possible to use Scan to create a paginated, reverse sorted solution if your table contains items with unique partition keys.
If this is your situation, and if you want to return paginated + reverse sorted sets back from DynamoDB, you will need to re-consider the design of your table and which columns are the partition key/range key/index so that you can use the Query API.

Resources