Geo shard key + Cosmos DB - azure-cosmosdb

I try to create a collection in CosmosDB and I don't know how to create a good shardkey !
I had something like that in mind, but it does not accept my shard-key !
{
"shard_key" : ["50.836421", "4.355267"],
"position":
{
"type": "Point",
"coordinates": [50.836421, 4.355267]
},
}
Does someone has experience with this?

You could make the shard_key as "[\"50.836421\", \"4.355267\"]",it is accepted by cosmos db mongo api.
Based on the book and the link, shard-key from array is not supported by mongo db.
Shard keys cannot be arrays. sh.shardCollection() will fail if any key
has an array value and inserting an array into that field is not
allowed. Once inserted, a document's shard key value cannot be
modified. To change a document's shard key, you must remove the
document, change the key, and reinsert it. Thus, you should choose a
field that is unchangeable or changes frequently.
Hope it helps you.

Related

What do I make my CosmosDB parition key when my JSON top level token is variable?

I am confused what to make my CosmosDB partition key when my JSON looks like this
{
"AE": [
{
"storeCode": "XXX",
"storeClass": "YYY"
}
],
"AT": [
{
"storeCode": "ZZZ",
"storeClass": "XYZ"
}
]
}
Normally the top level would be country:AT and so on and I would make the partition key /country but in this case I have nothing to use on the top level so what do I do?
the JSON comes from a third party so I dont have the option to change it at source.
Since i did not find any statements about the partition key for sub-array in the official document. I could only provide you with a similar thread for your reference :CosmosDB - Correct Partition Key
Here is some explanations by #Mikhail:
Partition Key has to be a single value for each document, it can't be
a field in sub-array. Partition Key is used to determine which
database node will host your document, and it wouldn't be possible if
you specified multiple values, of course.
If your single document contains data from multiple entities, and you
will query those entities separately, it might make sense to split
your documents per entity. If all those "radars" are related to some
higher level entity, use that entity ID as partition key.
For rigor,i would suggest you contacting with azure cosmos team to check whether this feature is not supported yet so far,whether will be implemented in the future.

Auto-increment integer in dynamodb

I'm modeling a dynamodb diagram for an invoice app and I'm looking for generate the unique invoice id that need to be incremented from 1 to X. There is in 2019 a solution about this kind of problem with aws appsync and dynamodb as datasource ?
Auto-incrementing integers are not a recommended pattern in DynamoDB although it is possible to implement something similar using application level logic. A DynamoDB table is distributed to many logical partitions according to the table's partition key. Items are then sorted within that partition according to their sort key. You will need to decide what structure makes sense for you app and what an auto-incrementing means for your app. The simplest case would be to omit a sort key and treat the auto-incremented id as the partition key which would guarantee its uniqueness but also has implications that every row lives in its own partition and thus listing all invoices would have to be a Scan and thus does not preserve order which may or may not make sense for your app.
As mentioned in this SO post (How to use auto increment for primary key id in dynamodb) you can use code like this:
const params = {
TableName: 'CounterTable',
Key: { HashKey : 'auto-incrementing-counter' },
UpdateExpression: 'ADD #a :x',
ExpressionAttributeNames: {'#a' : "counter_value"},
ExpressionAttributeValues: {':x' : 1},
ReturnValues: "UPDATED_NEW" // ensures you get value back the new key
};
new AWS.DynamoDB.DocumentClient().update(params, function(err, data) {});
to atomically increment the integer stored in the CounterTable row designated by the partition key "auto-incrementing-counter". After the atomic increment, you can use the returned id to create the new Invoice.
You can implement this pattern using DynamoDB & AppSync but the first thing to decide is if it suits your use case. You may also be interested in the RDS integration via the RDS data API which would have more native support for auto-incrementing IDs but would lose out on the set it and forget it scaling of DynamoDB.

DynamoDB sub item filter using .Net Core API

First of all, I have table structure like this,
Users:{
UserId
Name
Email
SubTable1:[{
Column-111
Column-112
},
{
Column-121
Column-122
}]
SubTable2:[{
Column-211
Column-212
},
{
Column-221
Column-222
}]
}
As I am new to DynamoDB, so I have couple of questions regarding this as follows:
1. Can I create structure like this?
2. Can we set primary key for subtables?
3. Luckily, I found DynamoDB helper class to do some operations into my DB.
https://www.gopiportal.in/2018/12/aws-dynamodb-helper-class-c-and-net-core.html
But, don't know how to fetch only perticular subtable
4. Can we fetch only specific columns from my main table? Also need suggestion for subtables
Note: I am using .net core c# language to communicate with DynamoDB.
Can I create structure like this?
Yes
Can we set primary key for subtables?
No, hash key can be set on top level scalar attributes only (String, Number etc.)
Luckily, I found DynamoDB helper class to do some operations into my DB.
https://www.gopiportal.in/2018/12/aws-dynamodb-helper-class-c-and-net-core.html
But, don't know how to fetch only perticular subtable
When you say subtables, I assume that you are referring to Array datatype in the above sample table. In order to fetch the data from DynamoDB table, you need hash key to use Query API. If you don't have hash key, you can use Scan API which scans the entire table. The Scan API is a costly operation.
GSI (Global Secondary Index) can be created to avoid scan operation. However, it can be created on scalar attributes only. GSI can't be created on Array attribute.
Other option is to redesign the table accordingly to match your Query Access Pattern.
Can we fetch only specific columns from my main table? Also need suggestion for subtables
Yes, you can fetch specific columns using ProjectionExpression. This way you get only the required attributes in the result set

DynamoDB: How to fetch single item where attribute value is not in a given list of values?

I understand this query might be inefficient since it can involve a full table scan in worst case, but I need to fetch only a single item at a time.
For example, I have a table containing values like this:
{
id: 'bc63a25e-b92b-483e-9ad3-ad6d474dfae2',
domain: 'xyz.com',
template_url: `https://s3.us-east-2.amazonaws.com/bucket/some-random-url.html`,
data_elements: {
message_link: 'http://www.google.com'
zodiac_sign: 'Scorpio'
}
}
I have a GSI with domain as the hash key. Now I want to fetch items from this table:
WHERE domain == 'xyz.com'
AND id not in <a list of ids>
LIMIT 1;
How can I achieve this type of query ? I checked the documentation and I could see there is IN operator but could not find any NOT IN operator.
I had the same issue, and I don't think you can. You will need to use the key for the 'get' method and the 'scan' method. The only alternative (I think) would be to fetch all items and then to do a string comparison on each and everyone. I don't think I need to mention how incredibly expensive that would be.
As mentioned, I had to deal with the same issue and I ended up changing my data structure. It was a bit cumbersome to begin with and I have twice the data entries of a relational db but it is negligible and the query is incredibly fast even on the micro AWS instance.
You can't always do the same operations on a NoSQL db that you can do on a MySQL db and this is a prime example of that.
I am not sure why you have mentioned about scan as you have hashkey of the GSI. You can use the Query API with the below params.
var idArray = ["1", "2"];
var params = {
TableName : "tablename",
IndexName : 'your_index_name',
KeyConditionExpression : 'domain = :domainVal',
FilterExpression : "NOT #id IN (:idValue)",
ExpressionAttributeNames: { "#id": "id" },
ExpressionAttributeValues : {
":domainVal" : 'xyz.com',
":idValue" : idArray
}
};
I have tested the NOT IN on my table. It works fine for me.
You can run SQL queries on DynamoDB if you use EMR Hive or Redshift. In this case you can use any SQL operators to query your data.
Of course this is not intended for interactive queries and intended only for some analytics queries that are executed infrequently.
Here is how to use DynamoDB with Redshift.
Here is how to use DynamoDB with EMR Hive.

CosmoDB applying a sort by field removes all documents that do not have that field

We are migrating from mongoDB to CosmoDB using the Mongo API.
We have encountered the following difference in query behavior around sorting.
Using the CosmoDB mongo API sorting by a field removes all documents that don't have that field. Is it possible to modifying the query to including the nulls to replicate the mongo behavior?
For example if we have the following 2 documents
[{
"id":"p1",
"priority":1
},{
"id":"p2"
}]
performing:
sort({"priority":1})
cosmoDB will return a single result 'p1'.
mongo will return both results in the order 'p2', 'p1', the null documents will be first.
As far as I know, the null value will not include in the query result sort scan.
Here is a workaround, you could set a not exists field in the sort method to force the engine scan all the data.
Like this:
db.getCollection('brandotestcollections').find().sort({"test": 1, "aaaa":1})
The result is like this:
I had the same problem and got solved after some reading
Refer the document...
You have to update the indexing policy of the container to change the default way of Cosmos DB sorting!

Resources