Conditional update all documents in all CosmosDB partitions - azure-cosmosdb

I have a CosmosDb container that have (simplified) documents like this:
{
id,
city,
propertyA,
propertyB
}
The partitionkey is 'city'.
Usually this works well. But there has been a bug in our code and now I have to update a lot of documents in a lot of different partitions.
I made a conditional update document like so:
{
"conditions": "from c where propertyA = 1",
"operations": [
{
"op": "set"
"path": "/propertyB",
"value": true
}
]
}
This document I send with the REST API to CosmosDB
Because I want to update all documents in all partitions that satisfy the condition I set the x-ms-documentdb-query-enablecrosspartition header to 'True'.
But I still need to supply a partition key in the x-ms-documentdb-partitionkey header.
Is there a way to use the REST API to update all the documents for which the condition is true, whatever the partitionkey is?

Yes, there is a way through which REST API can provide us with the ability to create, query, and delete databases, collection of documents, and documents through programmatically. To change a document partially, you can use the PATCH HTTP Method.
You will still need to supply a partition key in the x-ms-documentdb-partitionkey header.
If you have created the document based on partition key, you must need to provide partition key in header.
For more information regarding the above said partition key, you can refer the below documentation link: -
https://learn.microsoft.com/en-us/rest/api/cosmos-db/patch-a-document

Related

How should we model many to many relationships in dynamodb when aiming for single table design

Quick high level background:
DynamoDB with single table design
OpenSearch for full text search
DynamoDB Stream which indexes into OpenSearch on DynamoDB Create/Update/Delete via Lambda
The single table design approach has been working well for us so far, but we also haven't really had many-to-many relationships to deal with. However a new relationship we recently needed to account for is Tags for Entry objects:
interface Entries {
readonly id: string
readonly title: string
readonly tags: Tag[]
}
interface Tags {
readonly id: string
readonly name: string
}
We want to try and stick to a single query/read to retrieve a list of Entries / single Entry but also want to find a good balance between having to manage updates.
A few ways we've considered storing the data:
Store all tag data in the Entry
{
"id": "asdf1234",
"title": "Entry Title",
"tags": [
{
"id": "1234asdf",
"name": "stack"
},
{
"id": "4321hjkl",
"name": "over"
},
{
"id": "7657gdfg",
"name": "flow"
}
]
}
This approach makes reads easy, but updates become a pain - anytime a tag is updated, we would need to find a way to find all Entries that reference that tag and then update it.
Store only the tag ids in the Entry
{
"id": "asdf1234",
"title": "Entry Title",
"tags": ["1234asdf", "4321hjkl", "7657gdfg"]
}
With this approach, no updates would be required when a Tag is updated, but now we have to do multiple reads to return the full data - we would need to query each Tag by id to retrieve its data before returning the full content back to the client.
Store only the tag ids in the Entry but use OpenSearch to query and get data
This option, similar to the one above, would store only the tag ids on the Entry, but then have the Entry document that is indexed on the search side include all Tag data in our stream lambda. Updates on a Tag would still require updates to all Entries (in search) to also query and update each Entry individually - but the question is if its more cost effective to just do it in DynamoDB.
This scenario presents an interesting uni-directional flow:
writes go straight to DynamoDB
DynamoDB stream -> Lambda - do a transformations on the data => index in OpenSearch
reads are exclusively done via OpenSearch
The overall question is, how do applications using nosql with single table design, handle these many-to-many scenarios? Is using a uni-directional flow stated above a good idea/worth it?
Things to consider:
our application leans more heavily on the read side
our application will also utilize search capability quite heavily
Tag updates will not be often

Transformation of data across the whole collection

We are currently using CosmosDB in a production environment. The scenario arises where we want to update the contents of a particular property in nearly all documents of a collection. The property is used as a lookup/search field so gradually modifying the contents of the documents upon accessing it would not be an option here.
The example document below uses the "key" property as the main lookup field. From this field, the punctuation should be removed.
{
"id": 1,
"key": "123.123.123",
...
}
What would be a proper solution in this use case?
Assuming you're using SQL API in Cosmos DB, at least as of now partial updates to a document are not allowed.
Thus your approach would be to fetch the documents, make the necessary changes in the document and update that document. If you use .Net SDK, then you can update them in batches for faster updation.

Using multiple consumers with CosmosDB change feed

I am trying to use cosmos db change feed (I'm referring to https://learn.microsoft.com/en-us/azure/cosmos-db/change-feed-processor and https://github.com/Azure/azure-cosmos-dotnet-v2/tree/master/samples/code-samples/ChangeFeedProcessorV2).
When I start a multiple instances of a consumer, the observer seems to see only 1 partition key range. I only see a message - Observer opened for partition Key Range 0 and it starts receiving the change feed. So, the feed is received by only 1 consumer at any given point. If I close one consumer, the next one picks up happily.
I can't seem to understand the partition keys / ranges in cosmos db. In cosmos db, I've created a database and a collection within it. I've defined a partition key - /myId. I store a unique guid in myId. I've saved about 10000 transactions in the collection.
When I look at partition key ranges using api (/dbs/db-name/colls/coll-name/pkranges), I see only node under PartitionKeyRanges. Below is the output I see
{
"_rid": "LEAgAL7tmKM=",
"PartitionKeyRanges": [
{
"_rid": "LEAgAL7tmKMCAAAAAAAAUA==",
"id": "0",
"_etag": "\"00007d00-0000-0000-0000-5c3645e70000\"",
"minInclusive": "",
"maxExclusive": "FF",
"ridPrefix": 0,
"_self": "dbs/LAEgAA==/colls/LEAgAL7tmKM=/pkranges/LEAgAL7tmKMCAAAAAAAAUA==/",
"throughputFraction": 1,
"status": "online",
"parents": [],
"_ts": 1547060711
}
],
"_count": 1
}
Shouldn't this show more partition key ranges? Is this behavior expected?
How do I get multiple consumers to receive data as shown under https://learn.microsoft.com/en-us/azure/cosmos-db/change-feed-processor?
TL;DR - you should be able to ignore partition key ranges and the number of them you have and just let Change Feed Processor manage that for you.
The partition key ranges is an implementation detail we currently leak. The short answer is we add new partition key ranges when we want to restructure how your data is stored in the backend. This can happen for lots of reasons, like you add more data, you consume a lot of RUs for a subsection of that data, or we just want to shuffle things around. Theoretically, if you kept adding data, we'd eventually split the range in two.
We're working on some updates for the v3 SDKs that are currently in preview to abstract this a bit further, since even the answer I have given above is pretty hand wavey and we should have a more easily understood contract for public APIs.

Validate map keys in Firestore rules

How can I validate all keys in map in Firestore rules? Each key in map is an identifier related to a document.
Example:
User creates an invitation document with other documents access like this:
{
"email": "foo#bar.com",
"documents": {
"document1": true,
"document2": true,
...
}
}
How can I check document1 exists in database (ie. in documents collection)?
You can't write loops in security rules, so if you have an unknown number of documents to work with, you won't be able to validate them. Instead, consider using Cloud Functions to check the values after it's been added, and delete the invitation if it's not valid.

IAM Policy to prevent DynamoDB from creating records for UpdateItem

My application allows users direct access to DynamoDB from their browser. They can view and edit any record for which they know the partition key (a UUID).
My problem is that users can create new records by 'editing' a non-existant partition key. Is there a way to use an IAM policy to prevent that?
You need to use AWS Cognito to build fine-grained Access Control to your DynamoDB Table.
You can do it with code with a Lambda, you need to write all the Authorization Logic in code.
Reference to fine-grained Authorization:
It also includes row-level authorization and as well as table level authorization, merged with AWS Cognito.
https://aws.amazon.com/blogs/mobile/building-fine-grained-authorization-using-amazon-cognito-user-pools-groups/
Hope it helps.
EDIT1:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/api-permissions-reference.html
dynamodb:PutItem will prevent users from updating dynamodb records.
Example Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowAccessToOnlyItemsMatchingUserID",
"Effect": "Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:BatchGetItem",
"dynamodb:Query",
"dynamodb:UpdateItem"
],
"Resource": [
"arn:aws:dynamodb:us-west-2:123456789012:table/TableName"
]
}
]
}
Gives all permission reference to create IAM policy to block users to create new records.
Conditional Update:
Edits an existing item's attributes, or adds a new item to the table
if it does not already exist. You can put, delete, or add attribute
values. You can also perform a conditional update on an existing item
(insert a new attribute name-value pair if it doesn't exist, or
replace an existing name-value pair if it has certain expected
attribute values).
You can use ConditionExpression to only do the update if a certain criteria is met. Since all items must have a hash key (primary key), you can use a ConditionExpression to only do the update if the hash key exists, which will only be true for existing items. Therefore UpdateItem will only update existing items, not create new ones.
For example:
ConditionExpression: 'attribute_exists(myHashKey)'

Resources