We are currently using CosmosDB in a production environment. The scenario arises where we want to update the contents of a particular property in nearly all documents of a collection. The property is used as a lookup/search field so gradually modifying the contents of the documents upon accessing it would not be an option here.
The example document below uses the "key" property as the main lookup field. From this field, the punctuation should be removed.
{
"id": 1,
"key": "123.123.123",
...
}
What would be a proper solution in this use case?
Assuming you're using SQL API in Cosmos DB, at least as of now partial updates to a document are not allowed.
Thus your approach would be to fetch the documents, make the necessary changes in the document and update that document. If you use .Net SDK, then you can update them in batches for faster updation.
Related
I use this code to get a collection snapshot from Firestore.
firestore().collection('project').where('userID', '==', authStore.uid).onSnapshot(onResult, onError);
This returns a huge amount of data, but I only need a few fields. Is it possible to query only a specific field? For example, if I only need the projectName and the creationDate fields.
Is it possible to query only a specific field?
No, that is not possbile. A Firestore listener fires on the document level. This means that you'll always get the entire document.
For example if I only need the projectName and the creationDate fields.
You cannot only get the value of a specific set of fields. It's the entire document or nothing. If you, however, only need to read those values and nothing more, then you should consider storing them in a separate document. This practice is called denormalization, and it's a common practice when it comes to NoSQL databases.
You might also take into consideration using the Firebase Realtime Database, for the duplicated data.
I have a hashmap in a document. Let's say it looks like:
userHasFinished: {
'user1': false,
'user2': false,
'user3': false,
}
If I'm updating specific fields in this hashmap from false to true, and I know that only one user can initiate a write for a particular field (this is guarded by authentication), do I need a transaction for this update?
Put another away, do I need a transaction to make concurrent updates to a hashmap even though those concurrent updates will always be to different keys in the hashmap?
I'm assuming not because inherently an entire Firestore document is essentially a hashmap and you certainly don't need transactions to update individual fields in a document.
You only need to use a transaction if the data that you write depends on the current data in the same document.
A user adding their own UID to a map does not require the existing data in the document, so can be safely (and more efficiently) done with a merging set or update call, as long as you address the specific subfield with a .. For example: { "userHasFinished.user1": false }.
Also see the documentation on updating fields in nested objects, which contains example code for many supported languages.
In Firestore there is a principle of creating small documents and big collections. However this doesn't make seanse if we use Firestore using it's REST API, because listing collection also prints content of all documents within it. This doesn't make sense to me at all, shouldn't it be supposed to just return id's of all documents? Here is the exact method I'm talking about: https://firebase.google.com/docs/firestore/reference/rest/v1beta1/projects.databases.documents/list
The API is working correctly. If you don't want any document fields (or limited fields), you can use the mask parameter as described in API docs you linked:
object(DocumentMask)
The fields to return. If not set, returns all fields.
If a document has a field that is not present in this mask, that field
will not be returned in the response.
Is it possible in some way to have multiple access conditions that prevent the document to be saved to cosmos if they isn't met.
Today I have an accesscondition on the ETag, to prevent an old version of the document to be saved. But I want to have another condition based on the status of the document. So if the document in my store is in a 'closed' status, it will prevent any from modifying it.
I can always do a load -> check -> save routine, but the accesscondition works like a charm for the ETag so I wonder if there is a way to have multiple access condition specified when saving the document.
Best Regards
Magnus
Based on the detailed statements in below blogs,
1.https://codeopinion.com/documentdb-optimistic-concurrency/
2.https://chapsas.com/understanding-optimistic-concurrency-in-cosmos-db/
ETAG in cosmos db only provides optimistic concurrency and can be used with an AccessCondition in order to ensure that if the document changed between the retrieval and the manipulation attempt of the document.
AccessConditionType only has IfMatch or IfNoneMatch,no more other status.
So,back to your requirements,it seems that you have to add ifClosed item in your document and check it when you do modification to implement your needs.
I need to query a collection and return all documents that are new or updated since the last query. The collection is partitioned by userId. I am looking for a value that I can use (or create and use) that would help facilitate this query. I considered using _ts:
SELECT * FROM collection WHERE userId=[some-user-id] AND _ts > [some-value]
The problem with _ts is that it is not granular enough and the query could miss updates made in the same second by another client.
In SQL Server I could accomplish this using an IDENTITY column in another table. Let's call the table version. In a transaction I would create a new row in the version table, do the updates to the other table (including updating the version column with the new value. To query for new and updated rows I would use a query like this:
SELECT * FROM table WHERE userId=[some-user-id] and version > [some-value]
How could I do something like this in Cosmos DB? The Change Feed seems like the right option, but without the ability to query the Change Feed, I'm not sure how I would go about this.
In case it matters, the (web/mobile) clients connect to data in Cosmos DB via a web api. I have control of the entire stack - from client to back-end.
As the statements in this link:
Today, you see all operations in the change feed. The functionality
where you can control change feed, for specific operations such as
updates only and not inserts is not yet available. You can add a “soft
marker” on the item for updates and filter based on that when
processing items in the change feed. Currently change feed doesn’t log
deletes. Similar to the previous example, you can add a soft marker on
the items that are being deleted, for example, you can add an
attribute in the item called "deleted" and set it to "true" and set a
TTL on the item, so that it can be automatically deleted. You can read
the change feed for historic items, for example, items that were added
five years ago. If the item is not deleted you can read the change
feed as far as the origin of your container.
Change feed is not available for your requirements.
My idea:
Use Azure Function Cosmos DB Trigger to collect all the operations in your specific cosmos collection. Follow this document to configure the input of azure function as cosmos db, then follow this document to configure the output as azure queue storage.
Get the ids of changed items and send them into queue storage as messages.When you want to query the changed item,just query the messages from the queue to consume them at a specific unit time and after that just clear the entire queue. No items will be missed.
With your approach, you can get added/updated documents and save reference value (_ts and id field) somewhere (like blob)
SELECT * FROM collection WHERE userId=[some-user-id] AND _ts > [some-value] and id !='guid' order by _ts desc
This is a similar approach we use to read data from Eventhub and store checkpointing information (epoch number, sequence number and offset value) in blob. And at a time only one function can take a lease of that blob.
If you go with ChangeFeed, you can create listener (Function or Job) to listen all add/update data from collection and you can store those value in some collection, while saving data you can add Identity/version field on every document. This approach may increase your cosmos DB bill.
This is what the transaction consistency levels are for: https://learn.microsoft.com/en-us/azure/cosmos-db/consistency-levels
Choose strong consistency and your queries will always return the latest write.
Strong: Strong consistency offers a linearizability guarantee. The
reads are guaranteed to return the most recent committed version of an
item. A client never sees an uncommitted or partial write. Users are
always guaranteed to read the latest committed write.