I'm using Azure CosmosDb to store documents with TTL for documents enabled.
If I upsert an item or replace it, does the TTL count resets and start "counting" from the moment when I update, or it just continues from the "first creation" of the document?
Thank you!
There is a _ts parameter in your document, which is the last modified timestamp. And referring to: Set time to live on an item
So, if you update an item or replace it, the TTL count resets and start "counting" from the moment when you modify it.
Related
I am working on a project that when a user cancels their plan, all their documents should be updated to deactivated except for a pre-defined number of documents that are allowed to stay active. The pre-defined number amount determines the projects allowed to stay active along with the date they were created.
For example, if customer A has 1,000 documents and cancels their plan, all their documents except for the first 100 created should be updated to be deactivated.
My first attempt was to get all document ids with .listDocuments() but I noticed the created date is not part of Firestore's DocumentReference. Therefore I can't exclude the pre-defined number of documents allowed to stay active.
I could use .get() and use the created value, but I'm afraid that getting all the documents at once (which could be a million) would cause my cloud function to run out of memory, even if I have it set to the maximum allowed configuration.
Another option that I thought of, I could use .listDocuments() and write each document id to a temp collection in Firestore, which could kick off a cloud function for each document. This function would only have to work with one document, so it wouldn't have to worry about running out of resources. I am unsure how to determine if the document I'm working on should be marked as deactivated or is allowed to stay active.
I am not that worried about the reads to write as this workflow should not happen very often. Any help would be appreciated.
Thank you
One possible approach would be to mark the documents to be excluded.
I don't know what is your exact algorithm, but if you want to mark the first 100 documents that were created in a collection you can use a Cloud Function that runs for each new document and checks if there are already 100 docs in the collection.
If not, you update a field in this new document with its rank (using a Transaction to get the highest existing rank and increment it). If there are already 100 documents previously created in the collection, you just update the field to 0, for example, in such a way that later on you can query with where("rank", "==", 0).
Then, when you want to delete all the docs but the 100 first ones, just use where("rank", "==", 0) query.
So, concretely:
The first doc is created: you set the rank field to 1.
The Nth doc (N != 1) is created: you fetch all the docs with a query ordered by rank and limited to 1 doc (collecRef.orderBy("rank", "desc").limit(1)) in a Transaction. Since you are in a Cloud Function, you can use a Query in the Transaction (which you cannot with the Client SDKs). Then, still in the Transaction:
If the value of rank for the single doc returned by the Query is < 100 you set the rank value of the newly created do to [single doc value + 1]
If the value of rank for the single doc returned by the Query = 100 you set the rank value to 0
If I didn't make any mistake (I didn't test it! :-)) you end with 100 docs with a value of rank between 1 and 100 (the 100 first created docs) and the rest of the docs with a value of rank equal to 0.
Then, as said above you can use the where("rank", "==", 0) query to select all the docs to be deleted.
At first sight, it's clear what the continuation token does in Cosmos DB: attaching it to the next query gives you the next set of results. But what does "next set of results" mean exactly?
Does it mean:
the next set of results as if the original query had been executed completely without paging at the time of the very first query (skipping the appropriate number of documents)?
the next set of results as if the original query had been executed now (skipping the appropriate number of documents)?
Something completely different?
Answer 1. would seem preferable but unlikely given that the server would need to store unlimited amounts of state. But Answer 2. is also problematic as it may result in inconsistencies, e.g. the same document may be served multiple times across pages, if the underlying data has changed between the page queries.
Cosmos DB query executions are stateless at the server side. The continuation token is used to recreate the state of the index and track progress of the execution.
"Next set of results" means, the query is executed again on from a "bookmark" from the previous execution. This bookmark is provided by the continuation token.
Documents created during continuations
They may or may not be returned depending on the position of insert and query being executed.
Example:
SELECT * FROM c ORDER BY c.someValue ASC
Let us assume the bookmark had someValue = 10, the query engine resumes processing using a continuation token where someValue = 10.
If you were to insert a new document with someValue = 5 in between query executions, it will not show up in the next set of results.
If the new document is inserted in a "page" that is > the bookmark, it will show up in next set of results
Documents updated during continuations
Same logic as above applies to updates as well
(See #4)
Documents deleted during continuations
They will not show up in the next set of results.
Chances of duplicates
In case of the below query,
SELECT * FROM c ORDER BY c.remainingInventory ASC
If the remainingInventory was updated after the first set of results and it now satisfies the ORDER BY criteria for the second page, the document will show up again.
Cosmos DB doesn’t provide snapshot isolation across query pages.
However, as per the product team this is an incredibly uncommon scenario because queries over continuations are very quick and in most cases all query results are returned on the first page.
Based on preliminary experiments, the answer seems to be option #2, or more precisely:
Documents created after serving the first page are observable on subsequent pages
Documents updated after serving the first page are observable on subsequent pages
Documents deleted after serving the first page are omitted on subsequent pages
Documents are never served twice
The first statement above contradicts information from MSFT (cf. Kalyan's answer). It would be great to get a more qualified answer from the Cosmos DB Team specifying precisely the semantics of retrieving pages. This may not be very important for displaying data in the UI, but may be essential for data processing in the backend, given that there doesn't seem to be any way of disabling paging when performing a query (cf. Are transactional queries possible in Cosmos DB?).
Experimental method
I used Sacha Bruttin's Cosmos DB Explorer to query a collection with 5 documents, because this tool allows playing around with the page size and other request options.
The page size was set to 1, and Cross Partition Queries were enabled. Different queries were tried, e.g. SELECT * FROM c or SELECT * FROM c ORDER BY c.name.
After retrieving page 1, new documents were inserted, and some existing documents (including documents that should appear on subsequent pages) were updated and deleted. Then all subsequent pages were retrieved in sequence.
(A quick look at the source code of the tool confirmed that ResponseContinuationTokenLimitInKb is not set.)
I need to query a collection and return all documents that are new or updated since the last query. The collection is partitioned by userId. I am looking for a value that I can use (or create and use) that would help facilitate this query. I considered using _ts:
SELECT * FROM collection WHERE userId=[some-user-id] AND _ts > [some-value]
The problem with _ts is that it is not granular enough and the query could miss updates made in the same second by another client.
In SQL Server I could accomplish this using an IDENTITY column in another table. Let's call the table version. In a transaction I would create a new row in the version table, do the updates to the other table (including updating the version column with the new value. To query for new and updated rows I would use a query like this:
SELECT * FROM table WHERE userId=[some-user-id] and version > [some-value]
How could I do something like this in Cosmos DB? The Change Feed seems like the right option, but without the ability to query the Change Feed, I'm not sure how I would go about this.
In case it matters, the (web/mobile) clients connect to data in Cosmos DB via a web api. I have control of the entire stack - from client to back-end.
As the statements in this link:
Today, you see all operations in the change feed. The functionality
where you can control change feed, for specific operations such as
updates only and not inserts is not yet available. You can add a “soft
marker” on the item for updates and filter based on that when
processing items in the change feed. Currently change feed doesn’t log
deletes. Similar to the previous example, you can add a soft marker on
the items that are being deleted, for example, you can add an
attribute in the item called "deleted" and set it to "true" and set a
TTL on the item, so that it can be automatically deleted. You can read
the change feed for historic items, for example, items that were added
five years ago. If the item is not deleted you can read the change
feed as far as the origin of your container.
Change feed is not available for your requirements.
My idea:
Use Azure Function Cosmos DB Trigger to collect all the operations in your specific cosmos collection. Follow this document to configure the input of azure function as cosmos db, then follow this document to configure the output as azure queue storage.
Get the ids of changed items and send them into queue storage as messages.When you want to query the changed item,just query the messages from the queue to consume them at a specific unit time and after that just clear the entire queue. No items will be missed.
With your approach, you can get added/updated documents and save reference value (_ts and id field) somewhere (like blob)
SELECT * FROM collection WHERE userId=[some-user-id] AND _ts > [some-value] and id !='guid' order by _ts desc
This is a similar approach we use to read data from Eventhub and store checkpointing information (epoch number, sequence number and offset value) in blob. And at a time only one function can take a lease of that blob.
If you go with ChangeFeed, you can create listener (Function or Job) to listen all add/update data from collection and you can store those value in some collection, while saving data you can add Identity/version field on every document. This approach may increase your cosmos DB bill.
This is what the transaction consistency levels are for: https://learn.microsoft.com/en-us/azure/cosmos-db/consistency-levels
Choose strong consistency and your queries will always return the latest write.
Strong: Strong consistency offers a linearizability guarantee. The
reads are guaranteed to return the most recent committed version of an
item. A client never sees an uncommitted or partial write. Users are
always guaranteed to read the latest committed write.
I have a question
If I am using dynamodb and I want to put item inside table where I am making hash key similar and for range key I am using current timestamp and if two or more event putitem at the same time then what would be the result. And I want all items to be updated. What can I do here.????
PutItem API - Two items having same hash key and range key:-
The first request will create the item in the table
Second request will update (i.e. overwrite) the item on the table
Update the items having the same hash key but different range key :-
Firstly, DynamoDB doesn't allow to update multiple items in one API call
Second, you need both Hash and Range key to update the item
Steps - Get all the items for the hash key
Update the each item sequentially using update item API (or) use batchWrite item API to update all the items in one API operation. You can use PutRequest on BatchWriteItem to update the item. The API documentation says that you can't update the item. It actually means that it can't be used to update the specific attribute in the item. It will replace the entire item. You should be ok as long as you have the full item data that needs to be replaced. As per the step 5 above, you should have the full item details (i.e. all the attributes in the item) that need to be updated (i.e. replace in case of batch write item)
if one user accessing the record 1 out of 10 records in a table. if at the same time, 2nd user trying to access that same record of 1st user, he should not be displayed that record but instead he should be displayed 2nd record and because this first user will be holding the record for some time to process and update till then this records should not be shown to any other user even select query is fired from second user application. is it possible using Row Lock? please provide me the example how to implement rowlock and holdlock and release the hold lock used Row level lock. apart from this if you have any other suggestion please share it
I am using SqlServer2005 with Asp.Net
Babu.M
I don't know if row lock would stop it being selected but could you not use a audit table? For example when user one gets access to the record store the ID for that record in an audit table and then when user two trys to use a record the application should check to see if the primary key for the record is in the audit table, if it is the second user does not gain access if not the second user gains access. Once a user has finished with the record you can either delete the record in th audit table or you could keep the record but set a flag to say it is no longer in use as this way you could see who has changed what record and at what time if you add a date time stamp.
Again when using select command just make it so that if the primary key is present in the audit table you don't select it.
Hope this helps