I have some documents in Firestore with two Timestamp fields named lastUpdated and lastProcessed in addition to other fields. lastUpdated field is updated when a user updates the record's fields via web console. lastProcessed field is updated when the backend function processes the document (as a result of user clicking a button).
Following are the possible combinations of these 2 fields.
User has only updated the document, but yet to process (lastUpdated == some_timestamp, lastProcessed == '')
User has updated the document, and then processed (lastUpdated < lastProcessed)
User has updated the document, processed and re-updated (lastUpdated > lastProcessed)
My requirement is to execute a query to get a subset of these records (say top 10), ordered by its most recent timestamp. So when evaluating a record for the ordering, lastUpdated field should be considered for scenarios 1 and 3 above. But lastProcessed field should be considered for scenario 2.
Is this possible with Firestore?
When querying the Firestore database it is not possible to execute the logic you explain in your question (i.e. calculate on the fly the scenario to be applied and define which field shall be used in the query).
One classical solution is to add an extra field to the document which contains the value to be queried for. The value of this field can be calculated (according to the business logic) when you modify the document from your frontend, or via a Cloud Function triggered in the backend each time the doc is changed.
The main advantage of using a Cloud Function is to prevent users modifying the value of this field.
Related
My cloud firestore database has an "orders" collection and in HTML I have a 'save' button to add document(s) into that "orders" collection upon clicking. Now, using add will assign auto-generated ID for each document.
What if I want to customise such ID by timestamp? So that the document created yesterday will be assigned an index as '1', and the following document created will be '2', etc...
What you're trying to do is not compatible with the way Cloud Firestore was designed. Firestore will not assign monotonically increasing numbers for document IDs. This just doesn't scale massively as required by Firestore and would introduce performance bottlenecks.
If you want to be able to sort documents by timestamp, the best strategy is to add a timestamp field to each document, then use that field in an ordered query.
Note that you could try to write a lot of code to get this done the way you want, but you are MUCH better off accepting the random IDs and using fields to filter and order data.
in some case, when you need to save several docs in different collection due to an event occurs, it's better to same all docs with same id in different collections with single firestore server's timestamp. you get the timestamp like below:
const admin = require('firebase-admin')
const ts = admin.firestore.Timestamp.now().toMillis().toString()
by doing this, when you need to read all those docs, you only need to query once to get timestamp, then read all other doc by timestamp directly.
it should be faster than query the timestamp inside document fields for each collections
I am using Firebase FireStore database for the first time and I have the following question.
I have created a calendar collection. This collection will contains document representing events that have to be shown into a Calendar implemented by an Angular application.
So I am defining the following fields for these documents:
id: int. It is a unique identifier of the specific document\event.
title: string. It is the event title.
start_date_time: string. It specifies the date and the time at which the event starts.
end_date_time: string. It specifies the date and the time at which the event ends.
And here I have some doubts:
Is the id field required? From what I know I will have the document UID that will ensure the uniqueness of the document. If not strongly required adopt an id field can be convenient have something like an auto increment field? (I know that I have to handle in some other way the auto increment because FireStore is not a relational DB and doesn't automatically handle it). For example I was thinking that can be useful to order my document from the first inserted one to the last inserted one.
It seems to me that FireStore doesn't handle DateTime field (as done for example by a traditional relational database). Is this assumption correct? How can I correctly handle my start_date_time and end_date_time fields? These field have to contains the date time used by my Angular application? So for example I was thinking that I can define it as string field and put into these fields values as 2020-07-20T07:00:00 representing a specific date and a specific time. It could be considered a valid approach to the problem or not?
Is the id field required?
No fields are required. Firestore is schema-less. The only thing a document requires is a string ID that is unique to the collection where it lives.
There is no autoincrement of IDs. That doesn't scale massively the way Firestore requires. If you need ordering, you will have to define that for yourself according to your needs.
In general, you are supposed to accept the randomly generated IDs that the Firebase client APIs will generate for you. Ordering is typically defined using a field in the document.
It seems to me that FireStore doesn't handle DateTime field
Firestore has a timestamp field type that stores moments in time to nanosecond precision. There is no need to store a formatted string, unless that's something you require for other reasons.
I just watched the playlist of Get to know Cloud Firestore, and I just learned that every field of the document is indexed.
My question is, is there way for a certain fields to be excluded on by Firestore indexing? Something like fields that I am pretty sure that I will not be using as a query lookup.
Thanks.
As you correctly found, Firestore automatically indexes all individual fields of the documents in the collection. You can exclude certain fields from the single field indexes panel in the Firebase console.
From there:
Cloud Firestore creates the indexes defined by your automatic index settings for each field you add, enabling most simple queries by default. You can add exemptions to manually set how a specific field is indexed.
From there, you can enter the collection (or collection group), and the field name, and then select which indexes (ascending, descending, arrays) get auto-created or not.
I need to query a collection and return all documents that are new or updated since the last query. The collection is partitioned by userId. I am looking for a value that I can use (or create and use) that would help facilitate this query. I considered using _ts:
SELECT * FROM collection WHERE userId=[some-user-id] AND _ts > [some-value]
The problem with _ts is that it is not granular enough and the query could miss updates made in the same second by another client.
In SQL Server I could accomplish this using an IDENTITY column in another table. Let's call the table version. In a transaction I would create a new row in the version table, do the updates to the other table (including updating the version column with the new value. To query for new and updated rows I would use a query like this:
SELECT * FROM table WHERE userId=[some-user-id] and version > [some-value]
How could I do something like this in Cosmos DB? The Change Feed seems like the right option, but without the ability to query the Change Feed, I'm not sure how I would go about this.
In case it matters, the (web/mobile) clients connect to data in Cosmos DB via a web api. I have control of the entire stack - from client to back-end.
As the statements in this link:
Today, you see all operations in the change feed. The functionality
where you can control change feed, for specific operations such as
updates only and not inserts is not yet available. You can add a “soft
marker” on the item for updates and filter based on that when
processing items in the change feed. Currently change feed doesn’t log
deletes. Similar to the previous example, you can add a soft marker on
the items that are being deleted, for example, you can add an
attribute in the item called "deleted" and set it to "true" and set a
TTL on the item, so that it can be automatically deleted. You can read
the change feed for historic items, for example, items that were added
five years ago. If the item is not deleted you can read the change
feed as far as the origin of your container.
Change feed is not available for your requirements.
My idea:
Use Azure Function Cosmos DB Trigger to collect all the operations in your specific cosmos collection. Follow this document to configure the input of azure function as cosmos db, then follow this document to configure the output as azure queue storage.
Get the ids of changed items and send them into queue storage as messages.When you want to query the changed item,just query the messages from the queue to consume them at a specific unit time and after that just clear the entire queue. No items will be missed.
With your approach, you can get added/updated documents and save reference value (_ts and id field) somewhere (like blob)
SELECT * FROM collection WHERE userId=[some-user-id] AND _ts > [some-value] and id !='guid' order by _ts desc
This is a similar approach we use to read data from Eventhub and store checkpointing information (epoch number, sequence number and offset value) in blob. And at a time only one function can take a lease of that blob.
If you go with ChangeFeed, you can create listener (Function or Job) to listen all add/update data from collection and you can store those value in some collection, while saving data you can add Identity/version field on every document. This approach may increase your cosmos DB bill.
This is what the transaction consistency levels are for: https://learn.microsoft.com/en-us/azure/cosmos-db/consistency-levels
Choose strong consistency and your queries will always return the latest write.
Strong: Strong consistency offers a linearizability guarantee. The
reads are guaranteed to return the most recent committed version of an
item. A client never sees an uncommitted or partial write. Users are
always guaranteed to read the latest committed write.
Let's say I have a employees collection where I have one document per employee and I want to keep record of all changes that were made to a single employee doc. I was thinking of the following approach:-
Have a pendingEmployeeWrites collection where client is
only allowed to create documents. Each doc here will have an
employeeId field (this id is generated on client side for new employees).
Cloud function will be invoked whenever such a doc is created and then it validates the data. If valid, the employeeId doc in employees collection is overwritten with this data. Otherwise the pendingEmployeeWrites doc is updated to set isFailed as true. Client app is only allowed to read from employees collection.
Keeping pendingEmployeeWrites as a flat collection instead of a sub-collection allows me to pull all changes made by a user as well as all changes for a particular document. Does this approach make sense or is there a better approach that I should consider?