I’m modeling data and have a question about the onSnapshot (web) listener. As pointed out in a couple posts on SO and in the docs, after the initial invocation, the listener only fetches the changed data. I am interested to know what the changed data is. If listening on a document, is it just the field or the entire document that is fetched?
In a scenario where we have a listener on a Document, and the value of a field on that document changes (or a field is added, or removed), is only that field is fetched? In other words, is this similar to placing a child_changed/added/removed listener on a node in the RTDB?
The intent is to determine if I should keep frequently changing Documents, which clients must listen to, in RTDB or Firestore. I prefer not to resend the entire document to the client due to only a field change, if possible.
Example. We have the following document:
rando_id:
field1
field2
field3
If field2's value changes, will only field2 be the transmitted data from Firestore DB to the client? The same would apply to adding a field4 or removing field1. Would just those fields be sent to the client?
The unit of storage in Firestore is the document. There are no more granular ways to transmit data. There is no API to tell what exactly has changed in a document - you would have to determine that yourself using a prior snapshot, if available. You also can't target document fields in security rules. With documents in Firestore, it's either all or nothing.
Related
I use this code to get a collection snapshot from Firestore.
firestore().collection('project').where('userID', '==', authStore.uid).onSnapshot(onResult, onError);
This returns a huge amount of data, but I only need a few fields. Is it possible to query only a specific field? For example, if I only need the projectName and the creationDate fields.
Is it possible to query only a specific field?
No, that is not possbile. A Firestore listener fires on the document level. This means that you'll always get the entire document.
For example if I only need the projectName and the creationDate fields.
You cannot only get the value of a specific set of fields. It's the entire document or nothing. If you, however, only need to read those values and nothing more, then you should consider storing them in a separate document. This practice is called denormalization, and it's a common practice when it comes to NoSQL databases.
You might also take into consideration using the Firebase Realtime Database, for the duplicated data.
I have some documents in Firestore with two Timestamp fields named lastUpdated and lastProcessed in addition to other fields. lastUpdated field is updated when a user updates the record's fields via web console. lastProcessed field is updated when the backend function processes the document (as a result of user clicking a button).
Following are the possible combinations of these 2 fields.
User has only updated the document, but yet to process (lastUpdated == some_timestamp, lastProcessed == '')
User has updated the document, and then processed (lastUpdated < lastProcessed)
User has updated the document, processed and re-updated (lastUpdated > lastProcessed)
My requirement is to execute a query to get a subset of these records (say top 10), ordered by its most recent timestamp. So when evaluating a record for the ordering, lastUpdated field should be considered for scenarios 1 and 3 above. But lastProcessed field should be considered for scenario 2.
Is this possible with Firestore?
When querying the Firestore database it is not possible to execute the logic you explain in your question (i.e. calculate on the fly the scenario to be applied and define which field shall be used in the query).
One classical solution is to add an extra field to the document which contains the value to be queried for. The value of this field can be calculated (according to the business logic) when you modify the document from your frontend, or via a Cloud Function triggered in the backend each time the doc is changed.
The main advantage of using a Cloud Function is to prevent users modifying the value of this field.
I need to query a collection and return all documents that are new or updated since the last query. The collection is partitioned by userId. I am looking for a value that I can use (or create and use) that would help facilitate this query. I considered using _ts:
SELECT * FROM collection WHERE userId=[some-user-id] AND _ts > [some-value]
The problem with _ts is that it is not granular enough and the query could miss updates made in the same second by another client.
In SQL Server I could accomplish this using an IDENTITY column in another table. Let's call the table version. In a transaction I would create a new row in the version table, do the updates to the other table (including updating the version column with the new value. To query for new and updated rows I would use a query like this:
SELECT * FROM table WHERE userId=[some-user-id] and version > [some-value]
How could I do something like this in Cosmos DB? The Change Feed seems like the right option, but without the ability to query the Change Feed, I'm not sure how I would go about this.
In case it matters, the (web/mobile) clients connect to data in Cosmos DB via a web api. I have control of the entire stack - from client to back-end.
As the statements in this link:
Today, you see all operations in the change feed. The functionality
where you can control change feed, for specific operations such as
updates only and not inserts is not yet available. You can add a “soft
marker” on the item for updates and filter based on that when
processing items in the change feed. Currently change feed doesn’t log
deletes. Similar to the previous example, you can add a soft marker on
the items that are being deleted, for example, you can add an
attribute in the item called "deleted" and set it to "true" and set a
TTL on the item, so that it can be automatically deleted. You can read
the change feed for historic items, for example, items that were added
five years ago. If the item is not deleted you can read the change
feed as far as the origin of your container.
Change feed is not available for your requirements.
My idea:
Use Azure Function Cosmos DB Trigger to collect all the operations in your specific cosmos collection. Follow this document to configure the input of azure function as cosmos db, then follow this document to configure the output as azure queue storage.
Get the ids of changed items and send them into queue storage as messages.When you want to query the changed item,just query the messages from the queue to consume them at a specific unit time and after that just clear the entire queue. No items will be missed.
With your approach, you can get added/updated documents and save reference value (_ts and id field) somewhere (like blob)
SELECT * FROM collection WHERE userId=[some-user-id] AND _ts > [some-value] and id !='guid' order by _ts desc
This is a similar approach we use to read data from Eventhub and store checkpointing information (epoch number, sequence number and offset value) in blob. And at a time only one function can take a lease of that blob.
If you go with ChangeFeed, you can create listener (Function or Job) to listen all add/update data from collection and you can store those value in some collection, while saving data you can add Identity/version field on every document. This approach may increase your cosmos DB bill.
This is what the transaction consistency levels are for: https://learn.microsoft.com/en-us/azure/cosmos-db/consistency-levels
Choose strong consistency and your queries will always return the latest write.
Strong: Strong consistency offers a linearizability guarantee. The
reads are guaranteed to return the most recent committed version of an
item. A client never sees an uncommitted or partial write. Users are
always guaranteed to read the latest committed write.
Let's say I have a employees collection where I have one document per employee and I want to keep record of all changes that were made to a single employee doc. I was thinking of the following approach:-
Have a pendingEmployeeWrites collection where client is
only allowed to create documents. Each doc here will have an
employeeId field (this id is generated on client side for new employees).
Cloud function will be invoked whenever such a doc is created and then it validates the data. If valid, the employeeId doc in employees collection is overwritten with this data. Otherwise the pendingEmployeeWrites doc is updated to set isFailed as true. Client app is only allowed to read from employees collection.
Keeping pendingEmployeeWrites as a flat collection instead of a sub-collection allows me to pull all changes made by a user as well as all changes for a particular document. Does this approach make sense or is there a better approach that I should consider?
I'd like my web app router slugs to correspond to my Firestore documents data.
For example:
www.mysite.com/restaurants/burger-king
/restaurants <- Firestore Collection
/restaurants/mcdonalds <- Firestore Document
/restaurants/burger-king <- Firestore Document
This is easy enough, as I can assign the name as a slug-friendly UID in Firestore. The difficulty arises with CRUD functionality. I need to be able to rename my item titles, but Firestore does not permit you to rename indexes, which is the issue I'm facing.
One SO solution I saw was to delete the old record and creates a new one at the updated index. That's problematic for me, because sub-collections would be hard to transfer from the client side.
Are there more elegant solutions?
You don't have to identify a document by its ID. If you're able to ensure uniqueness of a document field value, you could instead query a collection for an ID value in a known field, then use the results of that query to satisfy your REST API. Then, you can change the value of that document field as often as you want, in order to satisfy required changes to the public API.