I have read the Firebase docs about priorities but I don't think I understand it yet.
I think I understand that it is related to querying and sorting data. To give my question (and the answers) some weight, in what instances might you use priorities?
From the docs I read that you can set priorities when you set a value at some reference, and then when you query that reference priority determines the ordering based on its type and value. And that makes some sense but I'm not quite understanding it.
Disclosure: I work for Firebase.
Priorities are an optional (numeric or alphanumeric) value of each node, which is used to sort the children under a specific parent or in a query if no other sort condition is specified. The priority of a node is hidden from most views of the data. In cases where a priority is specified for a node, it can be found as a .priority property in the exportVal() of a snapshot.
Since Firebase added the ability to order children on a specified property, priorities have lost most of their value. They are a left-over artifact from the time before Firebase had orderByChild queries. If you are starting on a Firebase project today, you should use orderByChild instead of relying on priorities.
Related
This article here recommends using the eventId as the document id to prevent multiple creations of a document due to background process retries. Is it guaranteed that there will never be a collision?
Mentioned article is showing how to avoid duplicate item created by retires of unsuccessful function. In shortcut its saying that if you use add method (reference) and function is retried (but failed after Firestore write) you may have a problem with 2 documents identical created in Firestore with different IDs created automatically.
As solution to this author is proposing to create documentID with eventID and write to it using set (refrence).
This approach gives you 100% that retries of the same function invocation will not create duplicate items.
Backing to the question... I think you are afraid that 2 different invocation will want will have the same event_id and the document can be overwritten. This I think is possible, but in my opinion it's not in scope of this article as it's answers different question and creating as simple use case as possible to help understand the approch.
Lets imagine we have to different functions invoked by the same event writing different content to the same collection. The result will be unpredictable, I think. However in such situation you can use the same mechanism, little bit upgraded ex. like this <function_name>_<event_id>. Using the example from the article it will be small change like:
...
return db.collection('contents').doc('<function_name>_'+eventId).set(content).then
...
So in my understanding if you afraid of collision you should add additional elements to created document references, like in the example above.
From my point of view, an ability to use an event_id as a firestore document id depends on a your context and requirements.
For example - from the "business" point of view - is the message/event really a unique business related thing (thus you really would like to avoid duplication of messages)? Or are there some other business entity which is to be unique, but there can be more than one messages (with different event_id) about that business entity?
On top of that, from the best of my knowledge, it may be a good practice to generate/create the firestore document ids randomly (as a hash, of a guid, etc.). In that case, the search/retrieval from the firestore should work "faster". So, I don't know if the event_id is "random" enough in your context. Maybe it is Ok, may be not...
In my personal experience I try to generate a document id as a hex digest of a hash from a string (may be composed string), which supposed to be unique in the business context. For example, the event/message - is a google.storage.object.finalize event. In that case, I would use some metadata about the underlined object/file. Depends on the business context and requirements, or can be (or not be) a bucket name, object name, size, md5 or crc32c etc. or a combination of those elements... The chosen elements are concatenated into a string, then a hash is calculated, and a hex digest of that hash becomes a document id in the firestore collection.
I have an entity that represents a relationship between two entity groups but the entity belongs to one of the groups. However, my queries for this data are going to be mostly with the other entity group. To support the queries I see I have two choices a) Create a global index that has the other entity group key as prefix b) Move the entity into the other entity group and create ancestor index.
I saw a presentation which mentioned that ancestor indexes map internally to a separate table per entity group while there is a single table for the global index. That makes me feel that ancestors are better than using global indexes which includes the ancestor keys as prefix for this specific use case where I will always be querying in the context of some ancestor key.
Looking for guidance on this in terms of performance, storage characteristics, transaction latency and any other architectural considerations to make the final call.
From what I was able to found I would say it depends on the of the type of work you'll be doing. looked at this docs and it suggest you Avoid writing to an entity group more than once per second. Also indexing a property could result in increased latency. Also it states that If you need strong consistency for your queries, use an ancestor query, in that docs there are many advice's on how to avoid latency and other issues. it should help you on taking a call.
I ended up using a 3rd option which is to have another entity denormalized into the other entity group and have ancestor queries on it. This allows me to efficiently query data for either of the entity groups. Since I was using transactions already, denormalizing wouldn't cause any inconsistencies and everything seems to work well.
The process to create/remove composite index using cloud datastore emulator is straightforward (here, here and here), but I couldn't find any means to exclude literally hundreds (or even thousands) non-composite index that datastore generates automatically.
Is there any method to do so?
Note: I didn't use the standalone emulator yet, only the datastore emulation bundled inside the local development server for standard environment GAE apps, but I presume the implementation is similar.
The emulator replicates the real datastore behaviour. And the datastore creates these built-in indexes for all indexed properties. From Indexes:
Built-in indexes
By default, Cloud Datastore automatically predefines an index for each
property of each entity kind. These single property indexes are
suitable for simple types of queries.
The built-in indexes are also used for the datastore/admin GQL queries and for sorting entities in the datastore admin pages, both on GCP and in local emulation.
The only way to prevent creation of built-in indexes is to mark the respective properties as unindexed (or excluded). From Excluded properties:
If you know you will never have to filter or sort on a particular
property, you can tell Cloud Datastore not to maintain index entries
for that property by excluding it from indexes. This lowers the cost
of running your application by reducing the storage size needed
for index entries. An entity with an excluded property behaves as if
the property were not set: queries with a filter or sort order on the
excluded property will never match that entity.
Note: If a property appears in a composite index, then excluding the property will disable it in the composite index. For example,
suppose that an entity has properties priority and done and that
you want to create an index able to satisfy queries like WHERE
priority = 4 AND done = FALSE. Also suppose that you don't care about
the queries WHERE priority = 4 and WHERE done = FALSE. If you
exclude priority from indexes and create an index for priority and
done, Cloud Datastore will not create index entries for the
priority and done index and so the WHERE priority = 4 AND done =
FALSE query won't work. For Cloud Datastore to create entries for the
priority and done indexes, both priority and done must be
indexed.
Also note that excluded properties can't be used in projection queries.
I have a legacy Firebase project i contribute to. In it I have the following rules for the resource songs:
"songs": {
".indexOn": ["artist_timestamp"]
},
Which allows me to do things like curl htttp://my-fire-base-ref/songs.json?orderBy="artist_timestamp"
However I can also do orderBy="$priority" which is a property we add to all song objects. This works even though it is not explicitly in the rules json definition. Is this a secretly allowed property??
The .priority of each node is implicitly indexed, so you don't need to define an index for it.
Why are you using priorities though? While they still work, using named properties allows you to accomplish the same with more readable code. See What does priority mean in Firebase?
According to the documentation for indexing data:
Firebase provides powerful tools for ordering and querying your data.
Specifically, Firebase allows you to do ad-hoc queries on a collection
of nodes using any common child key. As your app grows, the
performance of this query degrades. However, if you tell Firebase
about the keys you will be querying, Firebase will index those keys at
the servers, improving the performance of your queries.
This means you can order by any key at any time without specifying it as an index, but without a specific index specified for a key, performance may be very bad for large sets of data.
I want to maintains user comments as an ordered list, based on the server write-time. A couple of questions regarding that:
Can I create (set / push) a locations based on ServerValue.TIMESTAMP?
Can I validate (1) with rules?
Can I safely assume that no two posts will be written at the exact "same moment", overriding eachother?
If I decide to use setWithPriority, and not order by the location-name, can I use ServerValue.TIMESTAMP for the priority (and validate it by rules....)
"Bonus" question - Is there still no way to create a COUNT query in firebase?
=== EDIT ===
I'm trying to achieve a chat-like feature. Messages must be ordered chronologically (by a server-side timestamp) in order to maintain a logical order for "discussion" (if I'll use a client-generated order than one local's machine clock offset could ruin the entire discussion). I can use rules to validate that a ServerValue.TIMESTAMP field is persisted to any new message, however, I can't seem to find a way to make sure that clients actually use setWithPriority() in order to persist data. I can't figure out any way to do this - am i missing something?
you can either .push() which generates auto ID as key or you can .setWithPriority() where your key can be anything and the priority can be pretty much anything as well. As far as I'm aware there is no option to have serverValue.TIMESTAMP as a key. The only way is to retrieve it and explicitly set it as the key with .child(retrievedTime).set(someData)
see 1)
Not sure what you mean exactly since there is no option to set servervalue as a key, but IMHO this is one of the reasons why is it so.
You can use some field on database and set ServerValue.TIMESTAMP and listen for the change - in the callback you will get the most current server time pretty much ASAP as the placeholder is replaced with TIMESTAMP. However there is no guarantee that it is going to be unique. It is perfectly legitimate to have to records with the same priority.
It is promised, but not yet available. You you can do it manually with transactions on every write/delete see docs