I need to create a document inside a transaction.
The transaction object doesn't have the add function, probably because it would create multiple documents in case the Firestore retry the transaction.
The only way i can think is to use GUID as document id, but they make very long indexes to lookup.
Is there any way around? Another approach?
I wouldn't be too concerned about the length of a document ID. I don't think you'll find that performance will suffer if you use a GUID for an ID. If you're concerned about size, only you can compute about how much of your storage is consumed by IDs.
If you want to limit the size of a random document ID, you can simply generate your own random data and convert that to a string that follows the rules for Firestore document IDs. It could be something as simple as generating a X number of random letters and concatenating them.
Related
Assume I am designing a new Firestore database. Assume I like the idea of a hierarchical design and, as a contrived example, each Year has a sequence of child Weeks of which each has Days.
What's the most performance efficient way to retrieve a single document for today? i.e. 2021-W51-Thursday
Answers are permitted to include changes to the model, e.g. "denormalizing" the day model such that it includes year, week and dayName fields (and querying them).
Otherwise a simple document reference may be the fastest way, like:
DocumentReference ref = db
.Collection("years").Document("2021")
.Collection("weeks").Document("51")
.Collection("days").Document("Thursday");
Thanks.
Any query that identifies a single document to fetch is equally performant to any other query that does the same at the scale that Firestore operates. The organization of collections or documents does not matter at al at scale. You might see some fluctuations in performance at small scale, depending on your data set, but that's not how Firestore is optimized to work.
All collections and all subcollections each have at least one index on the ID of the document that works the same way, independent of each other collection and index. If you can identify a unique document using its path:
/db/XXXX/weeks/YY/days/ZZZZ
Then it scales the same as a document stored using a more flat structure:
/db/XXXXYYZZZZ
It makes no difference at scale, since indexes on collections scale to an infinite number of documents with no theoretical upside limit. That's the magic of Firestore: if the system allows the query, then it will always perform well. You don't worry about scaling and performance at all. The indexes are automatically sharded across computing resources to optimize performance (vs. cost).
All of the above is true for fields of a document (instead of a document ID). You can think of a document ID as a field of a document that must be unique within a collection. Each field has its own index by default, and it scales massively.
With NoSQL databases like Firestore, you should structure your data in such a way that eases your queries, as long as those queries can be supported by indexes that operate at scale. This stands in contrast with SQL databases, which are optimized for query flexibility rather than massive scalability.
Lets say we have a order collection in firestore where each order needs to have a unique readable random order number with lets say 8 digits:
{
orderNumber: '19456734'
}
So for every incoming order we want to generate this unique number. What is the recommended approach in firestore to make sure no other document is using it?
Note: One solution would be querying existing docs before saving, but this is not working in a concurrent scenario where multiple orders arrive at the same time (?).
The easiest to guarantee that some value is unique in a collection, is to use that value as the key/ID for the documents in that collection. Since keys/IDs are by definition unique in their collection, this implicitly enforces your requirement.
The only built-in way to generate unique IDs is by calling the add() method, which generates a UUID for the new document. If you don't want to use UUIDs to identify your orders, you'll have to roll your own mechanism.
The two most common approaches:
Generate a unique number and check if it's already taken. You'd do this in a transaction of course, to ensure no two instances can claim the same ID.
Keep a global counter (typically in a document at a well-known location) of the latest ID you've handed out, and then read-increment-write that in a transaction to get the ID for any new document. This is typically what other databases do for their built-in auto-ID fields.
update:
TLDR;
if you reached here, you should recheck the way you build your DB.
Your document(s) probably gets expended over time (due to nested list or etc.).
Original question:
I have a collection of documents that have a lot of fields. I do not query documents even no simple queries-
I am using only-
db.collection("mycollection").doc(docName).get().then(....);
in order to read the docs,
so I don't need any indexing for this collection.
The issue is that firestore generates Single-field indexes automatically, and due to the amount of fields cause limitation exceeding of indexing:
And if I trying to add a field to one of the documents it throws me an error:
Uncaught (in promise) Error: Too many indexed properties for entity: app: "s~myapp",path < Element { type: "tags", name: "aaaa" }>
at new FirestoreError (index.cjs.js:346)
at index.cjs.js:6058
at W.<anonymous> (index.cjs.js:6003)
at Ab (index.js:23)
at W.g.dispatchEvent (index.js:21)
at Re.Ca (index.js:98)
at ye.g.Oa (index.js:86)
at dd (index.js:42)
at ed (index.js:39)
at ad (index.js:37)
I couldn't find any way to delete these single-field-indexing or to tell firestore to stop generating them.
I found this in firestore console:
but there is no way to disable this, and to disable auto indexing for a specific collection.
Any way to do it?
You can delete simple Indexes in Firestore firestore.
See this answer for more up to date information on creating and deleting indexes.
Firestore composite index permutation explosion?
If you go in to Indexes after selecting the firestore database and then select "single" indexes there is an Add exemption button which allows you to specify which fields in a Collection (or Sub-collection) have simple indexes generated by Firestore. You have to specify the Collection followed by the field. You then specify every field individually as you cannot specify a whole collection. There does not seem to be any checking on valid Collections or field names.
The only way I can think to check this has worked is to do a query using the field and it should fail.
I do this on large string fields which have normal text in them as they would take a long time to index and I know I will never search using this field.
Firestore creates two indexes for every simple field (ascending and descending) but it is also possible to create an exemption which removes one of these if you will never need the second one which helps improve performance and makes it less likely to hit the index limits. In addition you can select whether arrays are indexed or not. If you create a lot of entries it an Array, then this can very quickly hit the firestore limits on the number of indexes, so care has to be taken when using indexes and it will often be best to take the indexes off Arrays since the designer may have no control over how many Array data items are added with the result that the maximum index limit is reached and the application will get an error as the original poster explained.
You can also remove any simple indexes if you are not using them even if a field is included in a complex index. The complex index will still work.
Other things to keep an eye on.
If you are indexing a timestamp field (or any field that increases or decreases sequentially between documents) and you are not using this to force a sequence in queries, then there is a maximum write rate of 500 writes per second for the collection. In this case, this limit can be removed by removing the increasing and decreasing indexes.
Note that unlike the Realtime Database, fields created with Auto-ID do not guarantee any ordering as they are generated by firestore to spread writes and avoid hotspots or bottlenecks where all writes (and therefore reads) end up at a single location. This means that a timestamp is often needed to generate ordering but you may be able to design your collections / sub-collections data layout to avoid the need for a timestamp. For example, if you are using a timestamp to find the last document added to a collection, it might be better to just store the ID of the last document added.
Large array or map fields can also cause the 20,000 index entries per document limit to be reached, so you can exempt the array from indexing (see screenshot below).
Once you have added one exemption, then you will get this screen.
See this link as well.
https://firebase.google.com/docs/firestore/query-data/index-overview
The short answer is you can't do that right now with Firebase. However, this is a good signal that you need to restructure your database models to avoid hitting limits such as the 1MB per document.
The documentation talks about the limitations on your data:
You can't run queries on nested lists. Additionally, this isn't as
scalable as other options, especially if your data expands over time.
With larger or growing lists, the document also grows, which can lead
to slower document retrieval times.
See this page for more information about the advantages and disadvantages on the different strategies for structuring your data: https://firebase.google.com/docs/firestore/manage-data/structure-data
As stated in the Firestore documentation:
Cloud Firestore requires an index for every query, to ensure the best performance. All document fields are automatically indexed, so queries that only use equality clauses don't need additional indexes. If you attempt a compound query with a range clause that doesn't map to an existing index, you receive an error. The error message includes a direct link to create the missing index in the Firebase console.
Can you update your question with the structure data you are trying to save?
A workaround for your problem would be to create compound indexes, or as a last resource, Firestore may not be suited to the needs for your app and Firebase Realtime Database can be a better solution.
See tradeoffs:
RTDB vs Firestore
I don't believe that there currently exists the switch that you are looking for, so I think that leaves the following,
Globally disable built-in indexes and create all indexes explicitly. Painful and they have limits too.
A workaround where you treat your Cloud Firestore unfriendly content like a BLOB, like so:
To store,
const objIn = {text: 'my object with a zillion fields' };
const jsonString = JSON.stringify(this.objIn);
const container = { content: this.jsonString };
To retrieve,
const objOut = JSON.parse(container.content);
We have an application that allows users to "follow" other users. When a user follows another, we register this data as a document within documentDB, like this:
{
"followerId": "userUUID",
"artistId": "artistUserUUID"
}
We now want to get a list of artists, ordered by the count of followers they have. So I am looking to somehow ask the DB to, based on these documents, give me back an array of artistUserUUId's, ordered by the amount of followers they have registered (as expressed in documents like the example given above).
Alternatively, we are also open to add an Array property to the document of the artistUser themselves, though even in this scenario I am still unsure how to do an ORDER BY based on the counting of a document's property (this property being an array of follower Ids).
I guess a workaround would be to add a stored procedure or trigger that will update a counter property within the artistUser document, but I'd like to validate if these is a way to implement this counting feature natively without such a trick.
Unless you denormalize the follower count into artist user documents (as you suggest), then you'll have to fetch every follower to accomplish your goal. Fetching every follower document, may or may not be prohibitive depending upon how many there are. If you fetch them only into a stored procedure rather than your actual client, it's conceptually no less efficient than an SQL GROUP_BY clause. Design your stored procedure to do the count and only returns the table of artist and counts. A robust implementation would incrementally update your output table in pages and be able to restart where it left off after a stored procedure timeout. Look at my countDocuments example stored procedure in documentdb-mock as well as my "Pattern for writing stored procedures" in the documentation for documentdb-utils for how I typically accomplish this.
I am using a riak bucket to store a list of messages, using a UUID as the key and a json message as value. This is working fine.
What I need is an efficient way to get a single message from the bucket without knowing its key, at least in one of these two scenarios:
Get the last inserted object (this is my prefered approach).
Get a random object from the bucket (if the first alternative is not possible).
Is there any efficient way to achieve that?
I think one alternative could be to retrieve the keys in the bucket and then get the first one. But this means making two calls to riak, one to obtain all the keys (just to discard all but one) and a second one to obtain the object. It does not seem very efficient.
As Riak is a key-value store, the by far most efficient way to retrieve data is through the keys. Listing or retrieving all keys in a bucket, even if you only end up using the one returned first, is one of the least efficient operations you can perform as it causes Riak to scan ALL keys in the system (not just the bucket), and it is usually recommended NEVER to use this on a production system.
The most efficient way to get the last inserted object would probably be to store the id in a separate, known record in a different bucket. This would however require you to perform two writes on every insert and two reads for every read, but would do so in the most efficient way. You could possibly implement a post-commit hook (would have to be in Erlang as it is not currently not possible to write records using JavaScript functions) on the bucket containing messages to get the system to perform the update for you, which would remove the need for the last write.
If you write a lot of data to the bucket containing messages, you may want to adjust the separate bucket so that it does not allow multiple values and that the last value wins. This way you would reduce the risk of having lots of siblings created due to frequent updates to this single record across the system. This would always give you one of the last written records, but not necessarily the last one (especially if you frequently write messages to the database), as Riak does not support any type of atomicity and is an eventually consistent database.
You could also create one or more secondary indexes if you are using the leveldb backend, and use this to limit your scan to only recent records, which would be more efficient than a scann of all keys. You could then either select the most recent key or a random one through mapreduce, but this would be much less efficient than the previously described approach.
I can not think of any efficient way to retrieve a random record in a bucket from Riak unless you know the range of keys you have inserted and can decide randomly on the client which one to get. One way to do this would be to generate all keys in sequence rather than using a UUID, but that is naturally not a good idea in a highly concurrent distributed system.
1st task is pretty easy to implement:
Add post-commit hook that will write the last inserted key to some predefined key/bucket place
Get the key from that predefined key/bucket and issue a get query using them
It's still two operations but both are just gets that are fast. Plus additional overhead on hook but nothing too heavy either.
2nd scenario is also easy, but it is way too inefficient to be used practically:
Get all keys (extremely expensive operation)
Pick random
Issue get
I have come up with the same scenario. In My scenario I have to save the users. For that I required an auto increment Id. So what I did is, I placed the last inserted key in a separate bucket as like mentioned by "Christian Dahlqvist", every time I want to insert new record I fetch the last inserted key from that key bucket. Here we have only one value in that bucket with the key as "LastKey" which is always known to us. And I incremented the key based on the fetched key and again updated the key bucket. So always the key bucket contains the latest key in it.