A limit clarification for the new Firestore - firebase

So in the limits section (https://firebase.google.com/docs/firestore/quotas) of the new Firestore product from Firebase it says:
Maximum write rate to a collection in which documents contain
sequential values in an indexed field: 500 per second
We're pretty confused as to what that actually entails.
If we have, say, a root-level collection called users with 10 million entries in it, will this rate affect this collection in such a way, so only 500 users can update their data in any given second?
Can anyone clarify?

Sorry for the confusion; an example might help.
If your user documents contained a last-updated timestamp and you index on that timestamp then each new write would end up clustering around the same value (now) creating a hotspot in the index.
Similarly if you somehow assigned users a sequential value like a place in line or something like that this would also create a hotspot.
Incidentally this is why generated document IDs are random strings. This evenly distributes the writes on the primary key index.
If you avoid these kinds of patterns the sky's the limit, though during beta you'd hit the database-wide limit.

A quick additional note : for the moment all properties are indexed by default, so if you had a last-updated timestamp it would necessarily be indexed - so you would not be able to avoid the hotspoting.
Index disablement will be available down the road though.

Related

Large arrays in Firestore Database (Best practices)

I am populating a series of dates and temperatures that I was thinking of storing in a Firestore Database to later be consumed by the front-end with the following structure:
{
date: ['1920-01-01', '1920-01-02', '1920-01-03', '1920-01-04', '1920-01-05', ...],
values: [20, 18, 19.5, 20.5, ...]
}
The array may consider a lot of years, so it turns huge, with thousands of entries. Firestore database started complaining about returning the too many index entries for entity error, and even if I get the data uploaded, the user interface Firebase -> Firebase Database -> Panel View collapses. That happens even with less than 3000 entries array.
The fact is that the data is consumed in the front-end with an array structure very similar to the one described above (I want to plot it using Echarts library). This way, I found this structure to be the more natural way, as any other alternative will require reversing the structure to arrays in the front-end.
Nevertheless, I see that Firestore Database very clearly does not like this structure. What should I do? What is the best practice for dealing with this kind of data in Firestore?
The indexes required for the most basic queries in Firestore are automatically created for you. However, there are some limits involved. So you're getting the following error:
too many index entries for entity
Because you hit the maximum number of index entries for a document, which is 40,000. If you add too many elements into an array or you add too many fields to a document, then you can reach the maximum limit.
So most likely the number of elements that exist in the date array + the number of elements that exist in the values array is bigger than 40k, hence the error.
To solve this, you might consider creating two separate documents, one for each array. If you still hit the maximum limit, then you might consider creating a document for each hour, and not for an entire day. In this way, you'll drastically reduce the number of elements that exist in an array.
If you don't find these solutions useful, then you have to set some "Single-field index exemptions" to avoid the above error.
Firestore is not the best tool to deal with time series. The best solution I found in Firestore was creating an independent document for each day in my data. Nevertheless, that raises the number of documents I need to fetch from the front-end side and, therefore, the costs.
By using large arrays in Firestore, you easily reach the index limit, and you are forced to remove the index, which I feel is a big red flag, suggesting checking another tool.
The solution I found, in case is useful for anyone, was building my API in Flask using MongoDB as a database. Although it takes more effort than just using Firestore, it deals better with time series and brings more flexibility.

How can I know if indexing a timestamp field on a collection of documents is going to cause problems?

I saw on the Firestore documentation that it is a bad idea to index monotonically increasing values, that it will increase latency. In my app I want to query posts based on unix time which is a double and that is a number that will increase as time moves on, but in my case not perfectly monotonically because people will not be posting every second, in addition I don't think my app will exceed 4 million users. does anyone with expertise think this will be a problem for me
It should be no problem. Just make sure to store it as number and not as String. Othervise the sorting would not work as expected.
This is exactly the problem that the Firestore documentation is warning you about. Your database code will incur a cost of "hotspotting" on the index for the timestamp at scale. Specifically, from that linked documentation:
Creates new documents with a monotonically increasing field, like a timestamp, at a very high rate.
The numbers don't have to be purely monotonic. The hotspotting happens on ranges that are used for sharding the index. The documentation just doesn't tell you what to expect for those ranges, as they can change over time as the index gains more documents.
Also from the documentation:
If you index a field that increases or decreases sequentially between documents in a collection, like a timestamp, then the maximum write rate to the collection is 500 writes per second. If you don't query based on the field with sequential values, you can exempt the field from indexing to bypass this limit.
In an IoT use case with a high write rate, for example, a collection containing documents with a timestamp field might approach the 500 writes per second limit.
If you don't have a situation where new documents are being added rapidly, it's not a near-term problem. But you should be aware that it just doesn't not scale up like reads and queries will scale against that index. Note that number of concurrent users is not the issue at all - it's the number of documents being added per second to an index shard, regardless of how many people are causing the behavior.

Should null elements be stored in Cosmos DB or should they be ignored?

Is there a good reason to serialize null elements in a Cosmos DB document or is it better to ignore them?
With the is_defined function I can query for undefined elements similar to how I query for null elements.
Does either consume less RUs? In my tests they seem to perform similarly.
If your query truly depends on filtering based on the existence of, or value of, an optional property, then do exactly that: either check for existence (or non-existence), or check that an optional property is a specific value you're looking for.
Storing null properties is an anti-pattern with document databases such as Cosmos DB. It's not required, and if you do decide to do it, you'll have to add new null properties to existing documents every time you add a new property (potentially costly, since you'd have to perform a ReplaceDocument() on every single existing document, every time you add a new property that can be null). Same thing when you decide to remove an optional property, and cleaning up all of your extraneous nulls.
Cosmos DB doesn't require every document to be the same, and you'd be giving up very big benefit by approaching data the same way as a relational store (where you do have to deal with nulls in table columns). Just imagine a shopping site, with thousands of product types, each with varying properties (books, CDs, lawn mowers, coffee...). You'd end up with thousands of null properties per document (which seems like a very unmanageable scenario, not to mention the per-document size limit you'll likely exceed eventually).
Also, you will incur additional RU per write, since every index will need to be updated for every document.
Not sending keys that don't have values will save you space some small amount of bytes (and thus RU/s) and there isn't any important performance difference in queries otherwise.
This could be significant if you have VERY sparse values among your keys. For instance, let's say you could have 1 of 1 million keys per doc and let's assume it is ~7 bytes per key. Well you'd be out of luck if you included all 1 million keys with a null value for all but one because in keys alone you'd have 7MB and your doc can only be 2MB.
It can add up for a single doc at scale. If one 7-byte key in each of 1 million documents reads is null (much more common) instead of undefined, it will theoretically cost 7000 RU/s to read them. That's about $340 a month spent on a key with a null value assuming you're doing 1M RPS the whole month (but that would only be .8% of your cost, so other optimizations like using the right indexes/etc. would make bigger differences).

How to delete Single-field indexes that generated automatically by firestore?

update:
TLDR;
if you reached here, you should recheck the way you build your DB.
Your document(s) probably gets expended over time (due to nested list or etc.).
Original question:
I have a collection of documents that have a lot of fields. I do not query documents even no simple queries-
I am using only-
db.collection("mycollection").doc(docName).get().then(....);
in order to read the docs,
so I don't need any indexing for this collection.
The issue is that firestore generates Single-field indexes automatically, and due to the amount of fields cause limitation exceeding of indexing:
And if I trying to add a field to one of the documents it throws me an error:
Uncaught (in promise) Error: Too many indexed properties for entity: app: "s~myapp",path < Element { type: "tags", name: "aaaa" }>
at new FirestoreError (index.cjs.js:346)
at index.cjs.js:6058
at W.<anonymous> (index.cjs.js:6003)
at Ab (index.js:23)
at W.g.dispatchEvent (index.js:21)
at Re.Ca (index.js:98)
at ye.g.Oa (index.js:86)
at dd (index.js:42)
at ed (index.js:39)
at ad (index.js:37)
I couldn't find any way to delete these single-field-indexing or to tell firestore to stop generating them.
I found this in firestore console:
but there is no way to disable this, and to disable auto indexing for a specific collection.
Any way to do it?
You can delete simple Indexes in Firestore firestore.
See this answer for more up to date information on creating and deleting indexes.
Firestore composite index permutation explosion?
If you go in to Indexes after selecting the firestore database and then select "single" indexes there is an Add exemption button which allows you to specify which fields in a Collection (or Sub-collection) have simple indexes generated by Firestore. You have to specify the Collection followed by the field. You then specify every field individually as you cannot specify a whole collection. There does not seem to be any checking on valid Collections or field names.
The only way I can think to check this has worked is to do a query using the field and it should fail.
I do this on large string fields which have normal text in them as they would take a long time to index and I know I will never search using this field.
Firestore creates two indexes for every simple field (ascending and descending) but it is also possible to create an exemption which removes one of these if you will never need the second one which helps improve performance and makes it less likely to hit the index limits. In addition you can select whether arrays are indexed or not. If you create a lot of entries it an Array, then this can very quickly hit the firestore limits on the number of indexes, so care has to be taken when using indexes and it will often be best to take the indexes off Arrays since the designer may have no control over how many Array data items are added with the result that the maximum index limit is reached and the application will get an error as the original poster explained.
You can also remove any simple indexes if you are not using them even if a field is included in a complex index. The complex index will still work.
Other things to keep an eye on.
If you are indexing a timestamp field (or any field that increases or decreases sequentially between documents) and you are not using this to force a sequence in queries, then there is a maximum write rate of 500 writes per second for the collection. In this case, this limit can be removed by removing the increasing and decreasing indexes.
Note that unlike the Realtime Database, fields created with Auto-ID do not guarantee any ordering as they are generated by firestore to spread writes and avoid hotspots or bottlenecks where all writes (and therefore reads) end up at a single location. This means that a timestamp is often needed to generate ordering but you may be able to design your collections / sub-collections data layout to avoid the need for a timestamp. For example, if you are using a timestamp to find the last document added to a collection, it might be better to just store the ID of the last document added.
Large array or map fields can also cause the 20,000 index entries per document limit to be reached, so you can exempt the array from indexing (see screenshot below).
Once you have added one exemption, then you will get this screen.
See this link as well.
https://firebase.google.com/docs/firestore/query-data/index-overview
The short answer is you can't do that right now with Firebase. However, this is a good signal that you need to restructure your database models to avoid hitting limits such as the 1MB per document.
The documentation talks about the limitations on your data:
You can't run queries on nested lists. Additionally, this isn't as
scalable as other options, especially if your data expands over time.
With larger or growing lists, the document also grows, which can lead
to slower document retrieval times.
See this page for more information about the advantages and disadvantages on the different strategies for structuring your data: https://firebase.google.com/docs/firestore/manage-data/structure-data
As stated in the Firestore documentation:
Cloud Firestore requires an index for every query, to ensure the best performance. All document fields are automatically indexed, so queries that only use equality clauses don't need additional indexes. If you attempt a compound query with a range clause that doesn't map to an existing index, you receive an error. The error message includes a direct link to create the missing index in the Firebase console.
Can you update your question with the structure data you are trying to save?
A workaround for your problem would be to create compound indexes, or as a last resource, Firestore may not be suited to the needs for your app and Firebase Realtime Database can be a better solution.
See tradeoffs:
RTDB vs Firestore
I don't believe that there currently exists the switch that you are looking for, so I think that leaves the following,
Globally disable built-in indexes and create all indexes explicitly. Painful and they have limits too.
A workaround where you treat your Cloud Firestore unfriendly content like a BLOB, like so:
To store,
const objIn = {text: 'my object with a zillion fields' };
const jsonString = JSON.stringify(this.objIn);
const container = { content: this.jsonString };
To retrieve,
const objOut = JSON.parse(container.content);

Maximum records can be stored at Riak database

Can anyone give an example of maximum record limit in Riak database with specific hardware details? please help me in this case.I'm going to build a CDR information system. Will it be suitable to select Riak as my database?
Riak uses the 2^160 SHA-1 hash value to identify the partitions to store data in. Data is then stored in the identified partitions based on the bucket and key name. The size of the hash space is therefore not related to the amount of data that can be stored. Two different objects that happen to hash to the same value will therefore not overwrite each other.
When working with Riak, it is important to model your data correctly and consider how it needs to be retrieved and queried during the design process. Ideally you should try to ensure that the vast majority of your queries can be done through direct key access. It is often recommended to de-normalise your data and use natural keys. For CDRs this may mean creating an object holding all CDRs for a subscriber per day. These objects can be named based on the subscriber id and date, making it easy to retrieve data directly by key. It is also often more efficient to retrieve a few larger objects than many small ones and perform filtering in the application rather than try to just get the exact data that is needed. I have described this approach in greater detail here.
The limit to the number of records (or key/value pairs) you can store in Riak is governed only by the size of the hash space: 2^160. According to WolframAlpha, this is the number:
1461501637330902918203684832716283019655932542976
In other words, go nuts. :)

Resources