Is google datastore projection queries can get all data in an entity? - google-cloud-datastore

According to this documentation Here
If I use projection queries with all properties in entity.It will cost me 1 entity read for query and small operation for results.
Is it better than I get all key with key-only queries then get entity data with get(key)? It will cost me 1 entity read for query and N times entity read for entity data.
Thank you.

Note that while projecting will result in a single read op, you need to have all the fields you want to project to be present in the index. An additional index has storage cost and also potentially increase the write latency. So, if you are projecting just a couple of fields of small size, then you can create such a composite index and do a projection and it will only cost one operation.
If composite index with projection is not an option, then you can still try to resolve as much of your where clause as possible via the index and at that point a single query that fetches all the entities will only cost N and (not 1+N where you first get keys and then the entities).

Related

Indexes and array of maps in Firestore

I want to be sure that no, or very little, Firestore storage is used for indexing an array containing many maps. To my understanding when reading about Firestore index types, no index are created for array of maps in a document since that can not be queried. Am I right think this?
For example, here is an image of the array of maps:
There will be a lot of map elements in those progressionArray arrays but not enough to exceed 1MB per document. Since all progression data always needs to be loaded by the user, it seems best to me to store this data in an array to minimize Firestore reading costs (and index storage costs). Also there is no need to index this data since it will always all be loaded once by the user.
What are the indexing storage costs associated to this progressionArray? Are they zero like I think since it can not be queried?
Thank you!
The documentation says
“A single-field index stores a sorted mapping of all the documents in a collection that contain a specific field.” so indexes will be created for arrays of maps.
You can create an exemption for single field indexes as explained here.
The only cost the indexes have is the amount of storage it takes to save them. You can calculate the index cost with the values specified in this document.

Many tiny documents in CosmosDB

I have many (order of 100s) pieces of data that I want to associate with a document in CosmosDB. Each piece of data is small (order of 100s of bytes).
My first solution was to store the data as an array inside the document. This works okay, but in order to append a new item to the array I need to read the document from CosmosDB, add the element, then replace the document back into CosmosDB.
Instead of doing this I would like to store each piece of data as its own document in the same partition. What are the drawbacks of having many tiny documents vs the one aggregated document?
What are the drawbacks of having many tiny documents vs the one
aggregated document?
I would like to say that i suggest you storing each piece of data,instead of one aggregated document.
Reason1:As you mentioned in your question,if you want to add the element into the document,you need to read the document from CosmosDB, then replace the document because the partial update is not supported by cosmos db so far.(Please refer to this feedback and follow it if you need:https://feedback.azure.com/forums/263030-azure-cosmos-db/suggestions/6693091-be-able-to-do-partial-updates-on-document) That's a huge and tedious work.
Reason2:If you store pieces of data,you can query them flat. (select * from c)
For one single array document,you need to use join to access the nested properties.(select a.array from c join array in c.array)
Reason3:If you store pieces of data,you could manage them into different partitions.Even though you don't need it now,why not keep the feature for the future.
Reason4:As to cost,it all depends the RUs and storage and requests to cosmos db will consume RUs. If you store pieces of data,you just need to access the specific document as you want which is more economical i think.
Depends on your use case.
For frequent add operations, you are first reading and updating the document back (2 operations) which will incur you more cost than creating a new document (1 operation).
However, if the documents are having some sort of relationships (like foreign keys in traditional SQL), getting data would require multiple queries if you go with approach #1 above (have more cost) otherwise, you'll get the complete data in a single query (low cost).
I'd recommend to go through this and this posts which will give you better insights on which approach you can choose.
I'm facing this question right now and I want to let my contribution here. I'm having to store some statuses, this status is a metric that I get once per hour, then i have two options:
Create a register per status -> 24 registers per day
Create a register per day and add status inside it -> 1 register per day with 24 status inside an array
I chose the second one because:
Both options will have the same amount of operations on database
I'm using this data on Power BI and after doing some tests the data from second option had a small size after importation

A limit clarification for the new Firestore

So in the limits section (https://firebase.google.com/docs/firestore/quotas) of the new Firestore product from Firebase it says:
Maximum write rate to a collection in which documents contain
sequential values in an indexed field: 500 per second
We're pretty confused as to what that actually entails.
If we have, say, a root-level collection called users with 10 million entries in it, will this rate affect this collection in such a way, so only 500 users can update their data in any given second?
Can anyone clarify?
Sorry for the confusion; an example might help.
If your user documents contained a last-updated timestamp and you index on that timestamp then each new write would end up clustering around the same value (now) creating a hotspot in the index.
Similarly if you somehow assigned users a sequential value like a place in line or something like that this would also create a hotspot.
Incidentally this is why generated document IDs are random strings. This evenly distributes the writes on the primary key index.
If you avoid these kinds of patterns the sky's the limit, though during beta you'd hit the database-wide limit.
A quick additional note : for the moment all properties are indexed by default, so if you had a last-updated timestamp it would necessarily be indexed - so you would not be able to avoid the hotspoting.
Index disablement will be available down the road though.

Cosmos DB - Querying a heterogeneous collection?

Are there any performance penalties for having a heterogenous collection (with multiple, completely different, documents schema)?
e.g:
If I have 1000 docs with the same schema, querying it will happen faster than if I had 500 docs with schema A and 500 docs with schema B?
No, there are no penalties. By default all properties of your JSON documents are indexed automatically for you, so by filtering on a property like type you can easily filter down to distinct and varied document types inside the same collection. This is the way Cosmos is intended to be used.
Same screenshot as in your question about indexing policies but it is relevant here too because this topic comes up a lot with people investigating Cosmos. It's good to know that if you want to perform Order By queries the field you're ordering must be covered by an index precision of -1. This means that the default supports ordering on all numeric fields. If you intend to store dates as strings and order them you will need to modify the index paths to apply -1 meaning highest level of indexing.

Maximum records can be stored at Riak database

Can anyone give an example of maximum record limit in Riak database with specific hardware details? please help me in this case.I'm going to build a CDR information system. Will it be suitable to select Riak as my database?
Riak uses the 2^160 SHA-1 hash value to identify the partitions to store data in. Data is then stored in the identified partitions based on the bucket and key name. The size of the hash space is therefore not related to the amount of data that can be stored. Two different objects that happen to hash to the same value will therefore not overwrite each other.
When working with Riak, it is important to model your data correctly and consider how it needs to be retrieved and queried during the design process. Ideally you should try to ensure that the vast majority of your queries can be done through direct key access. It is often recommended to de-normalise your data and use natural keys. For CDRs this may mean creating an object holding all CDRs for a subscriber per day. These objects can be named based on the subscriber id and date, making it easy to retrieve data directly by key. It is also often more efficient to retrieve a few larger objects than many small ones and perform filtering in the application rather than try to just get the exact data that is needed. I have described this approach in greater detail here.
The limit to the number of records (or key/value pairs) you can store in Riak is governed only by the size of the hash space: 2^160. According to WolframAlpha, this is the number:
1461501637330902918203684832716283019655932542976
In other words, go nuts. :)

Resources