What is index entry on firestore? - firebase

I am tring to use COUNT() method on firestore which release weeks ago and pricing is base on index entry, But what is index entry? Assume that I have thousands documents in a single collection like this without composite index, Just default index for each fields.
nickname: Jack
age: 28
score: 72
Then run query like this
Query query = collection.whereEqualTo("score", "72").count().get().get();
I am getting like 1 index entry no matter what amount of documets matches?

An index is essentially a sorted list. When you are querying whereEqualTo("score", "72"), Firestore will use the default index created for score field.
I am getting like 1 index entry no matter what amount of documents matches?
As mentioned in the pricing documentation, there is a minimum charge of one document read even if no index entry matches the query. Thereafter, it totally depends on the number of index entries that match your query. For example, if there are 34553 documents with score equal to 72, then that cost you 35 reads:
const reads = Math.floor(34553/1000) + 1 // 35

Related

Azure cosmos db - Default order of documents returned by WHERE clause

In what order are documents returned for a Select query without an Order by clause?
Example query - SELECT * FROM c WHERE c.type=someType
Is it based on id of the documents or last modified timestamp(_ts) or the timestamp of creation or some random order?
If this info helps - This query is performed in a collection with only one partition whose partitionKey is null, and there are atmost 3 documents for a 'type'
Is it based on id of the documents or last modified timestamp(_ts) or
the timestamp of creation or some random order?
Based on my test, if you do not set any sort rule, it will be sorted as default based on the time created in the database,whatever it is partitioned or not.
In above sample documents, the sort will not be changed if I change the id,partition key(that's name) or ts.

Some firestore quotas misunderstandings

I'm planning a stucture of db in firestore and cannot understand some standard points
Link: https://firebase.google.com/docs/firestore/quotas
1 point
This means total memory size in db of fields which are composite indexed in collection?
3 point
Is 20000 similar to max count of fields in document because due to doc every field is automatically indexed? or they mean something like
queryRef
.where('field1', '<', someNumber1)
...
.where('field20000', '<', someNumber20000);
Sorry for my not good english
Point 1
You can see how size is calculated in the Storage Size Calculation documentation.
Index entry size
The size of an index entry is the sum of:
The document name size of the indexed document
The sum of the indexed field sizes
The size of the indexed document's collection ID if the
index is an automatic index (does not apply to composite indexes)
32
additional bytes
Using this document in a Task collection with a numeric ID as an example:
Task id:5730082031140864
- "type": "Personal"
- "done": false
- "priority": 1
If you have a composite index on type + priority (both ascending), the total size of the index entry in this index is 84 bytes:
29 bytes for the document name
6 bytes for the done field name and boolean value
17 bytes for the priority field name and integer value
32 additional bytes
Point 2
For single field indexes (the ones we automatically create), we create 2 indexes per field: ascending + descending.
This means 10000 fields will hit the 20000 index entry limit, so 10000 fields is the current maximum. Less if you also have composite indexes since it will consume some of the 20000 index entry limit per document.

DynamoDB Last Evaluated Key Expiration?

My application ingests data from a 3rd party REST API which is backed by DynamoDB. The results are paginated and thus I page forward by passing the last evaluated key to each subsequent request.
My question is does the last evaluated key have a shelf life? Does it ever expire?
Let's say I query the REST API and then decide to stop. If I save the last evaluated key, can pick up exactly where I left off 30 days later? Would that last evaluated key still work and return the correct next page based on where I left off previously?
You shouldn't think of the last evaluated key like a "placeholder" or a "bookmark" in a result set from which to resume paused iteration.
You should think of it more like a "start from" place marker. An example might help. Let's say you have a table with a hash key userId and a range key timestamp. The range key timestamp will provide an ordering for your result set. Say your table looked like this:
user ID | Timestamp
1 | 123
1 | 124
1 | 125
1 | 126
In this order, when you query the table for all of the records for userId 1, you'll get the records back in the order they're listed above, or ascending order by timestamp. If you wanted them back in descending order, you'd use Dyanmo DB's scanIndexForward flag to indicate to order them "newest to oldest" or in descending order by timestamp.
Now, suppose there were a lot more than 4 items in the table and it would take multiple queries to return all of the records with a userId of one. Well, you wouldn't want to have to keep getting pages and pages back, so you can tell Dynamo DB where to start by giving it the last evaluated key. Say the last result for the previous query was the record with userId = 1 and timestamp = 124. You tell Dynamo in your query that that was the last record you got, and it will start your next result set with the record that has userId = 1 and timestamp = 125.
So the last evaluated key isn't something that "expires," it's a way for you to communicate to Dynamo which records you want it to return based on records that you've already processed, displayed to the user, etc.

How to design DynamoDB table to facilitate searching by time ranges, and deleting by unique ID

I'm new to DynamoDB - I already have an application where the data gets inserted, but I'm getting stuck on extracting the data.
Requirement:
There must be a unique table per customer
Insert documents into the table (each doc has a unique ID and a timestamp)
Get X number of documents based on timestamp (ordered ascending)
Delete individual documents based on unique ID
So far I have created a table with composite key (S:id, N:timestamp). However when I come to query it, I realise that since my id is unique, because I can't do a wildcard search on ID I won't be able to extract a range of items...
So, how should I design my table to satisfy this scenario?
Edit: Here's what I'm thinking:
Primary index will be composite: (s:customer_id, n:timestamp) where customer ID will be the same within a table. This will enable me to extact data based on time range.
Secondary index will be hash (s: unique_doc_id) whereby I will be able to delete items using this index.
Does this sound like the correct solution? Thank you in advance.
You can satisfy the requirements like this:
Your primary key will be h:customer_id and r:unique_id. This makes sure all the elements in the table have different keys.
You will also have an attribute for timestamp and will have a Local Secondary Index on it.
You will use the LSI to do requirement 3 and batchWrite API call to do batch delete for requirement 4.
This solution doesn't require (1) - all the customers can stay in the same table (Heads up - There is a limit-before-contact-us of 256 tables per account)

Hbase schema design -- to make sorting easy?

I have 1M words in my dictionary. Whenever a user issue a query on my website, I will see if the query contains the words in my dictionary and increment the counter corresponding to them individually. Here is the example, say if a user type in "Obama is a president" and "Obama" and "president" are in my dictionary, then I should increment the counter by 1 for "Obama" and "president".
And from time to time, I want to see the top 100 words (most queried words). If I use Hbase to store the counter, what schema should I use? -- I have not come up an efficient one yet.
If I use word in my dictionary as row key, and "counter" as column key, then updating counter(increment) is very efficient. But it's very hard to sort and return the top 100.
Anyone can give a good advice? Thanks.
You can use the natural schema (row key as word and column as count) and use IHBase to get a secondary index on the count column. See https://issues.apache.org/jira/browse/HBASE-2037 for the initial implementation; the current code lives at http://github.com/ykulbak/ihbase.
From Adobe's presentation at HBaseCon 2012 (slide 28 in particular), I suggest using two tables and this sort of data structure for the row key:
name
President => 1000
Test => 900
count
429461296:President => dummyvalue
429461396:Test => dummyvalue
The second table's row keys are derived by using Long.MAX_VALUE - count at that point of time.
As you get new words, just add the "count:word" as a row key to the count table. That way, you always have the top words returned first when you scan the table.
Sorting 1M longs can be done in memory, so what?
Store words x,y,z issued at time t as key:t cols:word:x=1 word:y=1 word:z=1 in a table. Then use a MapRed job to sum up counts for words and get the top 100.
This also enables further analysis.

Resources