I'm planning a stucture of db in firestore and cannot understand some standard points
Link: https://firebase.google.com/docs/firestore/quotas
1 point
This means total memory size in db of fields which are composite indexed in collection?
3 point
Is 20000 similar to max count of fields in document because due to doc every field is automatically indexed? or they mean something like
queryRef
.where('field1', '<', someNumber1)
...
.where('field20000', '<', someNumber20000);
Sorry for my not good english
Point 1
You can see how size is calculated in the Storage Size Calculation documentation.
Index entry size
The size of an index entry is the sum of:
The document name size of the indexed document
The sum of the indexed field sizes
The size of the indexed document's collection ID if the
index is an automatic index (does not apply to composite indexes)
32
additional bytes
Using this document in a Task collection with a numeric ID as an example:
Task id:5730082031140864
- "type": "Personal"
- "done": false
- "priority": 1
If you have a composite index on type + priority (both ascending), the total size of the index entry in this index is 84 bytes:
29 bytes for the document name
6 bytes for the done field name and boolean value
17 bytes for the priority field name and integer value
32 additional bytes
Point 2
For single field indexes (the ones we automatically create), we create 2 indexes per field: ascending + descending.
This means 10000 fields will hit the 20000 index entry limit, so 10000 fields is the current maximum. Less if you also have composite indexes since it will consume some of the 20000 index entry limit per document.
Related
I have a table in dynamodb, records has priority field. 1-N.
records shown to user with a form and user can update priority field, it means I need to change the priority of the field.
one solution is like when priority of a record changed I reorder all the records that their priory is more than it.
for example if I change a priority of record in N= 5 to 10, I need to order all records that their priority field is more than 5.
what do you recommend?
DynamoDB store all items(records) in order by a tables sort-attribute. However, you are unable to update a key value, you would need to delete and add a new item every time you update.
One way to overcome this is to create a GSI. Depending on the throughput required for you table you may need to artificially shard the partition key. If you expect to consume less than 1000 WCU per second, you won't need to.
gsipk
gsisk
data
1
001
data
1
002
data
1
007
data
1
009
data
Now to get all the data in order of priority you simply Query your index where gsipk = 1.
You can also Update the order attribute gsisk without having to delete and put an item.
I am tring to use COUNT() method on firestore which release weeks ago and pricing is base on index entry, But what is index entry? Assume that I have thousands documents in a single collection like this without composite index, Just default index for each fields.
nickname: Jack
age: 28
score: 72
Then run query like this
Query query = collection.whereEqualTo("score", "72").count().get().get();
I am getting like 1 index entry no matter what amount of documets matches?
An index is essentially a sorted list. When you are querying whereEqualTo("score", "72"), Firestore will use the default index created for score field.
I am getting like 1 index entry no matter what amount of documents matches?
As mentioned in the pricing documentation, there is a minimum charge of one document read even if no index entry matches the query. Thereafter, it totally depends on the number of index entries that match your query. For example, if there are 34553 documents with score equal to 72, then that cost you 35 reads:
const reads = Math.floor(34553/1000) + 1 // 35
We're looking to migrate some data into InfluxDB. I'm working with InfluxDB 2.0 on a test server to determine the best way to stock our data.
As of today, I have about 2.7 billion series to migrate to InfluxDB but that number will only go up.
Here is the structure of the data I need to stock:
ClientId (332 values as of today, string of 7 characters)
Driver (int, 45k values as of today, will increase)
Vehicle (int, 28k values as of today, will increase)
Channel (100 values, should not increase, string of 40 characters)
value of the channel (float, 1 value per channel/vehicle/driver/client at a given timestamp)
At first, I thought of stocking my data this way:
One bucket (as all data have the same data retention)
Measurements = channels (so 100 kind of measurements are stocked)
Tag Keys = ClientId
Fields = Driver, Vehicle, Value of channel
This gave me a cardinality of 1 * 100 * 332 * 3 = 99 600 according to this article
But then I realized that InfluxDB handle duplicate based on "measurement name, tag set, and timestamp".
So for my data, this will not work, as I need the duplicate to be based on ClientId, Channel, Vehicle at the minimum.
But if I change my data structure to be stored this way:
One bucket (as all data have the same data retention)
Measurements = channels (so 100 kind of measurements are stocked)
Tag Keys = ClientId, Vehicle
Fields = Driver, Value of channel
then I'll get a cardinality of 2 788 800 000.
I understand that I need to keep cardinality as low as possible. (And ideally I would even need to be able to search by driver as well as by vehicle.)
My questions are:
If I split the data into different buckets (ex: 1 bucket per clientId), will it decrease my cardinality?
What would be the best way to stock data for such a large amount of series?
I am querying DynamoDB from python and I would like to specify max ReadCapacityUnits that the query should use.
For example, my table has 100 ReadCapacityUnits, I would like to use only 5% of it which is 20.
Below is my query, how can I specify ReadCapacityUnits in this query
paginator = ddb_client.get_paginator('query')
response_iterator = paginator.paginate(TableName=table_name,
IndexName=INDEX_GSI,
KeyConditionExpression=condition,
ExpressionAttributeNames={ATTR_NAME: HASH_KEY},
ExpressionAttributeValues={
PLACEHOLDER: {'S': str(value)},
},
ConsistentRead=False,
ScanIndexForward=False,
PaginationConfig={"PageSize": 25})
You can't.
How big are your records?
Since an RCU is a read of up to 4KB of data (per second), and you specified a page size of 25, you'd have to have records larger than about 160 bytes for your query to consume more than 1 RCU.
Lets say your records are 1K, so for each RCU, you can read 4. Since your page size is 25, that would take only 7 RCU.
Key thing here is you're not using a filter express. (Good!)
With a filter express, you still pay for the data to be read even if it's not returned.
Also note that DDB will only read 1MB of data, before returning. (Even if all records are filtered out). This is why SCAN() as opposed to Query() can eat up your RCU.
Can you constrain the max length of a string field?
The documentation describes only the internal limits for a field.
Strings are Unicode with UTF-8 binary encoding. The length of a string must be greater than zero and is constrained by the maximum DynamoDB item size limit of 400 KB.
The following additional constraints apply to primary key attributes that are defined as type string:
For a simple primary key, the maximum length of the first attribute value (the partition key) is 2048 bytes.
For a composite primary key, the maximum length of the second attribute value (the sort key) is 1024 bytes.
Unlike traditional RDBMS, DynamoDB does not have a notion of "maximal column size". The only limit is an item size limit, which is, as you've mentioned, 400 KB. That is a total limit, it inludes attribute name lenghts and attribute value lengths. I.e. the attribute names also counts towards the total size limit.
Read more in the docs.