How to get a Firestore document size? - firebase

From Firestore docs, we get that the maximum size for a Firestore document is:
Maximum size for a document 1 MiB (1,048,576 bytes)
QUESTION
How can I know the current size of a single doc, to check if I'm approaching
that 1mb limit?
Example:
var docRef = db.collection("cities").doc("SF");
docRef.get().then(function(doc) {
if (doc.exists) {
console.log("Document data:", doc.data());
// IS THERE A PROPERTY THAT CAN DISPLAY THE DOCUMENT FILE SIZE?
} else {
// doc.data() will be undefined in this case
console.log("No such document!");
}
}).catch(function(error) {
console.log("Error getting document:", error);
});

The calculations used to compute the size of a document is fully documented here. There is a lot of text there, so please navigate there to read it. It's not worthwhile to copy all that text here.
If you're having to manually compute the size of a document as it grows, my opinion is that you're probably not modeling your data scalably. If you have lists of data that can grow unbounded, you probably shouldn't be using a list field, and instead put that data in documents in a new collection or subcollection. There are some exceptions to this rule, but generally speaking, you should not have to worry about computing the size of a document in your client code.

I've published a npm package that calculates the size of a Firestore document.
Other packages like sizeof or object-sizeof that calculate the size of JS object will not give you a precise result because some primitives in Firestore have different byte value. For example boolean in Js is stored in 4 bytes, in a Firestore document it's 1 byte. Null is 0 bytes, in Firestore it's 1 byte.
Additionally to that Firestore has own unique types with fixed byte size: Geo point, Date, Reference.
Reference is a large object. Packages like sizeof will traverse through all the methods/properties of Reference, instead just doing the right thing here. Which is to sum String value of a document name + path to it + 16 bytes. Also, if Reference points to a parent doc , sizeof or object-sizeof will not detect circular reference here which might spell even bigger trouble than incorrect size.

For Android users who want to check the size of a document against the maximum of 1 MiB (1,048,576 bytes) quota, there is a library I have made and that can help you calculate that:
https://github.com/alexmamo/FirestoreDocument-Android/tree/master/firestore-document
In this way, you'll be able to always stay below the limit. The algorithm behind this library is the one that is explained in the official documentation regarding the Storage Size.

I was looking in the Firebase reference expecting the metadata would have an attribute, but it doesn't. You can check it here.
So my next approach would be to figure the weight of the object as an approximation. The sizeOf library seems to have a reasonable API for it.
So it would be something like:
sizeof.sizeof(doc.data());
I wouldn't use the document snapshot, because it contains metadata, like if there are pending saves. On another hand overestimating could be better in some cases.
[UPDATE] Thanks to Doug Stevenson for the wonderful insight
So I was curious how much the difference would actually be, so with my clunky js I made a dirty comparison, you can see the demo here
Considering this object:
{
"boolean": true,
"number": 1,
"text": "example"
}
And discounting the id this is the result:
| Method | Bytes |
|---------|-------|
| FireDoc | 37 |
| sizeOf | 64 |
So sizeOf library could be a good predictor if we want to overestimate (assuming calculations are fine and will behave more or less equal for more complex entities). But as explained in the comment, it is a rough estimation.

For Swift users,
If you want to estimate the document size then I use the following. Returns the estimated size of document in Bytes. It's not 100% accurate but gives a solid estimate. Basically just converts each key, value in the data map to a string and returns total bytes of String + 1. You can see the following link for details on how Firebase determines doc size: https://firebase.google.com/docs/firestore/storage-size.
func getDocumentSize(data: [String : Any]) -> Int{
var size = 0
for (k, v) in data {
size += k.count + 1
if let map = v as? [String : Any]{
size += getDocumentSize(data: map)
} else if let array = v as? [String]{
for a in array {
size += a.count + 1
}
} else if let s = v as? String{
size += s.count + 1
}
}
return size
}

You can use this calculator (code snipped), i write by myself.
source : https://firebase.google.com/docs/firestore/storage-size
<!DOCTYPE html>
<html>
<head>
<title>Calculte Firestore Size</title>
</head>
<body>
<h1>Firestore Document Size Calculator</h1>
<h2 id="response" style="color:red">This is a Heading</h2>
<textarea id="id" style="width: 100%" placeholder="Firestore Doc Ref"></textarea>
<textarea id="json" style="width: 100%; min-height: 200px" placeholder="Firestore Doc Value JSON STRING"></textarea>
<textarea id="quantity" style="width: 100%;" placeholder="How Many repeat this value?"></textarea>
<script>
document.getElementById("json").value='{"type": "Personal","done": false , "priority": 1 , "description": "Learn Cloud Firestore"}'
document.getElementById("id").value = 'users/jeff/tasks/my_task_id'
calculate()
function yuzdeBul(total,number) {
if (number == 0) {
return 0;
}
const sonuc = Math.ceil(parseInt(number) / (parseInt(total) / 100));
return sonuc;
}
function calculate(){
var quantity = parseInt(document.getElementById("quantity").value || 1);
var firestoreId = document.getElementById("id").value;
var refTotal = firestoreId
.split("/")
.map((v) => v.length + 1)
.reduce((a, b) => a + b, 0) + 16;
var idTotal = 0
//console.log(idTotal);
var parseJson = JSON.parse(document.getElementById("json").value);
idTotal += calculateObj(parseJson);
idTotal+=32;
idTotal*=quantity;
idTotal+=refTotal;
document.getElementById("response").innerHTML = idTotal + "/" + 1048576 + " %"+yuzdeBul(1048576,idTotal);
}
function calculateObj(myObj) {
var total = Object.keys(myObj).map((key) => {
var keySize = key.toString().length + 1;
var findType = typeof myObj[key];
//console.log(key,findType)
if (findType == "string") {
keySize += myObj[key].length + 1;
} else if (findType == "boolean") {
keySize += 1;
}
if (findType == "number") {
keySize += 8;
}
if (findType == "object") {
keySize += calculateObj(myObj[key]);
}
return keySize;
});
return total.reduce((a, b) => a + b, 0);
}
document.getElementById("json").addEventListener("change", calculate);
document.getElementById("id").addEventListener("change", calculate);
document.getElementById("quantity").addEventListener("change", calculate);
</script>
</body>
</html>

So I was looking for a way to reduce unnecessary document reads by accumulating data in arrays and go worried about the size.
Turns out I wasn't even close to the limit.
Here's what you can do,
Create a new collection and add a document with the worst-case scenario for live data and using cloud console export that collection, you will see the document size.
Here is a screenshot of my export
Assuming all the documents are equal in size, each is 0.0003MB
You can also see if the documents exceed the 1024byte limit
document exceeding limit from the console
Note: you can only export when you have enabled billing.!

Related

AppSpreadsheet (GAS): avoid some problems with sistematic tested data

In my current job with spreadsheet, all inserted data passes through a test, checking if the same value is found on the same index in other sheets. Failing, a caution message is put in the current cell.
//mimimalist algorithm
function safeInsertion(data, row_, col_)
{
let rrow = row_ - 1; //range row
let rcol = col_ - 1; // range col
const active_sheet_name = getActiveSheetName(); // do as the its name suggest
const all_sheets = SpreadsheetApp.getActiveSpreadsheet().getSheets();
//test to evaluate the value to be inserted in the sheet
for (let sh of all_sheets)
{
if (sh.getName() === active_sheet_name)
continue;
//getSheetValues do as its name suggest.
if( getSheetValues(sh)[rrow][rcol] === data )
return "prohibited insertion"
}
return data;
}
// usage (in cell): =safeInsertion("A scarce data", ROW(), COLUMN())
The problems are:
cached values confuse me sometimes. The script or data is changed but not perceived by the sheet itself until renewing manually the cell's content or refreshing all table. Is there any relevant configuration available to this issue?
Sometimes, at loading, a messing result appears. Almost all data are prohibited, for example (originally, all was fine!).
What can I do to obtain a stable sheet using this approach?
PS: The original function does more testing on each data insertion. Those tests consist on counting the frequency in the actual sheet and in all sheets.
EDIT:
In fact, I can't create a stable sheet. For test, a let you a copy of my code with minimal adaptations.
function safelyPut(data, max_onesheet, max_allsheet, row, col)
{
// general initialization
const data_regex = "\^\s*"+data+"\s*$"
const spreadsheet = SpreadsheetApp.getActiveSpreadsheet();
const activesheet = spreadsheet.getActiveSheet();
const active_text_finder = activesheet.createTextFinder(data_regex)
.useRegularExpression(true)
.matchEntireCell(true);
const all_text_finder = spreadsheet.createTextFinder(data_regex)
.useRegularExpression(true)
.matchEntireCell(true);
const all_occurrences = all_text_finder.findAll();
//test general data's environment
const active_freq = active_text_finder.findAll().length;
if (max_onesheet <= active_freq)
return "Too much in a sheet";
const all_freq = all_occurrences.length;
if (max_allsheet <= all_freq)
return "Too much in the work";
//test unicity in a position
const active_sname = activesheet.getName();
for (occurrence of all_occurrences)
{
const sname = occurrence.getSheet().getName();
//if (SYSTEM_SHEETS.includes(sname))
//continue;
if (sname != active_sname)
if (occurrence.getRow() == row && occurrence.getColumn() == col)
if (occurrence.getValue() == data)
{
return `${sname} contains same data with the same indexes.`;
};
}
return data;
}
Create two or three cells and put randomly in a short range short range a value following the usage
=safeInsertion("Scarce Data", 3; 5; ROW(), COLUMN())
Do it, probably you will get a unstable sheet.
About cached values confuse me sometimes. The script is changed but not perceived by the sheet until renewing manually the cell's content or refreshing all table. No relevant configuration available to this issue?, when you want to refresh your custom function of safeInsertion, I thought that this thread might be useful.
About Sometimes, at loading, a messing result appears. Almost all data are prohibited, for example (originally, all was fine!). and What can I do to obtain a stable sheet using this approach?, in this case, for example, how about reducing the process cost of your script? I thought that by reducing the process cost of the script, your situation might be a bit stable.
When your script is modified by reducing the process cost, how about the following modification?
Modified script:
function safeInsertion(data, row_, col_) {
const ss = SpreadsheetApp.getActiveSpreadsheet();
const range = ss.createTextFinder(data).matchEntireCell(true).findNext();
return range && range.getRow() == row_ && range.getColumn() == col_ && range.getSheet().getSheetName() != ss.getActiveSheet().getSheetName() ? "prohibited insertion" : data;
}
The usage of this is the same with your current script like =safeInsertion("A scarce data", ROW(), COLUMN()).
In this modification, TextFinder is used. Because I thought that when the value is searched from all sheets in a Google Spreadsheet, TextFinder is suitable for reducing the process cost.
References:
createTextFinder(findText) of Class Spreadsheet
findNext()

Is there a better way to write this firebase cloud function than what I have right now?

I have been trying to learn firebase cloud functions recently and I have wrote an http that takes the itemName, sellerUid, and quantity. Then I have a background trigger (an onWrite) that finds the Item Price with the provided sellerUid and itemName and computes the total (Item Price * Quantity) and then writes it into a document in firestore.
My question is:
with what I have right now, suppose my client purchases N items, this means that I will have:
N reads (from the N items' price searching),
2 writes (one initial write for the N items and 1 for the Total Amount after computation),
N number of searches from cloud function??
I am not exactly sure how cloud functions count towards read and writes as well as the amount of compute time it needs (though it's all just text though so should be negligible?)
Would love to hear your thoughts on if what I have is already good enough or is there a much more efficient way of going about this.
Thanks!
exports.itemAdded = functions.firestore.document('CurrentOrders/{documentId}').onWrite(async (change, context) => {
const snapshot = change.after.data();
var total = 0;
for (const [key, value] of Object.entries(snapshot)) {
if (value['Item Name'] != undefined) {
await admin.firestore().collection('Items')
.doc(key).get().then((dataValue) => {
const itemData = dataValue.data();
if (!dataValue.exists) {
console.log('This is empty');
} else {
total += (parseFloat(value['Item Quantity']) * parseFloat(itemData[value['Item Name']]['Item Price']));
}
});
console.log('This is in total: ', total);
}
}
snapshot['Total'] = total;
console.log('This is snapshot afterwards: ', snapshot);
return change.after.ref.set(snapshot);
});
With your current approach you will be billed with:
N reads (from the N items' price searching);
1 write that triggers your onWrite function;
1 write that persists the total value;
One better approach that I can think of is one of comparing the size of the list of values in change.before.data() and change.after.data(), and reading the current total value (0 if this is the first time) and afterwards add only the values that were added in change.after.data() instead of N values, which would potentially result in you being charged for less reads.
For the actual pricing, if you check this Documentation for Cloud Functions, you will see that on your case only invocation and compute billing applies to your case, however there is a free tier for both, so if you are using this only to learn and this app does not have a lot of use, you should be on the free tier with either approach.
Let me know if you need any more information.

How to write 5000 records into DynamoDB Table?

I have a use case where I have to write 5000 records into dynamoDB table in one shot. I am using batchSave api of DynamoDBMapper Library. it can write upto 25 records in one go.
can I pass the list of 5000 records to it and it will internally convert them into batch of 25 records and write to dynamodb table or I will have to handle this thing in my code using conditional some logic and will pass only 25 records to batchSave?
According to the batchSave documentation, batchSave():
Saves the objects given using one or more calls to the AmazonDynamoDB.batchWriteItem
Indeed, it splits up the items you give it into appropriately-sized batches (25 items) and writes them using the DynamoDB BatchWriteItem operation.
You can see the code that does this in batchWrite() in DynamoDBMapper.java:
/** The max number of items allowed in a BatchWrite request */
static final int MAX_ITEMS_PER_BATCH = 25;
// Break into chunks of 25 items and make service requests to DynamoDB
for (final StringListMap<WriteRequest> batch :
requestItems.subMaps(MAX_ITEMS_PER_BATCH, true)) {
List<FailedBatch> failedBatches = writeOneBatch(batch, config.getBatchWriteRetryStrategy());
...
Here are the methods I use in order to achieve this end. I manage to do it, by first chucking the dataArray into small arrays (of length 25):
const queryChunk = (arr, size) => {
const tempArr = []
for (let i = 0, len = arr.length; i < len; i += size) {
tempArr.push(arr.slice(i, i + size));
}
return tempArr
}
const batchWriteManyItems = async (tableName, itemObjs, chunkSize = 25) => {
return await Promise.all(queryChunk(itemObjs, chunkSize).map(async chunk => {
await dynamoDB.batchWriteItem({RequestItems: {[tableName]: chunk}}).promise()
}))
}

DocumentDB Change Feed and saving Checkpoint

After reading the documentation, I'm having a hard time conceptualizing the change feed. Let's take the code from the documentation below. The second change feed is picking up the changes from the last time it was run via the checkpoints. Let's say it is being used to create summary data and there was an issue and it needed to be re-run from a prior time. I don't understand the following:
How to specify a particular time the checkpoint should start. I understand I can save the checkpoint dictionary and use that for each run, but how do you get the changes from X time to maybe rerun some summary data
Secondly, let's say we are rerunning some summary data and we save the last checkpoint used for each summarized data so we know where that one left off. How does one know that a record is in or before that checkpoint?
Code that runs from collection beginning and then from last checkpoint:
Dictionary < string, string > checkpoints = await GetChanges(client, collection, new Dictionary < string, string > ());
await client.CreateDocumentAsync(collection, new DeviceReading {
DeviceId = "xsensr-201", MetricType = "Temperature", Unit = "Celsius", MetricValue = 1000
});
await client.CreateDocumentAsync(collection, new DeviceReading {
DeviceId = "xsensr-212", MetricType = "Pressure", Unit = "psi", MetricValue = 1000
});
// Returns only the two documents created above.
checkpoints = await GetChanges(client, collection, checkpoints);
//
private async Task < Dictionary < string, string >> GetChanges(
DocumentClient client,
string collection,
Dictionary < string, string > checkpoints) {
List < PartitionKeyRange > partitionKeyRanges = new List < PartitionKeyRange > ();
FeedResponse < PartitionKeyRange > pkRangesResponse;
do {
pkRangesResponse = await client.ReadPartitionKeyRangeFeedAsync(collection);
partitionKeyRanges.AddRange(pkRangesResponse);
}
while (pkRangesResponse.ResponseContinuation != null);
foreach(PartitionKeyRange pkRange in partitionKeyRanges) {
string continuation = null;
checkpoints.TryGetValue(pkRange.Id, out continuation);
IDocumentQuery < Document > query = client.CreateDocumentChangeFeedQuery(
collection,
new ChangeFeedOptions {
PartitionKeyRangeId = pkRange.Id,
StartFromBeginning = true,
RequestContinuation = continuation,
MaxItemCount = 1
});
while (query.HasMoreResults) {
FeedResponse < DeviceReading > readChangesResponse = query.ExecuteNextAsync < DeviceReading > ().Result;
foreach(DeviceReading changedDocument in readChangesResponse) {
Console.WriteLine(changedDocument.Id);
}
checkpoints[pkRange.Id] = readChangesResponse.ResponseContinuation;
}
}
return checkpoints;
}
DocumentDB supports check-pointing only by the logical timestamp returned by the server. If you would like to retrieve all changes from X minutes ago, you would have to "remember" the logical timestamp corresponding to the clock time (ETag returned for the collection in the REST API, ResponseContinuation in the SDK), then use that to retrieve changes.
Change feed uses logical time in place of clock time because it can be different across various servers/partitions. If you would like to see change feed support based on clock time (with some caveats on skew), please propose/upvote at https://feedback.azure.com/forums/263030-documentdb/.
To save the last checkpoint per partition key/document, you can just save the corresponding version of the batch in which it was last seen (ETag returned for the collection in the REST API, ResponseContinuation in the SDK), like Fred suggested in his answer.
How to specify a particular time the checkpoint should start.
You could try to provide a logical version/ETag (such as 95488) instead of providing a null value as RequestContinuation property of ChangeFeedOptions.

SQLite storage API Insert statement freezes entire firefox in bootstrapped(Restartless) AddOn

Data to be inserted has just two TEXT columns whose individual length don't even exceed 256.
I initially used executeSimpleSQL since I didn't need to get any results.
It worked for simulataneous inserts of upto 20K smoothly i.e. in the bakground no lag or freezing observed.
However, with 0.1 million I could see horrible freezing during insertion.
So, I tried these two,
Insert in chunks of 500 records - This didn't work well since even for 20K records it showed visible freezing. I didn't even try with 0.1million.
So, I decided to go async and used executeAsync alongwith Bind etc. This also shows visible freezing for just 20K records. This was the whole array being inserted and not in chunks.
var dirs = Cc["#mozilla.org/file/directory_service;1"].
getService(Ci.nsIProperties);
var dbFile = dirs.get("ProfD", Ci.nsIFile);
var dbService = Cc["#mozilla.org/storage/service;1"].
getService(Ci.mozIStorageService);
dbFile.append('mydatabase.sqlite');
var connectDB = dbService.openDatabase(dbFile);
let insertStatement = connectDB.createStatement('INSERT INTO my_table
(my_col_a,my_col_b) VALUES
(:myColumnA,:myColumnB)');
var arraybind = insertStatement.newBindingParamsArray();
for (let i = 0; i < my_data_array.length; i++) {
let params = arraybind.newBindingParams();
// Individual elements of array have csv
my_data_arrayTC = my_data_array[i].split(',');
params.bindByName("myColumnA", my_data_arrayTC[0]);
params.bindByName("myColumnA", my_data_arrayTC[1]);
arraybind.addParams(params);
}
insertStatement.bindParameters(arraybind);
insertStatement.executeAsync({
handleResult: function(aResult) {
console.log('Results are out');
},
handleError: function(aError) {
console.log("Error: " + aError.message);
},
handleCompletion: function(aReason) {
if (aReason != Components.interfaces.mozIStorageStatementCallback.REASON_FINISHED)
console.log("Query canceled or aborted!");
console.log('We are done inserting');
}
});
connectDB.asyncClose(function() {
console.log('[INFO][Write Database] Async - plus domain data');
});
Also, I seem to get the async callbacks after a long time. Usually, executeSimpleSQL is way faster than this.If I use SQLite Manager Tool extension to open the DB immediately this is what I get ( as expected )
SQLiteManager: Error in opening file mydatabase.sqlite - either the file is encrypted or corrupt
Exception Name: NS_ERROR_STORAGE_BUSY
Exception Message: Component returned failure code: 0x80630001 (NS_ERROR_STORAGE_BUSY) [mozIStorageService.openUnsharedDatabase]
My primary objective was to dump data as big as 0.1 million + and then later on perform reads when needed.

Resources