Store large Json in Cosmos DB - azure-cosmosdb

I have a large Json file say 100 MB - 300 MB , but what I understand is Cosmos DB only supports 2MB item size, if that is the case what is the alternative . How can I save my JSON and query over it. Basically my Json is unstructured and Cosmos would be perfect choice for me . But due to size limitation I am unable to proceed.

If you wanted to use Cosmos DB, you'd need to transform the items into a smaller representation with more reasonably sized items under the 2 MB limit.
If that's not possible, consider using something like Azure Cognitive Search to index your JSON files from blob storage.

Related

what is the maximum size of a document in Realtime database in firestore

I am working on a more complicated database where I a want to store lots of data, the issue with fire store is the limit to 1MB per documents, and I am splitting my data in to different document but still according to my calculation the size will be bigger than the limit, yet I cannot find the limit for the Realtime database, and I want to be sure before switching to it, my single document in some cases could hit 6-9mb when scaling big.... at first I want to go with mongodb but I wanted to try the google cloud services.. any idea if the do size is same for both Realtime and firestore ?
Documents are part of Firestore (that have 1 MB max size limit each) while Realtime Database on the other hand is just a large JSON like thing. You can find limits of Realtime database in the documentation.
Property
Limit
Description
Maximum depth of child nodes
32
Each path in your data tree must be less than 32 levels deep.
Length of a key
768 Bytes
Keys are UTF-8 encoded and can't contain new lines or any of the following characters: . $ # [ ] / or any ASCII control characters (0x00 - 0x1F and 0x7F)
Maximum size of a string
10 MB
Data is UTF-8 encoded
There isn't a limit of number of child nodes you can have but just keep the max depth in mind. Also it might be best if you could share a sample of what currently takes over 6 MB in Firestore and maybe restructure the database.

Huge amount of RU to write document of 400kb - 600kb on Azure cosmos db

This is the log of my azure cosmos db for last write operations:
Is it possible that write operations of documents with size between 400kb to 600kb have this costs?
Here my document (a list of coordinate):
Basically I thought at the beginning it was a hotPartition problem, but afterwards I understood (I hope) that it is a problem in the loading of documents ranging in size from 400kb to 600kb. I wanted to understand if there was something wrong in the database setting, in the indexing policy or other as it seems to me anomalous that about 3000 ru are used to load a json of 400kb, when in the documentation it is indicated that to load a file of equal size at 100kb it takes about 50ru. Basically the document to be loaded is a road route and therefore I would not know in what other way to model it.
This is my indexing policy:
Thanks to everybody. I spent months behind this problem without having solutions...
It's hard to know for sure what the expected RU/s cost should be to ingest a 400KB-600KB item. The cost of this operation will depend on the size of the item, your indexing policy and the structure of the item itself. Greater hierarchy depth is more expensive to index.
You can get a good estimate for what the cost for a single write for an item will be using the Cosmos Capacity Calculator. In the calculator, click Sign-In, cut/paste your index policy, upload a sample document, reduce the writes per second to 1, then click calculate. This should give you the cost to insert a single item.
One thing to note here, is if you have frequent updates to a small number of properties I would recommend you split the documents into two. One with static properties, and another that is frequently updated. This can drastically reduce the cost for updates on large documents.
Hope this is helpful.
You can also pull the RU cost for a write using the SDK.
Check storage consumed
To check the storage consumption of an Azure Cosmos container, you can run a HEAD or GET request on the container, and inspect the x-ms-request-quota and the x-ms-request-usage headers. Alternatively, when working with the .NET SDK, you can use the DocumentSizeQuota, and DocumentSizeUsage properties to get the storage consumed.
Link.

Does CosmosDB's Mongo API support the full BSON spec?

Particularly, Binary data? CosmosDB's Core SQL/Document API supports JSON, which does not allow binary data on the wire easily.
As i read, Support is Minimal. From the docs,
Some of Azure Cosmos DB's internal formats for encoding information,
such as binary fields, are currently not as efficient as one might
like. Therefore this can cause unexpected limitations on data size.
For example, currently one couldn't use the full one Meg of a table
entity to store binary data because the encoding increases the data's
size

Should I upload images to Cloud Firestore or Firebase Storage?

Might be silly question but need to clear for beginner as like me.
I know how to upload image on Firestore database but I thought instead of use FirebaseStorage why we are not using base64 string for upload image, it might be so easy? Or there is some limitation on string length?
~ PS : I haven't tried with base64 string to upload image on Firestore.
The Cloud Firestore "Usage and limits" documentation states that the "Maximum size for a document" is 1 MiB (1,048,576 bytes).
This is a small amount of data for images and will only allow to store low resolution images in single documents. You could store images across multiple documents as a workaround, however, that is unnecessarily complicated and Firebase Storage solves this issue perfectly.
Additionally, you will be downloading the whole image (whole document) when listening to any change in a document. If you have a document that has an image (base64 string) and a likes (int count) field and you try to listen for real time updates of the likes, you will be downloading the full image every time client side.
Firebase Storage is therefore most likely what you are searching for as you can store objects of any size (as large as your capacity), it provides you with download URL's, and you do not have to handle any encoding or decoding (although Cloud Firestore allows blob's, i.e. byte data, as well).
There are no benefits for storing files (as binary data) on Firestore, over Firebase Storage.
Firestore is billed mostly for reads writes and deletes, with a monthly free storage of only 1 GB !!! This can be a serious pitfall if not done right. Read How we spent 30k USD in Firebase in less than 72 hours.
Firebase Storage is by far the best option to store files, with nearly no file size limitation and with much cheaper prices, then simply save the files url's to Firestore and you are done.
From the technical perspective, yes you can always store (base64) string and use it later to retrieve the image. However, base64 string is up to 30% higher than the bytes data. So, you'll loose the performance as well as storage if you use base64.
Secondly, using FirebaseStorage has another advantage. It is to upload the image when the mobile app is in the background. We, as a developer, don't have to write any specific code for this.
Limits of Cloud Firestore and Firebase Storage were already pointed out by #creativecreatorormaybenot.
My conclusion: Prefer Firebase Storage over base64.

Using Azure Search index to index blobs in Azure Blob Storage (Images and Videos)

I want to index blob of type image and video.
From what I have read Azure Search cannot index image and video types.
What I have done is that I was thinking of using the blob's metadata_storage_path. However that is my key and it is encoded.
Decoding it is really a performance killer.
Is there any way I can index images and videos, using azure search index?
If not, is there any other way?
IIUC, you want to index the metadata attached to the blob but not its content, correct? If so, set dataToExtract parameter to storageMetadata as described in Controlling which parts of the blob are indexed.
The cost of base64-decoding the encoded metadata_storage_path to correlate with the rest of your system is likely to be negligible compared to other work your app is doing, such as calls to the database or Azure Search. However, you can avoid the need for decoding if you fork metadata_storage_path into a new non-key field in your index, which won't need to be encoded. You can use field mappings to fork the field.

Resources