What is the size limit of a cosmosdb item? - azure-cosmosdb

I've been looking for a authoritative source of azure cosmosdb limits by I can't find one. In particular, I need to know the size limits for a individual item.

The maximum size of a document today is 2MB.
https://learn.microsoft.com/en-us/azure/cosmos-db/documentdb-resources#documents

So this is one of those things that always annoys me about documentation.
Sure it's 2MB, but by who's measuring stick.
TLDR: Between 2,090,014 and 2,100,014 when
Encoding.UTF8.GetByteCount(doc) or
Encoding.ASCII.GetByteCount(doc)
To get there I set up the following code:
for (int i = 10; i < 10000; i++)
{
var docItem = new TestItem(new string('A', i * 10000));
string doc = JsonConvert.SerializeObject(docItem);
log.LogInformation(" ");
log.LogInformation(" -------------------------------------------------------------------------------------------------------------------------------------");
log.LogInformation($" ------------------------------------------------- Doc Size = {i*10000 } --------------------------------------------------");
log.LogInformation(" -------------------------------------------------------------------------------------------------------------------------------------");
log.LogWarning($"UTF7 - {Encoding.UTF7.GetByteCount(doc)}");
log.LogWarning($"UTF8 - {Encoding.UTF8.GetByteCount(doc)}");
log.LogWarning($"UTF32 - {Encoding.UTF32.GetByteCount(doc)}");
log.LogWarning($"Unicode - {Encoding.Unicode.GetByteCount(doc)}");
log.LogWarning($"Ascii - {Encoding.ASCII.GetByteCount(doc)}");
log.LogInformation(" -------------------------------------------------------------------------------------------------------------------------------------");
log.LogWarning($"UTF7 - {ASCIIEncoding.UTF7.GetByteCount(doc)}");
log.LogWarning($"UTF8 - {ASCIIEncoding.UTF8.GetByteCount(doc)}");
log.LogWarning($"UTF32 - {ASCIIEncoding.UTF32.GetByteCount(doc)}");
log.LogWarning($"Unicode - {ASCIIEncoding.Unicode.GetByteCount(doc)}");
log.LogWarning($"Ascii - {ASCIIEncoding.ASCII.GetByteCount(doc)}");
try
{
await cosmosStore.CreateDocumentAsync(docItem);
}
catch (Exception e)
{
log.LogWarning(e.Message + "Caught");
}
}
And here's where it broke:

Update: increasing the max size to 16 MB is now possible.
https://devblogs.microsoft.com/cosmosdb/larger-document-sizes-unique-index-improvements-expr-support-in-azure-cosmos-db-api-for-mongodb/

The max allowable document size is 2 MB. This is fixed for Azure Cosmos DB for NOSQL API account
If it Exceeds you face 413 error --> 413 Entity too large  The document size in the request exceeded the allowable document size for a request.
If your Environment is already in Production
we still suggest reducing the document size as a solution.
You can Reduce the document size/ Re model your data
More Info:
Azure Cosmos DB service quotas | Microsoft Learn
https://learn.microsoft.com/en-us/azure/cosmos-db/concepts-limits#per-item-limits
Additional Information:
For Azure Cosmos DB for API for MongoDB
If your Azure Cosmos DB account is Mongo DB API , the limit is 2 MB but
There is a preview feature and please be noted that this is not recommended for prod environment:
There is a preview feature to set 16MB limit per document in API for MongoDB.
Ref:
https://azure.microsoft.com/en-us/updates/public-preview-16mb-limit-per-document-in-api-for-mongodb/
https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/feature-support-42#data-types
However, it’s in preview and not recommended for Production environments. There’s no ETA when it will be GA.

Related

firebase real time database takes a long time to load data

Firebase realtime database is taking a long time to load data. Here's a screenshot of the data that I have in the database. What can I do to optimize the loading? Also are there other places that I can store the data other than firebase? The data is 3.8MB in size, and has the following structure
{"10-happier": {body: "test"}, "zero-to-one": {body: "test2"}}
Here's my code
var defer = Q.defer();
app.database().ref('content').once('value').then(snapshot => {
if (snapshot && snapshot.val()) {
defer.resolve(snapshot.val());
} else {
defer.resolve({});
}
}).catch( error => {
defer.reject(error);
});
return defer.promise;
There's nothing you can do to optimize a query like this. When you fetch an entire node:
app.database().ref('content').once('value')
The SDK will download everything, and it will take as long as it takes to get the whole thing. The total performance is going to be determined by the speed of the client's connection to the server. If you want the query to be faster, your only viable option is to get a faster connection to Realtime Database.
Alternatively, you can bypass the use of a database altogether and use a different method of storage that involves some form of compression or CDN to deliver the content more efficiently for the end user. Since recommendations for software and services are off-topic for Stack Overflow, you will have to do some research to figure out what your options are and what will work best for your specific situation.

Blazor preview 9/mono-wasm memory access out of bounds: max string size for DotNet.invokeMethod?

Since dotnet core 3 preview 9, I am facing an issue invoking a dotnet method passing a large string from JavaScript.
Code is worth more than a thousand words, so the snippet below reproduces the issue. It works when length = 1 * mb but fails when length = 2 * mb.
#page "/repro"
<button onclick="const mb = 1024 * 1024; const length = 2 * mb;console.log(`Attempting length ${length}`); DotNet.invokeMethod('#GetType().Assembly.GetName().Name', 'ProcessString', 'a'.repeat(length));">Click Me</button>
#functions {
[JSInvokable] public static void ProcessString(string stringFromJavaScript) { }
}
The error message is:
Uncaught RuntimeError: memory access out of bounds
at wasm-function[2639]:18
at wasm-function[6239]:10
at Module._mono_wasm_string_from_js (http://localhost:52349/_framework/wasm/mono.js:1:202444)
at ccall (http://localhost:52349/_framework/wasm/mono.js:1:7888)
at http://localhost:52349/_framework/wasm/mono.js:1:8238
at Object.toDotNetString (http://localhost:52349/_framework/blazor.webassembly.js:1:39050)
at Object.invokeDotNetFromJS (http://localhost:52349/_framework/blazor.webassembly.js:1:37750)
at u (http://localhost:52349/_framework/blazor.webassembly.js:1:5228)
at Object.e.invokeMethod (http://localhost:52349/_framework/blazor.webassembly.js:1:6578)
at HTMLButtonElement.onclick (<anonymous>:2:98)
I need to process large strings, which represent the content of a file.
Is there a way to increase this limit?
Apart from breaking down the string into multiple segments and performing multiples calls, is there any other way to process a large string?
Is there any other approach for processing large files?
This used to work in preview 8.
Is there a way to increase this limit?
No (unless you modify and recompile blazor and mono/wasm that is).
Apart from breaking down the string into multiple segments and performing multiples calls, is there any other way to process a large string?
Yes, as you are on the client side, you can use the shared memory techniques. You basically map a .net byte[] to an ArrayBuffer. See this (disclaimer: My library) or this library for reference on how to do it. These examples are using the binary content of actual javascript Files but it's applicable to strings as well. There is no reference documentation on these API's yet. Mostly just examples and the blazor source code.
Is there any other approach for processing large files?
See 2)
I recreated your issue in a netcore 3.2 Blazor app (somewhere between 1 and 2 Mb of data kills it just as you described). I updated the application to netcore 5.0 and the problem is fixed (it was still working when I threw 50Mb at it).

seemingly simple Firestore query is very slow [duplicate]

I'm having slow performance issues with Firestore while retrieving basic data stored in a document compared to the realtime database with 1/10 ratio.
Using Firestore, it takes an average of 3000 ms on the first call
this.db.collection(‘testCol’)
.doc(‘testDoc’)
.valueChanges().forEach((data) => {
console.log(data);//3000 ms later
});
Using the realtime database, it takes an average of 300 ms on the first call
this.db.database.ref(‘/test’).once(‘value’).then(data => {
console.log(data); //300ms later
});
This is a screenshot of the network console :
I'm running the Javascript SDK v4.50 with AngularFire2 v5.0 rc.2.
Did anyone experience this issue ?
UPDATE: 12th Feb 2018 - iOS Firestore SDK v0.10.0
Similar to some other commenters, I've also noticed a slower response on the first get request (with subsequent requests taking ~100ms). For me it's not as bad as 30s, but maybe around 2-3s when I have good connectivity, which is enough to provide a bad user experience when my app starts up.
Firebase have advised that they're aware of this "cold start" issue and they're working on a long term fix for it - no ETA unfortunately. I think it's a separate issue that when I have poor connectivity, it can take ages (over 30s) before get requests decide to read from cache.
Whilst Firebase fix all these issues, I've started using the new disableNetwork() and enableNetwork() methods (available in Firestore v0.10.0) to manually control the online/offline state of Firebase. Though I've had to be very careful where I use it in my code, as there's a Firestore bug that can cause a crash under certain scenarios.
UPDATE: 15th Nov 2017 - iOS Firestore SDK v0.9.2
It seems the slow performance issue has now been fixed. I've re-run the tests described below and the time it takes for Firestore to return the 100 documents now seems to be consistently around 100ms.
Not sure if this was a fix in the latest SDK v0.9.2 or if it was a backend fix (or both), but I suggest everyone updates their Firebase pods. My app is noticeably more responsive - similar to the way it was on the Realtime DB.
I've also discovered Firestore to be much slower than Realtime DB, especially when reading from lots of documents.
Updated tests (with latest iOS Firestore SDK v0.9.0):
I set up a test project in iOS Swift using both RTDB and Firestore and ran 100 sequential read operations on each. For the RTDB, I tested the observeSingleEvent and observe methods on each of the 100 top level nodes. For Firestore, I used the getDocument and addSnapshotListener methods at each of the 100 documents in the TestCol collection. I ran the tests with disk persistence on and off. Please refer to the attached image, which shows the data structure for each database.
I ran the test 10 times for each database on the same device and a stable wifi network. Existing observers and listeners were destroyed before each new run.
Realtime DB observeSingleEvent method:
func rtdbObserveSingle() {
let start = UInt64(floor(Date().timeIntervalSince1970 * 1000))
print("Started reading from RTDB at: \(start)")
for i in 1...100 {
Database.database().reference().child(String(i)).observeSingleEvent(of: .value) { snapshot in
let time = UInt64(floor(Date().timeIntervalSince1970 * 1000))
let data = snapshot.value as? [String: String] ?? [:]
print("Data: \(data). Returned at: \(time)")
}
}
}
Realtime DB observe method:
func rtdbObserve() {
let start = UInt64(floor(Date().timeIntervalSince1970 * 1000))
print("Started reading from RTDB at: \(start)")
for i in 1...100 {
Database.database().reference().child(String(i)).observe(.value) { snapshot in
let time = UInt64(floor(Date().timeIntervalSince1970 * 1000))
let data = snapshot.value as? [String: String] ?? [:]
print("Data: \(data). Returned at: \(time)")
}
}
}
Firestore getDocument method:
func fsGetDocument() {
let start = UInt64(floor(Date().timeIntervalSince1970 * 1000))
print("Started reading from FS at: \(start)")
for i in 1...100 {
Firestore.firestore().collection("TestCol").document(String(i)).getDocument() { document, error in
let time = UInt64(floor(Date().timeIntervalSince1970 * 1000))
guard let document = document, document.exists && error == nil else {
print("Error: \(error?.localizedDescription ?? "nil"). Returned at: \(time)")
return
}
let data = document.data() as? [String: String] ?? [:]
print("Data: \(data). Returned at: \(time)")
}
}
}
Firestore addSnapshotListener method:
func fsAddSnapshotListener() {
let start = UInt64(floor(Date().timeIntervalSince1970 * 1000))
print("Started reading from FS at: \(start)")
for i in 1...100 {
Firestore.firestore().collection("TestCol").document(String(i)).addSnapshotListener() { document, error in
let time = UInt64(floor(Date().timeIntervalSince1970 * 1000))
guard let document = document, document.exists && error == nil else {
print("Error: \(error?.localizedDescription ?? "nil"). Returned at: \(time)")
return
}
let data = document.data() as? [String: String] ?? [:]
print("Data: \(data). Returned at: \(time)")
}
}
}
Each method essentially prints the unix timestamp in milliseconds when the method starts executing and then prints another unix timestamp when each read operation returns. I took the difference between the initial timestamp and the last timestamp to return.
RESULTS - Disk persistence disabled:
RESULTS - Disk persistence enabled:
Data Structure:
When the Firestore getDocument / addSnapshotListener methods get stuck, it seems to get stuck for durations that are roughly multiples of 30 seconds. Perhaps this could help the Firebase team isolate where in the SDK it's getting stuck?
Update Date March 02, 2018
It looks like this is a known issue and the engineers at Firestore are working on a fix. After a few email exchanges and code sharing with a Firestore engineer on this issue, this was his response as of today.
"You are actually correct. Upon further checking, this slowness on getDocuments() API is a known behavior in Cloud Firestore beta. Our engineers are aware of this performance issue tagged as "cold starts", but don't worry as we are doing our best to improve Firestore query performance.
We are already working on a long-term fix but I can't share any timelines or specifics at the moment. While Firestore is still on beta, expect that there will be more improvements to come."
So hopefully this will get knocked out soon.
Using Swift / iOS
After dealing with this for about 3 days it seems the issue is definitely the get() ie .getDocuments and .getDocument. Things I thought were causing the extreme yet intermittent delays but don't appear to be the case:
Not so great network connectivity
Repeated calls via looping over .getDocument()
Chaining get() calls
Firestore Cold starting
Fetching multiple documents (Fetching 1 small doc caused 20sec delays)
Caching (I disabled offline persistence but this did nothing.)
I was able to rule all of these out as I noticed this issue didn't happen with every Firestore database call I was making. Only retrievals using get(). For kicks I replaced .getDocument with .addSnapshotListener to retrieve my data and voila. Instant retrieval each time including the first call. No cold starts. So far no issues with the .addSnapshotListener, only getDocument(s).
For now, I'm simply dropping the .getDocument() where time is of the essence and replacing it with .addSnapshotListener then using
for document in querySnapshot!.documents{
// do some magical unicorn stuff here with my document.data()
}
... in order to keep moving until this gets worked out by Firestore.
Almost 3 years later, firestore being well out of beta and I can confirm that this horrible problem still persists ;-(
On our mobile app we use the javascript / node.js firebase client. After a lot of testing to find out why our app's startup time is around 10sec we identified what to attribute 70% of that time to... Well, to firebase's and firestore's performance and cold start issues:
firebase.auth().onAuthStateChanged() fires approx. after 1.5 - 2sec, already quite bad.
If it returns a user, we use its ID to get the user document from firestore. This is the first call to firestore and the corresponding get() takes 4 - 5sec. Subsequent get() of the same or other documents take approx. 500ms.
So in total the user initialization takes 6 - 7 sec, completely unacceptable. And we can't do anything about it. We can't test disabling persistence, since in the javascript client there's no such option, persistence is always enabled by default, so not calling enablePersistence() won't change anything.
I had this issue until this morning. My Firestore query via iOS/Swift would take around 20 seconds to complete a simple, fully indexed query - with non-proportional query times for 1 item returned - all the way up to 3,000.
My solution was to disable offline data persistence. In my case, it didn't suit the needs of our Firestore database - which has large portions of its data updated every day.
iOS & Android users have this option enabled by default, whilst web users have it disabled by default. It makes Firestore seem insanely slow if you're querying a huge collection of documents. Basically it caches a copy of whichever data you're querying (and whichever collection you're querying - I believe it caches all documents within) which can lead to high Memory usage.
In my case, it caused a huge wait for every query until the device had cached the data required - hence the non-proportional query times for the increasing numbers of items to return from the exact same collection. This is because it took the same amount of time to cache the collection in each query.
Offline Data - from the Cloud Firestore Docs
I performed some benchmarking to display this effect (with offline persistence enabled) from the same queried collection, but with different amounts of items returned using the .limit parameter:
Now at 100 items returned (with offline persistence disabled), my query takes less than 1 second to complete.
My Firestore query code is below:
let db = Firestore.firestore()
self.date = Date()
let ref = db.collection("collection").whereField("Int", isEqualTo: SomeInt).order(by: "AnotherInt", descending: true).limit(to: 100)
ref.getDocuments() { (querySnapshot, err) in
if let err = err {
print("Error getting documents: \(err)")
} else {
for document in querySnapshot!.documents {
let data = document.data()
//Do things
}
print("QUERY DONE")
let currentTime = Date()
let components = Calendar.current.dateComponents([.second], from: self.date, to: currentTime)
let seconds = components.second!
print("Elapsed time for Firestore query -> \(seconds)s")
// Benchmark result
}
}
well, from what I'm currently doing and research by using nexus 5X in emulator and real android phone Huawei P8,
Firestore and Cloud Storage are both give me a headache of slow response
when I do first document.get() and first storage.getDownloadUrl()
It give me more than 60 seconds response on each request. The slow response only happen in real android phone. Not in emulator. Another strange thing.
After the first encounter, the rest request is smooth.
Here is the simple code where I meet the slow response.
var dbuserref = dbFireStore.collection('user').where('email','==',email);
const querySnapshot = await dbuserref.get();
var url = await defaultStorage.ref(document.data().image_path).getDownloadURL();
I also found link that is researching the same.
https://reformatcode.com/code/android/firestore-document-get-performance

MeteorJS: Store/Call MongoDB Document Size (bsonsize)

In my MeteorJS app, documents grow very rapidly. When documents reach the 16MB, standard document size limit in MongoDB, my application starts erroring, notifying me that the document size is too large to perform any more updates to the document:
exception: BSONObj size: 16895320 (0xEEEEEEEE) is invalid. Size must be between 0 and 16793600(16MB)
To prevent my application from reaching this state of error, I want to be able to lookup the size (bsonsize) of a the document ahead of time. If the bsonsize is above, say 15.8MB, create a new document and start logging data there. This way, no errors will be encountered.
Using the Mongo Shell, bsonsize can be determined via: Object.bsonsize(db.collection.findOne({_id:'document id here'})). But Object.bsonsize() does not appear to be supported by Meteor within javascript:
TypeError: Object function Object() { [native code] } has no method 'bsonsize'
How can this be done in javascript/MeteorJS? Thanks!

Please suggest a way to store a temp file in Windows Azure

Here I have a simple feature on ASP.NET MVC3 which host on Azure.
1st step: user upload a picture
2nd step: user crop the uploaded picture
3rd: system save the cropped picture, delete the temp file which is the uploaded original picture
Here is the problem I am facing now: where to store the temp file?
I tried on windows system somewhere, or on LocalResources: the problem is these resources are per Instance, so here is no guarantee the code on an instance shows the picture to crop will be the same code on the same instance that saved the temp file.
Do you have any idea on this temp file issue?
normally the file exist just for a while before delete it
the temp file needs to be Instance independent
Better the file can have some expire setting (for example, 1H) to delete itself, in case code crashed somewhere.
OK. So what you're after is basically somthing that is shared storage but expires. Amazon have just announced a rather nice setting called object expiration (https://forums.aws.amazon.com/ann.jspa?annID=1303). Nothing like this for Windows Azure storage yet unfortunately, but, doesnt mean we can't come up with some other approach; indeed even come up with a better (more cost effective) approach.
You say that it needs to be instance independant which means using a local temp drive is out of the picture. As others have said my initial leaning would be towards Blob storage but you will have cleanup effort there. If you are working with large images (>1MB) or low throughput (<100rps) then I think Blob storage is the only option. If you are working with smaller images AND high throughput then the transaction costs for blob storage will start to really add up (I have a white paper coming out soon which shows some modelling of this but some quick thoughts are below).
For a scenario with small images and high throughput a better option might be to use the Windows Azure Cache as your temporary storaage area. At first glance it will be eye wateringly expensive; on a per GB basis (110GB/month for Cache, 12c/GB for Storage). But, with storage your transactions are paid for whereas with Cache they are 'free'. (Quotas are here: http://msdn.microsoft.com/en-us/library/hh697522.aspx#C_BKMK_FAQ8) This can really add up; e.g. using 100kb temp files held for 20 minutes with a system throughput of 1500rps using Cache is about $1000 per month vs $15000 per month for storage transactions.
The Azure Cache approach is well worth considering, but, to be sure it is the 'best' approach I'd really want to know;
Size of images
Throughput per hour
A bit more detail on the actual client interaction with the server during the crop process? Is it an interactive process where the user will pull the iamge into their browser and crop visually? Or is it just a simple crop?
Here is what I see as a possible approach:
user upload the picture
your code saves it to a blob and have some data backend to know the relation between user session and uploaded image (mark it as temp image)
display the image in the cropping user interface interface
when user is done cropping on the client:
4.1. retrieve the original from the blob
4.2. crop it according the data sent from the user
4.3. delete the original from the blob and the record in the data backend used in step 2
4.4. save the final to another blob (final blob).
And have one background process checking for "expired" temp images in the data backend (used in step 2) to delete the images and the records in the data backend.
Please note that even in WebRole, you still have the RoleEntryPoint descendant, and you still can override the Run method. Impleneting the infinite loop in the Run() (that method shall never exit!) method, you can check if there is anything for deleting every N seconds (depending on your Thread.Sleep() in the Run().
You can use the Azure blob storage. Have look at this tutorial.
Under sample will be help you.
https://code.msdn.microsoft.com/How-to-store-temp-files-in-d33bbb10
you have two way of temp file in Azure.
1, you can use Path.GetTempPath and Path.GetTempFilename() functions for the temp file name
2, you can use Azure blob to simulate it.
private long TotalLimitSizeOfTempFiles = 100 * 1024 * 1024;
private async Task SaveTempFile(string fileName, long contentLenght, Stream inputStream)
{
try
{
//firstly, we need check the container if exists or not. And if not, we need to create one.
await container.CreateIfNotExistsAsync();
//init a blobReference
CloudBlockBlob tempFileBlob = container.GetBlockBlobReference(fileName);
//if the blobReference is exists, delete the old blob
tempFileBlob.DeleteIfExists();
//check the count of blob if over limit or not, if yes, clear them.
await CleanStorageIfReachLimit(contentLenght);
//and upload the new file in this
tempFileBlob.UploadFromStream(inputStream);
}
catch (Exception ex)
{
if (ex.InnerException != null)
{
throw ex.InnerException;
}
else
{
throw ex;
}
}
}
//check the count of blob if over limit or not, if yes, clear them.
private async Task CleanStorageIfReachLimit(long newFileLength)
{
List<CloudBlob> blobs = container.ListBlobs()
.OfType<CloudBlob>()
.OrderBy(m => m.Properties.LastModified)
.ToList();
//get total size of all blobs.
long totalSize = blobs.Sum(m => m.Properties.Length);
//calculate out the real limit size of before upload
long realLimetSize = TotalLimitSizeOfTempFiles - newFileLength;
//delete all,when the free size is enough, break this loop,and stop delete blob anymore
foreach (CloudBlob item in blobs)
{
if (totalSize <= realLimetSize)
{
break;
}
await item.DeleteIfExistsAsync();
totalSize -= item.Properties.Length;
}
}

Resources