Maximum number of items in DocumentDb IN clause - azure-cosmosdb

I can't find mention anywhere in the documentation the maximum number of items supported by the IN keyword in DocumentDb.
I would make the assumption that there is a limit.
Can anyone point out in the documentation that is referenced?

DocumentDB has virtually eliminated all limits by raising them to a level we don’t foresee our users surpassing, thus they are no longer documented. In this case, 1000 arguments can be in an IN clause.

Currently the limit for AWS DocumentDB is 10K items (10,000) for $in operator:
OperationFailure
$in array size must not be greater than 10000

Related

Does dynamodb have maximum size limitation on query input?

I know dynamodb query has limitation on maximum response data size is 1MB. But does it have limitation on the input filter parameter? I may need to send a filter expression has a long list of values, I wonder whether it works without limtation?
As per AWS documentation:
The maximum length of any expression string is 4 KB, which includes ProjectionExpression, ConditionExpression, UpdateExpression, and FilterExpression.
check more details here
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ServiceQuotas.html#limits-expression-parameters
hope it answer your question.

new count() aggregation function performance

Firestore introduced the new aggregation query count() which is pretty useful for me. I'm trying to understand the speed and cost of it.
The documentation mentions performance:
Performance depends on your index configuration and on the size of the
dataset.
and
Most queries scale based on the on the size of the result set,
not the dataset. However, aggregation queries scale based on the size
of the dataset and the number of index entries scanned.
and also the pricing:
you are charged one document read for each batch of up to 1000 index
entries matched by the query.
Now imagine I have 2 collections, the first one with 100'000 documents, and the second one with 1'000'000 documents. I run a someQuery.count() on both collections and both return 1500. Here are some questions:
Will both queries be charged the same (2 document reads)?
Will the second query take longer than the first query, since the second collection has more documents? If yes - how much longer (linear, log, etc.)?
What does the documentation mean by performance depends on your index configuration? Can I configure the collections and their indexes to make count work faster?
Would be great to get some answers, I'm relying on count heavily (I know, it's still in preview).
The charge for COUNT() queries is based on the number of index entries that are matched by that query. Essentially:
take the count you get back,
divide by 1000, rounding up,
if the result is 0, add 1.
That's the number of document reads you're charged.
The performance of a COUNT() query depends on the number of items you count. Counting more items will take longer as the database has to match a large number of index entries.
This is similar to if you'd actually retrieve the documents rather than just their count: getting more documents would also take more time. Of course: counting the documents will be faster than actually getting them, but counting more documents still takes more time than counting fewer documents.
Some queries may be possible even when you don't have a specific index for the exact field combination you use, due to Firestore's ability to perform a zig-zag-merge-join across the indexes (that feature might have a different name now, but it's pretty much still doing something similar). That reduces the number of indexes you'd need for a database, and thus reduce the associated cost. It could also affect the performance of the count, but not in a way that you can easily calculate. I recommend not trying to optimize here until you actually need it, i.e. when you notice certain queries being slower than other similar queries in your database.

query with pagination causing high throughput usage

I use Azure Cosmos DB API for MongoDB account, version 3.6. In queries using skip and limit I noticed higher throughput usage. The higher skip is the more costly query is.
db.MyCollection.find({Property:"testtest"}).skip(12000).limit(10)
Above query costs around 3000 RU. The property that is in find clause is my partition key. I have read that currently cosmosdb is capable of doing queries with offset and limit but I found that officaly only in SQL API for CosmosDb there's OFFSET LIMIT clause. Is it possible with MongoDb API either or should I live with costly queries with skip ?
The SQL API will yield the same result with OFFSET LIMIT. You'll find an almost linear increase in RU as you increase the offset as each query loops over all skipped documents.
If possible you should try to use the continuation token if possible in your context. You could also adjust your filter criteria using a indexed property to move over your data.
The RU charge of a query with OFFSET LIMIT will increase as the number of terms being offset increases. For queries that have multiple pages of results, we typically recommend using continuation tokens. Continuation tokens are a "bookmark" for the place where the query can later resume. If you use OFFSET LIMIT, there is no "bookmark". If you wanted to return the query's next page, you would have to start from the beginning.
Source

Cosmos DB - Slow COUNT

I am working on an existing Cosmos DB where the number of physical partitions is less than 100. Each contains around 30,000,000 documents. There is an indexing policy in place on "/*".
I'm just trying to get a total count from SQL API like so:
SELECT VALUE COUNT(1) FROM mycollection c
I have set EnableCrossPartitionQuery to true, and MaxDegreeOfParallelism to 100 (so as to at least cover the number of physical partitions AKA key ranges). The database is scaled to 50,000 RU. The query is running for HOURS. This does not make sense to me. An equivalent relational database would answer this question almost immediately. This is ridiculous.
What, if anything, can I change here? Am I doing something wrong?
Microsoft support ended up applying an update to the underlying instance. In this case, the update was in the development pipeline to be rolled out gradually. This instance got it earlier as a result of the support case. The update related to using indexes to service this type of query.

Get 'X' number of records on DynamoDB

I'm trying know if exist something similar to TOP in SQL on DynamoDB
I'm reading the documentation http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html
But I didn't find something similar.
Someones knows a way to do it?
Limit in Scan/Query is what you were looking for
Both ScanRequest and QueryRequest have a withLimit function for limiting the maximum number of items to evaluate for a request.
From the documentation:
The maximum number of items to evaluate (not necessarily the number of
matching items). If DynamoDB processes the number of items up to the
limit while processing the results, it stops the operation and returns
the matching values up to that point, and a key in LastEvaluatedKey to
apply in a subsequent operation, so that you can pick up where you
left off. Also, if the processed data set size exceeds 1 MB before
DynamoDB reaches this limit, it stops the operation and returns the
matching values up to the limit, and a key in LastEvaluatedKey to
apply in a subsequent operation to continue the operation. For more
information, see Query and Scan in the Amazon DynamoDB Developer
Guide.

Resources