I'm trying to run a batch-get-item from CLI and getting an error that I cannot pass more than 100 keys:
failed to satisfy constraint: Member must have length less than or equal to 100
This is the command I use
aws dynamodb batch-get-item \
--request-items file://request-items.json \
--return-consumed-capacity TOTAL > user_table_output.txt
I would like to know if there's a way I can add pagination to my query?
or is there another way I can run the query?
I have ~4000 keys which I need to query.
Thanks in advance.
You'll have to break your keys down into batches of no more than 100. Also, keep in mind that the response may not include all the items you requested if the size of the items being returned exceeds 16MB. If that happens the response will include UnprocessedKeys, which can be used to request the keys that were not retrieved.
BatchGetItem has information about the API, or you can view the AWS CLI v2 doc here.
Related
I'm trying to add an attribute to a whole table, without specifying an index.
In this examples it's always being used an index:
aws dynamodb update-item \
--region MY_REGION \
--table-name MY_TABLE_NAME \
--key='{"AccountId": {"S": accountId}}' \
--update-expression 'SET conf=:newconf' \
--expression-attribute-values '{":newconf":{"S":"new conf value"}}'
Plus, that's an update for an attribute that is already in the table.
How can add a new attribute to each record of a table?
There is no API that will automatically add an attribute to all items in a table. DynamoDB just doesn't work that way.
The only way to add an attribute to all items in a table is to scan the table and for each item, make an UpdateItem request to add the attribute you want. This can be done for attributes that are missing (ie. adding new), or attributes that already exist and just being updated.
Some caveats:
If the table is small, and not being updated too often, this may work as intended in a single pass
If the table is larger and being updated relatively fast (ie. every second) then you will need to make sure the code updating the table is also adding the attribute to new items, or items being updated and that the updates don't clobber
Lastly, if the table is large, this can consume a LOT of capacity because of the scan and update for each item so plan on it taking a long time (also mind the consumed capacity vs. provisioned capacity) -- better have some rate-limiting on the update script
I am querying a Cosmos DB using the REST API. I am having problems with the 'OFFSET LIMIT' clause. I have tested this both with my code (Dart) and Postman with the same results:
This query works ok:
SELECT * FROM Faults f WHERE CONTAINS(f.Key, 'start', true)
This query does not work. Same as 1 but using OFFSET and LIMIT to get a subset:
SELECT * FROM Faults f
WHERE CONTAINS(f.Key, 'start', true)
OFFSET 10 LIMIT 10
This query works ok. Same as 2. but with an additional filter
SELECT * FROM Faults f
WHERE CONTAINS(f.Key, 'start', true)
AND f.Node = 'XI'
OFFSET 10 LIMIT 10
I don't get why if 1 and 3 are working 2 is not.
This is the response from query 2:
{
"code": "BadRequest",
"message": "The provided cross partition query can not be directly served by the gateway. This is a first chance (internal) exception that all newer clients will know how to handle gracefully. This exception is traced, but unless you see it bubble up as an exception (which only happens on older SDK clients), then you can safely ignore this message.\r\nActivityId: 5918ae0e-71ab-48a4-aa20-edd8427fe21f, Microsoft.Azure.Documents.Common/2.11.0",
"additionalErrorInfo": "{\"partitionedQueryExecutionInfoVersion\":2,\"queryInfo\":{\"distinctType\":\"None\",\"top\":null,\"offset\":10,\"limit\":10,\"orderBy\":[],\"orderByExpressions\":[],\"groupByExpressions\":[],\"groupByAliases\":[],\"aggregates\":[],\"groupByAliasToAggregateType\":{},\"rewrittenQuery\":\"SELECT *\\nFROM Faults AS f\\nWHERE CONTAINS(f.Key, \\\"start\\\", true)\\nOFFSET 0 LIMIT 20\",\"hasSelectValue\":false},\"queryRanges\":[{\"min\":\"\",\"max\":\"FF\",\"isMinInclusive\":true,\"isMaxInclusive\":false}]}"
}
Thanks for your help
It seems that you can't execute cross partition query through REST API.
Probably, you have to use the official SDKs.
Cosmos DB : cross partition query can not be directly served by the gateway
Thanks decoy for putting me in the right direction.
OFFSET LIMIT is not supported by the REST API.
Pagination though can be achieved with the headers without using the SDK.
Set on your first request:
x-ms-max-item-count to the amount of records you want to retrieve at a time e.g. 10.
With the response you get the header:
x-ms-continuation String that points to the next document.
To get the next 10 documents send a new request with the headers:
x-ms-max-item-count = 10. Just like the first one.
x-ms-continuation set to the value you got from the response.
So it is very easy to get the next documents but it is not straightforward to get the previous ones. I had to save the document nÂș and 'x-ms-continuation' strings as key-value pairs and use them to implement a 'search previous' pagination.
I don't know if there is an easier way to do it.
With Cosmos DB for MongoDB API (Version 3.4), the following find query in combination with the method cursor sort seems to behave incorrectly:
db.test.find({"field1": "value1"}).sort({"field2": 1})
The error occurs, if all of the following conditions are met:
the default indexing policy were discarded - regardless of whether custom indexes were created afterwards using createIndex().
The find() query does not return any documents (Find(filter).Count() == 0)
The Sort document defining the sort order contains only one field. It doesn't matter, whether this field exists or has been indexed. Using two fields in the sort document returns 0 hits which is the correct behavior.
The error also occurs, if all of the following conditions are met:
the default indexing policy were discarded
The find() query returns one or more documents
The Sort document contains exactly one field. This field has not been indexed.
The error message:
The index path corresponding to the specified order-by item is excluded.
The malfunction occurs only when using the CosmosDB, with native MongoDB (mongoDB Atlas, v4.0) it behaves correctly.
Azure Cosmos DB for MongoDB API with MongoDB 3.4 wire protocol (preview feature) is used. The problem occurs with both a MongoDB C#/.NET driver and the mongo shell.
In addition, the problem only occurs with find(). An equivalent aggregation pipeline containing $match and $sort behaves correctly.
Reproduction
Create an Azure Cosmos DB Account with the "Azure Cosmos DB for MongoDB API". Enable the preview feature MongoDB 3.4 (Version 3.2 has not been tested).
Create a new database
Create a new collection, define a shard key
Drop the default indexing policy (using db.test.dropIndexes() )
(Optional) Create new custom indexes
(Optional) Insert documents
Execute command in mongo shell (or the equivalent code with mongoDB C#/.NET driver):
db.test.find({"field1": "value1"}).sort({"field2": 1})
Expected result
All documents that match the query criteria. If there are none, no documents should be returned.
Actual result
Error: error: {
"_t" : "OKMongoResponse",
"ok" : 0,
"code" : 2,
"errmsg" : "Message: {\"Errors\":[\"The index path corresponding to the specified order-by item is excluded.\"]}\r\nActivityId: c50cc751-0000-0000-0000-000000000000, Request URI: /apps/[...]/, RequestStats: \r\nRequestStartTime: 2019-07-11T08:58:48.9880813Z, RequestEndTime: 2019-07-11T08:58:49.0081101Z, Number of regions attempted: 1\r\nResponseTime: 2019-07-11T08:58:49.0081101Z, StoreResult: StorePhysicalAddress: rntbd://[...]/, LSN: 359549, GlobalCommittedLsn: 359548, PartitionKeyRangeId: 0, IsValid: True, StatusCode: 400, SubStatusCode: 0, RequestCharge: 1, ItemLSN: -1, SessionToken: -1#359549, UsingLocalLSN: True, TransportException: null, ResourceType: Document, OperationType: Query\r\n, SDK: Microsoft.Azure.Documents.Common/2.4.0.0", [...]
Workaround
Adding an additional "dummy" field to the sort document prevents the error:
db.test.find({"field1": "value1"}).sort({"field2": 1, "dummyfield": 1}).count()
The workaround is not satisfactory. It could falsify the result.
Am I doing something wrong, or is Cosmos DB behaving flawed here?
According to Microsoft support, an index needs to be created on the field being sorted. The default indexes can be dropped and custom indexes created. As for the issue of not modifying the index every time a new field is added, there is no other alternative other than performing a client side sort. Unfortunately, client side sorting would take lot of CPU memory on the client side and the sort on index would take work when you would get more fields to index.
Thus I did not find a really satisfying solution:
Using the Default Indexing Policy. However, this can lead to a huge index.
Indexing all elements that need to be sorted. Every time a new element has to be indexed, this leads to a manual modification of the indexing policy.
Only use Client-side sort. In my opinion this leads to a strong limitation of MongoDB functionality.
Using aggregation frameworks instead of the find method. This leads to increased complexity and traffic.
Migrating to native MongoDB.
db.collection.createIndex ({ "$**" : 1 });
I am using Python client SDK for Datastore (google-cloud-datastore) version 1.4.0. I am trying to run a key-only query fetch:
query = client.query(kind = 'SomeEntity')
query.keys_only()
Query filter has EQUAL condition on field1 and GREATER_THAN_OR_EQUAL condition on field2. Ordering is done based on field2
For fetch, I am specifying a limit:
query_iter = query.fetch(start_cursor=cursor, limit=100)
page = next(query_iter.pages)
keyList = [entity.key for entity in page]
nextCursor = query_iter.next_page_token
Though there are around 50 entities satisfying this query, each fetch returns around 10-15 results and a cursor. I can use the cursor to get all the results; but this results in additional call overhead
Is this behavior expected?
keys_only query is limited to 1000 entries in a single call. This operation counts as a single entity read.
For another limitations of Datastore, please refer detailed table in the documentation.
However, in the code, you did specify cursor as a starting point for a subsequent retrieval operation. Query can be limited, without cursor:
query = client.query()
query.keys_only()
tasks = list(query.fetch(limit=100))
For detailed instruction how to use limits and cursors, please refer documentation of the Google Gloud Datastore
I am using MongoTemplate to execute my Mongo queries.
I wanted to know if count works with limit set?
Also why find query searches full collection (according to query) although limit is set?
For e.g. the query i wrote might result in having 10000 records, but i want only 100 records and for that i have set limit to 100 and then fired find query. But still query goes on to search full 10000 records.
dataQuery.limit(100);
List<logs> logResultsTemp = mongoTemplate1.find(dataQuery, logs.class);
Is their any limitations in using limit command?
Limit works fine (at least on spring data version 1.2.1 that I use). Perhaps it was a problem on your your version?
About count, there is a specific method to get your collection count, so you don't need to care about the amount of data that your system will fetch:
mongoTemplate.count(new Query(), MyCollection.class)
Btw, if you try this directly on your mongodb console: db.myCollection.limit(1).count() you will get the actual total of documents in your collection, not only one. An so it is for the mongoTemplate.count method, so:
mongoTemplate.count(new Query().limit(1), MyCollection.class)
will work the same way.