I want to update all records matching a query in Fire/DataStore, how do I do it?
The SQL equivalent will look like
UPDATE transactions SET category = X WHERE category = Y
Seems like to do this I will have to query all records matching category = Y then for each do a set()?
Firestore (like most NoSQL databases) does not support sending update queries directly to the database.
To update documents matching a query, you will indeed first need to execute that query and then update each document.
Related
I am using Python client SDK for Datastore (google-cloud-datastore) version 1.4.0. I am trying to run a key-only query fetch:
query = client.query(kind = 'SomeEntity')
query.keys_only()
Query filter has EQUAL condition on field1 and GREATER_THAN_OR_EQUAL condition on field2. Ordering is done based on field2
For fetch, I am specifying a limit:
query_iter = query.fetch(start_cursor=cursor, limit=100)
page = next(query_iter.pages)
keyList = [entity.key for entity in page]
nextCursor = query_iter.next_page_token
Though there are around 50 entities satisfying this query, each fetch returns around 10-15 results and a cursor. I can use the cursor to get all the results; but this results in additional call overhead
Is this behavior expected?
keys_only query is limited to 1000 entries in a single call. This operation counts as a single entity read.
For another limitations of Datastore, please refer detailed table in the documentation.
However, in the code, you did specify cursor as a starting point for a subsequent retrieval operation. Query can be limited, without cursor:
query = client.query()
query.keys_only()
tasks = list(query.fetch(limit=100))
For detailed instruction how to use limits and cursors, please refer documentation of the Google Gloud Datastore
If I use offset and limit to paginate query, do I need to specify an order?
In other words, a query with no order specified, uses some implicit order, like key order?
In a SQL database, if I don't specify an order, SQL engine will return the results in the order he pleases. So the second time the query is run, the results may be staked in a different order, so offset and limit would not cut the result as wanted.
Making the assumption you want to order your documents by creation date, in Firestore this would need an extra field, as explained in the Firestore documentation (link):
Unlike "push IDs" in the Firebase Realtime Database, Cloud Firestore
auto-generated IDs do not provide any automatic ordering. If you want
to be able to order your documents by creation date, you should store
a timestamp as a field in the documents.
Team,
I have a dynamodb with a given hashkey (userid) and sort key (ages). Lets say if we want to retrieve the elements as "per each hashkey(userid), smallest age" output, what would be the query and filter expression for the dynamo query.
Thanks!
I don't think you can do it in a query. You would need to do full table scan. If you have a list of hash keys somewhere, then you can do N queries (in parallel) instead.
[Update] Here is another possible approach:
Maintain a second table, where you have just a hash key (userID). This table will contain record with the smallest age for given user. To achieve that, make sure that every time you update main table you also update second one if new age is less than current age in the second table. You can use conditional update for that. Update can either be done by application itself, or you can have AWS lambda listening to dynamoDB stream. Now if you need smallest age for each use, you still do full table scan of the second table, but this scan will only read relevant records, to it will be optimal.
There are two ways to achieve that:
If you don't need to get this data in realtime you can export your data into a other AWS systems, like EMR or Redshift and perform complex analytics queries there. With this you can write SQL expressions using joins and group by operators.
You can even perform EMR Hive queries on DynamoDB data, but they perform scans, so it's not very cost efficient.
Another option is use DynamoDB streams. You can maintain a separate table that stores:
Table: MinAges
UserId - primary key
MinAge - regular numeric attribute
On every update/delete/insert of an original query you can query minimum age for an updated user and store into the MinAges table
Another option is to write something like this:
storeNewAge(userId, newAge)
def smallestAge = getSmallestAgeFor(userId)
storeSmallestAge(userId, smallestAge)
But since DynamoDB does not has native transactions support it's dangerous to run code like that, since you may end up with inconsistent data. You can use DynamoDB transactions library, but these transactions are expensive. While if you are using streams you will have consistent data, at a very low price.
You can do it using ScanIndexForward
YourEntity requestEntity = new YourEntity();
requestEntity.setHashKey(hashkey);
DynamoDBQueryExpression<YourEntity> queryExpression = new DynamoDBQueryExpression<YourEntity>()
.withHashKeyValues(requestEntity)
.withConsistentRead(false);
equeryExpression.setIndexName(IndexName); // if you are using any index
queryExpression.setScanIndexForward(false);
queryExpression.setLimit(1);
I'm planning on using Cosmos Db (Document Db) and I'm trying to understand how the queries, indexing and partitions relate to each other.
How to partition and scale in Azure Cosmos Db talks about the partition key and other documentation indicates that partition key + id = unique id for the document. But then SQL Query and SQL syntax in Azure Cosmos Db says it provides automatic indexing of JSON documents without requiring explicit schema or creation of secondary indexes.
I understand that partition key is important for scalability and how data is stored. But if we think about searching is the partition key kind of like extra filter/where clause? All the documents are indexed so I can execute query like:
SELECT *
FROM Families
WHERE Families.address.state = "NY"
Should I still specify the partition key or indicate some how that cross partition queries are allowed when using this SQL query syntax?
Your first link gives the answer for this:
For partitioned collections, you can use PartitionKey to run the query against a single partition (though Cosmos DB can automatically extract this from the query text), and EnableCrossPartitionQuery to run queries that may need to be run against multiple partitions.
So, yes, you either need to specify the WHERE clause which will make query run against a single partition, or set EnableCrossPartitionQuery to true in query options.
You don't have to do that anymore, EnableCrossPartitionQuery is set to true by default nowadays. This means Cosmos won't complain if you don't skip the partition key in your query.
More info here.
You don't need to specify a partition key to the query. Recent version enabled cross partition queries by default
I know we can do update by two operations, first get the primary key by querying the db, and then update it by put operation. But does DynamoDB support update by one operation as the relational db (such as mysql)? Since two operations will cost more time in network transferring.
My situation is as:
I have a table A with fields ID, Name, Location, Value.
And name+location can uniquely define a row.
So now I want to update the field "Value" when Name and Location satisfied some condition, but I don't know the ID. So if I use mysql, then I can update it by "Update A set value = XXX where name = "abc" and location="123"".
But when I use dynamoDB, I have to first get the primary key ID.
Then use the Key to update the item. So my question is that does DynamoDB also support similar update operation as mysql does.
Thanks!
Chen hit it on the nose. Joey, the situation you described (Get followed by a Put) is equivalent to 2 mysql functions
SELECT *
FROM TABLE
WHERE key = x
UPDATE TABLE
SET var = param
WHERE key = x
Do you see how the Select/PutItem aren't part of the update process? As long as you have the keys, you don't need to perform a query. I'm assuming you're performing the GetItem before the PutItem request because the PutItem replaces the entire item/row (i.e. deletes all attributes not specified in the Put request).
So if the original item looked like: < key-id=1, first-name=John, last-name=Doe, age=22>
and you perform a PutItem of: < key-id=1,location=NY>
The final item looks like: < key-id=1,location=NY>
If you perform an UpdateItem in place of PutItem then you would instead get:
< key-id=1, first-name=John, last-name=Doe, age=22, location=NY>
Here's a link for using the UpdateItem with Java. There also examples using .net and php
UpdateItem for Java
Correct me if I am wrong but Update Item will consume 1 operation only it will get hash key value and update it if exists else will create new Item (up-to 1 kb item)
here is the link for reference : http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/WorkingWithTables.html#CapacityUnitCalculations
Hope that helps
You don't need to get the primary key first. If you know the primary key, you don't need to get anything and you can simply use the UpdateItem API call to update your item.
If that still isn't clear, please edit your question and add some code samples of what you are trying to do.