Firebase web - transaction on query - firebase

Can I run a transaction on a query referring to multiple locations ?
In the doc I see that for example startAt returns a firebase.database.Query which has a ref property of type firebase.database.Reference which has the transaction method.
So can I do:
ref.startAt(ver).ref.transaction(transactionUpdate).then(... ?
Would the transaction then operate on multiple locations and update them correctly ?
What I'm trying to do is to get all locations since a particular version (key) and then mark them as 'read' so that a writing client will not update them. For that I need a transaction rather than a simple update.
Thx!

The answer is "no" to all questions.
The ref property of a Query gives you the reference of the node on which you set up the query. Consider how you built the query in the first place. In other words, ref.startAt(x).ref is equivalent to ref.
Manipulating a reference (navigating to children, adding query options, etc.) is completely independent of any query results. It's just local, trivial path manipulation, very similar to formatting a URL.
Transactions can only operate on a single node, by definition, using that node's value snapshots for incremental updates. They cannot "operate on multiple locations and update them correctly". These are not SQL transactions, the only thing common is the name – which might be, unfortunately, confusing.
The starting node doesn't have to be a leaf node. But if you start a transaction on a "parent" node, the client will have to download every child to create a whole snapshot, potentially multiple times if any of them is modified by another client.
This is most certainly a very slow, fragile and expensive operation, both for the user and you, the owner of the database. In general, it's not recommended to run transactions if the node might grow unbounded.
I suggest revising the presented strategy. Updating "all children" just to store a "read" marker simply does not scale.
You could for example store the last read ID of the client in a single node, and write security rules to enforce that no data with an ID less than this may be modified.

Related

Does DynamoDB expose an API to query or detect when there is conflict in merging item data

DynamoDB is an AP system based on the original dynamo paper.
Is there any API to detect when a merge conflict has happened or resolved?
Is there any API to provide a strategy to resolve a conflict if it happens.
Your question is based on a wrong premise. Although DynamoDB shares the name, and some goals and implementation details, with the original "Dynamo" paper, it is not very close, and the data model in particular is completely different.
Whereas in the Dynamo paper multiple clients could store multiple different values for an item concurrently - and later readers need to resolve the conflict - DynamoDB does things very differently:
If two clients replace an item, DynamoDB offers a "last write wins" - one of these writes will win, you don't know or care which.
If two clients modify different attributes in the same item concurrently, both changes will be merged. I never found this explicitly promised, but it appears to work this way.
You also have a powerful conditional update feature, which can do a modification to a single item based on some condition on the old value of this item. These conditional updates are guaranteed to be isolated, so they can be used to ensure safe concurrent modification. For example, a conditional update can be used to implement so-called optimistic locking: An item has a version attribute among other attributes, a client reads the old item, decides what to change it to, and then does the write - with the condition that the version still hasn't changed. If the condition fails (because some other client raced us), the write fails and the client tries the whole process again (read again, apply a change, and write back).
DynamoDB also has a new feature of full (multi-item) transactions. This feature did not exist in Dynamo at all.

First query snapshot of snapshotListener

According to the Firebase - Firestore documentation, snapshotListeners provides all of the available records when we enabled the listener based on our query.
Firestore documentation:
The first query snapshot contains added events for all existing documents that match the query. This is because you're getting a set of changes that bring your query snapshot current with the initial state of the query. This allows you, for instance, to directly populate your UI from the changes you receive in the first query snapshot, without needing to add special logic for handling the initial state.
As far as I understood, it's not possible to disable this feature but there are some workarounds.
My question is if this behavior counts as one read for every record received during the first initialization or not?
My question is if this behavior counts as one read for every record
received during the first initialization or not?
The answer is yes: the "initial state of the query" implies that all documents corresponding to the query are read.
However, as explained in the documentation:
The initial state can come from the server directly, or from a local
cache. If there is state available in a local cache, the query
snapshot will be initially populated with the cached data.
If the initial state comes from a local cache (See offline data persistence), it will not count for any read.

Return entity updated by axon command

What is the best way to get the updated representation of an entity after mutating it with a command.
For example, lets say I have a project like digital-restaurant and I want to be able to update a field on the restaurant and return it's current state to the client making the update (to retrieve any modifications by different processes).
When a restaurant is created, it is easy to retrieve the current state (ie: the projection representation) after dispatching the create command by subscribing to a FindRestaurantQuery and waiting until a record is returned (see Restaurant CommandController)
However, it isn't so simple to detect when the result of an UpdateCommand has been applied to the projection. For example,
if we use the same trick and subscribe to the FindRestaurantQuery, we will be notified if the restaurant has been modified,
but it may not be our command that triggered the modification (in the case where multiple processes are concurrently issuing
update commands).
There seems to be two obvious ways to detect when a given update command has been applied to the projection:
Have a unique ID associated with every update command.
Subscribe to a query that is updated when the command ID has been applied to the projection.
Propagate the unique ID to the event that is applied by the aggregate
When the projection receives the event, it can notify the query listener with the current state
Before dispatching an update command, query the existing state of the projection
Calculate the destination state given the contents of the update command
In the case of (1): is there any situation (eg: batching / snapshotting) where the event carrying the unique ID may be
skipped over somehow, preventing the query listener from being notified?
Is there a more reliable / more idiomatic way to accomplish this use case?
Axon 4 with Spring boot.
Although fully asynchronous designs may be preferable for a number of reasons, it is a common scenario that back-end teams are forced to provide synchronous REST API on top of asynchronous CQRS+ES back-ends.
The part of the demo application that is trying to solve this problem is located here https://github.com/idugalic/digital-restaurant/tree/master/drestaurant-apps/drestaurant-monolith-rest
The case you are mentioning is totally valid.
I would go with the option 1.
My only concern is that you have to introduce new unique ID associated with every update command attribute to the domain (events). This ID attribute does not have any Domain/Business value by my opinion. There is an Audit(who, when) attribute associated to every event already, and maybe you can use that to correlate commands and subscriptions. I believe that there is more value in this solution (identity is part of domain), if this is not to relaxing for your case.
Please note that Queries have to be extended with Audit in this case (you will know who requested the Query)

Can transaction be used on collection?

I am use Firestore and try to remove race condition in Flutter app by use transaction.
I have subcollection where add 2 document maximum.
Race condition mean more than 2 document may be add because client code is use setData. For example:
Firestore.instance.collection(‘collection').document('document').collection('subCollection’).document(subCollectionDocument2).setData({
‘document2’: documentName,
});
I am try use transaction to make sure maximum 2 document are add. So if collection has been change (For example new document add to collection) while transaction run, the transaction will fail.
But I am read docs and it seem transaction use more for race condition where set field in document, not add document in subcollection.
For example if try implement:
Firestore.instance.collection(‘collection').document('document').collection('subCollection').runTransaction((transaction) async {
}),
Give error:
error: The method 'runTransaction' isn't defined for the class 'CollectionReference'.
Can transaction be use for monitor change to subcollection?
Anyone know other solution?
Can transaction be use for monitor change to subcollection?
Transactions in Firestore work by a so-called compare-and-swap operation. In a transaction, you read a document from the database, determine its current state, and then set its new state based on that. When you've done that for the entire transaction, you send the whole package of current-state-and-new-state documents to the server. The server then checks whether the current state in the storage layer still matches what your client started with, and if so it commits the new state that you specified.
Knowing this, the only way it is possible to monitor an entire collection in a transaction is to read all documents in that collection into the transaction. While that is technically possible for small collections, it's likely to be very inefficient, and I've never seen it done in practice. Then again, for just the two documents in your collection it may be totally feasible to simply read them in the transaction.
Keep in mind though that a transaction only ensures consistent data, it doesn't necessarily limit what a malicious user can do. If you want to ensure there are never more than two documents in the collection, you should look at a server-side mechanism.
The simplest mechanism (infrastructure wise) is to use Firestore's server-side security rules, but I don't think those will work to limit the number of documents in a collection, as Doug explained in his answer to Limit a number of documents in a subcollection in firestore rules.
The most likely solution in that case is (as Doug also suggests) to use Cloud Functions to write the documents in the subcollection. That way you can simply reject direct writes from the client, and enforce any business logic you want in your Cloud Functions code, which runs in a trusted environment.

Lookup the existence of a large number of keys (up to1M) in datastore

We have a table with 100M rows in google cloud datastore. What is the most efficient way to look up the existence of a large number of keys (500K-1M)?
For context, a use case could be that we have a big content datastore (think of all webpages in a domain). This datastore contains pre-crawled content and metadata for each document. Each document, however, could be liked by many users. Now when we have a new user and he/she says he/she likes document {a1, a2, ..., an}, we want to tell if all these document ak {k in 1 to n} are already crawled. That's the reason we want to do the lookup mentioned above. If there is a subset of documents that we don't have yet, we would start to crawl them immediately. Yes, the ultimate goal is to retrieve all these document content and use them to build the user profile.
My current thought is to issue a bunch of batch lookup requests. Each lookup request can contain up to 1K of keys [1]. However to get the existence of every key in a set of 1M, I still need to issue 1000 requests.
An alternative is to use a customized middle layer to provide a quick look up (for example, can use bloom filter or something similar) to save the time between multiple requests. Assuming we never delete keys, every time we insert a key, we add it through the middle layer. The bloom-filter keeps track of what keys we have (with a tolerable false positive rate). Since this is a custom layer, we could provide a micro-service without a limit. Say we could respond to a request asking for the existence of 1M keys. However, this definitely increases our design/implementation complexity.
Is there any more efficient ways to do that? Maybe a better design? Thanks!
[1] https://cloud.google.com/datastore/docs/concepts/limits
I'd suggest breaking down the problem in a more scalable (and less costly) approach.
In the use case you mentioned you can deal with one document at a time, each document having a corresponding entity in the datastore.
The webpage URL uniquely identifies the page, so you can use it to generate a unique key/identifier for the respective entity. With a single key lookup (strongly consistent) you can then determine if the entity exists or not, i.e. if the webpage has already been considered for crawling. If it hasn't then a new entity is created and a crawling job is launched for it.
The length of the entity key can be an issue, see How long (max characters) can a datastore entity key_name be? Is it bad to haver very long key_names?. To avoid it you can have the URL stored as a property of the webpage entity. You'll then have to query for the entity by the url property to determine if the webpage has already been considered for crawling. This is just eventually consistent, meaning that it may take a while from when the document entity is created (and its crawling job launched) until it appears in the query result. Not a big deal, it can be addressed by a bit of logic in the crawling job to prevent and/or remove document duplicates.
I'd keep the "like" information as small entities mapping a document to a user, separated from the document and from the user entities, to prevent the drawbacks of maintaining possibly very long lists in a single entity, see Manage nested list of entities within entities in Google Cloud Datastore and Creating your own activity logging in GAE/P.
When a user likes a webpage with a particular URL you just have to check if the matching document entity exists:
if it does just create the like mapping entity
if it doesn't and you used the above-mentioned unique key identifiers:
create the document entity and launch its crawling job
create the like mapping entity
otherwise:
launch the crawling job which creates the document entity taking care of deduplication
launch a delayed job to create the mapping entity later, when the (unique) document entity becomes available. Possibly chained off the crawling job. Some retry logic may be needed.
Checking if a user liked a particular document becomes a simple query for one such mapping entity (with a bit of care as it's also eventually consistent).
With such scheme in place you no longer have to make those massive lookups, you only do one at a time - which is OK, a user liking documents one a time is IMHO more natural than providing a large list of liked documents.

Resources