Uniqueness check with a query in cloud firestore in datastore mode - google-cloud-datastore

As the cloud firestore in datastore mode supports strong consistency for all queries,
https://cloud.google.com/datastore/docs/firestore-or-datastore#in_datastore_mode
could this be used to check for uniqueness? Say I have a user entity (top level entity) that has a datastore allocated ID as the key. In the past, it wasn't possible to do a query by email within the transaction as it was a global query. But it seems that it is now possible to do such queries as clarified at
New Google Cloud Firestore in Datastore mode Queries Clarification
Does this mean it is now possible to ensure there are no duplicate User entities by just indexing and querying by the email property within the transaction to insert the User entity?
My current implementation is to have a separate entity that has a named key using the email and do a key based query on that entity within the transaction. I can get rid of that if I can query by email on the User entity itself within the transaction and it guarantees duplicate entities won't be created under race condition.

After some research, below is all I could gather.
Even though datastore mode is strongly consistent, it is still not possible to use Global queries within transactions.
As per https://cloud.google.com/datastore/docs/concepts/entities#creating_an_entity , it is possible to use a transaction, do get and based on the result do put but this is only possible with the uniqueness on the key.
There are some strategies outlined at Google cloud datastore only store unique entity for this same issue and Dan suggested "insert" as opposed to "put". At first I didn't get this as Appengine Datastore api never had "Insert". But the Cloud Datastore Client API has Mutations which allow explicit insert (as opposed to put which maps to Upsert).
As a result of the mutation support, I could use the same strategy of using a separate entity (different Kind) with key that maps to the unique property (such as email) but avoid an extra Get in the transaction. I do tx.Mutate on both the user entity and the uniqueness tracking entity with Named key on the email and try to insert both. This results in AlreadyExists error which can be used to track the violation.

As of right now, there is no way to enforce uniqueness on a property. However, there are workarounds for what you are trying to do. One workaround is explained in the article linked above, and another is here.

Related

Are multiple dynamoDB queries in a single API request bad practice

I'm trying to create my first DynamoDB based project and I'm having some trouble figuring out the best practices working with a NoSQL database.
My usecase currently is storing users and teams. I have a table that has a partition key of either USER#{userId} or TEAM{#teamId}. If the PK is TEAM{#teamId} I store records with SK either TEAM#{teamId} for team details, or USER#{userId} for the user's details in the team (acceptedInvite, joinDate etc). I also have a GSI based on the userId/email column that allows me to query all the teams a user has been invted to, or the user's team, depending on the value of acceptedInvite field. Attached screenshots of the table structure at the moment:
The table
The GSI
In my application I have an access pattern of getting a team's team members, given a user id.
Currently, I'm doing two queries in my lambda function:
Get user's team, by querying the GSI on PK = {userId} and fitler acceptedInvite = true
Get the team data by querying the table on PK = {teamId} and SK begins_with USER#
This works fine, but I'm concerned I need to preform two separate DynamoDB calls in my API function.
I'm wondering if there's a better way to represent this access pattern and if multiple dynamoDB calls are actually that bad, since I cannot see another way to do this.
Any kind of feedback is appreciated!
The best way to avoid making two queries like this is to supply the API caller with all the information needed to make a single DynamoDB request. For your case this means supplying the caller with the teamId. You can do this as either as part of a list operation response, or if it is the authenticated user, then as part of their claims in a JWT.

Restrict specific object key values with authentication in Firestore

I have an object stored in the Firestore database. Among other keys, it has a userId of the user who created it. I now want to store an email address, which is a sensitive piece of info, in the object. However, I only want this email address to be retrieved by the logged in user whose userId is equal to the userId of the object. Is it possible to restrict this using Firebase rules? Or will I need to store that email address in a /private collection under the Firebase object, apply restrictive firebase rules, and then retrieve it using my server?
TL;DR: Firestore document reads are all or nothing. Meaning, you can't retrieve a partial object from Firestore. So there is no feature at rule level that will give you granularity to restrict access to a specific field. Best approach is to create a subcollection with the sensitive fields and apply rules to it.
Taken from the documentation:
Reads in Cloud Firestore are performed at the document level. You either retrieve the full document, or you retrieve nothing. There is no way to retrieve a partial document. It is impossible using security rules alone to prevent users from reading specific fields within a document.
We solved this in two very similar approaches:
As you suggested, you can move your fields to a /private collection and apply rules there. However, this approach caused some issues for us because the /private collection is completely dettached from the original doc. Solving references implied multiple queries and extra calls to FS.
The second option -which is what the Documentation suggests also, and IMHO a bit better- is to use a subcollection. Which is pretty much the same as a collection but it keeps a hierarchical relationship with the parent coll.
From the same docs:
If there are certain fields within a document that you want to keep hidden from some users, the best way would be to put them in a separate document. For instance, you might consider creating a document in a private subcollection
NOTE:
Those Docs also include a good step-by-step on how to create this kind of structure on FS, how to apply rules to them, and how to consume the collections in various languages

Can transaction be used on collection?

I am use Firestore and try to remove race condition in Flutter app by use transaction.
I have subcollection where add 2 document maximum.
Race condition mean more than 2 document may be add because client code is use setData. For example:
Firestore.instance.collection(‘collection').document('document').collection('subCollection’).document(subCollectionDocument2).setData({
‘document2’: documentName,
});
I am try use transaction to make sure maximum 2 document are add. So if collection has been change (For example new document add to collection) while transaction run, the transaction will fail.
But I am read docs and it seem transaction use more for race condition where set field in document, not add document in subcollection.
For example if try implement:
Firestore.instance.collection(‘collection').document('document').collection('subCollection').runTransaction((transaction) async {
}),
Give error:
error: The method 'runTransaction' isn't defined for the class 'CollectionReference'.
Can transaction be use for monitor change to subcollection?
Anyone know other solution?
Can transaction be use for monitor change to subcollection?
Transactions in Firestore work by a so-called compare-and-swap operation. In a transaction, you read a document from the database, determine its current state, and then set its new state based on that. When you've done that for the entire transaction, you send the whole package of current-state-and-new-state documents to the server. The server then checks whether the current state in the storage layer still matches what your client started with, and if so it commits the new state that you specified.
Knowing this, the only way it is possible to monitor an entire collection in a transaction is to read all documents in that collection into the transaction. While that is technically possible for small collections, it's likely to be very inefficient, and I've never seen it done in practice. Then again, for just the two documents in your collection it may be totally feasible to simply read them in the transaction.
Keep in mind though that a transaction only ensures consistent data, it doesn't necessarily limit what a malicious user can do. If you want to ensure there are never more than two documents in the collection, you should look at a server-side mechanism.
The simplest mechanism (infrastructure wise) is to use Firestore's server-side security rules, but I don't think those will work to limit the number of documents in a collection, as Doug explained in his answer to Limit a number of documents in a subcollection in firestore rules.
The most likely solution in that case is (as Doug also suggests) to use Cloud Functions to write the documents in the subcollection. That way you can simply reject direct writes from the client, and enforce any business logic you want in your Cloud Functions code, which runs in a trusted environment.

Cloud Datastore batch insert entities while recognising if the same key already exist?

I'm trying to create a Cloud Dataflow 2.x (aka Apache Beam) PTransform to filter out elements of a PCollection that were already "seen" previously.
The basic idea is that the PTransform gets a Function to calculate a natural id (a "primary key") for each of the PCollection's elements. Each of these ids will be used to create a Cloud Datastore Key and stored in a dummy Entity Kind ("table"); if the insertion fails due to a duplicate key error, then I know that I already have seen the corresponding natural id and therefore the same PCollection element.
I assume the best way to implements this is by using Bundles in the DoFn and issue batch requests to Datastore in my #FinishBundle method.
I understand that I cannot use transactional commits in this case because if just one Insert mutation fails due to its key already existing, it will make the whole transaction fail, and the documentation says that's impossible to understand which of the Keys is the already existing one.
If I use non-transactional Inserts, am I guaranteed that concurrent inserts of the same Key will have at least one of them fail?
Otherwise, which other options do I have? I'd rather not use transactions with just one mutation, but batch multiple mutations together.
I'm using the datastore-v1-proto-client / datastore-v1-protos API (aka the plain Datastore v1 REST API), not the new google-cloud-datastore API.

What is the best practice for saving user data in Firestore/Firebase?

Specifically, should/can one one think of 'Collections' as table names and 'Documents' as unique keys ?
Should you use auto generated key, Auth uid or user email as documents names ?
What are the pros and cons of each if any ?
-Thanks
Yes, collections very closely resemble table names, as they would represent entities in object-oriented perspective. The documents are unique since each must have a unique id, the ids are the unique keys that identify each instance of an entity. No document can share a firebase id with another of the same collection.
Auth id keys seem to be the best idea for user firebase id's as it will allow you to sync between the firebase Auth, and Firestore/Firebase database, right out of the box. This is what I usually prefer. I would use autogenerated id's for other objects which have not been integrated into any other Firebase service. Having a consistent user id for both Firebase Auth,Firestore masks thing quite easy, since I only need to know one id, to access both services from the client end.

Resources