Creating Empty Documents in Firestore - firebase

I understand that empty documents within collections are removed automatically by the system in Firestore. However, I have a situation now where the name of the document serves a purpose. I have a collection named usernames, and within this, many documents with the ID being the username. For instance, usernames/bob_marley is what I might see in the database. The problem here is that, since the documents do not have any fields in them, they get removed automatically thereby defeating the purpose of the set-up. How should I be structuring my database in these cases?
Thank you

The easiest thing to do is simply not allow the document to ever become empty. Keep one property in it with (for example) "exists = true" and make sure it never gets removed. Use a security rule to prevent this, if you're concerned about users accidentally doing this to themselves.
Another thing to do is re-evaluate what exactly you're trying to do with an empty document in the system, and if it's worthwhile to think about how to structure your data in a way that best meets the queries you want to perform.

Related

Absurd Firestore Collections and Security Rules inheritance

I have Firestore Collections structure like this
../cards/{cardId}/data/{dataId}
to safely read data I need this call on Firestore Security Rules
get(/databases/$(database)/documents/cards/$(cardId)).data
then compare of card fields. It's basically 2 reads everytime I do this.
yet if I change my structure by making it all parent like this (but also has to make some similar fields on both model).
../cards/{cardId}
../data/{dataId}
It does need only 1 read. But I need 2 writes each time because of changes on both similar fields. Write call does less than reads which makes it cheaper, but this is annoying to code. And this makes Firestore inheritance useless.
I mean, can Firestore just have ability to read parent fields too with no cost? At least for Security Rules. Firestore basically is making index for each field right? So, can it just also understand the meaning of inheritance which is make the child know/have parent fields too?. Or this is just the limitation of NoSQL? It's really annoying every time I runs into this.
can Firestore just have ability to read parent fields too with no cost?
Stack Overflow isn't the right place to ask a question like this. But I'll speak confidently and say that, no, Firestore can't have any free writes (outside of the free monthly allowance), else it will not be a service that can sustain itself from the money it receives in revenue.
If you have feedback to send to the Firebase team, contact Firebase support directly. But I suspect that asking to relax the billing requirements of the system is not going to be a request they will entertain.
can it just also understand the meaning of inheritance which is make the child know/have parent fields too?.
What you're describing is not really "inheritance". It's simply nesting. Collections can be nested under documents within other collections. The relationship between those documents is only in the path prefix that they have in common. Other than that, each document stands fully on its own without any ties to any other documents in the system.
Or this is just the limitation of NoSQL?
It has nothing really to do with NoSQL. The relationship between collections and subcollections is just a way that you can organize data in the system. You choose the method of organization that best suits the queries (and security requirements, when using security rules) for your app. Whether that organization is nested or not, it's up to you.
But there is no way to organize your data to get free reads. Each document read always costs 1 read, no matter how it's organized.

Appropriate way to reject creation of a collection item in firestore, when parent collection doesn't exist?

I have an onRequest cloud function which inserts an item into my firestore instance:
...
db.collection('game').doc(game_id).collection('board').doc(board_id).add(new_piece)
...
If game_id doesn't exist in my database, the doc new_piece is still inserted, presumably under orphaned game and board collections (they show up italicized and slightly faded in firestore console).
This seems to be standard behavior. However, if I want to reject creation of documents if their parent path doesn't exist, what is the best way to do so?
One obvious way is to first check the existence of game and board collections. However, that adds additional latency and more .then blocks (I don't want to think in terms of monads here!). Is this the recommended way?
Is there a way I can simply instruct firestore to not create orphaned docs and collections in a given insert path and return an error (to be handled by the caller)?
No, it's not possible. If you want to know if something exists, you will have to make queries for it.

Using Firestore document's auto-generated ID versus using a custom ID

I'm currently deciding on my Firestore data structure.
I'll need a products collection, and the products items will live inside of it as documents.
Here are my product's fields:
uniqueKey: string
description: array of strings
images: array of objects
price: number
QUESTION
Should I use Firestore auto-generated ID's to be the ID of my documents, or is it better to use my uniqueKey (which I'll query for in many occasions) as the document ID? Is there a best option between the 2?
I imagine that if I use my uniqueKey, it will make my life easier when retrieving a single document, but I'll have to query for more than 1 product on many occasions too.
Using my uniqueKey as ID:
db.collection("products").doc("myUniqueKey").get();
Using my Firestore auto-generated ID:
db.collection("products").where("uniqueKey", "==", "myUniqueKey").get();
Is this enough of a reason to go with my uniqueKey instead of the auto-generated one? Is there a rule of thumb here? What's the best practice in this case?
In terms of making queries from a client, using only the information you've given in the question, I don't see that there's much practical difference between a document get using its known ID, or a query on a field that is also unique. Either way, an index is used on the server side, and it costs exactly 1 document read. The document get() might be marginally faster, but it's not worthwhile to optimize like this (in my opinion).
When making decision about data modeling like this, it's more important to think about things like system behavior under load and security rules.
If you're reading and writing a lot of documents whose IDs have a sequential property, you could run into hotspotting on those writes. So, if you want to use your own ID, and you expect to be reading and writing them in that sequence under heavy load, you could have a problem. If you don't anticipate this to be the situation, then it likely doesn't matter too much whose ID you use.
If you are going to use security rules to limit access to documents, and you use the contents of other documents to help with that, you'll need to be able to uniquely identify those documents in your rule. You can't perform a query against a collection in rules, so you might need meaningful IDs that will give direct access when used by rules. If your own IDs can be used easily this way in security rules, that might be more convenient overall. If you're force to used Firestore's generated IDs, it might become inconvenient, difficult, or expensive to try to maintain a relationship between your IDs and Firestore's IDs.
In any event, the decision you're making is not just about which ID is "better" in a general sense, but which ID is better for your specific, anticipated situation, under load, with security in mind.

Managing Denormalized/Duplicated Data in Cloud Firestore

If you have decided to denormalize/duplicate your data in Firestore to optimize for reads, what patterns (if any) are generally used to keep track of the duplicated data so that they can be updated correctly to avoid inconsistent data?
As an example, if I have a feature like a Pinterest Board where any user on the platform can pin my post to their own board, how would you go about keeping track of the duplicated data in many locations?
What about creating a relational-like table for each unique location that the data can exist that is used to reconstruct the paths that require updating.
For example, creating a users_posts_boards collection that is firstly a collection of userIDs with a sub-collection of postIDs that finally has another sub-collection of boardIDs with a boardOwnerID. Then you use those to reconstruct the paths of the duplicated data for a post (eg. /users/[boardOwnerID]/boards/[boardID]/posts/[postID])?
Also if posts can additionally be shared to groups and lists would you continue to make users_posts_groups and users_posts_lists collections and sub-collections to track duplicated data in the same way?
Alternatively, would you instead have a posts_denormalization_tracker that is just a collection of unique postIDs that includes a sub-collection of locations that the post has been duplicated to?
{
postID: 'someID',
locations: ( <---- collection
"path/to/post/location1",
"path/to/post/location2",
...
)
}
This would mean that you would basically need to have all writes to Firestore done through Cloud Functions that can keep a track of this data for security reasons....unless Firestore security rules are sufficiently powerful to allow add operations to the /posts_denormalization_tracker/[postID]/locations sub-collection without allowing reads or updates to the sub-collection or the parent postIDs collection.
I'm basically looking for a sane way to track heavily denormalized data.
Edit: oh yeah, another great example would be the post author's profile information being embedded in every post. Imagine the hellscape trying to keep all that up-to-date as it is shared across a platform and then a user updates their profile.
I'm aswering this question because of your request from here.
When you are duplicating data, there is one thing that need to keep in mind. In the same way you are adding data, you need to maintain it. With other words, if you want to update/detele an object, you need to do it in every place that it exists.
What patterns (if any) are generally used to keep track of the duplicated data so that they can be updated correctly to avoid inconsistent data?
To keep track of all operations that we need to do in order to have consistent data, we add all operations to a batch. You can add one or more update operations on different references, as well as delete or add operations. For that please see:
How to do a bulk update in Firestore
What about creating a relational-like table for each unique location that the data can exist that is used to reconstruct the paths that require updating.
In my opinion there is no need to add an extra "relational-like table" but if you feel confortable with it, go ahead and use it.
Then you use those to reconstruct the paths of the duplicated data for a post (eg. /users/[boardOwnerID]/boards/[boardID]/posts/[postID])?
Yes, you need to pass to each document() method, the corresponding document id in order to make the update operation work. Unfortunately, there are no wildcards in Cloud Firestore paths to documents. You have to identify the documents by their ids.
Alternatively, would you instead have a posts_denormalization_tracker that is just a collection of unique postIDs that includes a sub-collection of locations that the post has been duplicated to?
I consider that isn't also necessary since it require extra read operations. Since everything in Firestore is about the number of read and writes, I think you should think again about this approach. Please see Firestore usage and limits.
unless Firestore security rules are sufficiently powerful to allow add operations to the /posts_denormalization_tracker/[postID]/locations sub-collection without allowing reads or updates to the sub-collection or the parent postIDs collection.
Firestore security rules are so powerful to do that. You can also allow to read or write or even apply security rules regarding each CRUD operation you need.
I'm basically looking for a sane way to track heavily denormalized data.
The simplest way I can think of, is to add the operation in a datastructure of type key and value. Let's assume we have a map that looks like this:
Map<Object, DocumentRefence> map = new HashMap<>();
map.put(customObject1, reference1);
map.put(customObject2, reference2);
map.put(customObject3, reference3);
//And so on
Iterate throught the map, and add all those keys and values to batch, commit the batch and that's it.

Best way to update an entire document in marklogic

I would like to replace an xml document in a database without any metadata (e.g. permissions, properties or collections). Managed documents (dls) is not an option.
Using xdmp:document-insert() does not retain permissions, collections etc.
Using xdmp:node-replace() works well with parts of the document but requires knowing the root node in advance.
Is there a recommended way to update an entire document in MarkLogic?
You don't really need to know the root element itself. If you know the document URI, you can do something like:
xdmp:node-replace(fn:doc($uri)/*, $new-xml)
If you have any node of the document, you can also do:
xdmp:node-replace($node/fn:root(), $new-xml)
But just using xdmp:document-insert() isn't that much more difficult either:
xdmp:document-insert($uri, $new-xml, xdmp:document-get-permissions($uri), xdmp:document-get-collections($uri), xdmp:document-get-quality($uri))
Note: document-properties are preserved at document-insert. See also: http://docs.marklogic.com/xdmp:document-insert
Additionally, there is not much performance difference between these methods. The biggest difference in that respect is that xdmp:node-replace() requires a node from the original document, meaning it has to be retrieved from the database first. If the replacement does not depend on the original document, then xdmp:document-insert() would be fastest.
HTH!
+1 to #grtjn. Note that why using xdmp:node-replace is no more efficient then xdmp:document-insert is that all document updates update the entire document. It is a common understandable misconception that xdmp:node-replace would operate similar to say an RDBMS field update -- only 'touching' the affected field. In the RDBMS case that is often a mistaken misconception as well.
Similar to not needing to read the old document body, if you know what the permissions, collections, and quality should be you can supply those (or defaults) rather than querying them with xdmp:document-get-permissions() etc. It may not make a measurable difference, but as with xdmp:node-replace() if you don't need to query a value its simpler not to -- and removes unneeded dependencies and error opportunities (such as what if the document doesn't exist?)

Resources