Lets say we have a order collection in firestore where each order needs to have a unique readable random order number with lets say 8 digits:
{
orderNumber: '19456734'
}
So for every incoming order we want to generate this unique number. What is the recommended approach in firestore to make sure no other document is using it?
Note: One solution would be querying existing docs before saving, but this is not working in a concurrent scenario where multiple orders arrive at the same time (?).
The easiest to guarantee that some value is unique in a collection, is to use that value as the key/ID for the documents in that collection. Since keys/IDs are by definition unique in their collection, this implicitly enforces your requirement.
The only built-in way to generate unique IDs is by calling the add() method, which generates a UUID for the new document. If you don't want to use UUIDs to identify your orders, you'll have to roll your own mechanism.
The two most common approaches:
Generate a unique number and check if it's already taken. You'd do this in a transaction of course, to ensure no two instances can claim the same ID.
Keep a global counter (typically in a document at a well-known location) of the latest ID you've handed out, and then read-increment-write that in a transaction to get the ID for any new document. This is typically what other databases do for their built-in auto-ID fields.
Related
So I found this answer on how to create a unique field in Firebase: Firebase android : make username unique
But my question is, if I have multiple unique fields (in different collections) does that mean I have to create multiple usernames collections that will hold all my unique fields.
Here is an example. Say I have two collections users and groups. In my users, I have an email field that must be unique. In my groups I have an address field that must be unique. So does that mean (according to the above answer) I need to have these collections in my root:
users
uniqueUserEmails
groups
uniqueGroupAddresses
This seems horrible? Is this a big downside to nosql vs sql? In sql it would be so easy to say UNIQUE in the field(column) creation
If you need some value (or combination of values) to be unique, you need to create a node that contains that value (or combination) as its key. If you need to guarantee that multiple values (or combinations) are unique, you'll need multiple of such nodes.
When you have a database that does support uniqueness constructs, it is pretty much doing the same behind the scenes. The only difference is that the database then does it automatically, where here you have to do it yourself.
I need to create a document inside a transaction.
The transaction object doesn't have the add function, probably because it would create multiple documents in case the Firestore retry the transaction.
The only way i can think is to use GUID as document id, but they make very long indexes to lookup.
Is there any way around? Another approach?
I wouldn't be too concerned about the length of a document ID. I don't think you'll find that performance will suffer if you use a GUID for an ID. If you're concerned about size, only you can compute about how much of your storage is consumed by IDs.
If you want to limit the size of a random document ID, you can simply generate your own random data and convert that to a string that follows the rules for Firestore document IDs. It could be something as simple as generating a X number of random letters and concatenating them.
In my firestore database, I use the same collection name in different parts of my hierarchy. For example, imagine a stackoverflow-like site with the following 2 collections
/questions/{questionId}/votes/
/questions/{questionId}/answers/{answerId}/votes/
So now I want to create an index on one of these 2 collections. I would expect firestore to require some kind of "path-with-wildcards" like I've used above to identify the data to be indexed. However, instead, they only require the collection name: in this case, "votes".
So if I put an index on "votes" does it apply to both of these collections? Is there any way to put an index on one of these collection and not the other? Is it a best practice to use unique collection names to avoid this issue?
TL;DR:
Yes. Indexes are based on the collection id. This applies to both the ones we create automatically for you on single fields, as well as the composite indexes you create manually. If they are semantically different indexes we recommend you give them unique ids, so you could use question_votes and answer_votes.
More Info
Collection id is the identifier of the collection, excluding the full path. In your case, this is votes as you've noted.
The queries we currently serve use the subset of indexes for a specific path, although we have plans in the future to allow you to do a query that spans all collections with the same collection id (the collection group). This small bit of info adds some context to why.
A second reason is there is a 200 composite index limit in the system, so if someone had a data model structured like the following, /users/{user_id}/blog_posts/{post_id}, there would be no real way for them to create composite indexes on blog_posts for more than a handful of users (not to mention the operational burden of creating new indexes for every user!)
We have an application that allows users to "follow" other users. When a user follows another, we register this data as a document within documentDB, like this:
{
"followerId": "userUUID",
"artistId": "artistUserUUID"
}
We now want to get a list of artists, ordered by the count of followers they have. So I am looking to somehow ask the DB to, based on these documents, give me back an array of artistUserUUId's, ordered by the amount of followers they have registered (as expressed in documents like the example given above).
Alternatively, we are also open to add an Array property to the document of the artistUser themselves, though even in this scenario I am still unsure how to do an ORDER BY based on the counting of a document's property (this property being an array of follower Ids).
I guess a workaround would be to add a stored procedure or trigger that will update a counter property within the artistUser document, but I'd like to validate if these is a way to implement this counting feature natively without such a trick.
Unless you denormalize the follower count into artist user documents (as you suggest), then you'll have to fetch every follower to accomplish your goal. Fetching every follower document, may or may not be prohibitive depending upon how many there are. If you fetch them only into a stored procedure rather than your actual client, it's conceptually no less efficient than an SQL GROUP_BY clause. Design your stored procedure to do the count and only returns the table of artist and counts. A robust implementation would incrementally update your output table in pages and be able to restart where it left off after a stored procedure timeout. Look at my countDocuments example stored procedure in documentdb-mock as well as my "Pattern for writing stored procedures" in the documentation for documentdb-utils for how I typically accomplish this.
I am using a riak bucket to store a list of messages, using a UUID as the key and a json message as value. This is working fine.
What I need is an efficient way to get a single message from the bucket without knowing its key, at least in one of these two scenarios:
Get the last inserted object (this is my prefered approach).
Get a random object from the bucket (if the first alternative is not possible).
Is there any efficient way to achieve that?
I think one alternative could be to retrieve the keys in the bucket and then get the first one. But this means making two calls to riak, one to obtain all the keys (just to discard all but one) and a second one to obtain the object. It does not seem very efficient.
As Riak is a key-value store, the by far most efficient way to retrieve data is through the keys. Listing or retrieving all keys in a bucket, even if you only end up using the one returned first, is one of the least efficient operations you can perform as it causes Riak to scan ALL keys in the system (not just the bucket), and it is usually recommended NEVER to use this on a production system.
The most efficient way to get the last inserted object would probably be to store the id in a separate, known record in a different bucket. This would however require you to perform two writes on every insert and two reads for every read, but would do so in the most efficient way. You could possibly implement a post-commit hook (would have to be in Erlang as it is not currently not possible to write records using JavaScript functions) on the bucket containing messages to get the system to perform the update for you, which would remove the need for the last write.
If you write a lot of data to the bucket containing messages, you may want to adjust the separate bucket so that it does not allow multiple values and that the last value wins. This way you would reduce the risk of having lots of siblings created due to frequent updates to this single record across the system. This would always give you one of the last written records, but not necessarily the last one (especially if you frequently write messages to the database), as Riak does not support any type of atomicity and is an eventually consistent database.
You could also create one or more secondary indexes if you are using the leveldb backend, and use this to limit your scan to only recent records, which would be more efficient than a scann of all keys. You could then either select the most recent key or a random one through mapreduce, but this would be much less efficient than the previously described approach.
I can not think of any efficient way to retrieve a random record in a bucket from Riak unless you know the range of keys you have inserted and can decide randomly on the client which one to get. One way to do this would be to generate all keys in sequence rather than using a UUID, but that is naturally not a good idea in a highly concurrent distributed system.
1st task is pretty easy to implement:
Add post-commit hook that will write the last inserted key to some predefined key/bucket place
Get the key from that predefined key/bucket and issue a get query using them
It's still two operations but both are just gets that are fast. Plus additional overhead on hook but nothing too heavy either.
2nd scenario is also easy, but it is way too inefficient to be used practically:
Get all keys (extremely expensive operation)
Pick random
Issue get
I have come up with the same scenario. In My scenario I have to save the users. For that I required an auto increment Id. So what I did is, I placed the last inserted key in a separate bucket as like mentioned by "Christian Dahlqvist", every time I want to insert new record I fetch the last inserted key from that key bucket. Here we have only one value in that bucket with the key as "LastKey" which is always known to us. And I incremented the key based on the fetched key and again updated the key bucket. So always the key bucket contains the latest key in it.