Firebase unique field value - firebase

So I found this answer on how to create a unique field in Firebase: Firebase android : make username unique
But my question is, if I have multiple unique fields (in different collections) does that mean I have to create multiple usernames collections that will hold all my unique fields.
Here is an example. Say I have two collections users and groups. In my users, I have an email field that must be unique. In my groups I have an address field that must be unique. So does that mean (according to the above answer) I need to have these collections in my root:
users
uniqueUserEmails
groups
uniqueGroupAddresses
This seems horrible? Is this a big downside to nosql vs sql? In sql it would be so easy to say UNIQUE in the field(column) creation

If you need some value (or combination of values) to be unique, you need to create a node that contains that value (or combination) as its key. If you need to guarantee that multiple values (or combinations) are unique, you'll need multiple of such nodes.
When you have a database that does support uniqueness constructs, it is pretty much doing the same behind the scenes. The only difference is that the database then does it automatically, where here you have to do it yourself.

Related

DynamoDB: Duplicate a bunch of data to save a second lookup?

There are two typical uses for my database.
A user will access their record/item via their userID, and they will manage their info and a number of devices, each with a unique devID.
A device connects and using its devID, will find the owning userID, then takes action based on attributes in the user item.
Two options I could use, each with a single DynamoDB table.
A. The table has items that are users and devices, with a partition key of ID and sort key of itemType. User items have associated attributes like addresses, account and profile info, etc. Devices have associated attributes like their preferences, their type, their capabilities.
You can access both users and devices really quickly. If you are doing (1) you lookup and find a user, then you will have to use a set attribute that lists the one or more deviceIDs it owns, and then make individual lookups for each device. That's 2 lookups for a user that owns one device and more for multiple devices.
Or if you are doing (2), search and find a device, you grab its userID attribute and then lookup the userID item. That's 2 lookups.
B. I could reduce the multiple lookups this way:
Still one table, but all entries in the table are more homogenous: Every item includes all user related attributes, and includes one device's attributes. The provisioning key is the userID and sort key is deviceID, another indexed attribute is just the deviceID. If you are doing (1) then you lookup the userID and you get one or more records depending on whether they own one device or more. If you are doing (2), then we quickly find the device, and that same item includes all the user info we need and we don't need to do another lookup.
The problem with B is that I am duplicating a lot of data about the user in each of the items. Keeping them all synced is going to be problematic too, but that's a lot rarer.
So, am I overthinking the lookup costs, and should just go with the multiple lookup as in A, or is the multiple lookups going to be expensive enough that I need to have a better data design?

How to achieve sorting by any attribute of an item in DynamoDB

I have a DynamoDB structure as following.
I have patients with patient information stored in its documents.
I have claims with claim information stored in its documents.
I have payments with payment information stored in its documents.
Every claim belongs to a patient. A patient can have one or more claims.
Every payment belongs to a patient. A patient can have one or more payments.
I created only one DynamoDB table since all of aws dynamodb documentations indicates using only one table if possible is the best solution. So I end up with following :
In this table ID is the partition key and EntryType is the sortkey. Every claim and payment holds its owner.
My access patterns are as following :
Listing all patients in the DB with pagination with patients sorted on creation dates.
Listing all claims in the DB with pagination with claims sorted on creation dates.
Listing all payments in the DB with pagination with payments sorted on creation dates.
Listing claims of a particular patient.
Listing payments of a particular patient.
I can achieve these with two global secondary indexes. I can list patients, claims and payments sorted by their creation date by using a GSI with EntryType as a partition key and CreationDate as a sort key. Also I can list a patient's claims and payments by using another GSI with EntryType partition key and OwnerID sort key.
My problem is this approach brings me only sorting with creation date. My patients and claims have much more attributes (around 25 each) and I need to sort them according to each of their attribute as well. But there is a limit on Amazon DynamoDB that every table can have at most 20 GSI. So I tried creating GSI's on the fly (dynamically upon the request) but that also ended very inefficiently since it copies the items to another partition to create a GSI (as far as I know). So what is the best solution to sort patients by their patient name, claims by their claim description and any other fields they have?
Sorting in DynamoDB happens only on the sort key. In your data model, your sort key is EntryType, which doesn't support any of the access patterns you've outlined.
You could create a secondary index on the fields you want to sort by (e.g. creationDate). However, that pattern can be limiting if you want to support sorting by many attributes.
I'm afraid there is no simple solution to your problem. While this is super simple in SQL, DynamoDB sorting just doens't work that way. Instead, I'll suggest a few ideas that may help get you unstuck:
Client Side Sorting - Use DDB to efficiently query the data your application needs, and let the client worry about sorting the data. For example, if your client is a web application, you could use javascript to dynamically sort the fields on the fly, depending on which field the user wants to sort by.
Consider using KSUIDs for your IDs - I noticed most of your access patterns involves sorting by CreationDate. The KSUID, or K-Sortable Globally Unique Id's, is a globally unique ID that is sortable by generation time. It's a great option when your application needs to create unique IDs and sort by a creation timestamp. If you build a KSUID into your sort keys, your query results could automatically support sorting by creation date.
Reorganize Your Data - If you have the flexibility to redesign how you store your data, you could accommodate several of your access patterns with fewer secondary indexes (example below).
Finally, I notice that your table example is very "flat" and doesn't appear to be modeling the relationships in a way that supports any of your access patterns (without adding indexes). Perhaps it's just an example data set to highlight your question about sorting, but I wanted to address a different way to model your data in the event you are unfamiliar with these patterns.
For example, consider your access patterns that require you to fetch a patient's claims and payments, sorted by creation date. Here's one way that could be modeled:
This design handles four access patterns:
get patient claims, sorted by date created.
get patient payments, sorted by date created.
get patient info (names, etc...)
get patient claims, payments and info (in a single query).
The queries would look like this (in pseudocode):
query where PK = "PATIENT#UUID1" and SK < "PATIENT#UUID1"
query where PK = "PATIENT#UUID1" and SK > "PATIENT#UUID1"
query where PK = "PATIENT#UUID1" and SK = "PATIENT#UUID1"
query where PK = "PATIENT#UUID1"
These queries take advantage of the sort keys being lexicographically sorted. When you ask DDB to fetch the PATIENT#UUID1 partition with a sort key less than "PATIENT#UUID1", it will return only the CLAIM items. This is because CLAIMS comes before PATIENT when sorted alphabetically. The same pattern is how I access the PAYMENT items for the given patient. I've used KSUIDs in this scenario, which gives you the added feature of having the CLAIMS and PAYMENT items sorted by creation date!
While this pattern may not solve all of your sorting problems, I hope it gives you some ideas of how you can model your data to support a variety of access patterns with sorting functionality as a side effect.

How to generate and guarantee unique values in firestore collection?

Lets say we have a order collection in firestore where each order needs to have a unique readable random order number with lets say 8 digits:
{
orderNumber: '19456734'
}
So for every incoming order we want to generate this unique number. What is the recommended approach in firestore to make sure no other document is using it?
Note: One solution would be querying existing docs before saving, but this is not working in a concurrent scenario where multiple orders arrive at the same time (?).
The easiest to guarantee that some value is unique in a collection, is to use that value as the key/ID for the documents in that collection. Since keys/IDs are by definition unique in their collection, this implicitly enforces your requirement.
The only built-in way to generate unique IDs is by calling the add() method, which generates a UUID for the new document. If you don't want to use UUIDs to identify your orders, you'll have to roll your own mechanism.
The two most common approaches:
Generate a unique number and check if it's already taken. You'd do this in a transaction of course, to ensure no two instances can claim the same ID.
Keep a global counter (typically in a document at a well-known location) of the latest ID you've handed out, and then read-increment-write that in a transaction to get the ID for any new document. This is typically what other databases do for their built-in auto-ID fields.

How to choose a good value set for CosmosDB id field?

According to docs, the property id is special in Azure CosmosDB documents as it must always be set and have unique value per partition. Also it has additional restrictions on its content :
The following characters are restricted and cannot be used in the Id
property: '/', '\', '?', '#'
Obviously, this field is one of document "keys" (in addition to _rid) and used somehow in internal plumbing. Other than the restrictions above, it is unclear how exactly is this key used internally and more importantly for practitioners,which values constitute technically better ids than others?
Wild guess 1: For example, from some DB worlds, one would prefer short primary key values, since the PK would be included in index entries and shorter keys would allow more compact index for storage and lookup. Would id field length matter at all besides the one-time storage cost?
Wild guess 2: in some systems better throughput is achieved if common prefixes are avoided in names (i.e. azure storage container/blob names) and even suggest to add a small random hash as prefix. Does cosmosDB care about id prefix similarities?
Anything else one should consider?
EDIT: Clarification, I'm interested in what's good for the cosmosDB server storage/execution side, provided my data model is still in design and/or has multiple keys available the data designer can choose from.
First and foremost let's clear something out. The id property is NOT unique. Your collection can have multiple documents that have the exact same id. The id is ONLY unique within it's own logical partition.
That said, based on all the compiled info that we know from documentation and talks it doesn't really matter what value you choose to go with. It is a string and Cosmos DB will treat it as such but it is also considered as a "Primary key" internally so restrictions apply, such as ordering by it.
Where it does matter is in your consuming application's business logic. The id plays a double role of being both a CosmosDB property but also your property. You get to set it. This is the value you are going to use to make direct reads to the database. If you use any other value, then it's no longer a read. It's a query. That makes it more expensive and slower.
A good value to set is the id of the entity that is hosted in this collection. That way you can use the entity's id to read quickly and efficiently.

Indexes on firestore collections with the same name at different "paths"

In my firestore database, I use the same collection name in different parts of my hierarchy. For example, imagine a stackoverflow-like site with the following 2 collections
/questions/{questionId}/votes/
/questions/{questionId}/answers/{answerId}/votes/
So now I want to create an index on one of these 2 collections. I would expect firestore to require some kind of "path-with-wildcards" like I've used above to identify the data to be indexed. However, instead, they only require the collection name: in this case, "votes".
So if I put an index on "votes" does it apply to both of these collections? Is there any way to put an index on one of these collection and not the other? Is it a best practice to use unique collection names to avoid this issue?
TL;DR:
Yes. Indexes are based on the collection id. This applies to both the ones we create automatically for you on single fields, as well as the composite indexes you create manually. If they are semantically different indexes we recommend you give them unique ids, so you could use question_votes and answer_votes.
More Info
Collection id is the identifier of the collection, excluding the full path. In your case, this is votes as you've noted.
The queries we currently serve use the subset of indexes for a specific path, although we have plans in the future to allow you to do a query that spans all collections with the same collection id (the collection group). This small bit of info adds some context to why.
A second reason is there is a 200 composite index limit in the system, so if someone had a data model structured like the following, /users/{user_id}/blog_posts/{post_id}, there would be no real way for them to create composite indexes on blog_posts for more than a handful of users (not to mention the operational burden of creating new indexes for every user!)

Resources