firebase realtime schema design - firebase

i have two set of entities in my firebase realtime schema. Called Orders and customers.
so far i was not actually relating them in my app but was just showing them related. the current schema looked like:
{
"orders" : [
{"id" : 1, "name": "abc", "price": 200, "customer": "vik"}
],
"customers" : [
{"cust_id" : "10", "name" : "vik", "type": "existing"}
]
}
so i have a orders list page showing all the orders in a table which i get firing /orders.json
But practically, instead of having the customer name directly in the orders i should have cust_id attribute as that is the key.
That naturally makes it a standard relational schema where i will be free to change customer attributes without worrying about mismatch in orders.
However, the downside i see right away is that if i have say 20 orders to show in the order list table then instead of 1 i will end up firing 21 rest calls (1 to get order list and 20 to fetch customer name for each of the order)
What are the recommendations or standards around this ?

Firebase is a NoSQL database. So the rules of normalization that you know from relational databases don't necessarily apply.
For example: having the customer name in each order is actually quite normal. It saves having to do a client-side join for each customer record, significantly simplifying the code and improving the speed of the operation. But of course it comes at the cost of having to store data multiple times (quite normal in NoSQL databases), and having to consider if/how you update the duplicated data in case of updates of the customer record.
I recommend reading NoSQL data modeling, watching Firebase for SQL developers, and reading my answer on keeping denormalized data up to date.

Related

How to filter list value in DynamoDB

I am currently new to AWS DynamoDB and noSql.
I am lost here on filtering list value in DynamoDB.
Let's say that I have 2 items in this table.
[
{
"id": 1,
"title": "Robots in Music"
"topics": ["Robots", "Violin"]
},
{
"id": 2,
"title": "Where are good places to see stars"
"topics": ["Robots", "Stars"]
},
]
I want to filter using topics column.
ex.
User wants to get the item with topics having "Robots".
User gets item with id 1 and 2
User wants to get item with topics having "Stars".
Then user gets item with id 2.
I tried to search internet and found that I can use 'QueryFilter' 'contains'.
However, I know that 'contains' is scanning all the table and for DynamoDB they can extract 1MB of data in single query. Which means the action needs to be repeated and it would cost way more than using single index.
Is there any way to use GSI and filter the list effectively?
Unfortunately you cannot index a list type or any other type of nested attrubute, and your use case would require you to Scan the entire table to know which users contained a particular topic.
Would require a Scan
A GetItem if the user wants just id1 or of the user wants both id1 and id2 then a BatchGetItem
Same as 1
GetItem
Of your use-case requires searching nested attributes then you can consider using a relational database or something more flexible like OpenSearch.

Organizing a Cloud Firestore database

I can't manage to determine what is the better way of organizing my database for my app :
My users can create items identified by a unique ID.
The queries I need :
- Query 1: Get all the items created by a user
- Query 2 : From the UID of an item, get its creator
My database is organized as following :
Users database
user1 : {
item1_uid,
item2_uid
},
user2 : {
item3_uid
}
Items database
item1_uid : {
title,
description
},
item2_uid : {
title,
description
},
item3_uid : {
title,
description
}
For the query 2, its quite simple but for the query 2, I need to parse all the users database and list all the items Id to see if there is the one I am looking for. It works right now but I'm afraid that it will slow the request time as the database grows.
Should I add in the items data a row with the user id ? If yes the query will be simpler but I heard that I am not supposed to have twice the same data in the database because it can lead to conflicts when adding or removing items.
Should I add in the items data a row with the user id ?
Yes, this is a very common approach in the NoSQL world and is called denormalization. Denormalization is described, in this "famous" post about NoSQL data modeling, as "copying of the same data into multiple documents in order to simplify/optimize query processing or to fit the user’s data into a particular data model". In other words, the main driver of your data model design is the queries you plan to execute.
More concretely you could have an extra field in your item documents, which contain the ID of the creator. You could even have another one with, e.g., the name of the creator: This way, in one query, you can display the items and their creators.
Now, for maintaining these different documents in sync (for example, if you change the name of one user, you want it to be updated in the corresponding items), you can either use a Batched Write to modify several documents in one atomic operation, or rely on one or more Cloud Functions that would detect the changes of the user documents and reflect them in the item documents.

Using multiple consumers with CosmosDB change feed

I am trying to use cosmos db change feed (I'm referring to https://learn.microsoft.com/en-us/azure/cosmos-db/change-feed-processor and https://github.com/Azure/azure-cosmos-dotnet-v2/tree/master/samples/code-samples/ChangeFeedProcessorV2).
When I start a multiple instances of a consumer, the observer seems to see only 1 partition key range. I only see a message - Observer opened for partition Key Range 0 and it starts receiving the change feed. So, the feed is received by only 1 consumer at any given point. If I close one consumer, the next one picks up happily.
I can't seem to understand the partition keys / ranges in cosmos db. In cosmos db, I've created a database and a collection within it. I've defined a partition key - /myId. I store a unique guid in myId. I've saved about 10000 transactions in the collection.
When I look at partition key ranges using api (/dbs/db-name/colls/coll-name/pkranges), I see only node under PartitionKeyRanges. Below is the output I see
{
"_rid": "LEAgAL7tmKM=",
"PartitionKeyRanges": [
{
"_rid": "LEAgAL7tmKMCAAAAAAAAUA==",
"id": "0",
"_etag": "\"00007d00-0000-0000-0000-5c3645e70000\"",
"minInclusive": "",
"maxExclusive": "FF",
"ridPrefix": 0,
"_self": "dbs/LAEgAA==/colls/LEAgAL7tmKM=/pkranges/LEAgAL7tmKMCAAAAAAAAUA==/",
"throughputFraction": 1,
"status": "online",
"parents": [],
"_ts": 1547060711
}
],
"_count": 1
}
Shouldn't this show more partition key ranges? Is this behavior expected?
How do I get multiple consumers to receive data as shown under https://learn.microsoft.com/en-us/azure/cosmos-db/change-feed-processor?
TL;DR - you should be able to ignore partition key ranges and the number of them you have and just let Change Feed Processor manage that for you.
The partition key ranges is an implementation detail we currently leak. The short answer is we add new partition key ranges when we want to restructure how your data is stored in the backend. This can happen for lots of reasons, like you add more data, you consume a lot of RUs for a subsection of that data, or we just want to shuffle things around. Theoretically, if you kept adding data, we'd eventually split the range in two.
We're working on some updates for the v3 SDKs that are currently in preview to abstract this a bit further, since even the answer I have given above is pretty hand wavey and we should have a more easily understood contract for public APIs.

Cloud Firestore and data modeling: From RDBMS to No-SQL

I am building an iOS app that is using Cloud Firestore (not Firebase realtime database) as a backend/database.
Google is trying to push new projects towards Cloud Firestore, and to be honest, developers with new projects should opt-in for Firestore (better querying, easier to scale, etc..).
My issue is the same that any relational database developer has when switching to a no-SQL database: data modeling
I have a very simple scenario, that I will first explain how I would configure it using MySQL:
I want to show a list of posts in a table view, and when the user clicks on one post to expand and show more details for that post (let say the user who wrote it). Sounds easy.
In a relational database world, I would create 2 tables: one named "posts" and one named "users". Inside the "posts" table I would have a foreign key indicating the user. Problem solved.
Poor Barry, never had the time to write a post :(
Using this approach, I can easily achieve what I described, and also, if a user updates his/her details, you will only have to change it in one place and you are done.
Lets now switch to Firestore. I like to think of RDBMS's table names as Firestore's collections and the content/structure of the table as the documents.
In my mind i have 2 possible solutions:
Solution 1:
Follow the same logic as the RDBMS: inside the posts collection, each document should have a key named "userId" and the value should be the documentId of that user. Then by fetching the posts you will know the user. Querying the database a second time will fetch all user related details.
Solution 2:
Data duplication: Each post should have a map (nested object) with a key named "user" and containing any user values you want. By doing this the user data will be attached to every post it writes.
Coming from the normalization realm of RDBMS this sounds scary, but a lot of no-SQL documents encourage duplication(?).
Is this a valid approach?
What happens when a user needs to update his/her email address? How easily you make sure that the email is updated in all places?
The only benefit I see in the second solution is that you can fetch both post and user data in one call.
Is there any other solution for this simple yet very common scenario?
ps: go easy on me, first time no-sql dev.
Thanks in advance.
Use solution 1. Guidance on nesting vs not nesting will depend on the N-to-M relationship of those entities (for example, is it 1 to many, many to many?).
If you believe you will never access an entity without accessing its 'parent', nesting may be appropriate. In firestore (or document-based noSQL databases), you should make the decision whether to nest that entity directly in the document vs in a subcollection based on the expect size of that nested entity. For example, messages in a chat should be a subcollection, as they may in total exceed the maximum document size.
Mongo, a leading noSQL db, provides some guides here
Firestore also provided docs
Hope this helps
#christostsang I would suggest a combination of option 1 and option 2. I like to duplicate data for the view layer and reference the user_id as you suggested.
For example, you will usually show a post and the created_by or author_name with the post. Rather than having to pay additional money and cycles for the user query, you could store both the user_id and the user_name in the document.
A model you could use would be an object/map in firestore here is an example model for you to consider
posts = {
id: xxx,
title: xxx,
body: xxx,
likes: 4,
user: {refId: xxx123, name: "John Doe"}
}
users = {
id: xxx,
name: xxx,
email: xxx,
}
Now when you retrieve the posts document(s) you also have the user/author name included. This would make it easy on a postList page where you might show posts from many different users/authors without needed to query each user to retrieve their name. Now when a user clicks on a post, and you want to show additional user/author information like their email you can perform the query for that one user on the postView page. FYI - you will need to consider changes that user(s) make to their name and if you will update all posts to reflect the name change.

Normalized many-to-many schema for client-side data store

I am developing the browser front end of a social network application. It has lots of relational data, having one-to-many (1:m) and mostly many-to-many (m:m) relationships as in below list.
I want to use Flux data flow architecture in the application. I am using Vuex.js with Vue.js.
As expressed in the Redux.js docs it is better to have flat, normalized, store state shape for various reasons for usage with React, and I think that is the case for usage with Vue.js also.
posts have categories (m:m)
posts have tags (m:m)
post has comments (1:m)
posts have hashtags in them (m:m) // or users creates hashtags
posts have mentions in them (m:m) // or users creates mentions of users
users like posts (m:m)
users follow users, posts, post categories etc. (m:m)
users favorite posts (m:m)
etc.
I will need to show post feeds with all of its related data of other entities like users, comments, categories, tags. For this, like having a 1:many relation, holding the many side of this relation's data in the one side (can be said to be the parent), even it is actually many-to-many, seems ok for usual querying of them to compose their parent, that is posts. However, I will need to query the store state inversely also, for example, getting the posts with a certain category or tag.
In that case, it is not as easy is as doing so for posts. I need a relation entity that holds the id pairs for the two connected data entity, just like a join table or association table in RDBMSs, for ease of accessing and updating, avoiding deep digging into state, and also avoiding unnecessary re-renders (that requirement is React or Vue.js and GUI specific).
How can I achieve this relatively easily and effectively, e.g. as one do for 1:many relations?
Pursuant to your last comment. I'll present the data structure I currently use for this circumstance:
Tag
{
"type": "tag",
"id": "tag1",
"name": "Tag One"
}
Tag To Post
{
"id": "someId",
"type": "postTag",
"tagId": "tag1",
"postId": "post1"
}
Post
{
"id": "post1",
"name": "Post 1"
}
I found that each side of M:M storing the relationship ids potentially produces orphans. The management of these IDs in dual places leads to replicating steps and an increase in cognitive management as all functions managing the M:M happen in two places rather than one. Additionally, the relationship itself may need to contain data, where would this data go?
M:M Without Entity
{
"id": "post1",
"name": "Post 1"
"tagIds": [
{id: "tag1", extraAttribute: false} //this is awkward
]
}
Tag To Post - Additional Attributes
{
"id": "someId",
"extraAttribute": false,
"postId": "post1"
"type": "postTag",
"tagId": "tag1",
}
There are additional options available to speed up extracting tags with minor elbow grease.
Post
{
"id": "post1",
"name": "Post 1"
"tagIds" : ["tag1", "tag4"]
}
Hypothetically, a post would not have more than 20 tags. Making this a generally negligible storage requirement to reduce lookups. I have found no urgent need for this currently with a database of 10000 relationships.
Ease of access and updating
1:M is an object directly pointing at what it wants. M:M are two different entities pointing at their relationships. Model that relationship, and centralise the logic
Rerenders
If your application renders long lists of data (hundreds or thousands
of rows), we recommended using a technique known as “windowing”. This
technique only renders a small subset of your rows at any given time,
and can dramatically reduce the time it takes to re-render the
components as well as the number of DOM nodes created.
https://reactjs.org/docs/optimizing-performance.html#virtualize-long-lists
I feel solutions may be use case specific and subject to broader opinions. I ran into this issue utilising couch/pouch with Vuex and a very large table of 20,000 entries. Again, in production the issues were not extremely noticeable. Results will always vary.
A couple things I try here:
Load partial data sets: in-file (non-reactive) vs in memory (loaded in Vuex)
Sort, paginate, search in-file and load results

Resources