Let's assume a bank and address example:
class Address:
id: string
street: string
city: string
class BankBranch:
id: string
name: string
employee_count: number
address: object
Address is its own table because other services should be able to use instances of this Address table. For example each BankBranch has an address. Now when I store a BankBranch instance in DBB I'd imagine it will look like this:
{
name: "BranchXYZ",
id: "123456",
employee_count: 5,
address: {
street: "123 main st",
city: "maincity",
id: "456789"
}
}
and the address instance looks like this:
{
street: "123 main st",
city: "maincity",
id: "456789",
}
What's the best procedure to keep the data consistent? If I update the address instance, do I issue another call to the BankBranch and update it there as well? So one update operation is actually 2 database operations?
Or instead of saving the entire address object I just save the id (like a foreign key) and then when I go I get request to the BankBranch instance I issue another request to the address issue? That sounds a bit expensive.
You’re asking if you should denormalize or not. That depends on many factors.
First, are you doing this at a scale where it actually matters? You say it’s expensive. At highest price EC lookups are about 12.5c per million lookups. How many secondary lookups would you really be doing?
Meanwhile you haven’t explained in depth why the address has to be in another table. Seems like it’s an intrinsic part of the branch. You’re really going to have other services referring to the address by some random ID number unrelated to the branch? Can’t they refer to it by branch ID and pull out the address?
Related
I'm just getting started with TypeDB and have a basic question about the schema concepts:
How would I model that a Person has an Address? I.e. a composite attribute like an address that is composed of 3 values:
city,
street,
ZIP code?
My understanding is that an attribute can have exactly ONE value definition, but it can own any number of other attributes.
My attempts:
street sub attribute, value string;
city sub attribute, value string;
zip sub attribute, value long;
(1) Attribute without value?
address sub attribute,
// value ... (does not make sense here) ???
owns street,
owns city,
owns zip;
person sub entity,
owns address;
(2) Modeled as relation?
address sub relation,
relates street,
relates city,
relates zip,
relates subject; // ???
person sub entity,
plays address:subject;
(3) As entity?
address sub entity,
owns street,
owns city,
owns zip;
person sub entity,
owns address; // ??? is owning another entity possible?
Which one (if any) would be the recommended way to go?
An address works best as an entity because, as #2bigpigs said, an address is an entity as it has a distinct existence in the domain. Other examples of an entity include an organisation, a location or a person. You can see that an address fits right in among those.
I started a short time ago to experiment with gremlin and cosmos DB, with the idea of doing a prototype for a project at work.
How do I know if a "Request charge value" is good or bad?
For example, I have a query to get a list of flats located in a specific state which looks like this:
g.V().hasLabel('Flat').as('Flats').outE('has_property')
.inV().has('propertyType',eq('reference')).has('propertyName','City').outE('property_refers_to')
.inV().hasLabel('City').outE('has_property')
.inV().has('propertyType',eq('reference')).has('propertyName','State').outE('property_refers_to')
.inV().hasLabel('State').outE('has_property')
.inV().has('propertyType',eq('scalar')).has('name',eq('Some_State_Name'))
.values('name')
.select('Flats')
Each object-property is not stored directly in the node representing an "object" but has its own node. Some properties are "scalar" (like the name of a state or a city) and some are a reference to another node (like the city where a flat is located).
With this query, azure shows me a "Request charge" value of 73.179
Is it good?
What are the things I should try to improve?
Summary
How could I model my database in Firebase to keep, for example, reviews in a specific page updated with the users info, this is, if a user changes it's avatar or name, the reviews should also display the updated data of the user.
I've used MongoDB most of the time, with Mongoose, and I am now working on a mobile app with Firebase. In Mongo I would just store a ref to the user in the review, and populate the field to retrieve the data I wanted from the document. Is there something like this in Firebase, and is it even a good or acceptable practice?
Quick Questions
Is there something like ".populate()" in Firebase?
Should I model the documents as much as possible to have the data that will be used in the view, and avoid "joins"?
Example
We have a users collection, and a store collection with reviews in it.
As far as I've read, you should minimize the doc reads, and so we should model our data with the specific values we need for the view were they will be used, so that we only need to do one query.
For the sake of simplification, let's say:
User has a name, email, avatar
users: {
user_id_1: {
email: "user1#gmail.com",
name: "John Doe",
avatar: "some_firestore_url"
}
}
Should the store collection:
Have nested collection of reviews like this
stores: {
store_id_1: {
name: "Dat Cool Store!",
reviews: {
user_id_1: {
name: "John Doe",
avatar: "some_firestore_url",
text: "Great store love it!",
timestamp: "May 07, 2020 at 03:30"
}
}
}
}
The problem I see with this, is that unless we use a function that updates every field in every document with the new values there is no other way to update the data in name and avatar.
Have the user_id in a field and query for the user information after:
stores: {
store_id_1: {
name: "Dat Cool Store!",
reviews: {
review_id_1: {
user: "user_id_1",
text: "Great store love it!",
timestamp: "May 07, 2020 at 03:30"
}
}
}
}
This is the mimicking the way I would do in MongoDB.
Sorry if some of it sounds confusing or I didn't explain myself the best way, but it's 4 o'clock in the morning here and I'm just trying to get it right :)
How could I model my database in Firebase to keep, for example, reviews in a specific page updated with the user's info, this is, if a user changes its avatar or name, the reviews should also display the updated data of the user.
Without knowing the queries you intend to perform, it's hard to provide a viable schema. We are usually structuring a Firestore database according to the queries that we want to perform.
In Mongo I would just store a ref to the user in the review, and populate the field to retrieve the data I wanted from the document. Is there something like this in Firebase, and is it even a good or acceptable practice?
Yes, there is. According to the official documentation regarding Firestore supported data-types, a DocumentReference is one of them, meaning that you can store only a path to a document and not the entire document. In the NoSQL world, it's quite common to duplicate data, so to have the same data in more than one place. Again, without knowing the use-case of your app it's hard to say whether using normalization it's better than holding only a reference. For a better understanding, I recommend you read my answer from the following post:
What is denormalization in Firebase Cloud Firestore?
And to answer your questions:
Is there something like ".populate()" in Firebase?
If you only store a DocumentReference, it doesn't mean that the data of the document that the reference is pointing to will be auto-populated. No, you first need to get the reference from the document, and right after that, based on that reference, you have to perform another database call, to actually get the data from the referenced document.
Should I model the documents as much as possible to have the data that will be used in the view, and avoid "joins"?
Yes, you should only store the data that you actually need to be displayed in your views. Regarding a JOIN clause, there isn't something like this supported in Firestore. A query can only get documents in a single collection at a time. If you want to get, for example, data from two collections, you'll have at least two queries to perform.
Another solution would be to add a third collection with data already merged from both collections so you can perform a single query. This is already explained in the link above.
Some other information that might be useful is explained in my answer from the following post:
Efficiency of searching using whereArrayContains
Where you can find the best practice to save data into a document, collection, or subcollection.
For me, the way I would go ahead with structuring my json collection also depends on the size of data, I am trying to store in the collection.
Let's say the number of users if small and I only want to support a thousand users. So in that case, I can go with this structure.
{
"store_id_1": {
"name": "Dat Cool Store!",
"reviews": [
{
"user_id_1": {
"name": "John Doe",
"avatar": "some_firestore_url"
},
"text": "Great store love it!",
"timestamp": "May 07, 2020 at 03:30"
},
{
"user_id_2": {
"name": "John Doe 2",
"avatar": "some_firestore_url 2"
},
"text": "Great store love it! TWO",
"timestamp": "May 27, 2020 at 03:30"
}
]
}
}
So now, you can have all the user info embedded in the stores collection. This will reduce your reads too.
But in case you want to scale it, then, I would suggest only store the users metadata and then make another read from users collection.
Hope this helps!
I'm working on a CQRS/ES architecture. We run multiple asynchronous projections into the read stores in parallel because some projections might be much slower than others and we want to stay more in sync with the write side for the faster projections.
I'm trying to understand the approaches on how I can generate the read models and how much data-duplication this might entail.
Let's take an order with items as a simplified example. An order can have multiple items, each item has a name. Items and orders are separate aggregates.
I could either try to save the read models in a more normalized fashion, where I create an entity or document for each item and order and then reference them - or I maybe would like to save it in a more denormalized manner where I have an order which contains items.
Normalized
{
Id: Order1,
Items: [Item1, Item2]
}
{
Id: Item1,
Name: "Foosaver 9000"
}
{
Id: Item2,
Name: "Foosaver 7500"
}
Using a more normalized format would allow a single projection to process events that affect/effect item and orders and update the corresponding objects. It would also mean that any changes to the item name affect all orders. A customer might get a delivery note for different items than the corresponding invoice for example (so obviously that model might not be good enough and lead us to the same issues as denormalizing...)
Denormalized
{
Id: Order1,
Items: [
{Id: Item1, Name: "Foosaver 9000"},
{Id: Item2, Name: "Foosaver 7500"},
]
}
Denormalizing however would require some source where I can look up the current related data - such as the item. This means that I either have to transport all the information I might need in the event, or I'll have to keep track of the data that I source for my denormalization. This would also mean that I might have to do this once for each projection - i.e. I might need a denormalized ItemForOrder as well as a denormalized ItemForSomethingElse - both only containing the bare minimum properties that each of the denormalized entities or documents need (whenever they are created or modified).
If I would share the same Item in the read store, I could end up mixing item definitions from different points of time, because the projections for items and orders might not run at the same pace. In the worst case, the projection for items might not have yet created the item I need to source for its properties.
Generally, what approaches do I have when processing relationships from an event stream?
update 2016-06-17
Currently, I'm solving this by running a single projection per denormalised read model and its related data. If I have multiple read models that have to share the same related data, then I might put them into the same projection to avoid duplicating the same related data I need for the lookup.
These related models might even be somewhat normalised, optimised for however I have to access them. My projection is the only thing that reads and writes to them, so I know exactly how they are read.
// related data
public class Item
{
public Guid Id {get; set;}
public string Name {get; set;}
/* and whatever else is needed but not provided by events */
}
// denormalised info for document
public class ItemInfo
{
public Guid Id {get; set;}
public string Name {get; set;}
}
// denormalised data as document
public class ItemStockLevel
{
public ItemInfo Item {get; set;} // when this is a document
public decimal Quantity {get; set;}
}
// or for RDBMS
public class ItemStockLevel
{
public Guid ItemId {get; set;}
public string ItemName {get; set;}
public decimal Quantity {get; set;}
}
However, the more hidden issue here is that of when to update which related data. This is heavily dependent on the business process.
For example, I wouldn't want to change the item descriptions of an order after it has been placed. I must only update the data that changed according to the business process when the projection processes an event.
Therefore, the argument could be made towards putting this information into the event (and using the data as the client sent it?). If we find that we need additional data later, then we might have to fall back to projecting the related data from the event stream and read it from there...
This could be seen as a similar issue for pure CQRS architectures: when do you update the denormalised data in your documents? When do you refresh the data before presenting it to the user? Again, the business process might drive this decision.
First, I think you want to be careful in your aggregates about life cycles. In the usual shopping cart domain, the cart (Order) lifecycle spans that of the items. Udi Dahan wrote Don't Create Aggregate Roots, which I've found to mean that aggregates hold a reference to the aggregate that "created" them, rather than the other way around.
Therefore, I would expect the event history to look like
// Assuming Orders come from Customers
OrderCreated(orderId: Order1, customerId: Customer1)
ItemAdded(itemId: Item1, orderId: Order1, Name:"Foosaver 9000")
ItemAdded(itemId: Item2, orderId: Order1, Name:"Foosaver 7500")
Now, it's still the case that there are no guarantees here about ordering - that's going to depend on how the aggregates are designed in the write model, whether your event store linearizes events across different histories, and so on.
Notice that in your normalized views, you could go from the order to the items, but not the other way around. Processing the events I've described gives you that same limitation: instead of Orders with mysterious items, you have items with mysterious orders. Anybody who looks for an order either doesn't see it yet, sees it empty, or sees it with some number of items; and can follow links from those items to the key store.
Your normalized forms in your key value store don't need to change from your example; the projection responsible for writing the normalized form of orders needs to be smart enough to watch the item streams too, but its all good.
(Also note: we're eliding ItemRemoved here)
That's ok, but it misses on the idea that reads happen more often than writes. For hot queries, you are going to want the denormalized form available: the data in the store is the DTO that you are going to send in response to the query. For example, if the query were supporting a report on the order (no edits allowed), then you wouldn't need to send the item ids either.
{
Title: "Your order #{Order1}",
Items: [
{Name: "Foosaver 9000"},
{Name: "Foosaver 7500"}
]
}
One thing that you might consider is tracking the versions of the aggregates in question, so that when the user navigates from one view to the next -- rather than getting a stale projection, the query pauses waiting for the new projection to catch up.
For instance, if your DTO were hypermedia, then it might looks something like
{
Title: "Your order #{Order1}",
refreshUrl: /orders/Order1?atLeastVersion=20,
Items: [
{Name: "Foosaver 9000", detailsUrl: /items/Item1?atLeastVersion=7},
{Name: "Foosaver 7500", detailsUrl: /items/Item2?atLeastVersion=9}
]
}
I also had this problem and tried diferent things. I read this suggestion and while I did not tried it yet, I think it may be the best way to go. Just enrich events before publishing them.
I have a strong background in relational databases. However, I'm always looking to improve my skills. Recently, I've been exposed to Firebase. It seems pretty interesting. However, I'm slightly confused by the "schema" if that's even the correct term.
From what I can tell, each Firebase "app" basically represents a single "table". Thus, if I am building a web application that has two related, but seperate entities, I would have to have two firebase "apps". For example, perhaps I am building a web application that has football teams, coaches and players. In a relational database, I may have something like this:
Relational Database
Team Coach TeamCoachLookup Player TeamPlayerLookup
---- ----- --------------- ------ ----------------
ID ID ID ID ID
Name FirstName TeamID FirstName TeamID
Location LastName CoachID LastName PlayerID
The above shows a possible relational database structure. Some may want to have a Person table with a RoleID to represent whether the person is a Player or a Coach. That's one approach. Still, when I look at the Firebase model, I have trouble getting my head around how the above would be structured. Would it be:
http://teams.firebaseio.com
http://coaches.firebaseio.com
http://players.firebaseio.com
Where the JSON of each item would represent a row in the database? Or, should it just be http://teams.firebaseio.com and the schema would look like this:
{
coaches: [
{ id:1, firstName:'Joe', lastName:'Smith' }
],
players: [
{ id:1, firstName:'Bill', lastName:'Mans' },
{ id:2, firstName:'Zack', lastName:'Dude' }
]
}
The second approach seems to make more sense to me. However, I do not see how Firebase supports that. Instead, in my mind, it looks like Firebase has one URL for each "table" and the JSON isn't really hierarchical. Am I way off? Is there any documentation that anyone can recommend to me?
Thanks!
The corresponding concepts should be (Firebase <=> relational):
application <=> schema
root node <=> table
child node <=> row
node key <=> row id (typically push ids)
In your concrete example:
football-app.firebaseio.com
teams
fx7Q7q
name: "Foo"
coaches
ix0GWF
firstName: "Joe"
lastName: "Smith"
players
uQ8fJK
firstName: "Bill"
lastName: "Mans"
teamCoachLookup
QkW9uH
team: "fx7Q7q"
coach: "ix0GWF"
teamPlayerLookup
BmI48N
team: "fx7Q7q"
player: "uQ8fJK"
See also https://www.firebase.com/docs/web/guide/structuring-data.html.