I currently use MySQL, after looking into Document DB it seems like it may be a good move. I do a TON (95%) of querying for single records. As my database gets larger, the time its taking to do this seems to be getting slower. Both reading and writing. I'm curious based on the (simplified) scheme below if it could be a good move to a DocumentDB, and what the layout would be for said schema (i'm a bit new to documentDB)
User
UserID
Username
CreatedDate
Tank
TankID
UserID REF User.UserID
TankName
Awards
Map
MapID
MapName
MapFIle
MapData
MapID REF Map.MapID
TankID REF Tank.TankID
Rank
Color
TimePlayed
Equipment
Everytime a player joins, the data from Tank,MapaData is Queried to gather a full tank object. Every time they die, win an award, kill somebody, or exit the game, the data is then written back out to tank,and mapdata.
The website queries the User table for login, which stores the username and a hash of the password. Once logged in the users are able to modify/delete/create new tanks on the website, which inserts records into the tank/mapdata tables.
The website also stores Top 25 in the World, t25 in map, t25 for each color, t25 for each color for each map.
That's about the only query patterns I can think of at this moment.
Based on the provided information you have the choice of several schema designs (with JSON as examples). I've made some assumptions, such as that more than one tank can be on one map and map data is only linked to a single map. You have to tweak it for your needs. I also try to provide some advantages and disadvantages of every solution.
Option #1 (Single collection)
This should be the easiest, but not the best solution. Here you put everything into one document with extreme "denormalization".
{
"mapname": "map1",
"mapfile": "mapfile1",
"data": {
"rank": "rank1",
"color": "color1",
...
"tanks": [
{
"name": "tank1",
...
"user": {
"name": "user1",
...
}
},
{
...
}
]
}
}
This solution works best when you do a lot of writes, rare updates and reads where you want to get all information together. On the other side it has a lot of disadvantages, such as storing user information directly into your application data (an example would be the password hash).
Option #2 (Two collections)
Put your user data into one collection and the other data into a second collection.
User collection
{
"id": 1,
"username": "user1",
"password": "passwordhash",
...
}
Data collection
{
"mapname": "map1",
"mapfile": "mapfile1",
"data": {
"rank": "rank1",
"color": "color1",
...
"tanks": [
{
"name": "tank1",
...
"user": userId
}
},
{
...
}
]
}
}
This option is a lot better than the first one. First you don't want to have sensitive user data (such as the hash of the password) in a collection with your other data. Also this works better for reads of the user object, because you just retrieve the information you need without skipping a lot of not needed fields. A disadvantage is that heavy write operations on the tank object can become a problem.
Option #3 (Three collections)
The next step could be to move the tanks out of the data collection into their own collection.
User collection
{
"id": 1,
"username": "user1",
"password": "passwordhash",
...
}
Tank collection
{
"name": "tank1",
...
"user": userId
}
Data collection
{
"mapname": "map1",
"mapfile": "mapfile1",
"data": {
"rank": "rank1",
"color": "color1",
...
"tanks": [
idOfTank1,
idOfTank2,
...
]
}
}
This works best for a lot of writes of single objects, such as the tanks, and reading tanks from their collection. This solution has its problems when reading a lot of data together, for example if you want to get a map and all the tanks in that map. In that case you have to resolve the dependencies of the tanks and the map data.
Summary
As seen, schema design is not easy in a document-oriented database. This is the reason why I asked for the query patterns. To come up with a good design you have to know most of the query patterns in advance. To get started, you should create a simple prototype with a design you think makes sense and test your query patterns with some test data. If that works, you can make minor changes to get even better performance. If not, rethink your query patterns and how a better design could look like. Keep in mind that you don't need a full-blown application for that. Most of that can be tested before a single line of code is written, for example with the administration shell of MongoDB or a simple console application in the case of DocumentDB.
Related
I think I'm doing something wrong but I can't figure which part... Thanks for all the clarification I can get.
So I have a collection named Bases that looks like this:
{
"id1": {
"name": "My base 1",
"roles": {
"idUser_123": {
"name": "John"
}
}
},
"id2": {
"name": "My base 2",
"roles": {
"idUser_456": {
"name": "Jane"
}
}
}
}
idUser_123 log in and want to access his collection. So I do:
db.collection('bases').get()
And I use a match rule to make sure John is not reading Jane's bases. And that's where I think I'm wrong cause I'm using rule for filter purpose.
match /bases/{document=**}{
allow read: if request.auth.uid in resource.data.roles;
}
Which failed because resource is null... I tried to do this:
match /bases/{baseId}{
allow read: if request.auth.uid in resource.data.roles;
}
This work in the simulator when requesting specific document but fails when I'm get() without baseId from client - cause I want them all.
So how am I supposed to handle this very basic use case (IMO)?
I can't put all user's baseId in user.token as it'll be over 1000 bytes quite fast.
I can make an other collection Roles to create a relation between a baseId and my user but that's seems overengineered for a simple use case.
Or I can make the request on a server and filter where("roles", "has", user.uid) ? Defeat the purpose of fetching data on client side very very quickly in my opinion...
Any recommendation on how to address this will be gladly appreciated! Thanks a lot :)
There are two problems here.
Firstly, your query is demanding to receive all of the documents in bases, but your rules do not allow anyone to simply receive all documents. It's important to realize that Firestore security rules are not filters. They do not strip documnts from a query - this actually does not scale. If a rule puts requirements on who can query certain documents based on their contents, the query will have to use a filter that matches what the rule requires.
Secondly, your data isn't structured in a way that the client can perform a filter of only certain documents based on the user's UID. Firestore doesn't have queries that can check for the existence of a map field. It can only check for map values. You might want to restructure your document data so that the client can actually perform a query with a filter to get only the documents for the current user.
If roles was an array of UIDs:
"id1": {
"name": "My base 1",
"roles": [ "idUser_123" ]
},
Then the client could filter on that field:
firebase.firestore()
.collection("bases")
.where("roles", "array-contains", uid)
And you could enforce the use of that filter in rules:
match /bases/{document=**} {
allow read: if request.auth.uid in resource.data.roles;
}
But you might want to do something different.
I'm creating a database in Firebase to manage a local basketball league, for the first part of development I want to work on match management, mainly scoring and foul registration.
The thing with scoring is that a player can score 1, 2 or 3 points and there are different kinds of fouls, like regular fouls and technical fouls, I want to be able to differentiate between those.
Also, a small detail is that a person can play for different teams in different divisions
Here is my idea for the data structure in Firebase:
divisions:{
division1:{
name:"first division"
teams:{
team1:true
team2:true
}
}
}
teams:{
team1:{
name:"Team 1"
division: division1
players:{
player1:true
player2:true
}
matches:{
match1: true
}
}
}
players:{
player1:{
name:"Player 1"
phoneNumber:"555-XXXX"
address:"123 address"
teams:{ //A player can play for different teams in different divisions
team1:true
team2:true
}
}
}
matches:{
match1:{
date:10-20-2019
court:"West Park"
referee:"John Doe"
players:{
player1:{
/*Should I store points scored and fouls comitted in here and
the players collection?*/
}
player2:{...}
}
}
I'm unsure as to where to put the data regarding points and fouls, in the future I plan to use the database to create statistics and such, but in the meantime I just want to have a registry for matches including the players, their points score and fouls.
There is no singular correct way to model data in a NoSQL database. It all depends on the use-cases of the app you want to build. In fact, when using a NoSQL database, it is quite common to adapt your data model as you add new use-cases to your app.
On your current model, I would add an additional top-level data structure to store information about each individual match. Something like this:
matches: {
matchid1: {
teams: {
team1: true,
team2: true
},
events: {
"-Lasdkjhd31": {
time: "2m44",
type: "foul",
player: "player1id",
team: "team1"
}
}
}
}
But as said, that is dependent on the use-cases of the app. So my structure above allows the storing of event information for each match, which clearly would be useful if you want to show a timeline of what happened in each match.
Beyond general guidance, it's hard to be concrete. I do recommend that you read/watch these though:
NoSQL data modeling
Firebase for SQL developers
The Firebase documentation on data modeling
Getting to know Cloud Firestore (which is about Firestore, but the same logic often applies to Realtime Database too)
Referencing this API tutorial/explanation:
https://thinkster.io/tutorials/design-a-robust-json-api/getting-and-setting-user-data
The tutorial explains that to 'follow a user', you would use:
POST /api/profiles/:username/follow.
In order to 'unfollow a user', you would use:
DELETE /api/profiles/:username/follow.
The user Profile initially possesses the field "following": false.
I don't understand why the "following" field is being created/deleted (POST/DELETE) instead of updated from true to false. I feel as though I'm not grasping what's actually going on - are we not simply toggling the value of "following" between true and false?
Thanks!
I think that the database layer have to be implemented in a slightly more complex way than just having a boolean column for "following".
Given that you have three users, what would it mean that one of the users has "following": true? Is that user following something? That alone cannot mean that the user is following all other users, right?
The database layer probably consists of (at least) two different concepts: users and followings; users contain information about the user, and followings specify what users follow one another.
Say that we have two users:
[
{"username": "jake"},
{"username": "jane"}
]
And we want to say that Jane is following Jake, but not the other way around.
Then we need something to represent that concept. Let's call that a following:
{"follower": "jane", "followee": "jake"}
When the API talks about creating or deleting followings, this is probably what they imagine is getting created. That is why they use POST/DELETE instead of just PUT. They don't modify the user object, they create other objects that represent followings.
The reason they have a "following": true/false part in their JSON API response is because when you ask for information about a specific user, as one of the other users, you want to know if you as a user follows that specific user.
So, given the example above, when jane would ask for information about jake, at GET /api/profiles/jake, she would receive something like this:
{
"profile": {
"username": "jake",
"bio": "...",
"image": "...",
"following": true
}
}
However, when jake would ask for the profile information about jane, he would instead get this response:
{
"profile": {
"username": "jane",
"bio": "...",
"image": "...",
"following": false
}
}
So, the info they list as the API response is not what is actually stored in the database about this specific user, it also contains some information that is calculated based on who asked the question.
Using a microPUT would certainly be a reasonable alternative. I don't think anybody is going to be able to tell you why a random API tutorial made certain design decisions. It may be that they just needed a contrived example to use POST/DELETE.
Unless the author sees this question, I expect it's unanswerable. It's conceivable that they want to store meta information, such as the timestamp of the follow state change, but that would be unaffected by POST/DELETE vs. PUT.
I have meteor application made up of "notepads", each containing an array of "notes" which can be inserted into at any position, deleted from or have rows edited. This array is contained within an object with a variety of other information (ex. name, users, etc). Each object in my primary document will contain one of these arrays. For example:
{
"_id": "1234",
"name": "NotePad123",
"notes": [ {note: "this is my first test note"},
{note: "this is my second test note"},
{note: "this is my third test note"} ]
},{
"_id": "4321",
"name": "NotePad321",
"notes": [ {note: "noteA"},
{note: "noteB"},
{note: "noteC"} ]
}
Is there any way I can pass the "notes" as its own collection to my client so that the client can directly edit it as if it were not embedded? I am worried about a performance hit if I need to be passing the full notes array to the server every time I want to update it as there may be many updates it could become quite large.
I realize that I could create a new document and reference it, as described here, but this could become quite hectic with many "notepads" as ordering is important and I will have many rows associated with each of my primary objects.
You can make a client-side collection that you put the notes in. Then, call a method to make the changes once you want to save.
Here's how you make a client-side collection:
var notes = new Meteor.Collection(null)
From this https://github.com/meteor/meteor/wiki/Oplog-Observe-Driver.
As of Meteor 0.7.2, we use OplogObserveDriver for most queries. There
are a few types of queries that still use PollingObserveDriver:
...
Queries specifying the skip option
...
That means always when you use paging based on skip, which is probably always when you need paging mechanism if you have lot of records that user can navigate, it will use that old very non-efficient poll-and-diff algorithm.
It still looks for me that Meteor is good just for some limited sort of apps where just few ppl needs to work together and some realtime changes propagation.
If i will have something as stack overflow, it will be really slow, because each client could be on different page and that means rerun for instance 1000 queries each time new message is added/removed, because meteor can't read from mongo oplog what query with skip operator is affected.
I am right?
There are a lot of ways to paginate not based on skip. Because you have some sort order (otherwise you wouldn't paginate in any meaningful way), you can say something like "give me next 50 items greater than this value based on this sort order`".
For example, if you have a pagination query like this:
Posts.find({ author: "Nick" }, { sort: { timestamp: -1 }, limit: 50, skip: 200 })
you can rewrite it w/o skip like this:
Posts.find({ author: "Nick", timestamp: { $gt: X } }, { sort: { timestamp: -1 }, limit: 50 })
where X is the time stamp of the last seen post.