I had to manage relationships between documents over a NoSQL engine (Couchbase), and I figured out this way to solve my problem. Here is my solution and the steps that let me realize it:
https://forums.couchbase.com/t/document-relationships-using-arrays-and-views-passing-though-graph-theory/3281
My questions are:
What do you think about this solution?
Have you ever used something like this? How is it working?
Are there any better ideas? Critical points of this solution should be helpful
Thank you.
Interesting post Matteo. After reading it I realized that you can possibly improve on few aspects:
Consider 1-1 node relationships. In your post you focus on N-N node
relationships (sure one can argue that 1-1 is a subset of
N-N)...however I think there is a potential of having a different (optimized) implememgtaion for 1-1 relationships. for 1-1 I use node key
value as a field in my json doc (e.g. user: {name:string, dob:date,
addressID:string})
Node key design to address relationships: You can encode in the key
value relationship information, e.g. key: "user#11", "user#11#address#123", "address#123#user#11", etc.
Data integrity aspects: Take into consideration missing complex
transactions. i.e. you can't mutate several documents in one
transaction. The design should compensate for that.
I have used similar solution in my model design for Couchbase in the past. Its now in production for several years already and its performing just fine (load is about 250 tps)...I was trying to avoid as much as possible creating complex node relations and ended up having very few 1-1 and 1-N types.
I tested out this solutions and works well. I like the flexibility of the 'always possible' N-N relationships, because you can simply add the relationship document when you need it without changing the application logic. There is a drawback: you need to implement your own application logic constraints to avoid relationships abuse.
I noticed that using arrays there isn't a great advantage compared to JSON objects and sometimes it may be useful to have other relationships data, for example the weight (or cost) of the relationship. So I suggest you to use a relationship document that as it's own type:
{
"type": "relationship",
"documents": ["key1", "key2"],
"all-the-data-you-need": { ... }
}
Looking at the performance there isn't so much difference using objects over arrays.
Hope this helps someone! ;)
Related
The same question is like in the topic.
When using AWS documentation, they recommend that you keep your related data in one table. But reading the various comments from other users, for the most part, I saw not to use nested objects (list of maps). So what is it really supposed to look like?
When we use maps/lists, we are often modeling one-to-many relationships. For example, here's an example where I model the relationship between Users and their Hobbies.
This strategy works fine to model one-to-many relationships if you don't have any access patterns around searching users by hobby. For example, if your access pattern includes "fetch all users who like photography", this is not a good way to store your data. Instead, you may consider storing hobbies like this
This data model allows you to fetch users by hobby, something you couldn't do in the prior example.
There is no single way to model data in DynamoDB. Instead, there are common strategies and patterns that model various relationships (one-to-many, many-to-many, etc). The best strategy for any particular situation will depend on the needs of your application.
After a watching a few videos regarding DynamoDB and its best practices, I decided to give it a try; however, I cannot help but feel what I'm doing may be an anti-pattern. As I understand it, the best practice is to leverage as few tables as possible while also taking advantage of GSIs to do some 'heavy' lifting. Unfortunately, I'm working with a use case that doesn't actually have strictly defined access patterns yet since we're still in early development.
Some early access patterns that we may see are:
Retrieve the number of wins for a particular game: rock paper scissors, boxing, etc. [1 quick lookup]
Retrieve the amount of coins a user has. [1 quick lookup]
Retrieve all the items that someone has purchased (don't care about date). [Not sure?]
Possibly retrieve all the attributes associated with a user (rps wins, box wins, coins, etc). [I genuinely don't know.]
Additionally, there may be 2 operations we will need to complete. For example, if the user wins a particular game they may receive "coins". Effectively, we'll need to add coins to the user "coins" attribute & update their number of wins for the game.
Do you think I should revisit this strategy? Additionally, we'll probably start creating 'logs' associated with various games and each individual play.
Designing a DynamoDB data model without fully understanding your applications access patterns is the anti-pattern.
Take the time to define your entities (Users, Games, Orders, etc), their relationship to one another and your applications key access patterns. This can be hard work when you are just getting started, but it's absolutely critical to do this when working with DynamoDB. How else can we (or you, or anybody) evaluate whether or not you're using DDB correctly?
When I first worked with DDB, I approached the process in a similar way you are describing. I was used to working with SQL databases, where I could define a few tables and rely on the magic of SQL to support my access patterns as my understanding of the application access patterns evolved. I quickly realized this was not going to work if I wanted to use DynamoDB!
Instead, I started from the front-end of my application. I sketched out the different pages in my app and nailed down the most important concepts in my application. Granted, I may not have covered all the access patterns in my application, but the exercise certainly nailed down the minimal access patterns I'd need to have a usable app.
If you need to rapidly prototype your application to get a better understanding of your acecss patterns, consider using the skills you and your team already have. If you already understand data modeling with SQL databses, go with that for now. You can always revisit DynamoDB once you have a better understanding of your access patterns and determine that your application can benefit from using a NoSQL databse.
For educational reasons I wish to build a functional, full, relational database. I'm aware LMDB was used to be the storage backend of sqlite, but I don't know C. I'm on .NET and I'm not interested in just duplicate a "traditional" RDBMS (so, for example, I not worry about implement a sql parser but my own custom scripting language that I'm building), but expose the full relational model.
Consider this question similar to "How I implement a programming language on top of LLVM" before worry about why I'm not using sqlite or similar.
From the material I read, LMDB look great, specially because It provide transactions and reliability, plus the low-level plumbing. How that translate to changes that could touch several rows at several tables is another question..
Exist material that explain how is implemented a relational layer on top of something like LMDB? Is using LMDB (or their competitors) optimal enough or exist another better way to get results?
Is possible to use LMDB to store other structures like hashtables, arrays and (the one I'm more interested for a columnar database) bitmap arrays?, ie, similar to redis?
P.D: Exist a forum or another place to talk more about this subject?
I had this idea too. You should realize that this is tons of work and most likely no one will care. I haven't built full-blown relational db as this is crazy to do for one person. You could check it out here
Anyway I've used leveldb (and later rocksdb) and so you have keys-values sorted by key, ability to get value by key, iterate keys, have atomic writes of many values (WriteBatch) and consistent view of data at given time - snapshots. These features are enough to build correct thread-safe reading of table rows (using snapshots), correct writing of data and related indexes - all or nothing (using writebatch) and even transactions.
Each column has it's on disk index - keys sorted by values - so you could efficiently do various operations on it and keys with values themselves so you could efficiently read values with given id.
This setup is efficient for writing and reading using available operations on tables with little data (say less than a million rows). However, if table grows iterating over many keys can become not so fast. To solve this and to add a group-by statement I've decided to add memory indexes, but that's another story. So all-in-all it might be fun idea but in reality a lot of work and often frustrating results - why would you want to do that?
I'm coding an application to create surveys with Symfony3 and Doctrine. I would like to understand which is the best way to model the relation between the survey, items, and answers. A survey is composed by multiple items that have peculiar typologies of answer. For instance I could have the following typologies:
AnswerChoice
AnswerText
AnswerRange
etc..
Which is the best way to model this scenario with Doctrine?
I thought 2 possible solutions:
I create a single Answer object including every possible feature of the answers. The Item object should have a one-to-one relationship with this objects.
Pros: I have just one answer object
Cons: Confusing and against the single responsibility principle
I create a generic Item object containing a specific Answer object (AnswerChoice, AnswerText...) in a predefined class property. The Survey object should have a one-to-many relationship with Item that in turn will have a one-to-one relationship with a specific Answer object;
Pros: Nice solution but...
Cons: I need a property for each type of answer!
Could you please help me to choice the best solution? I have the feeling that I'm not facing well this problem. Thanks
It's inheritance. Actually Doctrine handles inheritance pretty well.
There are a few ways of implementing inheritance in Doctrine but I think, that in your case Single Table Inheritance is what you're looking for.
That way you will be able to get a repository for parent (abstract) answer,but you'll get instances of actual child types in return.
I am asking about Algorithms that would be useful in Querying the Semantic web DB to get all the related RDFs to an original Object.
i.e If the original Object is the movie "inception", I want an algorithm to build queries to get the RDFs of the cast of the movie, the studio, the country ....etc so that I can build a relationship graph.
The most close example is the answer to this question , Especially this class , I wan similar algorithms or maybe titles to search in order to produce such an algorithm, I am thinking maybe some modifications on graph traversing algorithms can work, but I'm not sure.
NOTE: My project is in ASP.NET. So, it would help to use Exisiting .NET libraries.
You should be able to do a simple breadth-first-search to get all the objects that are a certain distance away from a given node.
You'll need to know something about the schema because some neighboring nodes are more meaningful than others. For example, in Freebase, we have intermediate nodes that link a film to an actor and a role. You need to know to go 2-ply deep to get at the actor and the role because just saying that the film is related to the intermediate nodes is not very interesting.
Did you take a look at "property paths"?
Property Paths give a more succinct way to write parts of basic graph
patterns and also extend matching of triple pattern to arbitrary
length paths. Property paths do not invalidate or change any existing
SPARQL query.
Triple stores and SPARQL engines such as OWLIM and AllegroGraph support them.