Using LMDB to implement a sqlite-alike relational database, relevant resources? - sqlite

For educational reasons I wish to build a functional, full, relational database. I'm aware LMDB was used to be the storage backend of sqlite, but I don't know C. I'm on .NET and I'm not interested in just duplicate a "traditional" RDBMS (so, for example, I not worry about implement a sql parser but my own custom scripting language that I'm building), but expose the full relational model.
Consider this question similar to "How I implement a programming language on top of LLVM" before worry about why I'm not using sqlite or similar.
From the material I read, LMDB look great, specially because It provide transactions and reliability, plus the low-level plumbing. How that translate to changes that could touch several rows at several tables is another question..
Exist material that explain how is implemented a relational layer on top of something like LMDB? Is using LMDB (or their competitors) optimal enough or exist another better way to get results?
Is possible to use LMDB to store other structures like hashtables, arrays and (the one I'm more interested for a columnar database) bitmap arrays?, ie, similar to redis?
P.D: Exist a forum or another place to talk more about this subject?

I had this idea too. You should realize that this is tons of work and most likely no one will care. I haven't built full-blown relational db as this is crazy to do for one person. You could check it out here
Anyway I've used leveldb (and later rocksdb) and so you have keys-values sorted by key, ability to get value by key, iterate keys, have atomic writes of many values (WriteBatch) and consistent view of data at given time - snapshots. These features are enough to build correct thread-safe reading of table rows (using snapshots), correct writing of data and related indexes - all or nothing (using writebatch) and even transactions.
Each column has it's on disk index - keys sorted by values - so you could efficiently do various operations on it and keys with values themselves so you could efficiently read values with given id.
This setup is efficient for writing and reading using available operations on tables with little data (say less than a million rows). However, if table grows iterating over many keys can become not so fast. To solve this and to add a group-by statement I've decided to add memory indexes, but that's another story. So all-in-all it might be fun idea but in reality a lot of work and often frustrating results - why would you want to do that?

Related

Firebase: Is it a good idea to use dimension / fact table design in NoSQL

So I was wondering, the one downside I have to NoSQL is: if my front end app ever drastically changes then I would have a horrible time remodeling my database. This is because NoSQL is designed with front end in mind first. So if front end changes, back end changes (at least that is the general idea)
So my idea is, would it be smart to store all my ORIGINAL/PURE copies of documents in multiple root collections. And then create "views" collections which are the collections my app will call. What I like about this is that my data is always "SQL" at root if I ever need to change my front end. But my "views" are actually what my app will use.
This is a lot like the dimension/reference table and fact table design people use.
The big reason once again for this idea: if my front end changes drastically, then I need to do serious work converting these "views" to other "views". Where with my idea, you would just delete your old "views" and create new "views" using your "sql"/"root" reference tables.
Do I make sense? :) I have not used NoSQL (but I am building something now with it so my brain is still battling with SQL to NoSQL haha). So if this is a "dude don't worry about it case" then you can give that as an answer as well haha
Yup, that is indeed a fairly common approach. I recent answers about NoSQL data modeling I started calling this out explicitly:
Make sure you have a single point of definition for each entity/value.
Make sure all other occurrences of that same value are derived from #1.
With these two in mind, fanning out/duplicating the data is a fairly straightforward process (literally: as it's unidirectional), and can easily be redone by wiping the derived data and rerunning the fan-out process.
Some good pointers to learn more about NoSQL data modeling:
NoSQL data modeling
Getting to know Cloud Firestore
And these previous questions:
Maxing out document storage in Firestore
How to write denormalized data in Firebase
Understanding NoSQL CRUD calls

How to deal with complex database using an ORM on Android?

I can't find how to deal properly with complex databases using ORM on Android. I tried to find an Open Source project to see how it works but can find one that suits what i'm looking for...
I learned about relational databases some years ago and worked on SQL Server and Oracle databases, huge ones. The first things i learned when designing a database is to avoid having several times the same data. The second things i learned is never do in code what you can do with SQL. So I'm facing several problems with Android and ORM since it looks like you abolutely have to use an ORM in Android to be a good developer...
Let's take an example and say we have 100 buildings with 50 people in each of these building, all buildings has a different address. I want to get all people with their building address. I can't put this in one table else the same strings will exists many times in the database. Since on each adress there are 50 people if I use only one table I will have the same address string 50 times for each building, so I create another table with only buildings and make a relationship between these two tables. This is a trivial case but i saw many times Android app storing the same data many times in one or two tables.... what the point to use relational database if you replicate data ?
Again this is a simplistic example but when you have 20 or 30 tables with complex relationships the query in ORM styles can quickly become unreadable compare to SQL. Therefore not all SQL join types are generally supported by ORM. Then you use SQL raw query but what's the point to use an ORM since you can't use the object mapping since you're not returning a table you can map to a class but the result of a query... or maybe there is something I didn't understand. What the point to use an ORM if you don't use the relationnal object mapping advantage or make the queries difficultly maintainable ?
I saw a lot of code too where the ORM is used to get data in several tables and then the filtering and joining part is made using code... what's the point to use a relational database if you have to do this in code ? actually doing this some years ago what seen as the worse thing to do... but now I saw it so many times on Android...
So another solution is to create a View in the db and map my object to this view. I can use the power of SQL and the power of relationnal object mapping of the ORM. But several ORM doesn't support Views, like GreenDao who is one of the most used ORM today as far as I know...
All the example i can find here and there are not dealing with complex databases or has this kind of bad practices. Or at least it was condidered as bad practices for years... does it changed ?
So what's the best way to deal with "complex" databases on Android ?

What's a good strategy to move data from a SQL database to Google NDB?

Hello there fellow netizens,
I have a SQL database (about 600MB big) that I want to import into my GAE app. I know that one possibility would be to simpy use Google Cloud SQL, but I'd rather have the data available in NDB to get the benefits thereof. So I'm wondering, how should I think about converting the SQL schema into a NDB schemaless structure? Should I simply set up Kinds to mirror each table? How ought I deal with foreign keys that relate different tables?
Any pointers are greatly appreciated!
- Lee
How should I think about converting the SQL schema into a NDB schemaless structure?
If you are planning to transfer your SQL data to the Datastore, you need to think about how these two systems are very different.
Should I simply set up Kinds to mirror each table?
In thinking about making this transfer, simple analogies like this will only get you so far. Thinking SQL on a schemaless DB can get you in serious trouble due to the difference in implementation, even if at first it helps to think of a Kind as a table, Entity properties as columns, etc... In short, no, you should not simply set up Kinds to mirror each table. You could, but it depends what kind of operations you want to support on these entities, how often these ops will occur, what kind of queries your system relies on, etc...
How ought I deal with foreign keys that relate different tables?
Honestly, if you're looking to use MySQL specific features like foreign keys, or your data model will require a lot of rethinking. A "foreign key" could be as little as maintaining a key reference to the other Kind in an Entity of a certain Kind.
I would suggest that you stick with Cloud SQL if your data storage solution is already built in SQL, unless you are willing to A) rethink your whole data model B) implement the new data model C) transfer the data you currently have D) re-code all code that interacts with data storage (unless using ORM, in which case your life might be easier for this aspect).
Depending how complex your SQL db is, and how much time you feel it will take to migrate to Datastore, and how much time/brainpower you are willing to commit to learning a new system and new ways of thinking, you should either stick with SQL or do the above steps to rebuild your data storage solution.

How to store big data? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 months ago.
Improve this question
Suppose we have a web service that aggregates 20 000 users, and each one of them is linked to 300 unique user data entities containing whatever. Here's naive approach on how to design an example relational database that would be able to store above data:
Create table for users.
Create table for user data.
And thus, user data table contains 6 000 000 rows.
Querying tables that have millions of rows is slow, especially since we have to deal with hierarchical data and do some uncommon computations much different from SELECT * FROM userdata. At any given point, we only need specific user's data, not the whole thing - getting it is fast - but we have to do weird stuff with it later. Multiple times.
I'd like our web service to be fast, so I thought of following approaches:
Optimize the hell out of queries, do a lot of caching etc. This is nice, but these are just temporary workarounds. When database grows even further, these will cease to work.
Rewriting our model layer to use NoSQL technology. This is not possible due to lack of relational database features and even if we wanted this approach, early tests made some functionalities even slower than they already were.
Implement some kind of scalability. (You hear about cloud computing a lot nowadays.) This is the most wanted option.
Implement some manual solution. For example, I could store all the users with names beginning with letter "A..M" on server 1, while all other users would belong to server 2. The problem with this approach is that I have to redesign our architecture quite a lot and I'd like to avoid that.
Ideally, I'd have some kind of transparent solution that would allow me to query seemingly uniform database server with no changes to code whatsoever. The database server would scatter its table data on many workers in a smart way (much like database optimizers), thus effectively speeding everything up. (Is this even possible?)
In both cases, achieving interoperability seems like a lot of trouble...
Switching from SQLite to Postgres or Oracle solution. This isn't going to be cheap, so I'd like some kind of confirmation before doing this.
What are my options? I want all my SELECTs and JOINs with indexed data to be real-time, but the bigger the userdata is, the more expensive queries get.
I don't think that you should use NoSQL by default if you have such amount of data. Which kind of issue are you expecting that it will solve?
IMHO this depends on your queries. You haven't mentioned some kind of massive writing so SQL is still appropriate so far.
It sounds like you want to perform queries using JOINs. This could be slow on very large data even with appropriate indexes. What you can do is to lower your level of decomposition and just duplicate a data (so they all are in one database row and are fetched together from hard drive). If you concern latency, avoid joining is good approach. But it still does not eliminates SQL as you can duplicate data even in SQL.
Significant for your decision making should be structure of your queries. Do you want to SELECT only few fields within your queries (SQL) or do you want to always get the whole document (e.g. Mongo & Json).
The second significant criteria is scalability as NoSQL often relaxes usual SQL things (like eventual consistency) so it can provide better results using scaling out.

Is it possible to reference custom code in the Where-clause of Neo4J's Cipher Query Language?

Is it possible to use Neo4J's Cipher Query language (or another declarative language) but still reference custom code snippets (for instance to do custom WHERE-clauses based on, say, the result of a ElasticSearch/Lucene search?)
If other GraphDB's have declarative languages that support this, please shoot. I'm in no way bound to Neo4J.
Background:
I'm doing some research whether to include Neo4J in my current stack, which in the backend already consists of ElasticSearch, MongoDB and Redis.
Particulary with Redis' fast set-intersection capability, I could potentially create some rude graph-like querying. (although likely not as performant as a graphDB). I'm a long way in defining a DSL, with the type of queries to support.
However, I'm designing a CMS so contenttypes, and the relationships between these contenttypes which I would like to model with a graph are not known beforehand.
Therefore, the ideal case, of populating the needed Redis collections (with Mongo as source) to support all my quering based on Contenttypes and their relationships that are not known at design time, will be messy to say the least. Hope you're still following.
Which leads me to conclude that another solution may be needed, which is why I'm looking at GraphDb'd and Neo4J in particular (If others are potentially better suited for my use-case do shoot)
If you model your content-types as nodes you don't need to know them beforehand.
User-defined functions in javascript are planned for cypher later this year.
You can use a language like gremlin to declare your functions in groovy though.
You can store the node-id's in redis and then pass in an array of id's returned by redis to a cypher query for further processing.
start n=node({ids})
match n-[:HAS_TYPE]->content_type<-[:HAS_TYPE]-other_content
return content_type, count(*)
order by count(*) desc
limit 10
parameters: {"ids": [1,2,3,5]}

Resources