Are there any NoSQL standards emerging? [closed] - standards

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
As with most new technologies after a while a standard emerges.
Is there anything cooking for NoSQL?

The whole point of NoSQL is that there are no standard solutions. Every data storage problem is different, and you need to choose the data storage technology that is appropriate for your specific problem and not the one that is "the standard".
That's the whole premise of "Not Only SQL".
Take ACID (here's a pieve of advice you never thought you'd get on StackOverflow, or really anywhere after 1987 :-) ), for example. There is a wide array of problems which don't need ACID guarantees. For those problems, ACID is overkill. Overkill that translates into wasted I/O, wasted CPU cycles, wasted performance. Which means wasted heat and wasted energy, which in turn means wasted money on electrical and utility bills.
Some problems only need weaker forms of those guarantees. For example, for a wide array of web applications the so-called eventual consistency is plenty enough. Other problems need higher guarantees than what SQL-style ACID provides.
So, some NoSQL databases don't have ACID guarantees or only have them in a weaker form. Some can turn them on and off on a per-DB basis. Some can turn A, C, I and D on and off individually on a per-DB basis. Some can not only turn A, C, I and D on and off individually, they can finetune them on a sliding scale. Some can even do that on a per-query basis.
If you have hierarchical data, store it in a hierarchical database. If you have graph data, store it in a graph database. If you have key-value data, store it in a key-value database. If you have semi-structured document data, store it in a document database. If you have semantic RDF data, store it in a triple database. If you build a data warehouse, store it in a column database. And if you have relational data, then, by all means store it in a relational database. (But only if you actually have relational data!)

There is no single standard NoSQL solution, as Jörg explained (+1). The term NoSQL covers a wide array of database types, each tailored for a specific data domain.
Ayende's That No SQL Thing series takes a look at some of the mainstream NoSQL solutions and highlights the strengths and weaknesses of each type. He discusses the following:
Key/value stores
Column-family stores
Document databases
Graph databases
You can think of these different types as standards within NoSQL. Just remember that each of them is specialized for certain data storage problems. There's no "one size fits all" solution: all of them will continue to exist.

A query language for JSON, semi-structured and document databases called UnQL is being developed:
http://www.unqlspec.org/display/UnQL/Home

Some people have contemplated about standards for document db's: http://nosql.mypopescu.com/post/731261002/a-common-nosql-query-language .
However key-value-stores and document db's don't do joins and that means that their query languages are simple and easy to learn. There is less need for a common language like SQL.
However .NET developers can use LINQ to access document db's MongoDB and RavenDB, and some people are developing a LINQ provider to document db CouchDB: http://github.com/sinesignal/ottoman . LINQ isn't a NoSQL standard but a standard for everything that is related to data. You can use it to talk to a relational database or an xml file too.
Graph databases are very different from key-value-stores and document db's. I don't think you can unite them in one standard. I really don't know if it is possible to develop a LINQ provider for a graph database. I guess not but I'm not sure.

Some NoSQL product supports SQL or a super set of it. This is the case of OrientDB, a document-graph nosql dbms with the support of SQL. It's released under Apache 2 license.
Furthermore it can export document in JSON format (you can export/import the entire database in JSON). Other NoSQL products read/write JSON.
bye,
Lvc#

(Speaking specifically on subset of NoSQL known as Document databases).
Many document databases do not expose a "Query Language". In lieu, they often provide Query APIs and these APIs are specific to the implementation and controlled by the individual sponsors/owners of the implementations (10gen for MongoDB, for example).
In the XML database space (a subset of Document databases), there is the W3C standard XQuery. It is a query and functional programming language designed for querying collections of XML data (says wikipedia).
It is unclear yet if there is any need/desire for a standard query API (or language) for JSON data. JSONPath (analogous to XPath) has been proposed, but it's received little attention other than it's use by Kynetx .

One potentially interesting one is AppScale which provides a unified API for HBase, Hypertable, MySQL Cluster, Cassandra, Voldemort, MongoDB, MemcacheDB and Redis. The API is defined by Google for the Google App Engine and is available for Java, Python and Go.

Related

Using LMDB to implement a sqlite-alike relational database, relevant resources?

For educational reasons I wish to build a functional, full, relational database. I'm aware LMDB was used to be the storage backend of sqlite, but I don't know C. I'm on .NET and I'm not interested in just duplicate a "traditional" RDBMS (so, for example, I not worry about implement a sql parser but my own custom scripting language that I'm building), but expose the full relational model.
Consider this question similar to "How I implement a programming language on top of LLVM" before worry about why I'm not using sqlite or similar.
From the material I read, LMDB look great, specially because It provide transactions and reliability, plus the low-level plumbing. How that translate to changes that could touch several rows at several tables is another question..
Exist material that explain how is implemented a relational layer on top of something like LMDB? Is using LMDB (or their competitors) optimal enough or exist another better way to get results?
Is possible to use LMDB to store other structures like hashtables, arrays and (the one I'm more interested for a columnar database) bitmap arrays?, ie, similar to redis?
P.D: Exist a forum or another place to talk more about this subject?
I had this idea too. You should realize that this is tons of work and most likely no one will care. I haven't built full-blown relational db as this is crazy to do for one person. You could check it out here
Anyway I've used leveldb (and later rocksdb) and so you have keys-values sorted by key, ability to get value by key, iterate keys, have atomic writes of many values (WriteBatch) and consistent view of data at given time - snapshots. These features are enough to build correct thread-safe reading of table rows (using snapshots), correct writing of data and related indexes - all or nothing (using writebatch) and even transactions.
Each column has it's on disk index - keys sorted by values - so you could efficiently do various operations on it and keys with values themselves so you could efficiently read values with given id.
This setup is efficient for writing and reading using available operations on tables with little data (say less than a million rows). However, if table grows iterating over many keys can become not so fast. To solve this and to add a group-by statement I've decided to add memory indexes, but that's another story. So all-in-all it might be fun idea but in reality a lot of work and often frustrating results - why would you want to do that?

How well does UnQLite perform? How does it compare to SQLite (in performance)?

I've researched on what I can about SQLite and UnQLite but there are still a few things that haven't quite been answered yet. UnQLite appears to have been released within the past few years which would attribute to the lack of benchmarks. "Performance" (read/write speed, querying, avg. database size before significant slowdown, etc.) comparisons may be somewhat apples-to-oranges here.
From all that I have seen the two have very few differences comparatively speaking, namely that SQLite is a relational database whereas UnQLite is a key-value pair and document (via Jx9) database. They're both portable, cross-platform, and 32/64-bit friendly, and can have single-write and multi-read connections. Very little can be found on UnQLite benchmarks while SQLite has quite a few with different implementations across various (scripting) languages. SQLite has some varied performance across in-memory databases, indexed data, and read/write modes with varying data size. Overall SQLite appears quick and reliable.
All that I can find on UnQLite are unreliable and confusing. I cannot seem to find anything helpful. What read/writes speeds does UnQLite seem to peak at? What languages are (not) recommended when using UnQLite? What are some known disadvantages and bugs?
If it helps at all to explain my intrigue, I'm developing a network utility that will be reading and processing packets with hot-swapping between network interfaces. Since the connections can, though unlikely, reach speeds up to 1 Gbps there will be a lot of raw data being written out to a database. It's still in the early stages of development and I'm having to find a way to balance out performance. There are a lot of factors such as missed packets, how large each write size is, how quickly it can process and move data, how much organization will be required, how many tables will be needed, if I can implement multiprocessing, how reliant each database is on HDD speeds, etc. etc.. My data will need tables but whether or not I have to store them as relational is still in the air. Seeing how the two stack up with their own pros and cons (aside from the usual KVP vs Relational debate) may push me towards either one or, if I'm crazy enough, a mix of both
I've done a bit of fooling around with UnQLite using python bindings I wrote. The Python bindings use cython and are quite fast.
What I've found from my experimentation is that UnQLite's key/value APIs are pretty damn fast, comparable to other DBMs. Things slow down a bit when you start using Jx9 and the document store, though.
Basically depends on what you need...
If you want SQL and ad-hoc querying, I'd suggest using SQLite. It is plenty fast and quite flexible.
If you want just keys and values, I'd use something like leveldb or rocksdb.
If you want a lightweight JSON document store, or key/value with a bit "extra", then UnQLite may be a good fit.

What's a good strategy to move data from a SQL database to Google NDB?

Hello there fellow netizens,
I have a SQL database (about 600MB big) that I want to import into my GAE app. I know that one possibility would be to simpy use Google Cloud SQL, but I'd rather have the data available in NDB to get the benefits thereof. So I'm wondering, how should I think about converting the SQL schema into a NDB schemaless structure? Should I simply set up Kinds to mirror each table? How ought I deal with foreign keys that relate different tables?
Any pointers are greatly appreciated!
- Lee
How should I think about converting the SQL schema into a NDB schemaless structure?
If you are planning to transfer your SQL data to the Datastore, you need to think about how these two systems are very different.
Should I simply set up Kinds to mirror each table?
In thinking about making this transfer, simple analogies like this will only get you so far. Thinking SQL on a schemaless DB can get you in serious trouble due to the difference in implementation, even if at first it helps to think of a Kind as a table, Entity properties as columns, etc... In short, no, you should not simply set up Kinds to mirror each table. You could, but it depends what kind of operations you want to support on these entities, how often these ops will occur, what kind of queries your system relies on, etc...
How ought I deal with foreign keys that relate different tables?
Honestly, if you're looking to use MySQL specific features like foreign keys, or your data model will require a lot of rethinking. A "foreign key" could be as little as maintaining a key reference to the other Kind in an Entity of a certain Kind.
I would suggest that you stick with Cloud SQL if your data storage solution is already built in SQL, unless you are willing to A) rethink your whole data model B) implement the new data model C) transfer the data you currently have D) re-code all code that interacts with data storage (unless using ORM, in which case your life might be easier for this aspect).
Depending how complex your SQL db is, and how much time you feel it will take to migrate to Datastore, and how much time/brainpower you are willing to commit to learning a new system and new ways of thinking, you should either stick with SQL or do the above steps to rebuild your data storage solution.

How to store big data? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 months ago.
Improve this question
Suppose we have a web service that aggregates 20 000 users, and each one of them is linked to 300 unique user data entities containing whatever. Here's naive approach on how to design an example relational database that would be able to store above data:
Create table for users.
Create table for user data.
And thus, user data table contains 6 000 000 rows.
Querying tables that have millions of rows is slow, especially since we have to deal with hierarchical data and do some uncommon computations much different from SELECT * FROM userdata. At any given point, we only need specific user's data, not the whole thing - getting it is fast - but we have to do weird stuff with it later. Multiple times.
I'd like our web service to be fast, so I thought of following approaches:
Optimize the hell out of queries, do a lot of caching etc. This is nice, but these are just temporary workarounds. When database grows even further, these will cease to work.
Rewriting our model layer to use NoSQL technology. This is not possible due to lack of relational database features and even if we wanted this approach, early tests made some functionalities even slower than they already were.
Implement some kind of scalability. (You hear about cloud computing a lot nowadays.) This is the most wanted option.
Implement some manual solution. For example, I could store all the users with names beginning with letter "A..M" on server 1, while all other users would belong to server 2. The problem with this approach is that I have to redesign our architecture quite a lot and I'd like to avoid that.
Ideally, I'd have some kind of transparent solution that would allow me to query seemingly uniform database server with no changes to code whatsoever. The database server would scatter its table data on many workers in a smart way (much like database optimizers), thus effectively speeding everything up. (Is this even possible?)
In both cases, achieving interoperability seems like a lot of trouble...
Switching from SQLite to Postgres or Oracle solution. This isn't going to be cheap, so I'd like some kind of confirmation before doing this.
What are my options? I want all my SELECTs and JOINs with indexed data to be real-time, but the bigger the userdata is, the more expensive queries get.
I don't think that you should use NoSQL by default if you have such amount of data. Which kind of issue are you expecting that it will solve?
IMHO this depends on your queries. You haven't mentioned some kind of massive writing so SQL is still appropriate so far.
It sounds like you want to perform queries using JOINs. This could be slow on very large data even with appropriate indexes. What you can do is to lower your level of decomposition and just duplicate a data (so they all are in one database row and are fetched together from hard drive). If you concern latency, avoid joining is good approach. But it still does not eliminates SQL as you can duplicate data even in SQL.
Significant for your decision making should be structure of your queries. Do you want to SELECT only few fields within your queries (SQL) or do you want to always get the whole document (e.g. Mongo & Json).
The second significant criteria is scalability as NoSQL often relaxes usual SQL things (like eventual consistency) so it can provide better results using scaling out.

Comparing features in a asp.net web application using different database technologies

I have a webstore which sells components (it is a academic project) which looks like this. I have developed the same web application using following database technologies:
MS Sql Server with Stored procedures and sql data reader
LINQ to Sql
DB4o using LINQ (Client/Server)
What features can I compare apart from the technical and theoretical details between relational database and object oriented database ?
It is my graduate/master's thesis final project. I want the features that i want to compare to be more practical and interesting so that I can draw some concrete and meaningful conclusions rather than abstract comparisons which don't create much interest and hard for inference.
Please help me.
Feel free to express your opinions.
Thanks in anticipation
PS: Don't downvote or flag this post, if some one doesn't like this question u may delete it after getting answered
Here is a site that compare DALs, maybe you can get some ideas for what other think you can compare.
http://ormbattle.net/
Also here is my first question on StackOverflow that I compare 4 dals for speed and optimization.
Benchmark Linq2SQL, Subsonic2, Subsonic3 - Any other ideas to make them faster?
What features can I compare apart
In you case I was try to compare the speed, and if the conversion to a DAL can give the same or more features that can get with out it. For example, can you get all the same questions that you can do direct with SQL or not, and what is the limitations.
Try creating some performance benchmarks and do a side-by-side compare of the three different DB technologies (these are not methodologies) for given types of queries.

Resources