I want to shard OrientDB to run on multiple servers, but I couldn't find a true way to do it.
How can I do it?
Sharding will be provided with 2.0.0 planned for December 2012/January 2013. Right now you can have multiple distributed databases.
Related
I have a question regarding some NoSQL databases. In Ehcache we have for example the JCache API, in MapDB the Map Interface and in Riak KV we have a own process with clusters. How do I exactly find out which database fits to which implementation type? For example for RocksDB (I assume that it is a process) and same for LevelDB.
For reference, RocksDB and LevelDB perform very similar functions and can be interchangeable in some situations.
Given your question of Is RocksDB and LevelDB just like Riak?, I can say that they are not the same as Riak provides a scalable distributed platform to run on that can connect to one or more backend databases simultaneoulsy (currently supported backends are Bitcask, LevelDB, Leveled and memory). RocksDB and LevelDB are essentially stand alone database platforms that can be used as such or can utilised by other software such as Riak as a backend. While you could technically implement RocksDB as a backend for Riak KV without needing a mountain of custom code, you probably wouldn't want to as RocksDB does not scale well.
How do I exactly find out which database fits to which implementation type? is rather a broad question. I think you might want to rephrase it as Which databases offer me {my list of desired implementations/functions}? to make it easier for community members to answer. Please note that some NoSQL databases have multiple uses available e.g. as you mentioned Riak KV, we have Maps, Sets, GSets, Flags, Registers, Solr Search, 2i and the standard CRDT options as well but some of those may be tied to other requirements e.g. 2i only works with a LevelDB/Leveled backend, Solr Search requires the Yokozuna package version of Riak KV 3.0.0 and above but is built in for all Riak 2.x.x versions etc.
What you may also want to try to do is download a few different options to a VM or bare metal rig, have a play and see how it works out. There are often cases where two competing products do something very similar on paper but in your specific use case, one outperforms the other significantly.
To get you started, here are links to Riak 2.9.8 (the latest release of the 2.x.x series) and to the Riak 2.2.6 docs (the 2.9.x docs should be out later this month).
I'm not sure if this has directly answered your question but, hopefully, it will give you some pointers as to where to go next.
I am just trying to figure out if we can join different versions of riak kv in a cluster.
I currently run a 5 node cluster of riak-1.4.7. Can I join riak-2.0.X versions to the same cluster? If the answer is Yes, How is the data transfer will happen to the new node of different version?
I tried to search in official documents but I couldn't find what I am looking for.
Yes you can, some versions are compatible, this is for instance how you can migrate to a new version of Riak without any downtime. You should look at the migration documentation( for instance http://docs.basho.com/riak/kv/2.2.3/setup/upgrading/version/ )
In your case, you have a 1.4.7 cluster and you want to join a 2.0 to that cluster. I'm pretty sure that it'll work fine, as a lot of backward compatibility effort has been in done. Once that's done, you might want to think about upgrading all the nodes to the latest version though.
I am planning to use Titan Graph DB for my project.
The reason for selecting it is because it is the only graph database which can use DynamoDB as the storage backend. Thus I can free myself of the scalability/throughput worries.
But when I am trying to find any tutorial to get started with Titan, I am not finding many of them. This makes me doubt whether to use Titan or choose another graph database, like Neo4j or OrientDB.
Can someone tell me if Titan being used widely?
Is the community active?
Can I expect proper releases?
The last blogpost on ThinkAurelius is dated Feb 3, 2015 regarding acquisition by DataStax. DataStax website has no mentioning about Titan.
Your question is more of an opinion question which might be suited for a different forum, but I will attempt to answer the main questions you stated anyway.
Is Titan being used widely? This is hard to tell since Aurelius doesn't disclose much information publicly about their users other than their client list. The Amazon Fulfillment gave a session on their usage of Titan with DynamoDB. This blog post identifies NASA's Jet Propulsion Laboratory and AdAgility as users also.
Is the community active? Somewhat. There are discussions occurring on the Titan mailing list and new messages come in every day. Commits are being made on the titan11 stream in GitHub. The most recent commit was on April 4, 2016.
Can I expect proper releases? "Expecting" releases of software is not something I would recommend in general. Software (both open source and proprietary) is released when the maintainers think the code is ready. For Titan specifically, the core maintainers have largely been absent from the community because they are busy working on DataStax Enterprise Graph.
Few months back I saw TeraData Express Edition. I have no idea of this beast. I want to know whether it still comes with Express Edition and whether it is a good idea to use this database for Windows based mid-sized apps.
It really depends on what you want to do with this application. From an application perspective, a great weakness of Teradata is that it does not support read committed transaction isolation. If you are attempting to use Teradata as an OLTP database, then you might want to try something else. If you are using it to crunch numbers, then yes, go with it. The one issue is that Teradata Express Edition is not supported that well. Express edition is essentially a snapshot of the database for a certain release. If you find and report a bug, it will take a long time for you to receive a fix. Teradata only releases the express editions once per db release. However (imo), if you buy the real version, you will receive a pretty quick patch which will be rolled into the database software.
I use teradata in my technical support work. I work on database middleware, and Teradata is one of our supported data stores. Define mid-sized? 1-200 transactions per second? I'd stand ANY commonly used database up against that. 10000 tps? Maybe not - maybe you go to the enterprise edition.
I am interested to create portal on cassandra services, since I faced some performance and scale issues starting from 1 million of records.
Definitely, it could be solved, but I am interested on other options.
My main issues is cost of updating all necessary indexes, to make reading fast.
First, is cassandra is good way for asp.net programmers? I mean, maybe there is some other projects, which worth to take a look
And second, can you provide any documentation samples on how to start with cassandra programming from C#?
since I faced performance and scale issues starting from 1 million of records.
Maybe your design was not that good, NoSQL is not a magic bullet for bad design. I have multi billion row tables and 95% of the response is sub second. Also what do you mean by updating indexes, do you mean updating statistics or rebuilding indexes?
since I faced performance and scale
issues starting from 1 million of
records.
You know, the one million mark for modern databases is where it is not something "totally ridiculously small" where you can ignore actually knowing what you do. Below one million is "tiny". I have a 800 million row table and get a LOT of sql running through with it - no problem at all.
First, is cassandra is good way for
asp.net programmers?
I would more suggest a basic book about SQL, reading the documentation and POSSIBLY throwing some hardware on the problem. As in: having totally bad hardware will kill all data management systems.
If you are using Cassandra for your .NET Application take a look at Aquiles. I developed it based on my company needs. If you find it useful or need any help let me know.
You can't really speak of Cassandra documentation. There's a myriad of partial tutorials on the web.
You may want to setup Linux in a virtual machine, because the windows build process is quite challenging, to say the least. (http://www.virtualbox.org, http://www.ubuntu.com)
Here's the howto:
http://www.ridgway.co.za/archive/2009/11/06/net-developers-guide-to-getting-started-with-cassandra.aspx
Note that the cassandra SVN url and the code sample have changed since the writing of this tutorial.
Here's another C# client:
http://github.com/mattvv/hectorsharp
And here some sample code:
http://www.copypastecode.com/26752/
Note that you need to download the latest Java Development Kit (JDK) from Sun for Linux.
It's not in the repositories of Ubuntu 10.04.
Then you need to type
export JAVA_HOME="/path/to/jdk"
in order for Cassandra to find your Java installation.
You might also want to take a look at:
http://en.wikipedia.org/wiki/NoSQL
Especially the taxonomy section is interesting.
Make sure Cassandra is the right type of NoSQL solution for your problem, e.g. use Neo4J if your problem actually is a graph problem.
Also, you need to make sure your NoSQL solution is ACID-compliant.
For example, Neo4J is the only ACID-compliant NoSQL graph engine.
Edit: Here's a jumpstart guide for Windows, without compiling:
http://coderjournal.com/2010/03/cassandra-jump-start-for-the-windows-developer/
http://www.ronaldwidha.net/2010/06/23/running-cassandra-on-windows-first-attempt/
http://www.yafla.com/dforbes/Getting_Started_with_Apache_Cassandra_a_NoSQL_frontrunner_on_Windows/
Instead of cassandra you might take a look at: ravendb. Supposedly it is a document store made with and created for .Net. It has Linq integration, and is (again supposedly) very fast.
As with any new technology, read if it helps you with your specific case, and check if it is proven technology (Do they have mainstream clients using it).
Before you go into this route see if you can't optimize your current solution first. Check if your queries are fast, if the indexes are done correctly, and if you can't remove load by adding caching.
Last nut not least, if adding some processors to your SQL machine might fix issues, it is typically a much cheaper solution.
If you want to do something new, then instead of going for noSQL, you might want to consider trying a database cluster.
The idea is when two machines each search half of the original database at the same time, you have half the search time without totally redesigning your existing database.