Couchbase vs Amazon Dynamo DB - amazon-dynamodb

How can we compare Amazon DynamoDB to Couchbase? Which is better in terms of performance, scalability, availability and fault tolerance ?
Thanks for the help.
Regards,
Vinay

This is too generic & strategical question to be part of stackoverflow and the answer could be either of it depending on use case. However I will just let you know what should be considered while deciding.
Remember Couchbase is a NoSQL software but DynamoDB is a NoSQL database as a service. So comparing them may be not a a great idea.
Couchbase can be hosted in-memory and be blazing fast if hosted in local network inside the datacentre. For DynamoDB you have to host your application in EC2 otherwise network lag would be a bottle neck.Most of the time you may not notice a lag as they promise 2 digit latency i.e. less than 99ms. Advantage of dynamo DB is that you don't manage it for backup and recovery . For couchbase in case you host it it could be a pain while upgrades or downtime when you have to take care of backups and version compatibility of data and balancing data across nodes in cluster.

Related

Keeping network architecture in mind, Why Redis Sentinel not a good architecture. But setting up external zookeeper is good for production Solr

I am confused with my little knowledge of Redis, Zookeeper, Solr.
Help me in understanding network architecture of Redis Sentinel, zookeeper.
Both Redis Sentinel and Zookeeper at a high level look functionally similar. Choosing master and slaves and monitoring.
Redis cluster was introduced with different architecture where separate servers for monitoring not required. This also mentioned as cons of Sentinel.
Solr document says in production it is good to setup external zookeeper to maintain Solr.
Can someone explain me on the network protocols/architecture level like why one is good and the other one is not?
--Updated
My question is not specific to Redis Sentinel or Solr. Rather, it is on the architecture.
In Redis keeping sentinel outside was not really helping. It was creating unnecessary overhead as Sentinel also needs to be maintained in separate servers.
So they came up with Redis cluster where no external servers required for maintaining/choosing master/slave.
In case of Solr, though they have internal zookeeper, they suggest to keep external zookeeper in production Solr as best practice.
In above cases, for me it looks like both are architecturally opposite, which they say as best practice.
Please help me to understand at an architecture level, how it is helping in Solr and not in Redis usecase

Which one is the cheapest to use AWS RDS or my own database?

I have wordpress application running on my EC2 - AWS. I haven't decide which one is Amazon RDS or my own database on different hosting. Which one is the cheapest to use? Let's say I have my own MySQL database from Lunarpages or Bluehost hosting, to allow my wordpress on EC2 instance to connect/remote to my database on Lunarpages not allow my wordpress to connect remote to Amazon RDS. Which one is the cheapest to use? I heard people saying when you use Amazon RDS is very expensive, so I thought maybe to save costs to allow my wordpress to connect to my own database not Amazon RDS for wordpress. I don't know it is true or not. I don't know how it performance well. Which one is the best one. Any suggestion appreciated. Thank you.
I don't agree with that. In amazon AWS, the first thing you do is set up a virtual private network and create the corresponding network access interfaces. My experience working with heavy CMSs is that the architecture is much more stable with EC2 + RDS, each in one instance. In addition RDS has automated version maintenance and it is much more difficult to fail or suffer a crash, as opposed to a mysql or similar, running on the same virtual machine.
Also in terms of speed and performance, working with this scheme for example in wordpress, the system flies, the speed is much higher and appreciable even with small machines.
Running on a different hosting will cause extra latency.
Let's do the math on AWS RDS for the smallest instances (taking eu-west-1 region as example)
Running on RDS: db.t2.micro $0.018 per hour, or $12.96 per month for RDS. Free the first year under AWS free tier.
Running on EC2: t2.micro (You configure MySQL and backups, ...), $0.0126 per hour, or $9.07 per month. Free the first year under AWS free tier
If your application is small enough, you could host both your database and your application on the same machine (solution 2)
Performance wise is not good to have database on a totally different network of the website hosted itself. It'll delay. Imagine if you have a lot of calls, it'll multiply the delay.
You can host a local database on the EC2 it'self, this would be the best choice.

How to scale up write speed on galera cluster? using maxscale as db proxy

Currently, i am researching about galera cluster using many of servers(linux centos). Scaling up read traffic is very effective and easy, but scaling for write seems difficult(not improved).
I have used many servers, using maxscale as router(Readconnroute) to distribute write queries in paralles to all servers. But the write speed is not improved.
One option would be to use the Spider storage engine in MariaDB. It supports sharding of tables and should improve write speeds compared to a Galera cluster. On the other hand, you will lose the high availability of the Galera cluster in favor of increased write speeds.
This slide set by Kentoku Shiba on Spider is a good overview of how Spider improves write scalability.
Galera does not improve write speed, as all servers will have to process all writes. MySQL is very poor for scaling writes. You could do it with a proxy (like you mentioned maxscale). Then you can shard your data. You have to pick a key for each table to distribute keys to multiple servers.
I would suggest to use another nosql server i.e. mongodb, which have sharding capabilities built in for write heavy use cases. Mongodb is much easier to set up and to maintain than mysql for this job.

Method to replicate sqlite database across multiple servers

I'm developing an application that works distributed, and I have a SQLite database that must be shared between distributed servers.
If I'm in serverA, and change sqlite row, this change must be in the other servers instantly, but if a server were offline and then it came online, it must update all info equal other servers.
I'm trying to develop a HA service with small SQLite databases.
I'm thinking on something like MongoDB or ReThinkDB, due to replication works fine and I have got data independently server online I had.
There are a library or other SQL methodology to share data between servers?
I used the Raft consensus protocol to replicate my SQLite database. You can find the system here:
https://github.com/rqlite/rqlite
Here are some options:
LiteReplica:
It supports master-slave replication for SQLite3 databases using a single master (writable node) and one or many replicas (read-only nodes).
If a device went offline and then it came online, the secondary/slave dbs are updated with the primary/master one incrementally.
LiteSync:
It implements multi-master replication so we can write to the db in any node, even when the device is off-line.
On both we open the database using a modified URI, like this:
“file:/path/to/app.db?replica=master&bind=tcp://0.0.0.0:4444”
AergoLite:
Blockchain based, it has the highest level of security. Stores immutable relational data, secured by a distributed consensus with low resource usage.
Disclosure: I am the author of these solutions
You can synchronize SQLite databases by embedding SymmetricDS in your application. It supports occasionally connected clients, so it will capture changes and sync them when a server comes online. It supports several different database platforms and can be used as a library or as a standalone service.
You can also use CopyCat, which support SQLite as well as a few other database types.
Marmot looks good:
https://github.com/maxpert/marmot
From their docs:
What & Why?
Marmot is a distributed SQLite replicator with leaderless, and eventual consistency. It allows you to build a robust replication between your nodes by building on top of fault-tolerant NATS Jetstream. This means if you are running a read heavy website based on SQLite, you should be easily able to scale it out by adding more SQLite replicated nodes. SQLite is probably the most ubiquitous DB that exists almost everywhere, Marmot aims to make it even more ubiquitous for server side applications by building a replication layer on top.

Local SQLite vs Remote MongoDB

I'm designing a new web project and, after studying some options aiming scalability, I came up with two database solutions:
Local SQLite files carefully designed for a scalable fashion (one new database file for each X users, as writes will depend on user content, with no cross-user data dependence);
Remote MongoDB server (like Mongolab), as my host server doesn't serve MongoDB.
I don't trust MySQL server at current shared host, as it cames down very frequently (and I had problems with MySQL on another host, too). For the same reason I'm not goint to use postgres.
Pros of SQLite:
It's local, so it must be faster (I'll take care of using index and transactions properly);
I don't need to worry about tcp sniffing, as Mongo wire protocol is not crypted;
I don't need to worry about server outage, as SQLite is serverless.
Pros of MongoDB:
It's more easily scalable;
I don't need to worry on splitting databases, as scalability seems natural;
I don't need to worry about schema changes, as Mongo is schemaless and SQLite doesn't fully support alter table (specially considering changing many production files, etc.).
I want help to make a decision (and maybe consider a third option). Which one is better when write and read operations is growing?
I'm going to use Ruby.
One major risk of the SQLite approach is that as your requirements to scale increase, you will not be able to (easily) deploy on multiple application servers. You may be able to partition your users into separate servers, but if that server were to go down, you would have some subset of users who could not access their data.
Using MongoDB (or any other centralized service) alleviates this problem, as your web servers are stateless -- they can be added or removed at any time to accommodate web load without having to worry about what data lives where.

Resources