percona xtrabackup incremental backup vs replication - mariadb

I was playing with percona xtrabackup innobackupex for incremental backups. It is a cool tool and very efficient and effective for incremental backups. However, i could not help but wonder why doing incremental backups would be any better than just doing a regular mysql master-slave replication, and whenever needed to retrieve point-in-time data, just use the binary log?
What advantages would doing incremental backups have over doing master-slave replication? When should you choose to use over the other?

One disadvantage to using master-slave replication as a backup is that accidentally running data damaging commands like
DROP TABLE users;
would replicate to the slave.

They are solutions to two different problems; master-slave is redundancy and backup is resilience.
The MySQL JDBC driver has the ability to connect to many servers. If you look at the driver options (https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-url-format.html) you will notice that the host option is not only host, but hosts. If you specify the URL to both the master and the slave and something happens to the master the driver will automatically connect to the slave instead.
Backup, on the other hand, is, as was mentioned earlier, a way to recover from either a catastrophic crash (having your backups stored off-site is a must) or recover from a catastrophic mistake -- neither of which is served by a master-slave setup. (Well, technically you could have the slave at a different site but that still does not cover the mistake scenario)

Related

Are Google Cloud Disks OK to use with SQLite?

Google Cloud disks are network disks that behave like local disks. SQLite expects a local disk so that locking and transactions work correctly.
A. Is it safe to use Google Cloud disks for SQLite?
B. Do they support the right locking mechanisms? How is this done over the network?
C. How does disk IOP's and Throughput relate to SQLite performance? If I have a 1GB SQLite file with queries that take 40ms to complete locally, how many IOP's would this use? Which disk performance should I choose between (standard, balanced, SSD)?
Thanks.
Related
https://cloud.google.com/compute/docs/disks#pdspecs
Persistent disks are durable network storage devices that your instances can access like physical disks
https://www.sqlite.org/draft/useovernet.html
the SQLite library is not tested in across-a-network scenarios, nor is that reasonably possible. Hence, use of a remote database is done at the user's risk.
Yeah, the article you referenced, essentially stipulates that since the reads and writes are "simplified", at the OS level, they can be unpredictable resulting in "loss in translation" issues when going local-network-remote.
They also point out, it may very well work totally fine in testing and perhaps in production for a time, but there are known side effects which are hard to detect and mitigate against -- so its a slight gamble.
Again the implementation they are describing is not Google Cloud Disk, but rather simply stated as a remote networked arrangement.
My point is more that Google Cloud Disk may be more "virtual" rather than purely networked attached storage... to my mind that would be where to look, and evaluate it from there.
Checkout this thread for some additional insight into the issues, https://serverfault.com/questions/823532/sqlite-on-google-cloud-persistent-disk
Additionally, I was looking around and I found this thread, where one poster suggest using SQLite as a read-only asset, then deploying updates in a far more controlled process.
https://news.ycombinator.com/item?id=26441125
the persistend disk acts like a normal disk in your vm. and is only accessable to one vm at a time.
so it's safe to use, you won't lose any data.
For the performance part. you just have to test it. for your specific workload. if you have plenty of spare ram, and your database is read heavy, and seldom writes. the whole database will be cached by the os (linux) disk cache. so it will be crazy fast. even on hdd storage.
but if you are low on spare ram. than the database won't be in the os cache. and writes are always synced to disk. and that causes lots of I/O operations.
in that case use the highest performing disk you can / are willing to afford.

How to setup bidirectional rsync?

I tend to run simulations on a cluster that produces files larger than 100MB and I can't sync my computer with the cluster. So I considered setting up rsync between the two by following this link.
However, I believe this is just a cron job to sync the backup server with the main server and doesn't work in both directions. What will be the stepwise instructions to set up a bidirectional rsync ?
Both the systems run linux
Rsync isn't really the right tool for this job. You can sort of get it to work, using cron jobs and extremely carefully chosen parameters, but there's significant danger of data loss, especially if you want file deletion to propagate.
I'd recommend a tool like Syncthing for bidirectional sync. You want something that maintains an independent database of what's changed and what hasn't, and real-time updates are nice to have too.

MariaDb master slave with failover

I havea business need related to a MariaDb instance that should work in a master-slave configuration with failover.
Looking at the documentation I have seen that is possible to conigure a multi- cluster-master (galera) or a simple master slave replica.
Any suggestion to configure master-slave + failover?
Many thanks in advance
Roberto
MySQL/MariaDB master-slave replication is great for handling read-heavy workloads. It's also used as a redundancy strategy to improve database availability, and as a backup strategy (i.e. take the snapshot/backup on the slave to avoid interrupting the master). If you don't need a multi-master solution with all the headaches that brings—even with MySQL Cluster or MariaDB Galera Cluster—it's a great option.
It takes some effort to configure. There are several guides out there with conflicting information (e.g. MySQL vs. MariaDB, positional vs. GTID) and several decision points that can affect your implementation (e.g. row vs. statement binlog formats, storage engine selection), and you might have to stitch various pieces together to form your final solution. I've had good luck with MariaDB 10.1 (GTID, row binlog format) and mixed MyISAM and InnoDB storage engines. I create one slave user on the master per slave, and I don't replicate the mysql database. YMMV. This guide is a good starting place, but it doesn't really cover GTID.
Failover is a whole separate ball of wax. You will need some kind of a reverse proxy (such as MaxScale or HAproxy) or floating IP address in front of your master that can adjust to master changes. (There might be a way to do this client-side, but I wouldn't recommend it.) Something has to monitor the health of the cluster, and when it comes time to promote a slave to the new master, there is a whole sequence of steps that have to be performed. MySQL provides a utility called mysqlfailover to facilitate this process, but as far as I know, it is not compatible with MariaDB. Instead, you might take a look at replication-manager, which seems to be MariaDB's Go-based answer to mysqlfailover. It appears to be a very sophisticated tool.
Master-Slave helps with failover, but does not provide it.
MariaDB Cluster (Galera) does provide failover for most cases, assuming you have 3 nodes.

Method to replicate sqlite database across multiple servers

I'm developing an application that works distributed, and I have a SQLite database that must be shared between distributed servers.
If I'm in serverA, and change sqlite row, this change must be in the other servers instantly, but if a server were offline and then it came online, it must update all info equal other servers.
I'm trying to develop a HA service with small SQLite databases.
I'm thinking on something like MongoDB or ReThinkDB, due to replication works fine and I have got data independently server online I had.
There are a library or other SQL methodology to share data between servers?
I used the Raft consensus protocol to replicate my SQLite database. You can find the system here:
https://github.com/rqlite/rqlite
Here are some options:
LiteReplica:
It supports master-slave replication for SQLite3 databases using a single master (writable node) and one or many replicas (read-only nodes).
If a device went offline and then it came online, the secondary/slave dbs are updated with the primary/master one incrementally.
LiteSync:
It implements multi-master replication so we can write to the db in any node, even when the device is off-line.
On both we open the database using a modified URI, like this:
“file:/path/to/app.db?replica=master&bind=tcp://0.0.0.0:4444”
AergoLite:
Blockchain based, it has the highest level of security. Stores immutable relational data, secured by a distributed consensus with low resource usage.
Disclosure: I am the author of these solutions
You can synchronize SQLite databases by embedding SymmetricDS in your application. It supports occasionally connected clients, so it will capture changes and sync them when a server comes online. It supports several different database platforms and can be used as a library or as a standalone service.
You can also use CopyCat, which support SQLite as well as a few other database types.
Marmot looks good:
https://github.com/maxpert/marmot
From their docs:
What & Why?
Marmot is a distributed SQLite replicator with leaderless, and eventual consistency. It allows you to build a robust replication between your nodes by building on top of fault-tolerant NATS Jetstream. This means if you are running a read heavy website based on SQLite, you should be easily able to scale it out by adding more SQLite replicated nodes. SQLite is probably the most ubiquitous DB that exists almost everywhere, Marmot aims to make it even more ubiquitous for server side applications by building a replication layer on top.

Local SQLite vs Remote MongoDB

I'm designing a new web project and, after studying some options aiming scalability, I came up with two database solutions:
Local SQLite files carefully designed for a scalable fashion (one new database file for each X users, as writes will depend on user content, with no cross-user data dependence);
Remote MongoDB server (like Mongolab), as my host server doesn't serve MongoDB.
I don't trust MySQL server at current shared host, as it cames down very frequently (and I had problems with MySQL on another host, too). For the same reason I'm not goint to use postgres.
Pros of SQLite:
It's local, so it must be faster (I'll take care of using index and transactions properly);
I don't need to worry about tcp sniffing, as Mongo wire protocol is not crypted;
I don't need to worry about server outage, as SQLite is serverless.
Pros of MongoDB:
It's more easily scalable;
I don't need to worry on splitting databases, as scalability seems natural;
I don't need to worry about schema changes, as Mongo is schemaless and SQLite doesn't fully support alter table (specially considering changing many production files, etc.).
I want help to make a decision (and maybe consider a third option). Which one is better when write and read operations is growing?
I'm going to use Ruby.
One major risk of the SQLite approach is that as your requirements to scale increase, you will not be able to (easily) deploy on multiple application servers. You may be able to partition your users into separate servers, but if that server were to go down, you would have some subset of users who could not access their data.
Using MongoDB (or any other centralized service) alleviates this problem, as your web servers are stateless -- they can be added or removed at any time to accommodate web load without having to worry about what data lives where.

Resources