what is the difference between innodb_flush_log_at_trx_commit and sync_binlog? - innodb

I am reading what affects the durability of InnoDB and find 2 config fields. I have the following questions:
What is the difference between the 2 config fields? My first guess is that innodb_flush_log_at_trx_commit affects InnoDB redo log, and sync_binlog affects the MySQL standard binlog. Am I right?
My second question is if the logging process is divided into 3 phases: write to buffer, write to os cache, flush to disk. Then in which phase does the secondary replication happen?
about sync_binlog, I have another question. If sync_binlog is set to 0, according to innodb doc, the flush is not on commits but delegated to OS. Could it be the case that the binlog is synced before commit so that replication see data not committed?

Add innodb_flush_method to the list.
innodb_flush_log_at_trx_commit should be 1 for security (won't lose any data in a crash) or 2 for speed. (0, I think, is there for historic reasons and has no advantage.)
sync_binlog avoids a Slave from trying to read off the end of the Master's binlog (after a crash). Since the data was already sent to the Slaves, it is not a "data loss" issue, but an annoying error that is easily rectified by hand (move it to the next binlog).
I think this is the order of the replication steps. Note: In the case of a transaction, nothing happens until the COMMIT. (See also "binlog_cache_size".)
Send data to Slaves and flush the copy to the binlog if sync_binlog is on (else let it eventually be flushed). (I don't know which happens first, or even whether they are done by separate threads.)
The I/O thread on the Slave copies the data to its "relay log".
Eventually (usually right away), the execute thread performs the query.
(Multi-source replication and parallel execution on the Slave add further complications.)
Semi-sync gets involved somewhere.
Galera -- see "gcache", etc.
What do you mean by "secondary replication".
Your item 3 may be referring to an obscure case where something could drop through the cracks because too many pieces of hardware are involved.
If you want reliability, see Galera Cluster and/or InnoDB Cluster. This goes beyond what is available with simple Master-Slave replication even with semi-sync.

Related

what is the thread model of mysql and innodb?

I am learning the architecture of MySQL and InnoDB, and the thread model combined with the pluggable engine system confuses me. It is claimed that MySQL instance is one process with many threads, and InnoDB has many background threads such as master-thread, io threads to deal with callbacks of kernel aio. What's more, I find that the connection pool is managed by the MySQL layer not the pluggable InnoDB layer.
Then how do MySQL threads manage connections and their requests, and how does MySQL request go to kernel aio does it cooperate with InnoDB io thread?
A connection is given its own process (or thread, depending on OS). That process runs along as if it were a single process, except for access to a large number of caches, tables, etc. To deal with them, there are many "mutexes". These are used to give exclusive access to one process to, say, insert/delete something in the buffer_pool or table cache, etc.
In the old days, there were fewer Mutexes. For example, to do anything with the buffer pool a single (or maybe a few) Mutex granted access. This was fine when there was one CPU or maybe two. But when servers had 8 cores, they realized that this was not fine-grained enough. So there was an overhaul of the mutexes. (IIRC, it was spearheadded by Percona.)
Because of these mutexes, 4 processes could work "simultaneously" before they stumbled over each other. More that 8 cores --> throughput actually went down. (And latency went through the roof.)
As new versions came out, that number increased. With MySQL 8, it is possibly over 64.
Another thing that happened was "instances" of buffer_pools and table_open_caches. This was a straightforward way to split a mutex into several.
I am less familiar with the I/O thread model. I suspect the thread (for one connection) somehow signals another thread to do the work for it.
Also, several things are left to background tasks: flushing to logs, updating index blocks from the "change buffer", read-ahead, flushing of "dirty" pages from the buffer_pool, cleaning up BTrees, etc. Note that all of these would use the appropriate Mutex(es) so as not to step on each other and the various connection processes.
The "pluggable engine" system needs a different kind of chatter -- between the generic handler and the engine-specific code. I suspect this is orthogonal to the mutexes, background threads, etc.
The "connection pool" would be yet-another array (or list) of something, and multiple processes would want to grab/release an item in that array. Hence its own mutex(es).

Mariadb galera cluster and cap theorem

Where does mariadb galera cluster lies according to cap theorem CP or AP based on a brief explanation how it works.
Consistency -- For handling the "critical read" problem, Galera needs a little help. See http://mysql.rjweb.org/doc.php/galera#critical_reads
Otherwise, one can state that Galera survives "any" single-point-of-failure.
Galera is normally deployed in 3 nodes, one in each of 3 geographic locations. That means that no single machine failure, data center failure, earthquake, tornado, network outage, etc, can take out more than one node at a time. The other two nodes (whichever two survive and still talk to each other) will declare that they "have a quorum" and continue to accept writes and deliver reads. Further, "split brain" is not 'possible'; this is what keeps any attempt at dual-master, even with monitoring from surviving any SPOF.
If the third node or the network is repaired, the Cluster goes about patching up the data as needed, so that the 3 nodes again have identical data.
Granted, this is not quite the same as the definition of CAP, but is is a reasonable goal for a computer cluster.
How it works (in a tiny nutshell)... Each node talks to each other node. It does this only during the COMMIT of a transaction. (Hence, it is reasonably efficient even when spread across a WAN, as needed to survive natural disasters.) The COMMIT says to the other nodes "I am about to do this write; is it OK?" Without actually doing the write, they check Galera's magic sauce to see if it will succeed. Once everyone says "yes", the COMMIT returns success to the client. (That gives you a hint of the "critical" read issue.)

How Galera Cluster guarantees consistency?

I'm searching for a high-available SQL solution! One of the articles that I read was about "virtually synchronized" in Galera Cluster: https://www.percona.com/blog/2012/11/20/understanding-multi-node-writing-conflict-metrics-in-percona-xtradb-cluster-and-galera/
He says
When the writeset is actually applied on a given node, any locking
conflicts it detects with open (not-yet-committed) transactions on
that node cause that open transaction to get rolled back.
and
Writesets being applied by replication threads always win
What will happen if the WriteSet conflicts with a committed transaction?
He also says:
Writesets are then “certified” on every node (in order).
How does Galera Cluster make WriteSets ordered over a cluster? Is there any hidden master node who make WriteSets ordered; something like Zookeeper? or what?
This is for the second question (about how Galera orders the writesets).
Galera implements Extended Virtual Synchrony (EVS) based on the Totem protocol. The Totem protocol implements a form of token passing, where only the node with the token is allowed to send out new requests (as I understand it). So the writes are ordered since only one node at a time has the token.
For the academic background, you can look at these:
The Totem Single-Ring Ordering and Membership Protocol
The database state machine and group communication issues
(This Answer does not directly tackle your Question, but it may give you confidence that Galera is 'good'.)
In Galera (PXC, etc), there are two general times when a transaction can fail.
On the node where the transaction is being run, the actions are compared to what is currently running on the same node. If there is a conflict, either one of the transactions is stalled (think innodb_lock_wait_timeout) or is deadlocked (and rolled back).
At COMMIT time, info is sent to all the other nodes; they check your transaction against anything on the node or pending (in gcache). If there is a conflict, a message is sent back saying that there would be trouble. So, the originating node has the COMMIT fail. For this reason, you must check for errors even on the COMMIT statement.
As with single-node systems, a deadlock is usually resolved by replaying the entire transaction.
In the case of autocommit, there is a small, configurable, number of retries, after which the statement will fail. So, again, check for errors. However, since a retry has already been tried, you may want to abort the program.
Currently (in my opinion) Galera, with at least 3 nodes in at least 3 different physical locations, is the best available HA solution for MySQL. It can effectively survive any single-point-of-failure. (Group Replication / InnoDB Cluster, from Oracle, is coming soon, and is very promising.)
One thing to note is that the "critical read" problem has a solution in Galera, but you have to take action. See wsrep_sync_wait. (As of this writing, InnoDB Cluster has no solution.)
See http://mysql.rjweb.org/doc.php/galera for tips (some of which are included above) on coding differences when moving to PXC/Galera.

Synchronous vs Asynchronous Clustering

I was reading the mariaDD knowledge base on Galera Cluster and i came across this:
The basic difference between synchronous and asynchronous replication is that "synchronous" guarantees that if changes happened on one node of the cluster, they happened on other nodes "synchronously", or at the same time. "Asynchronous" gives no guarantees about the delay between applying changes on "master" node and the propagation of changes to "slave" nodes. The delay can be short or long. This also implies that if master node crashes, some of the latest changes may be lost
With the last sentence, i have always understood that even though the updates on the slave in the asynchronous cluster setup is not performed at the same time, it logs these updates to a bin log file as the updates are being made on the master. So in the case that the master crashes before all the data is passed on to the slave, the updates will still go ahead when the master is restored since the bin log file logged the updates. Can somebody please tell me if my understanding is wrong and clarify on the matter for me please. Thanks.
In your example of a normal replication pair, the slave would catch up after the master comes back. Assuming the master does come back, you wouldn't really lose the data but if the master is permanently dead, the data is lost. The knowledge base article you mention is talking about the replication delay and not the overall integrity of the replication stream.
With normal replication, if the slave io thread (the part that gets the replication events from the master) is able to keep up with the master, then the slave may only lose a couple seconds if the master crashes. However, if it cannot keep up and is for example 1 hour behind, the slave would lose access to 1 hour of data. Another way you could lose access to data on the slave is if you have a max relay log size set and that is reached.
Galera makes sure that the write is sent to every node in the cluster before it is actually committed on any of the nodes so once the node that the write is done on commits the write, all of the other nodes will commit the same write. With galera, all writes basically happen at the same time on every node. Losing any node at any time during normal operation will not cause any data loss.

sqlite database connection/locking question

Folks
I am implementing a file based queue (see my earlier question) using sqlite. I have the following threads running in background:
thread-1 to empty out a memory structure into the "queue" table (an insert into "queue" table).
thread-1 to read and "process" the "queue" table runs every 5 to 10 seconds
thread-3 - runs very infrequently and purges old data that is no longer needed from the "queue" table and also runs vacuum so the size of the database file remains small.
Now the behavior that I would like is for each thread to get whatever lock it needs (waiting with a timeout if possible) and then completing the transaction. It is ok if threads do not run concurrently - what is important is that the transaction once begins does not fail due to "locking" errors such as "database is locked".
I looked at the transaction documentation but there does not seem to be a "timeout" facility (I am using JDBC). Can the timeout be set to a large quantity in the connection?
One solution (untried) I can think of is to have a connection pool of max 1 connection. Thus only one thread can connect at a time and so we should not see any locking errors. Are there better ways?
Thanx!
If it were me, I'd use a single database connection handle. If a thread needs it, it can allocate it within a critical section (or mutex, or similar) - this is basically a poor man's connection pool with only one connection in the pool:) It can do its business with the databse. When done, it exits the critical section (or frees the mutex or ?). You won't get locking errors if you carefully use the single db connection.
-Don

Resources