MariaDb Slow Query log - only query time is high - mariadb

I have the following entry in my slow query log:
Query_time: 4.000347
Lock_time: 0.000095
Rows_sent: 0
Rows_examined: 0
update `my_table` set `a` = null, `b` = 'x', c= ... (updating around 20 fields) where `id` = 1234;
id is the primary key in this table; and there is a record which matches this id
A select by PK is fast. And in most cases update by PK runs a lot faster; so I'm trying to work out the cause of the slower ones.
No significant spikes show up on CPU, I/O or memory monitoring; and the load on the system is relatively flat.
There seems to be some pattern with how close to exactly 4 seconds these queries are. Its not like I have a distribution between 2 and 6 seconds; they all right on 4!
The 4 second PK updates show up in little groups; and some of them came 4 seconds; 1 after the other.
All that; seems to indicate something ... odd.
There are over 1000 db connections on the server, so I'm wondering if there could be some thread scheduling stuff clogging things up from time to time?

Related

Can UPDATE referencing fields from the updated table cause deadlock?

I have a query like this:
UPDATE loginlogs SET rxbytes = rxbytes + ?, txbytes = txbytes + ? WHERE logid = ?
from time to time, my DB gets into a deadlock and SELECT * FROM sys.innodb_lock_waits shows pending locks like this:
wait_started wait_age wait_age_secs locked_table locked_index locked_type waiting_trx_id waiting_trx_started waiting_trx_age waiting_trx_rows_locked waiting_trx_rows_modified waiting_pid waiting_query waiting_lock_id waiting_lock_mode blocking_trx_id blocking_pid blocking_query blocking_lock_id blocking_lock_mode blocking_trx_started blocking_trx_age blocking_trx_rows_locked blocking_trx_rows_modified sql_kill_blocking_query sql_kill_blocking_connection
2020-12-27 07:43:32 00:00:04 4 `db`.`loginlogs` PRIMARY RECORD 37075679818 2020-12-27 07:43:32 00:00:04 1 0 19194139 UPDATE loginlogs SET ... HERE logid = 64634225227638257 37075679818:921:15944673:61 X 37075617021 19191704 UPDATE loginlogs SET ... HERE logid = 64634225227638257 37075617021:921:15944673:61 X 2020-12-27 07:43:07 00:00:29 1 0 KILL QUERY 19191704 KILL 19191704
As you can see, the 2 identical queries seem to run at the same time. And the 2nd one is waiting for the first one to complete.
I thought MySQL should handle simple UPDATE queries like this. Do I need to first select the bytes and then do the UPDATE without referencing rxbytes and txbytes in the new values?
BTW this started happening after update from MariaDB 10.4.2 to 10.4.17, therefore I also suspect MariaDB bug and I opened a bug report.
A lock-wait is not a deadlock!
I see this misunderstanding frequently.
What you have shown is a lock-wait. That is, one transaction is waiting for the other. Every UPDATE locks the rows it examines, and any concurrent UPDATE has to wait for the first one to COMMIT to release those row locks. That's a lock-wait. It's normal and common.
A deadlock is different. It's where two transactions get into a mutual lock-wait. UPDATE1 locks some rows, but does not commit yet. Then UPDATE2 tries to update the same rows, and begins waiting. Then the transaction for UPDATE1 tries another locking statement, that needs locks on some rows already held by the transaction for UPDATE2, possibly from a previous statement it ran. Thus both transactions are waiting for the other, and won't COMMIT to release the locks they hold, because they're waiting. The deadlock is so named because there's no way to resolve the mutual wait.
https://dev.mysql.com/doc/refman/8.0/en/innodb-deadlocks.html says:
A deadlock is a situation where different transactions are unable to proceed because each holds a lock that the other needs. Because both transactions are waiting for a resource to become available, neither ever release the locks it holds.
You won't experience a delay from a true deadlock. MySQL watches for these cyclical lock-waits and forces one of them to rollback its transaction. This happens nearly instantly, so there is virtually no waiting.
So why did this start happening to you when you upgraded from MariaDB 10.4.2 to 10.4.17. Apparently something changed that affects either the number of rows locked, or the duration of the transaction, to make it more likely that you have concurrent transactions conflict in this way.
Or else the software did not change anything related to locking or transactions, but it was your traffic that changed, coincidentally, as you upgraded to the new version of MariaDB.

Executing SQL to get few rows from a huge table is throwing a error "No more spool space error"

Select row_id from table_name
Sample 20;
issuing no more spool space error...
Is there any query to get 20 arbitary rows from a table in less time ? Assume table is very huge one
This is pretty common depending on the primary index of the table. If you add a predicate using the primary index you should return results.
Just add a WHERE clause with the primary index in it to limit the results, and you should see results without a spool issue.
Assuming
Table_T is table name
Table_v is view on that Table_T....
Following is the explain on query executed....
First we lock the table_T in view table_v for access
Next we will do an all Amp retrieve step from table in view table_v by way of an all rows scan with a condition of (table_T in view table_v.col2 is null) in to spool2(all_amps), which is built locally on all amps. The input table will not be cached in the memory, bit is eligible for synchronized scanning. The size of spool is estimated with high confidence to be 1 row (35 bytes). The estimated time for step is 2 minutes and 16 seconds
We do an all amps stat function step from spool2 by way of an all-rows scan in to spool5, which is redistributed by hash code to all amps. He result rows are put into spool1(group_amps), which is built locally on amps. The step is used to retrieve top 20 rows, then execute step4. The size is estimated with high confidence to be 1 row 41 bytes
We do an all-amps stat function step from spool2 (last use) by ways of an all row scan in to spool5 (last use) which is redistributed by hash code to all amps. The result rows are put in to spool1 (group_amps) which is built locally on the amps. This step is used to retrieve top 20 rows. The step is estimated with high confidence to be 1 row (41 bytes)
Finally we will send an END TRANSACTION to all amps involved
Content of spool1 are sent back to user

SQLite3 + Rasperry Pi3 - 1st select on startup slow

I am using the pi to record video surveillance data in a single, but highly populated table per camera. The table consists of 3 columns - Timestamp, offset and frame length. The video data is stored in a separate file on the filesystem. My code is written in C.
Timestamp is the date/time for a video frame in the stream, offset is the fseek/ftell offsets into the streaming data file and frame length is the length of the frame. Pretty self explanatory. The primary and only index is on the timestamp column.
There is one database writer forked process per camera and there could be multiple forked read-only processes querying the database at any time.
These processes are created by socket listeners in the classic client/server architecture which accept video streams from other processes that manage the surveillance cameras and clients that query it.
When a read-only client connects, it selects the first row in the database for a selected camera. For some reason, the select takes > 60 secs and subsequent selects on the same query are very snappy (much less than 1 sec). I've debugged the code to determine this is the cause.
I have these pragma's configured both for the reader and writer forked processes and have tried greater and lesser values with minimal if any impact:
pragma busy_timeout=7000
pragma cache_size=-4096
pragma mmap_size=4194304
I am assuming the cause is due to populating the SQLite3 caches when a read-only client connects, but I'm not sure what else to try.
I've implemented my own write caching/buffering strategy to help prevent locks, which helped significantly, but it did not solve the delay at startup problem.
I've also split the table by weekday in an attempt to help control the table population size. It seems once the population nears 100,000 rows, the problem starts occurring. The population for a table can be around 2.5 million rows per day.
Here is the query:
sprintf(sql, "select * from %s_%s.VIDEO_FRAME where TIME_STAMP = "
"(select min(TIME_STAMP) from %s_%s.VIDEO_FRAME)",
cam_name, day_of_week, cam_name, day_of_week);
(edit)
$ uname -a
Linux raspberrypi3 4.1.19-v7+ #858 SMP Tue Mar 15 15:56:00 GMT 2016 armv7l GNU/Linux
$ sqlite3
sqlite> .open Video_Camera_02__Belkin_NetCam__WED.db
sqlite> .tables VIDEO_FRAME
sqlite> .schema VIDEO_FRAME CREATE TABLE VIDEO_FRAME(TIME_STAMP UNSIGNED BIG INT NOT NULL,FRAME_OFFSET BIGINT, FRAME_LENGTH INTEGER,PRIMARY KEY(TIME_STAMP));
sqlite> explain query plan
...> select * from VIDEO_FRAME where TIME_STAMP = (select min(TIME_STAMP) from VIDEO_FRAME);
0|0|0|SEARCH TABLE VIDEO_FRAME USING INDEX sqlite_autoindex_VIDEO_FRAME_1 (TIME_STAMP=?)
0|0|0|EXECUTE SCALAR SUBQUERY 1 1|0|0|SEARCH TABLE VIDEO_FRAME USING COVERING INDEX sqlite_autoindex_VIDEO_FRAME_1 –
After some further troubleshooting, the culprit seems to be with the forked db writer process. I tried starting the r/o clients with no streaming data being written and the select returned immediately. I haven't found the root problem, but at least have isolated where it is coming from.
Thanks!

NHibernate Query slows other queries

i'm writing a program in which i use two database queries using NHibernate. First query is a large one - select with two joins (the big SELECT query) whose result is about 50000 records. Query takes about 30 secs. Next step in the program is iterating through these 50000 record and invoking query on each of this records. This query is pretty small COUNT method.
There are two interesting things tough:
If i run the small COUNT query before the big SELECT, the COUNT query takes about 10ms, but if i ran it after the big SELECT query it takes 8-9 seconds. Furthermore, if i reduce the complexity of the big SELECT query i also reduce the time of the COUNT query execution afterwards.
If i ran the the big SELECT query on sql server management studio it takes 1 sec, but from ASP.NET application it takes 30 secs.
SO there are two main questions. Why is the query taking so long to execute in code when its so fast in ssms? Why is the big SELECT query affecting the small COUNT queries afterwards.
I know there are many possible answers to this problem but i have googled a lot and this is what i have tried:
Setting the SET parameters of asp.net application and ssms so they are the same to avoid different query plans
Clearing the ssms cache so the good ssms result is not caused by ssms caching - same 1 second result after the cache clear
The big SELECT query:
var subjects = Query
.FetchMany(x => x.Registrations)
.FetchMany(x => x.Aliases)
.Where(x => x.InvalidationDate == null)
.ToList();
The small COUNT query:
Query.Count(x => debtorIRNs.Contains(x.DebtorIRN.CodIRN) && x.CurrentAmount > 0 && !x.ArchivationDate.HasValue && x.InvalidationDate == null);
As it turned out the above mentioned FatchMany's were inevitable for the program so i couldn't just skip. The first significant improvement i achieved was turning off the loggs of the application (as i mentioned the above code is just a fragment). Performance without logs were about a half faster. But still it took considerable amount of time. SO i decided to avoid using NHibernate for this query and wrote plain sqlQuery to data reader, which i than parsed into my object's. I was able to reduce the execution time from 2.5 days (50000 * 4 sec -> number of small queries * former execution time of one small query) to 8 minutes.

How come I still get deadlocks even after setting wsrep_retry_autocommit really high?

I have a cluster of 3 percona xtradb 5.5.34-55 servers and since they are all writable, I get deadlock errors under any substantial load. Increasing wsrep_retry_autocommit variable helped with it to some extent, but ER_LOCK_DEADLOCK did not disappear completely. So I've tried setting wsrep_retry_autocommit to 10000 (seems to be the maximum), thinking it would make some queries really slow, but none of them would fail with ER_LOCK_DEADLOCK:
mysql-shm -ss -e 'show global variables like "%wsrep_retry_auto%"'
wsrep_retry_autocommit 10000
------------------------
LATEST DETECTED DEADLOCK
------------------------
140414 10:29:23
*** (1) TRANSACTION:
TRANSACTION 72D8, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 376, 1 row lock(s), undo log entries 1
MySQL thread id 34, OS thread handle 0x7f11840d4700, query id 982 localhost shm update
REPLACE INTO metric(host, name, userid, sampleid, type, priority) VALUES
('localhost','cpu-3/cpu-nice',8,0,0,0),('localhost','cpu-3/cpu-system',8,0,0,0),
('localhost','cpu-3/cpu-idle',8,0,0,0),('localhost','cpu-3/cpu-wait',8,0,0,0),
('localhost','cpu-3/cpu-interrupt',8,0,0,0),('localhost','cpu-3/cpu-softirq',8,0,0,0),
('localhost','cpu-3/cpu-steal',8,0,0,0),('localhost','cpu-4/cpu-user',8,0,0,0),
('localhost','cpu-4/cpu-nice',8,0,0,0),('localhost','cpu-4/cpu-system',8,0,0,0),
('localhost','cpu-4/cpu-idle',8,0,0,0),('localhost','cpu-4/cpu-wait',8,0,0,0),
('localhost','cpu-4/cpu-interrupt',8,0,0,0),('localhost','cpu-4/cpu-softirq',8,0,0,0),
('localhost','cpu-4/cpu-steal',8,0,0,0)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 344 n bits 488 index `unique-metric` of
table `shm`.`metric` trx id 72D8 lock_mode X waiting
*** (2) TRANSACTION:
TRANSACTION 72D7, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
7 lock struct(s), heap size 3112, 141 row lock(s), undo log entries 40
MySQL thread id 50, OS thread handle 0x7f1184115700, query id 980 localhost shm update
REPLACE INTO metric(host, name, userid, sampleid, type, priority) VALUES
('localhost','cpu-3/cpu-nice',8,0,0,0),('localhost','cpu-3/cpu-system',8,0,0,0),
('localhost','cpu-3/cpu-idle',8,0,0,0),('localhost','cpu-3/cpu-wait',8,0,0,0),
('localhost','cpu-3/cpu-interrupt',8,0,0,0),('localhost','cpu-3/cpu-softirq',8,0,0,0),
('localhost','cpu-3/cpu-steal',8,0,0,0),('localhost','cpu-4/cpu-user',8,0,0,0),
('localhost','cpu-4/cpu-nice',8,0,0,0),('localhost','cpu-4/cpu-system',8,0,0,0),
('localhost','cpu-4/cpu-idle',8,0,0,0),('localhost','cpu-4/cpu-wait',8,0,0,0),
('localhost','cpu-4/cpu-interrupt',8,0,0,0),('localhost','cpu-4/cpu-softirq',8,0,0,0),
('localhost','cpu-4/cpu-steal',8,0,0,0),('localhost','cpu-3/cpu-nice',8,0,0,0),
('localhost','cpu-3/cpu-system',8,0,0,0),('localhost','cpu-3/cpu-idle',8,0,0,0),
('localhost','cpu-3/cpu-wait',8,0,0,0),('localhost','cpu-3/cpu-interrupt',8,0,0,0),
('localhost','cpu-3/cpu-softirq',8,0,0,0),('localhost'
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 0 page no 344 n bits 488 index `unique-metric` of table
`shm`.`metric` trx id 72D7 lock_mode X
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 344 n bits 504 index `unique-metric` of table
`shm`.`metric` trx id 72D7 lock_mode X locks gap before rec insert intention waiting
*** WE ROLL BACK TRANSACTION (1)
Shouldn't it be retried instead? Is there a way to verify percona actually retried the query 10000 times?
Don't have exact answer to your question, but for any write-intensive loads (if you you try to insert same data as the damn Drupal does), then deadlocks happen, and the only solution for me (still waiting to confirm this is 100% OK solution)- is to use haproxy in front of galera nodes, and define first node (haproxy backend definition) to be used, and other 2 nodes to be used as backup.
This way all mysql traffic will flow from clients, via haproxy to single galera node, and if that node fails, some other node will be used.
Hope that helps...
Andrija
In your answer scalability is a concern since we are in a cluster but make use of only one node is really bad use of resources. So alternatives will be , you can use any loadbalancer , if its haproxy you can create 2 listeners on two ports say 3306 and 3305 ; then say
lister bind to 3306 gets all the write requests from application , backend of this will have node 1 and then node2 and node3 as backup ;
lister bind to 3305 will have all read requests from application , whose backend will have all the nodes specified as normal.
So no its read scalable and write is with limited scalabilty where deadlock can be reduced to very extend.

Resources