I have a cluster with 2 galera nodes and 1 arbitrator.
My node 1 crashed I don't understand why..
Here is the log of the node 1.
It seems that it is a problem with the pthread library.
Also every requests are proxied by 2 HAProxy.
2023-01-03 12:08:55 0 [Warning] WSREP: Handshake failed: peer did not return a certificate
2023-01-03 12:08:55 0 [Warning] WSREP: Handshake failed: peer did not return a certificate
2023-01-03 12:08:56 0 [Warning] WSREP: Handshake failed: http request
terminate called after throwing an instance of 'boost::wrapexcept<std::system_error>'
what(): remote_endpoint: Transport endpoint is not connected
230103 12:08:56 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 10.5.13-MariaDB-1:10.5.13+maria~focal
key_buffer_size=134217728
read_buffer_size=2097152
max_used_connections=101
max_threads=102
thread_count=106
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 760333 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
mariadbd(my_print_stacktrace+0x32)[0x55b1b67f7e42]
Printing to addr2line failed
mariadbd(handle_fatal_signal+0x485)[0x55b1b62479a5]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7ff88ea983c0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7ff88e59e18b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7ff88e57d859]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911)[0x7ff88e939911]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c)[0x7ff88e94538c]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3f7)[0x7ff88e9453f7]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa6a9)[0x7ff88e9456a9]
/usr/lib/galera/libgalera_smm.so(+0x448ad)[0x7ff884b5e8ad]
/usr/lib/galera/libgalera_smm.so(+0x1fc315)[0x7ff884d16315]
/usr/lib/galera/libgalera_smm.so(+0x1ff7eb)[0x7ff884d197eb]
/usr/lib/galera/libgalera_smm.so(+0x1ffc28)[0x7ff884d19c28]
/usr/lib/galera/libgalera_smm.so(+0x2065b6)[0x7ff884d205b6]
/usr/lib/galera/libgalera_smm.so(+0x1f81f3)[0x7ff884d121f3]
/usr/lib/galera/libgalera_smm.so(+0x1e6f04)[0x7ff884d00f04]
/usr/lib/galera/libgalera_smm.so(+0x103438)[0x7ff884c1d438]
/usr/lib/galera/libgalera_smm.so(+0xe8eea)[0x7ff884c02eea]
/usr/lib/galera/libgalera_smm.so(+0xe9a8d)[0x7ff884c03a8d]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x9609)[0x7ff88ea8c609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7ff88e67a293]
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Fatal signal 11 while backtracing
PS: if you want more data ask me :)
OK it seems that 2 simultaneous scans of OpenVAS crashes the node.
I tried with version 10.5.13 and 10.5.16 -> crash.
Solution: Upgrade to 10.5.17 at least.
I submitted a job via slurm. The job ran for 12 hours and was working as expected. Then I got Data unpack would read past end of buffer in file util/show_help.c at line 501. It is usual for me to get errors like ORTE has lost communication with a remote daemon but I usually get this in the beginning of the job. It is annoying but still does not cause as much time loss as getting error after 12 hours. Is there a quick fix for this? Open MPI version is 4.0.1.
--------------------------------------------------------------------------
By default, for Open MPI 4.0 and later, infiniband ports on a device
are not used by default. The intent is to use UCX for these devices.
You can override this policy by setting the btl_openib_allow_ib MCA parameter
to true.
Local host: barbun40
Local adapter: mlx5_0
Local port: 1
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.
Local host: barbun40
Local device: mlx5_0
--------------------------------------------------------------------------
[barbun21.yonetim:48390] [[15284,0],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in
file util/show_help.c at line 501
[barbun21.yonetim:48390] 127 more processes have sent help message help-mpi-btl-openib.txt / ib port
not selected
[barbun21.yonetim:48390] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error
messages
[barbun21.yonetim:48390] 126 more processes have sent help message help-mpi-btl-openib.txt / error in
device init
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
An MPI communication peer process has unexpectedly disconnected. This
usually indicates a failure in the peer process (e.g., a crash or
otherwise exiting without calling MPI_FINALIZE first).
Although this local MPI process will likely now behave unpredictably
(it may even hang or crash), the root cause of this problem is the
failure of the peer -- that is what you need to investigate. For
example, there may be a core file that you can examine. More
generally: such peer hangups are frequently caused by application bugs
or other external events.
Local host: barbun64
Local PID: 252415
Peer host: barbun39
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[15284,1],35]
Exit code: 9
--------------------------------------------------------------------------
I'm seeing an IO error on the Riak console. I'm not sure what the cause is as the owner of the directory is riak. Here's how the error looks.
2018-01-25 23:18:06.922 [info] <0.2301.0>#riak_kv_vnode:maybe_create_hashtrees:234 riak_kv/730750818665451459101842416358141509827966271488: unable to start index_hashtree: {error,{{badmatch,{error,{db_open,"IO error: lock /var/lib/riak/anti_entropy/v0/730750818665451459101842416358141509827966271488/LOCK: already held by process"}}},[{hashtree,new_segment_store,2,[{file,"src/hashtree.erl"},{line,725}]},{hashtree,new,2,[{file,"src/hashtree.erl"},{line,246}]},{riak_kv_index_hashtree,do_new_tree,3,[{file,"src/riak_kv_index_hashtree.erl"},{line,712}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{riak_kv_index_hashtree,init_trees,3,[{file,"src/riak_kv_index_hashtree.erl"},{line,565}]},{riak_kv_index_hashtree,init,1,[{file,"src/riak_kv_index_hashtree.erl"},{line,308}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}}
2018-01-25 23:18:06.927 [info] <0.2315.0>#riak_kv_vnode:maybe_create_hashtrees:234 riak_kv/890602560248518965780370444936484965102833893376: unable to start index_hashtree: {error,{{badmatch,{error,{db_open,"IO error: lock /var/lib/riak/anti_entropy/v0/890602560248518965780370444936484965102833893376/LOCK: already held by process"}}},[{hashtree,new_segment_store,2,[{file,"src/hashtree.erl"},{line,725}]},{hashtree,new,2,[{file,"src/hashtree.erl"},{line,246}]},{riak_kv_index_hashtree,do_new_tree,3,[{file,"src/riak_kv_index_hashtree.erl"},{line,712}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{riak_kv_index_hashtree,init_trees,3,[{file,"src/riak_kv_index_hashtree.erl"},{line,565}]},{riak_kv_index_hashtree,init,1,[{file,"src/riak_kv_index_hashtree.erl"},{line,308}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}}
2018-01-25 23:18:06.928 [error] <0.27284.0> CRASH REPORT Process <0.27284.0> with 0 neighbours exited with reason: no match of right hand value {error,{db_open,"IO error: lock /var/lib/riak/anti_entropy/v0/890602560248518965780370444936484965102833893376/LOCK: already held by process"}} in hashtree:new_segment_store/2 line 725 in gen_server:init_it/6 line 328
Any ideas on what the problem could be?
I have a soft PBX setup using asterisk,dahdi and libpri. Asterisk getting stopped frequently when handling more than 200 calls. Due to this, all processing calls are getting abandon.
Server Configurations :
RAM : 32 GB
Processor : 16 core
OS : debian Squeeze - 64 bit ( installed without X )
Asterisk Version : 13.10
Dahdi TE435/235 Version : 2.11.1 (we are using 4 port card 2 Nos)
Libpri Version : 1.4.11
We have changed maxfiles to 2000 in asterisk.conf for handling 240 calls
Getting below error in dmesg:
wcte43x 0000:05:00.0: Underrun detected by hardware. Latency at max of 12ms.
[406144.759396] __ratelimit: 48 callbacks suppressed
Getting below warning in asterisk log:
WARNING[4876][C-000000db] sig_analog.c: Ring/Off-hook in strange state 6 on channel 37
WARNING[4876][C-000000db] channel.c: Unexpected control subclass '2'
Getting below message in message log,
Altumivr kernel: [165794.686917] asterisk[32641] trap divide error ip:7f14375e75eb sp:7f1411b1c1a0 error:0 in res_musiconhold.so[7f14375e1000+b000]
Is there need to do any tweak in configuration level. Please assist and suggest.
The problem with digium card (wcte43x). The error
Underrun detected by hardware. Latency at max of 12ms is resolved once replaced the card. Thank you.
I am using percona-toolkit for analysing mysql-slow-query (logs). So the command is pretty basic:
pt-query-digest slowquery.log
Now the result(error) is:
18.2s user time, 100ms system time, 35.61M rss, 105.19M vsz
Current date: Thu Jul 7 17:18:43 2016
Hostname: Jammer
Files: slowquery.log
Pipeline process 5 (iteration) caused an error: Redundant argument in sprintf at /usr/bin/pt-query-digest line 2556.
Will retry pipeline process 4 (iteration) 2 more times.
..
..(same result prints twice)
..
The pipeline caused an error: Pipeline process 5 (iteration) caused an error: Redundant argument in sprintf at /usr/bin/pt-query-digest line 2556.
Terminating pipeline because process 4 (iteration) caused too many errors.
Now the specifics for the environment, I am using Ubuntu 16.04 , MariaDB 10.1.14, Percona-Toolkit 2.2.16
I found something here bug-report, but it is like a workaround and does not actually solve the error. Even after applying the patch the command result doesn't look satisfying enough.
I am facing same problem on ubuntu 16.04 MySql.
The contents of my slow query log is as follow.
/usr/sbin/mysqld, Version: 5.7.16-0ubuntu0.16.04.1-log ((Ubuntu)). started with:
Tcp port: 3306 Unix socket: /var/run/mysqld/mysqld.sock
Time Id Command Argument
/usr/sbin/mysqld, Version: 5.7.16-0ubuntu0.16.04.1-log ((Ubuntu)). started with:
Tcp port: 3306 Unix socket: /var/run/mysqld/mysqld.sock
Time Id Command Argument
Time: 2016-12-08T05:13:55.140764Z
User#Host: root[root] # localhost [] Id: 20
Query_time: 0.003770 Lock_time: 0.000200 Rows_sent: 1 Rows_examined: 2
SET timestamp=1481174035;
SELECT COUNT(*) FROM INFORMATION_SCHEMA.TRIGGERS;
The error is same:
The pipeline caused an error: Pipeline process 5 (iteration) caused an
error: Redundant argument in sprintf at /usr/bin/pt-query-digest line 2556.
Ubuntu 16.04
MySql Ver 14.14 Distrib 5.7.16
pt-query-digest 2.2.16
The bug appears to be fixed in the current version of the toolkit (2.2.20), and apparently in previous ones, starting from 2.2.17.
This patch seems to do the trick for this particular place in pt-query-digest:
--- percona-toolkit-2.2.16/bin/pt-query-digest 2015-11-06 14:56:23.000000000 -0500
+++ percona-toolkit-2.2.20/bin/pt-query-digest 2016-12-06 17:01:51.000000000 -0500
## -2555,8 +2583,8 ##
}
return sprintf(
$num =~ m/\./ || $n
- ? "%.${p}f%s"
- : '%d',
+ ? '%1$.'.$p.'f%2$s'
+ : '%1$d',
$num, $units[$n]);
}
But as mentioned in the original question and bug report, quite a few tools/functions were affected, the full bugfix consisted of a lot of small changes:
https://github.com/percona/percona-toolkit/pull/73/files
I might be late here. I want to share how I overcame that same error as it might help someone who is searching for an answer. At this time the latest tag of Percona toolkit is 3.0.9
I tried to run pt-query-digest after installing via apt, by downloading deb file as methods provided by Percona documentation, but any of it didn't help. It was this same error.
Pipeline process 5 (iteration) caused an error:
Redundant argument in sprintf at /usr/bin/pt-query-digest line (some line)
1 - So I deleted/removed the installation of percona-toolkit
2 - first, I cleaned/updated perl version
sudo apt-get install perl
3 - then I installed Percona toolkit from source as mentioned in the repository's readme. like this. I used branch 3.0.
git clone git#github.com:percona/percona-toolkit.git
cd percona-toolkit
perl Makefile.PL
make
make test
make install
Thats it. Hope this help to someone.
i found error in this version percona-toolkit-3.0.12-1.el7.x86_64.rpm
and percona-toolkit-3.0.10-1.el7.x86_64.rpm is fine, percona-toolkit is very useful to me
at ./pt-query-digest line 9302.
Terminating pipeline because process 4 (iteration) caused too many errors.
Note that you will see the error message:
"Redundant argument in sprintf at"
if you forget to put a % in front of your format spec (first argument).