I have a cluster with 2 galera nodes and 1 arbitrator.
My node 1 crashed I don't understand why..
Here is the log of the node 1.
It seems that it is a problem with the pthread library.
Also every requests are proxied by 2 HAProxy.
2023-01-03 12:08:55 0 [Warning] WSREP: Handshake failed: peer did not return a certificate
2023-01-03 12:08:55 0 [Warning] WSREP: Handshake failed: peer did not return a certificate
2023-01-03 12:08:56 0 [Warning] WSREP: Handshake failed: http request
terminate called after throwing an instance of 'boost::wrapexcept<std::system_error>'
what(): remote_endpoint: Transport endpoint is not connected
230103 12:08:56 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 10.5.13-MariaDB-1:10.5.13+maria~focal
key_buffer_size=134217728
read_buffer_size=2097152
max_used_connections=101
max_threads=102
thread_count=106
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 760333 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
mariadbd(my_print_stacktrace+0x32)[0x55b1b67f7e42]
Printing to addr2line failed
mariadbd(handle_fatal_signal+0x485)[0x55b1b62479a5]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7ff88ea983c0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7ff88e59e18b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7ff88e57d859]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911)[0x7ff88e939911]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c)[0x7ff88e94538c]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3f7)[0x7ff88e9453f7]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa6a9)[0x7ff88e9456a9]
/usr/lib/galera/libgalera_smm.so(+0x448ad)[0x7ff884b5e8ad]
/usr/lib/galera/libgalera_smm.so(+0x1fc315)[0x7ff884d16315]
/usr/lib/galera/libgalera_smm.so(+0x1ff7eb)[0x7ff884d197eb]
/usr/lib/galera/libgalera_smm.so(+0x1ffc28)[0x7ff884d19c28]
/usr/lib/galera/libgalera_smm.so(+0x2065b6)[0x7ff884d205b6]
/usr/lib/galera/libgalera_smm.so(+0x1f81f3)[0x7ff884d121f3]
/usr/lib/galera/libgalera_smm.so(+0x1e6f04)[0x7ff884d00f04]
/usr/lib/galera/libgalera_smm.so(+0x103438)[0x7ff884c1d438]
/usr/lib/galera/libgalera_smm.so(+0xe8eea)[0x7ff884c02eea]
/usr/lib/galera/libgalera_smm.so(+0xe9a8d)[0x7ff884c03a8d]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x9609)[0x7ff88ea8c609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7ff88e67a293]
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Fatal signal 11 while backtracing
PS: if you want more data ask me :)
OK it seems that 2 simultaneous scans of OpenVAS crashes the node.
I tried with version 10.5.13 and 10.5.16 -> crash.
Solution: Upgrade to 10.5.17 at least.
Related
I submitted a job via slurm. The job ran for 12 hours and was working as expected. Then I got Data unpack would read past end of buffer in file util/show_help.c at line 501. It is usual for me to get errors like ORTE has lost communication with a remote daemon but I usually get this in the beginning of the job. It is annoying but still does not cause as much time loss as getting error after 12 hours. Is there a quick fix for this? Open MPI version is 4.0.1.
--------------------------------------------------------------------------
By default, for Open MPI 4.0 and later, infiniband ports on a device
are not used by default. The intent is to use UCX for these devices.
You can override this policy by setting the btl_openib_allow_ib MCA parameter
to true.
Local host: barbun40
Local adapter: mlx5_0
Local port: 1
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.
Local host: barbun40
Local device: mlx5_0
--------------------------------------------------------------------------
[barbun21.yonetim:48390] [[15284,0],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in
file util/show_help.c at line 501
[barbun21.yonetim:48390] 127 more processes have sent help message help-mpi-btl-openib.txt / ib port
not selected
[barbun21.yonetim:48390] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error
messages
[barbun21.yonetim:48390] 126 more processes have sent help message help-mpi-btl-openib.txt / error in
device init
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
An MPI communication peer process has unexpectedly disconnected. This
usually indicates a failure in the peer process (e.g., a crash or
otherwise exiting without calling MPI_FINALIZE first).
Although this local MPI process will likely now behave unpredictably
(it may even hang or crash), the root cause of this problem is the
failure of the peer -- that is what you need to investigate. For
example, there may be a core file that you can examine. More
generally: such peer hangups are frequently caused by application bugs
or other external events.
Local host: barbun64
Local PID: 252415
Peer host: barbun39
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[15284,1],35]
Exit code: 9
--------------------------------------------------------------------------
I am trying to run a simple MPI job across multiple hosts of a cluster.
[capc#gpu6 mpi_tests]$ /opt/openmpi4.0.3/build/bin/mpirun --host gpu7,gpu6 ./a.out
WARNING: There is at least non-excluded one OpenFabrics device found,
but there are no active ports detected (or Open MPI was unable to use
them). This is most certainly not what you wanted. Check your
cables, subnet manager configuration, etc. The openib BTL will be
ignored for this job.
Local host: gpu7
We have 2 processes.
WARNING: Open MPI accepted a TCP connection from what appears to be a
another Open MPI process but cannot find a corresponding process
entry for that peer.
This attempted connection will be ignored; your MPI job may or may not
continue properly.
Local host: gpu6
PID: 29209
[gpu6:29203] 1 more process has sent help message help-mpi-btl-openib.txt / no active ports found
[gpu6:29203] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
I have compiled the MPI program with mpicc and on running with mpirun it hangs.
Can anyone guide me regarding this?
I am still using Corda 1.0 version. when i try to redeploy nodes with existing data, getting below error while start-up but able to access the nodes . If i clear the data and redeploy the nodes, i didn't face these error message.
Logs can be found in :
C:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\kotlin-
source\build\nodes\xxxxxxxx\logs
Database connection url is : jdbc:h2:tcp://xxxxxxxxx/node
E 18:38:46+0530 [main] core.client.createConnection - AMQ214016: Failed to
create netty connection
javax.net.ssl.SSLException: handshake timed out
at io.netty.handler.ssl.SslHandler.handshake(...)(Unknown Source) ~[netty
all-4.1.9.Final.jar:4.1.9.Final]
Incoming connection address : xxxxxxxxxxxx
Listening on port : 10014
RPC service listening on port : 10015
Loaded CorDapps : corda-finance-1.0.0, kotlin-
source-0.1, corda-core-1.0.0
Node for "xxxxxxxxxxx" started up and registered in 213.08 sec
Welcome to the Corda interactive shell.
Useful commands include 'help' to see what is available, and 'bye' to shut
down the node.
Wed May 23 18:39:20 IST 2018>>> E 18:39:24+0530 [Thread-6 (ActiveMQ-server-
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImp
l$3#4a532271)] core.client.createConnection - AMQ214016: Failed to create
netty connection
javax.net.ssl.SSLException: handshake timed out
at io.netty.handler.ssl.SslHandler.handshake(...)(Unknown Source) ~[netty-
all-4.1.9.Final.jar:4.1.9.Final]
This looks like the Artemis failed to connect to the node which means the node fails to start.
You should look at the log and if there are any other previous Corda node started which occupy the node.
If there are any legacy Corda nodes that have not been killed, try ps -ef |grep java to see if there is any other java still alive. Especially look for the port number and check if they are overlapped
Cluster is constantly crashing.
Sometimes it works stable for 2 days. Sometimes five minutes after 2 servers stand.
server.cnf was tested with different parameters. The result did not change
When the setup is not installed, the single server is running without problems.
Clean installation several times.
iptables stop
selinux disable
yum repo
[mariadb]
name = MariaDB
baseurl = http://yum.mariadb.org/10.1/centos6-amd64
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck=1
Centos 6.9 64bit
yum install MariaDB-server MariaDB-client
Erorr Log
/var/lib/mysql/maria1.err
180305 16:17:03 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 10.1.31-MariaDB
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=5
max_threads=153
thread_count=2
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467163 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x48400
/etc/my.cnf.d/server.cnf
#
# These groups are read by MariaDB server.
# Use it for options that only the server (but not clients) should see
#
# See the examples of server my.cnf files in /usr/share/mysql/
#
# this is read by the standalone daemon and embedded servers
[server]
# this is only for the mysqld standalone daemon
[mysqld]
bind-address=0.0.0.0
skip-name-resolve
#
# * Galera-related settings
#
[galera]
# Mandatory settings
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://10.1.1.11,10.1.1.12"
wsrep_cluster_name='galera_cluster'
wsrep_node_address='10.1.1.11'
wsrep_node_name='maria1'
wsrep_sst_method=rsync
wsrep_sst_auth=sst_user:!!PassWordSs!!
binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
#
# Allow server to accept connections on all interfaces.
#
#bind-address=0.0.0.0
#
# Optional setting
#wsrep_slave_threads=1
#innodb_flush_log_at_trx_commit=0
# this is only for embedded server
[embedded]
# This group is only read by MariaDB servers, not by MySQL.
# If you use the same .cnf file for MySQL and MariaDB,
# you can put MariaDB-only options here
[mariadb]
# This group is only read by MariaDB-10.1 servers.
# If you use the same .cnf file for MariaDB of different versions,
# use this group for options that older servers don't understand
[mariadb-10.1]
Test DB structure
Please report this crash to MariaDB bug tracker
I have three nodes to setup the mariadb cluster.
When I use /usr/sbin/mysqld --wsrep-new-cluster --user=root & to start the cluster in the node1, but get the below error:
2017-08-05 17:41:36 140123776886528 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-08-05 17:41:36 140123777161408 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.1.19-MariaDB' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server
2017-08-05 17:41:41 140123698513664 [ERROR] InnoDB: InnoDB: Unable to allocate memory of size 18446744073709544120.
2017-08-05 17:41:41 7f7117464700 InnoDB: Assertion failure in thread 140123698513664 in file ha_innodb.cc line 22407
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
170805 17:41:41 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 10.1.19-MariaDB
key_buffer_size=0
read_buffer_size=131072
max_used_connections=3
max_threads=10002
thread_count=6
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 21969763 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0x7f6fe0b7e008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f7117463cc0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7f711ca3dcde]
/usr/sbin/mysqld(handle_fatal_signal+0x2d5)[0x7f711c561005]
/lib64/libpthread.so.0(+0xf100)[0x7f711bb7b100]
/lib64/libc.so.6(gsignal+0x37)[0x7f7119ed65f7]
/lib64/libc.so.6(abort+0x148)[0x7f7119ed7ce8]
/usr/sbin/mysqld(+0x73880f)[0x7f711c6e480f]
/usr/sbin/mysqld(+0x78fbb7)[0x7f711c73bbb7]
/usr/sbin/mysqld(+0x78fcfc)[0x7f711c73bcfc]
/usr/sbin/mysqld(+0x82da1c)[0x7f711c7d9a1c]
/usr/sbin/mysqld(+0x82dc9e)[0x7f711c7d9c9e]
/usr/sbin/mysqld(+0x80ab5f)[0x7f711c7b6b5f]
/usr/sbin/mysqld(+0x8002cc)[0x7f711c7ac2cc]
/usr/sbin/mysqld(+0x7360f2)[0x7f711c6e20f2]
/usr/sbin/mysqld(_ZN7handler18index_read_idx_mapEPhjPKhm16ha_rkey_function+0x85)[0x7f711c561965]
/usr/sbin/mysqld(_ZN7handler21ha_index_read_idx_mapEPhjPKhm16ha_rkey_function+0xa6)[0x7f711c565ca6]
/usr/sbin/mysqld(+0x479524)[0x7f711c425524]
/usr/sbin/mysqld(+0x4796a0)[0x7f711c4256a0]
/usr/sbin/mysqld(+0x47edd6)[0x7f711c42add6]
/usr/sbin/mysqld(_ZN4JOIN14optimize_innerEv+0x72f)[0x7f711c432c1f]
/usr/sbin/mysqld(_ZN4JOIN8optimizeEv+0x2f)[0x7f711c43551f]
/usr/sbin/mysqld(_Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x8f)[0x7f711c43565f]
/usr/sbin/mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0x245)[0x7f711c4361c5]
/usr/sbin/mysqld(+0x4291a1)[0x7f711c3d51a1]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x5f8f)[0x7f711c3e14cf]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x352)[0x7f711c3e4e62]
/usr/sbin/mysqld(+0x439689)[0x7f711c3e5689]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1fb0)[0x7f711c3e7d10]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x169)[0x7f711c3e8bb9]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x18a)[0x7f711c4af71a]
/usr/sbin/mysqld(handle_one_connection+0x40)[0x7f711c4af8f0]
/usr/sbin/mysqld(+0x96f37d)[0x7f711c91b37d]
/lib64/libpthread.so.0(+0x7dc5)[0x7f711bb73dc5]
/lib64/libc.so.6(clone+0x6d)[0x7f7119f9721d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f6fd9420020): is an invalid pointer
Connection ID (thread ID): 16
Status: NOT_KILLED
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
We think the query pointer is invalid, but we will try to print it anyway.
Query: SELECT agents.id AS agents_id, agents.agent_type AS agents_agent_type, agents.`binary` AS agents_binary, agents.topic AS agents_topic, agents.host AS agents_host, agents.availability_zone AS agents_availability_zone, agents.admin_state_up AS agents_admin_state_up, agents.created_at AS agents_created_at, agents.started_at AS agents_started_at, agents.heartbeat_timestamp AS agents_heartbeat_timestamp, agents.description AS agents_description, agents.configurations AS agents_configurations, agents.resource_versions AS agents_resource_versions, agents.`load` AS agents_load
FROM agents
WHERE agents.agent_type = 'Linux bridge agent' AND agents.host = 'ha-node2'