Rabbitmq: Node down - tcp

I am getting node down error on rabbitmq, this is happening sometimes.
Able to see the below error when I execute: sudo rabbitmqctl status or sudo rabbitmqctl list_queues
Error: unable to connect to node : nodedown
connected to epmd (port 4369) on host-name
epmd reports node 'rabbit' running on port 25672
can't establish TCP connection, reason: timeout
suggestion: blocked by firewall?
version: {rabbit,"RabbitMQ","3.6.9"}
os: Ubuntu 16.04
I have checked hostname which is ok with me, not changed since the installation
Also able to telnet localhost 25672
What could be the reason behind this error and possible solution?
And one more question, I am checking node status using below API
curl -s GET http://edx:edx#127.0.0.1:15672/api/healthchecks/node/
Is above API ok or not to check the health status of the node? Please suggest if there is anything else. I have set up one shell script which will call this API and if status is not ok then it will restart rabbitmq-server service. Script is executed from cron every minute.

Looks like your rabbitmq node is... down. rabbitmqctl needs a running node to perform these commands.
If you're using systemd, you can check the service status:
service rabbitmq-server status
Or just try to restart the node:
rabbitmqctl start_app
Telnet on port 25672 tells you the rabbitmqctl is running, but RabbitMQ itself does not run on that port (by default, it's listening on 5672).

Related

web GRPC and Iroha (JS implementation for iroha)

I am trying to run this docker file https://gitlab.com/snippets/1713665
consoles
I have running iroha container as you can see in right console on 50051 port, but on running the above docker file for web GRPC then you can see in left console it is unable to make connection. as i have also tried with enabling and disabling the firewalls and also with opening the 50051 withudo ufw allow 50051 sudo ufw allow 50051 ...But in the end i have the same results
"Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:50051: connect: connection refused". Reconnecting... system=system"
I have also posted this issue month ago but no once gave me any response, Thats why i am reposting with further elaboration
Try running the grpc web proxy, with the backend address as localhost, instead of whatever is default in the gitlab post.
ex. ./grpcwebproxy-v0.13.0-osx-x86_64 --backend_addr=localhost:50051 --run_tls_server=false
From the console logs, it looks like it is trying to connect to dev.localdomain:50051

ICp 2.1.0.1: Installation failed with error TASK [master: Waiting for MariaDB service to start]

I am installing ICp 2.1.0.1 and I received an error at the TASK
[master: Waiting for MariaDB service to start] msg: The MariaDB
component failed to start.
After this msg the installation completed with failed status.
We are installing ICp with 3 Masters, 3 Proxies and 2 Workers. We have 1 IP for VIP master and 1 for VIP proxy.
I tried to install multiple times and all installations got the same error.
For prior issues with that error, the correct db admin password was not used. So check the db user and password to resolve issue.
Would you validate whether each master host was able to access port 3306 on the other hosts?
If you run with .. install -vv | tee -a install-log.txt, do you get additional details as well?
The error was solved by following the steps below.
Check whether kubelet is running:
Log in to your master node.
Run the following command to check kubelet status:
systemctl status kubelet
If kubelet is not running, run the following command to get the logs:
journalctl -u kubelet &> kubelet.log
We found the error in the kubelet.log log:
Error: failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false.
We found this troubleshoot in this link, and the solution at the ICP issue 4651.
https://www.ibm.com/support/knowledgecenter/en/SSBS6K_2.1.0/troubleshoot/etcd_fails.html
https://github.ibm.com/IBMPrivateCloud/roadmap/issues/4651

Permission denied while using 'Kaa-Node restart'

I am trying on an application and previously it worked and the data was able to be persisted into MongoDB. But recenntly , we had a change of router and thus we went ahead to regenerate SDK and etc but we still has the connection error.
Error :
2017/01/26 9:24:27 [WARNING] [kaa_bootstrap_manager.c:612] (-7) - Could not find next Bootstrap access point (protocol: id=0x56C8FF92, version=1)
2017/01/26 9:24:27 [ERROR] [kaa_tcp_channel.c:307] (-7) - Kaa TCP channel [0x929A2016] error notifying bootstrap manager on access point failure
2017/01/26 9:24:27 [ERROR] [kaa_client.c:240] (-7) - Failed to process OUT event for the client socket 3
And thus , we went ahead with troubleshooting where one of the staffs i emailed passed me a link for troubleshooting .
https://kaaproject.github.io/kaa/docs/v0.10.0/Administration-guide/Troubleshooting/
I followed already but i had an error where im stucked with writing 'kaa-node restart' to restart the node service.
Here are the commands for troubleshooting:
Connect to your Kaa Sandbox via ssh:
$ ssh kaa#<YOUR-SANDBOX-IP>
password: kaa
Stop the Kaa service:
$ sudo service kaa-node stop
Clear the Kaa logs:
$ sudo rm -rf /var/log/kaa/*
Start the Kaa service:
$ sudo service kaa-node start
I typed 'sudo service kaa-node start'. it gave me:
kaa#kaa-sandbox.kaaproject.org:~$ sudo service kaa-node start
* Starting Kaa Node daemon (kaa-node):
/bin/bash: /var/log/kaa/kaa-node-server.init.log: Permission denied
Try verifying the Kaa host on the Management page. Also, the Sandbox Web UI (the Management page) is able to restart all the necessary Kaa services on the Sandbox after the Kaa host change.
Please note that the Kaa host should match the PC host IP address accessible from the network your applications are running in.
Please try and let me know if this works for you.

Getting "connection refused" when trying to access etcd from within a Docker container

I am trying to access etcd from within a running Docker container. When I run
curl http://172.17.42.1:4001/v2/keys
I get
curl: (7) Failed to connect to 172.17.42.1 port 4001: Connection refused
I have four other hosts where this works fine, but every container on this machine has this problem. I'm really at a loss as to what's going on, and I don't know how to debug it.
My etcd environment variables are
ETCD_ADVERTISE_CLIENT_URLS=http://10.242.10.2:2379
ETCD_DISCOVERY=https://discovery.etcd.io/<token_removed>
ETCD_INITIAL_ADVERTISE_PEER_URLS=http://10.242.10.2:2380
ETCD_LISTEN_CLIENT_URLS=http://10.242.10.2:2379,http://127.0.0.1:2379,http://0.0.0.0:4001
ETCD_LISTEN_PEER_URLS=http://10.242.10.2:2380
I can also access etcd from the host with
curl http://localhost:4001/v2/keys
So there seems to be some error when routing from the container out to the host. But I can't figure out what it is. Can anyone point me in the right direction?
I observed I had to use the --advertise-client-urls and --listen-client-urls. Like so:
./etcd --advertise-client-urls 'http://0.0.0.0:2379,http://0.0.0.0:4001' --listen-client-urls 'http://0.0.0.0:2379,http://0.0.0.0:4001'
Then I was able to successfully do
curl -L http://hostname:2379/version
from any machine that could reach that server and it worked.
It turns out etcd was only listening on localhost:4001 on that machine, which is why I couldn't access it from within a container. This is despite me configuring one of the listen client urls to 0.0.0.0:4001.
It turns out that I had run sudo systemctl enable etcd2, which caused it to run before the cloud-config service ran. As such, etcd started with default configuration instead of the one that I had specified in my cloud config. Running sudo systemctl disable etcd2 fixed the issue.

opensuse network management undefined

I did an update on my opensuse box and networking stopped working. The system is trying to use networkmanager, even though it isn't installed. I am using yast to try and get it to use ifup, but it complains about no network connection. I tried running:
ifup eth0
and I get back:
Network is managed by '' -> skipping
Does anyone out there know why it is coming back empty and if there is a config file that I can manually tweak to fix this?
I'm assuming you are running 12.3 or 13.1 with systemd.
Disable network manager if it exists:
systemctl disable networkmanager.service
Enable network.service:
systemctl enable network.service
Make sure ifcfg-eth0 exists with a configuration in /etc/sysconfig/network/
Run ifup eth0
Hope this will help someone.
1. Disable NetworkManager, Stop is and then enable it and restart it respectively.
2. All this happens in console. Check the status for NetworkManager and in the status messages it should show that the interface(wierless) is disconnected. Confirm this by typing command "sudo nmcli c"
3. Type command "sudo iwlist (wireless-interface) scan" to show you the available wireless networks
4. If you see the network that you want to connect to listed, type command "nmcli a" and enter the corresponding connect phrase/password to connect

Resources