ICp 2.1.0.1: Installation failed with error TASK [master: Waiting for MariaDB service to start] - mariadb

I am installing ICp 2.1.0.1 and I received an error at the TASK
[master: Waiting for MariaDB service to start] msg: The MariaDB
component failed to start.
After this msg the installation completed with failed status.
We are installing ICp with 3 Masters, 3 Proxies and 2 Workers. We have 1 IP for VIP master and 1 for VIP proxy.
I tried to install multiple times and all installations got the same error.

For prior issues with that error, the correct db admin password was not used. So check the db user and password to resolve issue.

Would you validate whether each master host was able to access port 3306 on the other hosts?
If you run with .. install -vv | tee -a install-log.txt, do you get additional details as well?

The error was solved by following the steps below.
Check whether kubelet is running:
Log in to your master node.
Run the following command to check kubelet status:
systemctl status kubelet
If kubelet is not running, run the following command to get the logs:
journalctl -u kubelet &> kubelet.log
We found the error in the kubelet.log log:
Error: failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false.
We found this troubleshoot in this link, and the solution at the ICP issue 4651.
https://www.ibm.com/support/knowledgecenter/en/SSBS6K_2.1.0/troubleshoot/etcd_fails.html
https://github.ibm.com/IBMPrivateCloud/roadmap/issues/4651

Related

Podman build command unable to pull image

I have configured Subuid and Subgid after installing Podman in RHEL7
I have created a simple Dockerfile to print hello world and was trying to build the image.
My Dockerfile
FROM alpine
CMD ["echo", "Hello World"]
To test I am running below command
Podman build -t imagename .
I see the below error received.
STEP 1: FROM alpine
Error: error creating build container: The following failures happened while trying to pull image specified by "alpine" based on search registries in /etc/containers/registries.conf:
* "localhost/alpine": Error initializing source docker://localhost/alpine:latest: error pinging docker registry localhost: Get https://localhost/v2/: dial tcp [::1]:443: connect: connection refused
* "registry.access.redhat.com/alpine": Error initializing source docker://registry.access.redhat.com/alpine:latest: error pinging docker registry registry.access.redhat.com: Get https://registry.access.redhat.com/v2/: read tcp 10.70.85.174:17758->23.54.147.129:443: read: connection reset by peer
* "registry.redhat.io/alpine": Error initializing source docker://registry.redhat.io/alpine:latest: error pinging docker registry registry.redhat.io: Get https://registry.redhat.io/v2/: read tcp 10.70.85.174:36028->104.79.150.216:443: read: connection reset by peer
* "docker.io/library/alpine": Error initializing source docker://alpine:latest: error pinging docker registry registry-1.docker.io: Get https://registry-1.docker.io/v2/: read tcp 10.70.85.174:53352->18.213.137.78:443: read: connection reset by peer
Am I missing any configuration ?
Thanks
Have you still the docket Daemon running and/or docker installed?
First stop the docker Daemon
sudo systemctl stop docker
OR
sudo service docker stop
Then uninstall docker
Ubuntu here but what ever you need you can Google :D
sudo apt-get remove docker docker-engine docker.io containerd runc
Try again,
If other fail now try a refreshed install of podman
sudo --reinstall install podman
Sources
https://www.cyberciti.biz/faq/debian-ubuntu-linux-reinstall-a-package-using-apt-get-command/
https://askubuntu.com/questions/935569/how-to-completely-uninstall-docker
https://intellipaat.com/community/43965/how-to-stop-docker
https://podman.io/getting-started/installation
I suggest that you first search your image in registries
podman search alpine
you should get a list of images available. Choose the one you want - version, name, tag etc and put that in the dockerfile.
to be sure it is accessible, do the 'pull' manually
podman pull alpine<version,tag>

Rabbitmq: Node down

I am getting node down error on rabbitmq, this is happening sometimes.
Able to see the below error when I execute: sudo rabbitmqctl status or sudo rabbitmqctl list_queues
Error: unable to connect to node : nodedown
connected to epmd (port 4369) on host-name
epmd reports node 'rabbit' running on port 25672
can't establish TCP connection, reason: timeout
suggestion: blocked by firewall?
version: {rabbit,"RabbitMQ","3.6.9"}
os: Ubuntu 16.04
I have checked hostname which is ok with me, not changed since the installation
Also able to telnet localhost 25672
What could be the reason behind this error and possible solution?
And one more question, I am checking node status using below API
curl -s GET http://edx:edx#127.0.0.1:15672/api/healthchecks/node/
Is above API ok or not to check the health status of the node? Please suggest if there is anything else. I have set up one shell script which will call this API and if status is not ok then it will restart rabbitmq-server service. Script is executed from cron every minute.
Looks like your rabbitmq node is... down. rabbitmqctl needs a running node to perform these commands.
If you're using systemd, you can check the service status:
service rabbitmq-server status
Or just try to restart the node:
rabbitmqctl start_app
Telnet on port 25672 tells you the rabbitmqctl is running, but RabbitMQ itself does not run on that port (by default, it's listening on 5672).

Starting OpenShift cluster never ends when starting minishift or takes to much memory

Whenever I run the command to start Minishift with the virtualbox driver to the OS host it takes a crazy time and it never ends. Sometimes I even get an error message on storage limit being reached.
I wonder if it is an error of Persistent storage volume configuration and usage that is described here
mike#mike-thinks:~$ minishift start --vm-driver=virtualbox
-- Starting profile 'minishift'
-- Check if depereccated options are used ... OK
-- Checking if https://github.com is reachable ... OK
-- Checking if requested OpenShift version 'v3.9.0' is valid ... OK
-- Checking if requested OpenShift version 'v3.9.0' is supported ... OK
-- Checking if requested hypervisor 'virtualbox' is supported on this platform ... OK
-- Checking if VirtualBox is installed ... OK
-- Checking the ISO URL ... OK
-- Checking if provided oc flags are supported ... OK
-- Starting local OpenShift cluster using 'virtualbox' hypervisor ...
-- Starting Minishift VM ........................ OK
-- Checking for IP address ... OK
-- Checking for nameservers ... OK
-- Checking if external host is reachable from the Minishift VM ...
Pinging 8.8.8.8 ... OK
-- Checking HTTP connectivity from the VM ...
Retrieving http://minishift.io/index.html ... OK
-- Checking if persistent storage volume is mounted ... OK
-- Checking available disk space ... 8% used OK
-- OpenShift cluster will be configured with ...
Version: v3.9.0
-- Copying oc binary from the OpenShift container image to VM .... OK
-- Starting OpenShift cluster ..............................................
What I can do ? I'm following this tutorial and I just want to get to the stage that allows me to add oc to the PATH
Update: new error during openshift cluster start
-- Starting OpenShift cluster ...........Error during 'cluster up' execution: Error starting the cluster. ssh command error:
command : /var/lib/minishift/bin/oc cluster up --use-existing-config --host-volumes-dir /var/lib/minishift/openshift.local.volumes --host-pv-dir /var/lib/minishift/openshift.local.pv --host-config-dir /var/lib/minishift/openshift.local.config --host-data-dir /var/lib/minishift/hostdata --public-hostname 192.168.99.100 --routing-suffix 192.168.99.100.nip.io
err : exit status 1
output : Deleted existing OpenShift container
Using nsenter mounter for OpenShift volumes
Using public hostname IP 192.168.99.100 as the host IP
Using 192.168.99.100 as the server IP
Starting OpenShift using openshift/origin:v3.9.0 ...
-- Starting OpenShift container ...
Starting OpenShift using container 'origin'
Waiting for API server to start listening
FAIL
Error: cannot access master readiness URL https://192.168.99.100:8443/healthz/ready
Details:
Last 10 lines of "origin" container log:
E0625 14:47:40.905680 2341 proxier.go:252] Error removing userspace rule: error checking rule: fork/exec /usr/sbin/iptables: exec format error:
E0625 14:47:40.908353 2341 proxier.go:259] Error removing userspace rule: error checking rule: fork/exec /usr/sbin/iptables: exec format error:
E0625 14:47:40.910681 2341 proxier.go:274] Error flushing userspace chain: error flushing chain "KUBE-PORTALS-CONTAINER": fork/exec /usr/sbin/iptables: exec format error:
E0625 14:47:40.913452 2341 proxier.go:274] Error flushing userspace chain: error flushing chain "KUBE-PORTALS-HOST": fork/exec /usr/sbin/iptables: exec format error:
E0625 14:47:40.919209 2341 proxier.go:274] Error flushing userspace chain: error flushing chain "KUBE-NODEPORT-HOST": fork/exec /usr/sbin/iptables: exec format error:
W0625 14:47:40.931698 2341 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
E0625 14:47:40.932412 2341 proxier.go:274] Error flushing userspace chain: error flushing chain "KUBE-NODEPORT-CONTAINER": fork/exec /usr/sbin/iptables: exec format error:
E0625 14:47:40.938345 2341 proxier.go:274] Error flushing userspace chain: error flushing chain "KUBE-NODEPORT-NON-LOCAL": fork/exec /usr/sbin/iptables: exec format error:
W0625 14:47:40.941639 2341 iptables.go:151] Error checking iptables version, assuming version at least 1.4.11: fork/exec /usr/sbin/iptables: exec format error
F0625 14:47:40.949329 2341 network.go:177] error: Could not initialize Kubernetes Proxy. You must run this process as root (and if containerized, in the host network namespace as privileged) to use the service proxy: failed to initialize iptables: error creating chain "KUBE-PORTALS-CONTAINER": fork/exec /usr/sbin/iptables: exec format error:
Caused By:
Error: Get https://192.168.99.100:8443/healthz/ready: dial tcp 192.168.99.100:8443: getsockopt: connection refused
If this should anyhow help, the first time it did not finish for me at all too. However the image was in a state "running" in VirtualBox.
The second time I run it via elevated command prompt - and it finalized successfully. But perhaps running inside elevated user did not help, but the fact that I run it the second time.
If the issue is that you're stuck on Starting Minishift VM (not the case for the OP), then the issue may be that you're on a VPN. Try disconnecting the VPN and see if that fixes your issues.
Had the same problem. I noticed some network traffic during the first (failing) startup with some slower network connection. I've waited some time until the network traffic was low and tried it again then and it worked. So probably during startup some docker image downloads are done.

openwhisk postdeploy fails on single node ubuntu virtual machine

I am trying to run openwhisk serverless framework on a single node ubuntu vm.
I am following the instructions here.
I followed the instructions for database set up and then went over to the steps listed for ansible single node: (ansible/README.md)
Using the steps under "Deploy Using CouchDB", in the following step:
ansible-playbook -i environments/<environment> postdeploy.yml
I get an error in running installCatalog.sh
Looks like the URL 172.17.0.1 is not accesible. Where am I going wrong?
TASK [install the catalog from the catalog location] ***************************
Thursday 04 May 2017 10:41:29 +0000 (0:00:01.602) 0:00:09.063 **********
fatal: [ansible]: FAILED! => {"changed": true, "cmd": "./installCatalog.sh /home/techie/openwhisk/ansible/../ansible/files/auth.whisk.system 172.17.0.1 /whisk.system /home/techie/openwhisk/ansible/../bin/wsk", "delta": "0:00:01.840405", "end": "2017-05-04 10:41:32.380241", "failed": true, "rc": 7, "start": "2017-05-04 10:41:30.539836", "stderr": "error: Package update failed: Put 172.17.0.1/api/v1/namespaces/_/packages/websocket?overwrite=true: dial tcp 172.17.0.1:443: getsockopt: connection refused\nerror: Package update failed: Put 172.17.0.1/api/v1/namespaces/_/packages/combinators?overwrite=true: dial tcp 172.17.0.1:443: getsockopt: connection refused\nerror: Package update failed: Put 172.17.0.1/api/v1/namespaces/_/packages/watson-speechToText?overwrite=true: dial tcp 172.17.0.1:443: getsockopt: connection refused\nerror: Package update failed: Put 172.17.0.1/api/v1/namespaces/_/packages/utils?overwrite=true: dial tcp 172.17.0.1:443: getsockopt: connection refused\nerror: Package update failed:
.......
I ran docker ps after the deployment step. There were several dockers like zookeeper, kafka, etc. running. Is there supposed to be a nginx docker running too? In my set-up there was no nginx docker running.
In the config files, I have base url set to 172.17.0.1 - is this ok, or could it be something else?
I found that I needed to also run edge.yml after apigateway.yml and before postdeploy.yml to get the postdeploy script to work and then to be able to have the wsk tool work against the API endpoint.

bootstrap clodufiy3.4 error occured

I installed cloudify3.4 according to the cloudify DOCS. When I install the manager, and executed like this:
# cfy bootstrap --install-plugins -p openstack-manager-blueprint.yaml -i openstack-manager-blueprint-inputs.yaml
an error occurred:
[ERROR] Workflow failed: Task failed 'fabric_plugin.tasks.run_script' -> Timed out trying to connect to 192.168.17.15 (tried 5 times)
I have already build a extern network 192.168.17.0/24 and I have already installed
cloudify_docker_plugin-1.3.2-py27-none-linux_x86_64-Ubuntu-trusty.wgn
cloudify_fabric_plugin-1.4.1-py27-none-linux_x86_64-centos-Core.wgn
cloudify_fabric_plugin-1.4.1-py27-none-linux_x86_64-redhat-Maipo.wgn
cloudify_host_pool_plugin-1.4-py27-none-linux_x86_64-centos-Core.wgn
cloudify_openstack_plugin-1.4-py27-none-linux_x86_64-redhat-Maipo.wgn
So, how to solve this error? Thank you to everyone who helped me!
It seems that you can't connect the manager.
Please make sure that you have an ssh connection from the CLI to the manager.
Since you are bootstrapping an Openstack manager you should make sure to have an external IP if you are outside of Openstack or that the CLI is on the same network if you are on Openstack.

Resources