Error on bootstrap of Management VM - openstack

I am using Cloudify 2.7 with OpenStack Icehouse. In particularly, I have configured the cloud driver to bootstrap 2 Management VMs (numberOfManagementMachines 2).
Sometime, when I bootstrap the VMs I receive the following error:
cloudify#default> bootstrap-cloud --verbose openstack-icehouse-<project_name>
...
Starting agent and management processes:
[VM_Floating_IP] nohup gs-agent.sh gsa.global.lus 0 gsa.lus 1 gsa.gsc 0 gsa.global.gsm 0 gsa.gsm 1 gsa.global.esm 1 >/dev/null 2>&1
[VM_Floating_IP] STARTING CLOUDIFY MANAGEMENT
[VM_Floating_IP] .
[VM_Floating_IP] Discovered agent nic-address=177.86.0.3 lookup-groups=gigaspaces-Cloudify-2.7.1-ga.
[VM_Floating_IP] Detected LUS management process started by agent null expected agent a0eec4e5-7fb0-4428-80e1-ec13a8b1c744
[VM_Floating_IP] Detected LUS management process started by agent a0eec4e5-7fb0-4428-80e1-ec13a8b1c744
[VM_Floating_IP] Detected GSM management process started by agent a0eec4e5-7fb0-4428-80e1-ec13a8b1c744
[VM_Floating_IP] Waiting for Management processes to start.
[VM_Floating_IP] Waiting for Elastic Service Manager
[VM_Floating_IP] Waiting for Management processes to start.
[VM_Floating_IP] .
[VM_Floating_IP] Waiting for Elastic Service Manager
[VM_Floating_IP] Waiting for Management processes to start.
[VM_Floating_IP] .
[VM_Floating_IP] Waiting for Elastic Service Manager
[VM_Floating_IP] Waiting for Management processes to start.
[VM_Floating_IP] .
[VM_Floating_IP] Waiting for Elastic Service Manager
[VM_Floating_IP] Waiting for Management processes to start.
[VM_Floating_IP] .failure occurred while renewing an event lease: Operation failed. net.jini.core.lease.UnknownLeaseException: Unknown event id: 3
[VM_Floating_IP] at com.sun.jini.reggie.GigaRegistrar.renewEventLeaseInt(GigaRegistrar.java:5494)
[VM_Floating_IP] at com.sun.jini.reggie.GigaRegistrar.renewEventLeaseDo(GigaRegistrar.java:5475)
[VM_Floating_IP] at com.sun.jini.reggie.GigaRegistrar.renewEventLease(GigaRegistrar.java:2836)
[VM_Floating_IP] at com.sun.jini.reggie.RegistrarGigaspacesMethodinternalInvoke16.internalInvoke(Unknown Source)
[VM_Floating_IP] at com.gigaspaces.internal.reflection.fast.AbstractMethod.invoke(AbstractMethod.java:41)
[VM_Floating_IP] at com.gigaspaces.lrmi.LRMIRuntime.invoked(LRMIRuntime.java:464)
[VM_Floating_IP] at com.gigaspaces.lrmi.nio.Pivot.consumeAndHandleRequest(Pivot.java:561)
[VM_Floating_IP] at com.gigaspaces.lrmi.nio.Pivot.handleRequest(Pivot.java:662)
[VM_Floating_IP] at com.gigaspaces.lrmi.nio.Pivot$ChannelEntryTask.run(Pivot.java:196)
[VM_Floating_IP] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
[VM_Floating_IP] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
[VM_Floating_IP] at java.lang.Thread.run(Thread.java:662)
[VM_Floating_IP]
[VM_Floating_IP]
[VM_Floating_IP] Waiting for Elastic Service Manager
[VM_Floating_IP] Waiting for Management processes to start.
....
[VM_Floating_IP] ....Failed to add [Processing Unit Instance] with uid [8038e956-1ae2-4378-8bb1-e2055202c160]: Operation failed. java.rmi.ConnectException: Connect Failed to [NIO://177.86.0.3:7011/pid[4390]/164914896032_3_8060218823096628119_details[class org.openspaces.pu.container.servicegrid.PUServiceBeanImpl]]; nested exception is:
[VM_Floating_IP] java.net.SocketTimeoutException
...
[VM_Floating_IP] Failed to add [GSM] with uid [3c0e20e9-bf85-4d22-8ed6-3b387e690878]: Operation failed. java.rmi.ConnectException: Connect Failed to [NIO://177.86.0.3:7000/pid[4229]/154704895271_2_2245795805687723285_details[class com.gigaspaces.grid.gsm.GSMImpl]]; nested exception is:
[VM_Floating_IP] java.net.SocketTimeoutException
...
[VM_Floating_IP] Failed to add GSC with uid [8070dabb-d80d-43c7-bd9c-1d2478f95710]: Operation failed. java.rmi.ConnectException: Connect Failed to [NIO://177.86.0.3:7011/pid[4390]/164914896020_2_8060218823096628119_details[class com.gigaspaces.grid.gsc.GSCImpl]]; nested exception is:
[VM_Floating_IP] java.net.SocketTimeoutException
...
[VM_Floating_IP] Failed to add [GSA] with uid [a0eec4e5-7fb0-4428-80e1-ec13a8b1c744]: Operation failed. java.rmi.ConnectException: Connect Failed to [NIO://177.86.0.3:7002/pid[4086]/153569177936_2_8701370873164361474_details[class com.gigaspaces.grid.gsa.GSAImpl]]; nested exception is:
[VM_Floating_IP] java.net.SocketTimeoutException
...
[VM_Floating_IP] Waiting for Management processes to start.
[VM_Floating_IP] Failed to connect to LUS on 177.86.0.3:4174, retry in 73096ms: Operation failed. java.net.ConnectException: Connection timed out
...
[VM_Floating_IP] .Failed to add [ESM] with uid [996c8898-897c-4416-a877-82efb22c7ea6]: Operation failed. java.rmi.ConnectException: Connect Failed to [NIO://177.86.0.3:7003/pid[4504]/172954418920_2_5475350805758957057_details[class org.openspaces.grid.esm.ESMImpl]]; nested exception is:
[VM_Floating_IP] java.net.SocketTimeoutException
Can someone suggest to me any solution? Should I have to configure any timeout value?
Thanks.
------------------------Edited-------------------
I would add some information.
Each manager instance has 4VCPUs, 8GB RAM, 20GB Disk.
Each manager instance has the Security Groups created by Cloudify, that is:
cloudify-manager-cluster
Egress IPv4 Any - 0.0.0.0/0 (CIDR)
Egress IPv6 Any - ::/0 (CIDR)
cloudify-manager-management
Egress IPv4 Any - 0.0.0.0/0 (CIDR)
Egress IPv6 Any - ::/0 (CIDR)
Ingress IPv4 TCP 22 0.0.0.0/0 (CIDR)
Ingress IPv4 TCP 4174 cfy-mngt-cluster
Ingress IPv4 TCP 6666 cfy-mngt-cluster
Ingress IPv4 TCP 7000 cfy-mngt-cluster
Ingress IPv4 TCP 7001 cfy-mngt-cluster
Ingress IPv4 TCP 7002 cfy-mngt-cluster
Ingress IPv4 TCP 7003 cfy-mngt-cluster
Ingress IPv4 TCP 7010 - 7110 cfy-mngt-cluster
Ingress IPv4 TCP 8099 0.0.0.0/0 (CIDR)
Ingress IPv4 TCP 8100 0.0.0.0/0 (CIDR)
Moreover, Cloudify creates a private net "cloudify-manager-Cloudify-Management-Network" with subnet 177.86.0.0/24, and for each VM it asks for a Floating IP.

The ESM is Cloudify's Orchestrator. Only one instance of it should be running at any one time. The error indicates that the boostrap process was expecting to find a running ESM, but did not find one. This seems to be related to communication errors between the manager instances - is it possible that the security groups defined for the manager do not open all ports between the managers?
Security group/firewall configurations are the usual problem. It is also possible that the manager VM is too small - it should have at-least 4 GB Ram and 2 vCPUs.
Please keep in mind that Cloudify 2.X has reached end-of-life and is no longer supported. You may want to check out Cloudify 3.

Related

Failed to dial target host "kong-proxy-service-external-ip:443": context deadline exceeded

I am trying to use Kong API Gateway with Using Ingress with gRPC but getting below error.
Failed to dial target host "kong-proxy-service-external-ip:443": context deadline exceeded
I am using minikube cluster for deplyment.
I followed all the steps mentioned here - https://docs.konghq.com/kubernetes-ingress-controller/latest/guides/using-ingress-with-grpc/ but when I tried to run grpcurl -v -d '{"greeting": "Kong Hello world!"}' -insecure $PROXY_IP:443 hello.HelloService.SayHello
I got the error Failed to dial target host
If I use port forwarding on service with command kubectl port-forward service/grpcbin 9001:9001 then this works so that mean the issue is somewhere with ingress or some configuration issue.
Request you to help me with this issue.

Kubernetes Ingress nginx on Minikube fails

minikube v1.13.0 on Ubuntu 18.04 with Kubernetes v1.19.0 on Docker 19.03.8. Using helm/helmfile ("v3.3.4"). The Ubuntu VM is on VM-Workstation running on Win10, networking set as NAT, everything in my home wifi network.
I am trying to use ingress-backend stable/nginx-ingress 1.36.0 . I do have the nginx-ingress-1.36.0.tgz in the ingress/charts folder, and I have ingress/enabled minikube addons enable ingress.
Before I had enabled ingress on minikube, everything will get deployed successfully (no errors) but the service/LB stayed pending:
ClusterIP 10.101.41.156 <none> 8080/TCP
ingress-controller-nginx-ingress-controller LoadBalancer 10.98.157.222 <pending> 80:30050/TCP,443:32294/TCP
After I enabled ingress on minikube, I now get this connection refused error:
STDERR:
Error: UPGRADE FAILED: cannot patch "ingress-service" with kind Ingress:
Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post "https://ingress-nginx-controller-admission.kube-system.svc:443/extensions/v1beta1/ingresses?timeout=30s":
dial tcp 10.105.131.220:443: connect: connection refused
COMBINED OUTPUT:
Error: UPGRADE FAILED: cannot patch "ingress-service" with kind Ingress:
Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post "https://ingress-nginx-controller-admission.kube-system.svc:443/extensions/v1beta1/ingresses?timeout=30s":
dial tcp 10.105.131.220:443: connect: connection refused
I don't know what is this IP 10.105.131.220 - looks like pvt IP. It is not my minikube IP, or my VM IP or my laptop IP, I cant ping it.
But it all still deploys fine- but the Load Balancer still shows pending.
Update
I had missed one of the Steps based on documentation
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/nginx-0.30.0/deploy/static/mandatory.yaml
I stopped/deleted minkube and redid everything, now the error is gone, but the loadbalancer is still <pending>
By default all solutions like minikube does not provide you LoadBalancer. Cloud solutions like EKS, Google Cloud, Azure do it for you automatically by spinning in the background separate LB. Thats why you see Pending status.
Solutions:
use MetalLB on minikube
MetalLB hooks into your Kubernetes cluster, and provides a network load-balancer implementation. In short, it allows you to create Kubernetes services of type LoadBalancer in clusters that don’t run on a cloud provider, and thus cannot simply hook into paid products to provide load-balancers.
Installation:
kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.8.1/manifests/metallb.yaml
namespace/metallb-system created
podsecuritypolicy.policy/speaker created
serviceaccount/controller created
serviceaccount/speaker created
clusterrole.rbac.authorization.k8s.io/metallb-system:controller created
clusterrole.rbac.authorization.k8s.io/metallb-system:speaker created
role.rbac.authorization.k8s.io/config-watcher created
clusterrolebinding.rbac.authorization.k8s.io/metallb-system:controller created
clusterrolebinding.rbac.authorization.k8s.io/metallb-system:speaker created
rolebinding.rbac.authorization.k8s.io/config-watcher created
use minikube tunnel
Services of type LoadBalancer can be exposed via the minikube tunnel
command. It must be run in a separate terminal window to keep the
LoadBalancer running. Ctrl-C in the terminal can be used to terminate
the process at which time the network routes will be cleaned up.
minikube tunnel runs as a process, creating a network route on the host to the service CIDR of the cluster using the cluster’s IP
address as a gateway. The tunnel command exposes the external IP
directly to any program running on the host operating system.

Kubernetes API server failing to start: TLS handshake error

Out of nowhere one of our API servers has started to fail with the following error:
http: TLS handshake error from 172.23.88.213:17244: EOF
It throws this error for every single node in the cluster, thus failing to start. This started happening this morning with no changes to any infrastructure.
Things I've tried that haven't helped:
Manually restart the weave docker container on the master node.
Manually kill and reschedule the api-server.
Manually restart the Docker daemon.
Manually restarted the kubelet service.
Check all SSL certs are valid, which they are.
Check inodes, thousands free.
Ping IP addresses of other nodes in cluster, all return ok with 0 packet loss.
Check journalctl and systemctl logs of kubelet services and the only significant errors I see are related to TLS handshake error.
Cluster specs:
Cloud provider: AWS
Kubernetes version: 1.11.6
Kubelet version: 1.11.6
Kops version: 1.11
I'm at a bit of a loss as to how to debug this further.

Kubectl gives "Unable to connect to the server: dial tcp i/o timeout

I have installed minikube and kubectl on Ubuntu 16.04LTS
However when i try any command with kubectl it give the below error:
Unable to connect to the server: dial tcp x.x.x.x:x i/o timeout
kubectl version only gives client version. The server version is not dispalyed
Is there any workaround to fix this?
I had to ensure the interface was up and running.
So a sudo ifconfig vboxnet0 up resolved the issue.

Change IP Cloudify Manager

I did the Cloudify Manager installation on the Amazon cloud (http://getcloudify.org/guide/3.2/getting-started-bootstrapping.html), successfully.
However, to turn off the machine and start again, the IP is changed and when running:
cfy status
I get:
Getting management services status... [ip=54.83.41.97]
('Connection aborted.', error(110, 'Connection timed out'))
How do I change the IP 54.83.41.97 within the Coudify?
The internal IP is set during bootstrap of the manager.
If your internal IP has changed you should tear it down and bootstrap again.
If it is only the Elastic IP that changed you run:
cfy use -t your_new_ip
And the the CLI will connect to the manager with the new IP

Resources