Unable to access GCP service after gke upgrade - networking

Automatic upgrade in mid-May.
1.14.10-gke.27 → 1.14.10-gke.36
https://cloud.google.com/kubernetes-engine/docs/release-notes?_ga=2.196679278.-315187236.1572486593#may_13_2020
After that, I got Memorystore(Redis)connection error and crul6 error.
crul error
cURL error 6: Could not resolve host: www.googleapis.com (see https://curl.haxx.se/libcurl/c/libcurl-errors.html)
This problem happens occasionally, not always.
It worked fine before the upgrade.
Role-based access control not use
Workload Identity not use
Please advise.

Close it once.
Probably not a gke version issue.
The reason is that the node was a preemptible node.
That is likely.
When I ran the reboot test of the Pretentive Node, I was able to reproduce the error.
gcloud compute instances simulate-maintenance-event <instance name> --zone <zone>

Related

Maxscale: maxctrl error when admin_ssl parameters are set in maxscale.cnf

System:
Maxscale 2.5.9
Ubuntu 20.04
In order to access the Web AdminGUI my maxsclale.cnf file looks like this:
[maxscale]
threads=auto
admin_host=0.0.0.0
admin_secure_gui=1
admin_auth=1
admin_enabled=1
admin_gui=1
admin_ssl_key=/etc/ssl/certs/maxscale-key.pem
admin_ssl_cert=/etc/ssl/certs/maxscale-cert.pem
admin_ssl_ca_cert=/etc/ssl/certs/ca-certificates.crt
[...all other configuration..]
With this configuration I can access the Web-AdminGUI on port 8989 from the internal ip address (not 127.0.0.1) by browser.
The SSL key/certs are self-signed .
BUT
When using the command line like:
maxctrl list servers
I get the following error:
Error: Error: socket hang up
When I remove or comment out the lines with the admin_ssl_XXX parameters and restart maxscale, command line works again, but of course the Web-AdminGUI does not.
I tried with various SSL certificate creations (also the one that is listed on the mariadb.com-Website
https://mariadb.com/docs/security/encryption/in-transit/create-self-signed-certificates-keys-openssl/#create-self-signed-certificates-keys-openssl),
the issue remains.
No errors in the maxscale.log whatsoever.
What is the best way to debug this issue?
Or do you have by any chance the right answer at hand?
YOUR help is greatly appreciated!
BR. Martin
You should use maxctrl --secure to encrypt the connections used by it.
Since you are using self-signed certificates, you have to also specify the CA certificate with --tls-ca-cert=/etc/ssl/certs/ca-certificates.crt if it's not installed in the system certificate store.
In addition, you probably need to use --tls-verify-server-cert=false to disable any warnings about self-signed certificates.

openstack network create command gives "The Keystone service is temporarily unavailable" and "The server is currently unavailable"

Currently, we are trying to setup neutron for our cloud server. Since everyone is new to this, we are struggling a bit. When we entered this command:
openstack network create --share --external \
--provider-physical-network provider \
--provider-network-type flat provider
And it throws this error:
Error while executing command: HttpException: 503, The Keystone service is temporarily unavailable.: 503 Service Unavailable: The server is currently unavailable. Please try again at a later time.
We are following openstack docs guide to a T.
Does anyone know what causes this error and how to fix it?
Thanks.
I fixed it, it was a problem with MariaDB. When we updated it from 10.1 to 10.3, I couldn't access my DB, so when I fixed that error it works now.

Hiveserver2: could not start ThriftHttpCliService

I'm attempting to enable SSL on hiveserver2.
I can run in the default binary mode fine. http mode works no problem. As soon as I enable SSL through hive-site.xml, i'm faced with the following error.
ERROR [Thread-28] thrift.ThriftCLIService: Error starting HiveServer2: could not start ThriftHttpCLIService
java.net.BindException: Address already in use
There is nothing using any of the ports, prior to starting hiveserver2. Checked with netstat -tupln
Ports i've configured in hive-site.xml are
hive.server2.webui.port 11002
hive.server2.thrift.http.port 11001
hive.server2.thrift.port 11000
and invoking hiveserver2 via the service /opt/hive/bin/hive --service hiveserver2 &
O/S ubuntu (on kubernetes)
Hive version 3.0.0
Any help greatly appreciate. Google search for problems with ThriftHTTPCliService came up short.
For anyone that come across this post.
I upgraded to Hive 3.1.0, along with the metastore schema.
This fixed the issue, although unsure as to the underlying cause.

Gitlab ports 80 & 8080 taken by a separate Gitlab instance?

I have Gitlab 8.6 running on an Ubuntu 14.04 server that seems to have gotten messed up. I consistently get a 502 error when accessing the site. The server likely has not been restarted since installing Gitlab initially, and a power outage caused the server to reboot. Now, I cannot start/restart Gitlab due to what appears to be port conflicts.
I installed Gitlab via source, I don't have any custom port configurations, and am using NGINX. nginx -t shows that the configuration appears to be correct syntax-wise.
When I run netstat -tupln, I see that Unicorn & a Gitlab instance is already running on :8080 and :80 respectively at boot up. I suspect that a 2nd instance of Gitlab was installed which is being run at boot and that is causing the proper instance to have port conflicts when I try to run it via service gitlab restart. I'm not even sure if that's possible, but I can't seem to figure out where to go from here. Every time I run sudo gitlab-ctl reconfigure or service gitlab start, it fails and the unicorn.stderror.log shows bind errors to the :8080 port. I tried moving the Unicorn service to :8081 as well, but I still receive the port binding error.
Does anyone know how I can detect if there are multiple Gitlab instances running, and maybe if there is a way to remove a duplicated one if it's possible? Thank you!
EDIT: Here is what is in the /etc/gitlab/gitlab.rb file. Everything else is commented out.
## Url on which GitLab will be reachable
external_url 'http://my-gitlab-instance.domain.com'
EDIT 2: My /home/git/gitlab/ directory is mapped to https://gitlab.com/gitlab-org/gitlab-ce.git, and is on the 8-7-stable branch. gitlab-shell and gitlab-workhorse are on the correct versions according to https://gitlab.com/gitlab-org/gitlab-ce/blob/master/doc/update/8.6-to-8.7.md
EDIT 3: I have gotten to a point where the Gitlab seems to self-check okay by removing the gitlab-ce package (https://gitlab.com/gitlab-org/omnibus-gitlab/issues/135), but the server returns a 404. NGINX, Unicorn, Sidekiq, and gitlab-workhorse all say that they're running. I see that unicorn.rb is listening on :8080, and nginx is listening on 0.0.0.0:80 and :::80. I guess now I'm troubleshooting this 404 and hopefully I will be back to my install-from-source.
What I have found is that there were 2 issues causing the errors I had.
First, I removed a "gitlab-ce" package that was installed, following the instructions here: https://gitlab.com/gitlab-org/omnibus-gitlab/issues/135. For some reason, when I restart the machine now I have to restart these services, in order, for Gitlab to run properly redis-server, gitlab, nginx. However, Gitlab does start responding properly after that.
Second, the 404 error was due to a different server that was also listening on that IP address, causing a conflict.
I will likely move to using the omnibus package on a fresh, new server going forward, but at least the immediate issues appear resolved. Thanks for your help, SLY!

Getting "connection refused" when trying to access etcd from within a Docker container

I am trying to access etcd from within a running Docker container. When I run
curl http://172.17.42.1:4001/v2/keys
I get
curl: (7) Failed to connect to 172.17.42.1 port 4001: Connection refused
I have four other hosts where this works fine, but every container on this machine has this problem. I'm really at a loss as to what's going on, and I don't know how to debug it.
My etcd environment variables are
ETCD_ADVERTISE_CLIENT_URLS=http://10.242.10.2:2379
ETCD_DISCOVERY=https://discovery.etcd.io/<token_removed>
ETCD_INITIAL_ADVERTISE_PEER_URLS=http://10.242.10.2:2380
ETCD_LISTEN_CLIENT_URLS=http://10.242.10.2:2379,http://127.0.0.1:2379,http://0.0.0.0:4001
ETCD_LISTEN_PEER_URLS=http://10.242.10.2:2380
I can also access etcd from the host with
curl http://localhost:4001/v2/keys
So there seems to be some error when routing from the container out to the host. But I can't figure out what it is. Can anyone point me in the right direction?
I observed I had to use the --advertise-client-urls and --listen-client-urls. Like so:
./etcd --advertise-client-urls 'http://0.0.0.0:2379,http://0.0.0.0:4001' --listen-client-urls 'http://0.0.0.0:2379,http://0.0.0.0:4001'
Then I was able to successfully do
curl -L http://hostname:2379/version
from any machine that could reach that server and it worked.
It turns out etcd was only listening on localhost:4001 on that machine, which is why I couldn't access it from within a container. This is despite me configuring one of the listen client urls to 0.0.0.0:4001.
It turns out that I had run sudo systemctl enable etcd2, which caused it to run before the cloud-config service ran. As such, etcd started with default configuration instead of the one that I had specified in my cloud config. Running sudo systemctl disable etcd2 fixed the issue.

Resources