unable to verify hash for node 'raspberrypi': hash does not match - raspberry-pi4

On a cluster of 2 RPI 4's I see this when it seems to setup the worker node. The master seems to be fine. I have connected only 2 RPIs now.
I think the network switch is fine but if this is a networking issue I may be missing something.
I use ansible-playbook site.yml -i inventory/rpi/hosts.ini
k3s_version: v1.23.4+k3s1
This error repeats.
But I see this.
kubectl get nodes --kubeconfig ~/.kube/config-berry-pi
NAME STATUS ROLES AGE VERSION
raspberrypi Ready control-plane,master 4m41s v1.23.4+k3s1
This is from the journal in the master pi.
pr 01 11:58:14 raspberrypi k3s[13742]: time="2022-04-01T11:58:14+01:00" level=error msg="unable to verify hash for node 'raspberrypi': hash does not match"
Apr 01 11:58:15 raspberrypi k3s[13742]: I0401 11:58:15.180204 13742 request.go:665] Waited for 1.053982859s due to client-side throttling, not priority and fairness, request: POST:https://127.0.0.1:6443/api/v1/namespaces/kube-system/serviceaccounts/coredns/token
Apr 01 11:58:19 raspberrypi k3s[13742]: time="2022-04-01T11:58:19+01:00" level=error msg="unable to verify hash for node 'raspberrypi': hash does not match"
Apr 01 11:58:24 raspberrypi k3s[13742]: time="2022-04-01T11:58:24+01:00" level=error msg="unable to verify hash for node 'raspberrypi': hash does not match"
Apr 01 11:58:29 raspberrypi k3s[13742]: time="2022-04-01T11:58:29+01:00" level=error msg="unable to verify hash for node 'raspberrypi': hash does not match"
Apr 01 11:58:34 raspberrypi k3s[13742]: time="2022-04-01T11:58:34+01:00" level=error msg="unable to verify hash for node 'raspberrypi': hash does not match"

I was able to proceed by executing on each worker node.
sudo curl -sfL https://get.k3s.io | K3S_TOKEN="K10e9d200a500ad44ba0072af9ea9f38d19a40cfc4cfd96d753d01466c618007f8e::server:bb58186ddf5f34800004affa48c44a12" K3S_URL="https://192.168.1.29:6443" K3S_NODE_NAME="rpiworker1" sh -
This token is obtained from the k3s server.
sudo cat /var/lib/rancher/k3s/server/token
K10e9d200a500ad44ba0072af9ea9f38d19a40cfc4cfd96d753d01466c618007f8e::server:bb58186ddf5f34800004affa48c44a12
The ansible-playbook installed the master but it seemed to be stuck after that. I did not understand the reason for that error in the log though.

Related

sshd connection issue: Connection reset by [ip] port x [preauth]

I'm seeing the following error messages when trying to sftp from a windows client to my redhat server:
Client:
C:\Users\Administrator\.ssh>sftp -P 7822 -v user#x.x.x.x
.
.
debug1: kex_input_ext_info: server-sig-algs=<rsa-sha2-256,rsa-sha2-512>
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic
debug1: Next authentication method: publickey
debug1: Offering public key: RSA SHA256:FczboY8BDSWtdA87euFDWSDrwBNRMbYzHUR3VmMpbk
C:\\Users\\Administrator/.ssh/id_rsa
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic
debug1: Trying private key: C:\\Users\\Administrator/.ssh/id_dsa
debug1: Trying private key: C:\\Users\\Administrator/.ssh/id_ecdsa
debug1: Trying private key: C:\\Users\\Administrator/.ssh/id_ed25519
debug1: Trying private key: C:\\Users\\Administrator/.ssh/id_xmss
debug1: No more authentication methods to try.
user#x.x.x.x Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
Server:
Aug 4 23:27:09 3oy1jxwr1k81l.xxx.io sshd[16064]: Connection reset by x.x.x.x port 65256 [preauth]
Aug 4 23:27:14 3oy1jxwr1k81l.xxx.io sshd[16117]: Did not receive identification string from x.x.x.x port 12593
Aug 4 23:27:24 3oy1jxwr1k81l.xxx.io sshd[16259]: Did not receive identification string from x.x.x.x port 48329
Aug 4 23:27:34 3oy1jxwr1k81l.xxx.io sshd[16394]: Did not receive identification string from x.x.x.x port 2040
I'm positive that all ports open in firewall, and authorized_keys are setup up correctly.
So i stop the sshd service, and run from cmd line with -ddd hoping to get more information.
However when running in debug mode, the connection succeeds !?!?
/user/sbin/sshd -D -ddd
Client:
C:\Users\Administrator\.ssh>sftp -P 7822 user#x.x.x.x
Connected to user#x.x.x.x.
sftp> exit
Any ideas what could be happening? (Note this is 100% reproducible, fails every time when sshd is run normally, and succeeds always when run with -ddd)
So looks like the problem was due to a missing .bash_profile in the user home dir on the server.
After adding the user profile back, it seems to resolve the issue.
Why sshd didn't care it was missing when run in debug mode seems like a bug in sshd.
I was also getting the Connection reset by [ip] port x [preauth] message.
For me, however, it was a firewall issue on the client side. The IT department had blocked SSH outside the network. After updating the firewall, the connection worked.

Error "ldap_sasl_bind_s failed" on n-way multi-master openldap

I am trying to connect openldap nodes in cluster but I receive the
following message (The password is update on all different openldap).
What password is failing and how can I force to be update?
Feb 25 18:57:01 ldap03 slapd[9556]: slapd starting
Feb 25 18:57:01 ldap03 slapd[9556]: slap_client_connect: URI=ldap://ldap01 DN="cn=admin,dc=clients,dc=enterprise,dc=com" ldap_sasl_bind_s failed (-1)
Feb 25 18:57:01 ldap03 slapd[9556]: do_syncrepl: rid=001 rc -1 retrying (4 retries left)
Thanks in advance.
I am met same issue...
625cf83c slapd starting
625cf83c slap_client_connect: URI=ldaps://ldap.example.com:636 DN="cn=admin,dc=example,dc=com" ldap_sasl_bind_s failed (-1)
625cf83c do_syncrepl: rid=123 rc -1 retrying
But in my case, the issue was on transport layer. The OpenLDAP server was built without SSL support. Re-installation the OpenLDAP server with SSL support solved my issue.

uWSGI, Nginx, Flask app service keeps failing

Going to my app produces a 502 gateway error. Found out that it was because my how_lit.service is failing. But I am having trouble finding out why.
Tried editing the application and the ini document. Cannot figure out whats wrong.
The Nginx and uWSGI services are up and running fine.
Service Status:
lit#digitalocean:~/howlit$ sudo service how_lit status
[sudo] password for lit:
● how_lit.service - uWSGI instance to serve how lit rest api
Loaded: loaded (/etc/systemd/system/how_lit.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2016-08-04 00:30:44 EDT; 5 days ago
Process: 14294 ExecStart=/home/lit/howlit/env/bin/uwsgi --ini /home/lit/howlit/howlit.ini (code=exited, status=1/FAILURE)
Main PID: 14294 (code=exited, status=1/FAILURE)
Aug 04 00:30:44 digitalocean systemd[1]: Started uWSGI instance to serve how lit rest api.
Aug 04 00:30:44 digitalocean uwsgi[14294]: [uWSGI] getting INI configuration from /home/lit/howlit/howlit.ini
Aug 04 00:30:44 digitalocean systemd[1]: how_lit.service: Main process exited, code=exited, status=1/FAILURE
Aug 04 00:30:44 digitalocean systemd[1]: how_lit.service: Unit entered failed state.
Aug 04 00:30:44 digitalocean systemd[1]: how_lit.service: Failed with result 'exit-code'.
Directory and Permissions:
lit#digitalocean:~/howlit$ ls -l .
total 16
drwx---r-x 6 lit www-data 4096 Jul 29 11:47 env
-rwx---r-x 1 lit www-data 202 Aug 3 23:29 howlit.ini
-rwx---r-x 1 lit www-data 1203 Aug 3 23:01 how_lit_restapi.py
-rwxr-xr-x 1 lit www-data 72 Aug 3 23:27 wsgi.py
/etc/systemd/system/how_lit.service:
lit#digitalocean:~/howlit$ cat /etc/systemd/system/how_lit.service
[Unit]
Description=uWSGI instance to serve how lit rest api
After=network.target
[Service]
User=lit
Group=www-data
WorkingDirectory=/home/lit/howlit/
Environment="PATH=/home/lit/howlit/env/bin"
ExecStart=/home/lit/howlit/env/bin/uwsgi --ini /home/lit/howlit/howlit.ini
[Install]
WantedBy=multi-user.target
howlit.ini file:
lit#digitalocean:~/howlit$ cat howlit.ini
[uwsgi]
module = wsgi:app
uid = lit
gid = www-data
master = true
processes = 5
socket = how_lit_restapi.sock
chmod-sock = 666
vacum = true
die-on-term = true
gto = /var/log/uwsgi/%n.log
Tried running it by hand:
lit#digitalocean:~/howlit$ /home/lit/howlit/env/bin/uwsgi --ini /home/lit/howlit/howlit.ini
[uWSGI] getting INI configuration from /home/lit/howlit/howlit.ini
*** Starting uWSGI 2.0.13.1 (64bit) on [Tue Aug 9 18:28:25 2016] ***
compiled with version: 5.4.0 20160609 on 29 July 2016 11:48:08
os: Linux-4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC 2016
nodename: digitalocean
machine: x86_64
clock source: unix
detected number of CPU cores: 1
current working directory: /home/lit/howlit
detected binary path: /home/lit/howlit/env/bin/uwsgi
!!! no internal routing support, rebuild with pcre support !!!
your processes number limit is 1896
your memory page size is 4096 bytes
detected max file descriptor number: 1024
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
bind(): Permission denied [core/socket.c line 230]
permission error again?
SOLVED IT: By sending my socket into tmp, but still getting bad gateway error when I navigate to my site :(
Solved my own problem.
First I checked my services.
sudo service nginx status
sudo service uwsgi status
sudo service how_lit status
then I saw them all running and up but was still getting the bad gateway error. Well after checking the logs had no errors. I had to assume my configs.
Then I realized my mistake....I never restarted all of it, just certain parts at certain times. So I restarted every single one as such:
sudo service nginx restart
sudo service uwsgi restart
sudo service how_lit restart
now it works.
About the permission issue I tried it by putting the socket into the /tmp directory that way www-data group users can access it as well as root. I learned that you need to be able to create the socket and allow access to the system for it.
I moved it out of tmp btw later for production as I was told that was not best practice.

Solr closes connection to Zookeeper

I have two servers, server one running apache zookeeper and server two running Solr.
When starting the zookeeper I can connect to it on server one (through bin/zkCli.sh) but not through server two with solr.
Zookeeper is started through supervisor, but I have also tried starting it through bind/zkServer.sh without improvements.
When looking in the tomcat log (which Solr is logging to) I get:
WARNING: Overseer cannot talk to ZK
Jun 04, 2013 3:26:52 PM org.apache.solr.cloud.Overseer$ClusterStateUpdater amILeader
WARNING:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /overseer_elect/leader
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:253)
at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:250)
at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65)
at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:250)
at org.apache.solr.cloud.Overseer$ClusterStateUpdater.amILeader(Overseer.java:199)
at org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:122)
at java.lang.Thread.run(Thread.java:722)
...
Jun 04, 2013 3:31:04 PM org.apache.zookeeper.ClientCnxn$SendThread logStartConnect
INFO: Opening socket connection to server XXX.XXX.XXX.XXX/XXX.XXX.XXX.XXX:2181. Will not attempt to authenticate using SASL (unknown error)
Jun 04, 2013 3:31:04 PM org.apache.zookeeper.ClientCnxn$SendThread run
INFO: Client session timed out, have not heard from server in 46974ms for sessionid 0x13f0f5a570c0006, closing socket connection and attempting reconnect
Jun 04, 2013 3:31:05 PM org.apache.zookeeper.ClientCnxn$SendThread logStartConnect
INFO: Opening socket connection to server XXX.XXX.XXX.XXXXXX.XXX.XXX.XXX.75:2181. Will not attempt to authenticate using SASL (unknown error)
Jun 04, 2013 3:32:01 PM org.apache.zookeeper.ClientCnxn$SendThread run
INFO: Client session timed out, have not heard from server in 56627ms for sessionid 0x13f0f5a570c0006, closing socket connection and attempting reconnect
How do I setup zookeeper such that it can be accessed by solr on server two?
Additional info: Using netstat -l on server one, I get the following:
tcp6 0 0 [::]:2181 [::]:* LISTEN
I.e. it is only listening on tcp6, not tcp.
Check you firewall configuration on the zookeeper server and ensure port 2181, 2888 and 3888 are all open. 2181 is the client communication port, 2888 and 3888 are used for zookeeper cluster communication (in case you decide to run zookeeper in an ensemble).

Cannot start Plone production instances normally with plone.app.async enabled

After adding plone.app.async, I cannot start my production instances normally using 'bin/instance start'. However, the instances run fine using 'foreground' and I can start the production instances on my development machine just fine. (The machines have almost identical configurations but the production machine has almost 100GB of data in blob storage.)
Additionally, I can start the instances normally if I remove support for plane.app.async, specifically the zcml-additions section, from my buildout. And I can start the worker instance for plone.app.async just fine. It uses almost all the same sections as the regular instances except for 'zcml-additional' being for worker instead of instance.
This happens with both single and multi db for plone.app.async.
The instance log shows that it gets trapped in some sort of cycle during startup. Here is the log of what happens:
....
2012-02-09T18:31:27 INFO ZServer HTTP server started at Thu Feb 9 18:31:27 2012
Hostname: 0.0.0.0
Port: 8081
2012-02-09T18:31:32 INFO ZServer WebDAV server started at Thu Feb 9 18:31:32 2012
Hostname: 0.0.0.0
Port: 1980
2012-02-09T18:31:32 INFO Zope Set effective user to "plone"
2012-02-09T18:31:34 INFO ZEO.ClientStorage zeostorage ClientStorage (pid=16331) created RW/normal for storage: '1'
2012-02-09T18:31:34 INFO ZEO.cache created temporary cache file '<fdopen>'
2012-02-09T18:31:34 INFO ZEO.ClientStorage zeostorage Testing connection <ManagedClientConnection ('127.0.0.1', 8100)>
2012-02-09T18:31:34 INFO ZEO.zrpc.Connection(C) (127.0.0.1:8100) received handshake 'Z3101'
2012-02-09T18:31:34 INFO ZEO.ClientStorage zeostorage Server authentication protocol None
2012-02-09T18:31:34 INFO ZEO.ClientStorage zeostorage Connected to storage: ('localhost', 8100)
2012-02-09T18:31:34 INFO ZEO.ClientStorage zeostorage No verification necessary -- empty cache
2012-02-09T18:31:45 INFO ZServer HTTP server started at Thu Feb 9 18:31:45 2012
Hostname: 0.0.0.0
Port: 8081
2012-02-09T18:31:50 INFO ZServer WebDAV server started at Thu Feb 9 18:31:50 2012
Hostname: 0.0.0.0
Port: 1980
....
This repeats forever.
With a logging level of debug, I receive the following output: http://pastebin.com/nnyekuRA
Around line 58 is what I think is the culprit:
2012-02-09T17:18:22 DEBUG ZEO.ClientStorage pickled inval None '\x03\x94X\x8a\xa8\xe9\xf6\xee'
------
2012-02-09T17:18:22 BLATHER ZEO.zrpc (15892) CM.connect_done(preferred=1)
------
2012-02-09T17:18:22 BLATHER ZEO.zrpc (15892) CT: exiting thread: Connect([(2, ('127.0.0.1', 8100))])
But I have no idea why this is happening or even if this is correct.
Here is the buildout for deployment:
http://pastebin.com/u8D7swJs
The permissions were set incorrectly on the Plone 'parts' directory. This prevented 'uuid.txt' from being written in 'parts/instance/' . There were no error messages to indicate this problem.

Resources