Hazelcast tcp-ip config REST API is not enabled. Unknown protocol: I^#^# - tcp

The scenario is the same as Hazelcast tcp-ip configuration cluster: Unwanted IPs join the cluster even after cluster-name is specified
When I finished starting two nodes in cluster1, everything ran OK. However, when I ran one node from cluster2, I see the following error after a while. In the log, I have masked the IP with "machineC_IP" and "transformed_IP". Note that I have turned on the REST enabled as
hazelcast:
rest:
enabled: true
log:
com.hazelcast.logging.StandardLoggerFactory$StandardLogger [hz.thirsty_brahmagupta.IO.thread-in-1] [machineC_IP]:5702 [dev] [4.0.1] Connection[id=5, /machineC_IP:5702->/transformed_IP:51320, qualifier=null, endpoint=null, alive=false, connectionType=NONE] closed. Reason: Exception in Connection[id=5, /machineC_IP:5702->/transformed_IP:51320, qualifier=null, endpoint=null, alive=true, connectionType=NONE], thread=hz.thirsty_brahmagupta.IO.thread-in-1
java.lang.IllegalStateException: REST API is not enabled.
at com.hazelcast.internal.nio.tcp.UnifiedProtocolDecoder.onRead(UnifiedProtocolDecoder.java:105)
at com.hazelcast.internal.networking.nio.NioInboundPipeline.process(NioInboundPipeline.java:137)
at com.hazelcast.internal.networking.nio.NioThread.processSelectionKey(NioThread.java:382)
at com.hazelcast.internal.networking.nio.NioThread.processSelectionKeys(NioThread.java:367)
at com.hazelcast.internal.networking.nio.NioThread.selectLoop(NioThread.java:293)
at com.hazelcast.internal.networking.nio.NioThread.run(NioThread.java:248)
com.hazelcast.logging.StandardLoggerFactory$StandardLogger [hz.suspicious_brahmagupta.IO.thread-in-2] [machineC_IP]:5701 [cluster2] [4.0.1] Connection[id=18, /machineC_IP:5701->/transformed_IP:46468, qualifier=null, endpoint=null, alive=false, connectionType=NONE] closed. Reason: Exception in Connection[id=18, /machineC_IP:5701->/transformed_IP:46468, qualifier=null, endpoint=null, alive=true, connectionType=NONE], thread=hz.suspicious_brahmagupta.IO.thread-in-2
java.lang.IllegalStateException: Unknown protocol: I^#^#
at com.hazelcast.internal.nio.tcp.UnifiedProtocolDecoder.onRead(UnifiedProtocolDecoder.java:116)
at com.hazelcast.internal.networking.nio.NioInboundPipeline.process(NioInboundPipeline.java:137)
at com.hazelcast.internal.networking.nio.NioThread.processSelectionKey(NioThread.java:382)
at com.hazelcast.internal.networking.nio.NioThread.processSelectionKeys(NioThread.java:367)
at com.hazelcast.internal.networking.nio.NioThread.selectLoop(NioThread.java:293)
at com.hazelcast.internal.networking.nio.NioThread.run(NioThread.java:248)

Related

Ansible Hetzner Cloud - Create a server in private network

I am using Ansible to create a server in the Hetzner Cloud, the playbook reads:
- name: create the server at Hetzner
hetzner.hcloud.hcloud_server:
name: "{{server_hostname}}"
enable_ipv4: false
enable_ipv6: false
server_type: cx11
location: "{{server_location}}"
image: ubuntu-22.04
ssh_keys:
- "mykey"
state: present
api_token: "{{hetzner_secret}}"
private_networks: ipfire
register: server
My aim is to integrate the new server into the private network named 'ipfire' that I have previously created. The server should not be accessible via the internet, so I have disabled ipv4 and ipv6. Rather, I'd like to access the server by connecting via OpenVPN to the private network 'ipfire' and connect by use of ssh from there.
Unfortunately, I get an error message as follows:
PLAY [Order servers] ********************************************************************************************************
TASK [hetznerserver : create the server at Hetzner] *************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Unsupported parameters for (hetzner.hcloud.hcloud_server) module: private_networks. Supported parameters include: rebuild_protection, api_token, location, enable_ipv6, upgrade_disk, ipv4, endpoint, ipv6, firewalls, server_type, state, force, labels, ssh_keys, delete_protection, image, id, name, enable_ipv4, placement_group, force_upgrade, user_data, datacenter, rescue_mode, allow_deprecated_image, volumes, backups."}
PLAY RECAP ******************************************************************************************************************
localhost : ok=0 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
The module private_networks does not seem to work like this?
Error messages like Unsupported parameters for (<moduleName>) module: <givenParameter>. Supported parameters include: <supportedParametersList> are usually syntax errors of the module used.
Therefore one may need to look up the respective documentation, in the example case hcloud_server module – Create and manage cloud servers on the Hetzner Cloud.
If the documentation shows the Parameters in question are available, it indicates
either a version mismatch of module used, means the used version is too old and an update is necessary
or an bug within the module code and further debugging and investigation within the module code is necessary
Code and Documentation Links
Community Authors> hetzner> hcloud
ansible-collections / hetzner.hcloud
After further investigation it might turn out that the parameter in question was introduced recently, in example
Github hetzner.hcloud Issue #150 "Unable to create cloud server without public ipv4 and ipv6"
Github hetzner.hcloud Pull #160 "Add possibility to specify private network when creating or updating servers"
which indicates in your example case that you'll need to update the Ansible Collection module in question since the parameter wasn't introduced in your used version of the module but as of v1.9.0.

Kubernetes GKE Error dialing backend: EOF on random exec command

On GKE we experiencing some random error with the API.
Many time ago we have "Error dialing backend: EOF".
We use Jenkins on top of K8s to manage our build. And afew time ago job is killed with this error:
Executing shell script inside container [protobuf] of pod [kubernetes-bad0aa993add416e80bdc1e66d1b30fc-536045ac8bbe]
java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
at com.squareup.okhttp.ws.WebSocketCall.createWebSocket(WebSocketCall.java:123)
at com.squareup.okhttp.ws.WebSocketCall.access$000(WebSocketCall.java:40)
at com.squareup.okhttp.ws.WebSocketCall$1.onResponse(WebSocketCall.java:98)
at com.squareup.okhttp.Call$AsyncCall.execute(Call.java:177)
at com.squareup.okhttp.internal.NamedRunnable.run(NamedRunnable.java:33)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
This case looks a lot like: https://gitlab.com/gitlab-org/gitlab-runner/issues/3247
Many Audit log url:
permission: "io.k8s.core.v1.pods.exec.create"
resource: "core/v1/namespaces/default/pods/pubsub-6132c0bc-2542-46a2-8041-c865f238698d-4ccc0-c1nkz-lqg5x/exec/pubsub-6132c0bc-2542-46a2-8041-c865f238698d-4ccc0-c1nkz-lqg5x"
and
permission: "io.k8s.core.v1.pods.exec.get"
resource: "core/v1/namespaces/default/pods/pubsub-a5a21f14-0bd1-4338-87b1-8658c3bbc7ad-9gm4n-8nz14/exec"
But i don't unerstand why this error comes on Kubernetes...
Update:
Those error can be validated with kube-state-metrics with 2 of them:
- ssh_tunnel_open_count
- ssh_tunnel_open_fail_count
For me the number of open tunnel ssh fail grow with more than 200 ssh tunnel open.
For information, we have make some test with GKE
- swith from zonal to regional cluster
- use new native IP (old alias IP)
But this not solve the problem.
After disabling auto-scaling on node-pool , we have no more error.
I could fix this issue by deactivating auto-scaling profile optimize-utilization/resetting the profile back to the default balanced. optimize-utilization is in beta status anyway.

Getting error while redeploying nodes

I am still using Corda 1.0 version. when i try to redeploy nodes with existing data, getting below error while start-up but able to access the nodes . If i clear the data and redeploy the nodes, i didn't face these error message.
Logs can be found in :
C:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\kotlin-
source\build\nodes\xxxxxxxx\logs
Database connection url is : jdbc:h2:tcp://xxxxxxxxx/node
E 18:38:46+0530 [main] core.client.createConnection - AMQ214016: Failed to
create netty connection
javax.net.ssl.SSLException: handshake timed out
at io.netty.handler.ssl.SslHandler.handshake(...)(Unknown Source) ~[netty
all-4.1.9.Final.jar:4.1.9.Final]
Incoming connection address : xxxxxxxxxxxx
Listening on port : 10014
RPC service listening on port : 10015
Loaded CorDapps : corda-finance-1.0.0, kotlin-
source-0.1, corda-core-1.0.0
Node for "xxxxxxxxxxx" started up and registered in 213.08 sec
Welcome to the Corda interactive shell.
Useful commands include 'help' to see what is available, and 'bye' to shut
down the node.
Wed May 23 18:39:20 IST 2018>>> E 18:39:24+0530 [Thread-6 (ActiveMQ-server-
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImp
l$3#4a532271)] core.client.createConnection - AMQ214016: Failed to create
netty connection
javax.net.ssl.SSLException: handshake timed out
at io.netty.handler.ssl.SslHandler.handshake(...)(Unknown Source) ~[netty-
all-4.1.9.Final.jar:4.1.9.Final]
This looks like the Artemis failed to connect to the node which means the node fails to start.
You should look at the log and if there are any other previous Corda node started which occupy the node.
If there are any legacy Corda nodes that have not been killed, try ps -ef |grep java to see if there is any other java still alive. Especially look for the port number and check if they are overlapped

openstack: Failed to launch instance from the glance

We have setup OpenStack using conjure-up on a (Ubuntu LTS server 16.04.3) single machine. All are services are up and running, and successfully I am able to upload images to the glance.
We wanted to save these glance images created by "glance image-create" in remote machine which have nfs server. So we have configured glance-api.conf file as below.
My glance-api.conf looks like this:
[glance_store]
filesystem_store_datadir = /var/lib/glance/images
default_store = file
And in glance controller node, I have mounted
remote machine Ip/home/glance/images/ in this directory path
/var/lib/glance/images
and have mentioned the same mounted directory path inside the glance-api.conf file.
I have created the two sample private network with some ip (192.168.1.0 and 10.221.50.0) but have not created a public network as at this moment I don't want to access this VM instance from outside.
When I am trying to launch the instance from dashboard UI as well as through CLI, I am getting below error.
Error: Failed to perform requested operation on instance "Ubuntu_Hawkbit", the instance has an error status: Please try again later [Error: No valid host was found. There are not enough hosts available.].
Note: I have tried by associating Instance with different private network ,thinking that it may be network IP address issue but facing the same error.
When I check /var/log/nova/nova-compute.log logs, I see below error.
ERROR nova.image.glance [req-1459f1b2-491c-46a2-b803-6ff621a79d30 6ebc7996240c4ce688234f544c9d0116 07427c9d49704357a049b24193ee0a28 - -
-] Error contacting glance server 'http://10.206.193.159:9292' for 'data', done trying.
ERROR nova.image.glance CommunicationError:
Error finding address for
http://10.206.193.159:9292/v1/images/6c30e2ab-1078-45ad-bed2-3e3a75f6af8c:
('Connection aborted.', BadStatusLine("''",))
ERROR
nova_lxd.nova.virt.lxd.image
[req-1459f1b2-491c-46a2-b803-6ff621a79d30
6ebc7996240c4ce688234f544c9d0116 07427c9d49704357a049b24193ee0a28 - -
-] [instance: eedc008d-ef34-498d-8774-b3813ce032f4] Failed to upload 6c30e2ab-1078-45ad-bed2-3e3a75f6af8c to LXD: Connection to glance
ERROR nova_lxd.nova.virt.lxd.operations
[req-1459f1b2-491c-46a2-b803-6ff621a79d30
6ebc7996240c4ce688234f544c9d0116 07427c9d49704357a049b24193ee0a28 - -
-] [instance: eedc008d-ef34-498d-8774-b3813ce032f4] Faild to start container instance-00000020: Connection to glance host
http://10.206.193.159:9292 failed: Error finding address for
http://10.206.193.159:9292/v1/images/6c30e2ab-1078-45ad-bed2-3e3a75f6af8c:
('Connection aborted.', BadStatusLine("''",))
ERROR nova.compute.manager [req-1459f1b2-491c-46a2-b803-6ff621a79d30
6ebc7996240c4ce688234f544c9d0116 07427c9d49704357a049b24193ee0a28 - -
-] [instance: eedc008d-ef34-498d-8774-b3813ce032f4] Instance failed to spawn

Spark & SparkR Configuration on EC2 - Java Timeout

I'm trying to get Spark, and SparkR, running on a small EC2 cluster using the provided scripts and directions. Whenever I ask for an operation that would require computation on an RDD (e.g., collect(), reduce()), I get the error logged below. Workers do appear to startup correctly -- if I only parallelize, I see the workers running via the master's web ui.
The error I get is similar to the one in Intermittent Timeout Exception using Spark and I've been through all of the solutions there (modifying the conf file for URL's, disabling the firewall, etc.), no luck.
Here is the error log, thank you in advance for your help:
15/02/17 19:10:22 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
15/02/17 19:10:22 INFO spark.SecurityManager: Changing view acls to: root,-
15/02/17 19:10:22 INFO spark.SecurityManager: Changing modify acls to: root,-
15/02/17 19:10:22 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, -); users with modify permissions: Set(root, -)
15/02/17 19:10:23 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/02/17 19:10:23 INFO Remoting: Starting remoting
15/02/17 19:10:23 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher#-.ec2.internal:60218]
15/02/17 19:10:23 INFO util.Utils: Successfully started service 'driverPropsFetcher' on port 60218.
15/02/17 19:10:53 ERROR security.UserGroupInformation: PriviledgedActionException as:- cause:java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException: Unknown exception in doAs
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1134)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:115)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:161)
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.security.PrivilegedActionException: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
... 4 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:127)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:59)
... 7 more
This was ultimately resolved by a combination of
- Updates to SparkR, which have resolved a number of serialization issues.
- Recognizing that the Spark-ec2 scripts require that the control node and master node be the same machine.
and
- Replacing calls to parallelize() with distributing and then loading the data by hadoop.
I am writing an intro to SparkR for R programmers that I hope will help people with things like this in the future.

Resources