nerdctl DNS timeout on Windows - networking

We just installed Rancher Desktop 1.4.1 (nerdctl v 0.20.0) on Windows 10 and we seem to have a problem pulling images and logging into a registry:
nerdctl pull alpine
docker.io/library/alpine:latest: resolving |--------------------------------------|
elapsed: 9.9 s total: 0.0 B (0.0 B/s)
INFO[0010] trying next host error="failed to do request: Head \"https://registry-1.docker.io/v2/library/alpine/manifests/latest\": dial tcp: lookup registry-1.docker.io on 192.168.167.172:53: read udp 192.168.167.172:47744->192.168.167.172:53: i/o timeout" host=registry-1.docker.io
FATA[0010] failed to resolve reference "docker.io/library/alpine:latest": failed to do request: Head "https://registry-1.docker.io/v2/library/alpine/manifests/latest": dial tcp: lookup registry-1.docker.io on 192.168.167.172:53: read udp 192.168.167.172:47744->192.168.167.172:53: i/o timeout
Trying to login results in similar errors:
nerdctl --debug-full login registry-1.docker.io
/usr/local/bin/docker-credential-rancher-desktop: source: line 5: can't open '/etc/rancher/desktop/credfwd': No such file or directory
Enter Username: myusername
Enter Password:
DEBU[0030] Ignoring hosts dir "/etc/containerd/certs.d" error="stat /etc/containerd/certs.d: no such file or directory"
DEBU[0030] Ignoring hosts dir "/etc/docker/certs.d" error="stat /etc/docker/certs.d: no such file or directory"
DEBU[0030] len(regHosts)=1
ERRO[0040] failed to call tryLoginWithRegHost error="failed to call rh.Client.Do: Get \"https://registry-1.docker.io/v2/\": dial tcp: lookup registry-1.docker.io on 192.168.167.172:53: read udp 192.168.167.172:36590->192.168.167.172:53: i/o timeout" i=0
FATA[0040] failed to call rh.Client.Do: Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io on 192.168.167.172:53: read udp 192.168.167.172:36590->192.168.167.172:53: i/o timeout
It looks like nerdctl is having problems resolving hostnames. It always times-out after 10 seconds.
Is there a way to explicitly configure hostname resolution in Rancher or nerdctl?
Any help would be appreciated.

Related

Starting OpenShift cluster never ends when starting minishift or takes to much memory

Whenever I run the command to start Minishift with the virtualbox driver to the OS host it takes a crazy time and it never ends. Sometimes I even get an error message on storage limit being reached.
I wonder if it is an error of Persistent storage volume configuration and usage that is described here
mike#mike-thinks:~$ minishift start --vm-driver=virtualbox
-- Starting profile 'minishift'
-- Check if depereccated options are used ... OK
-- Checking if https://github.com is reachable ... OK
-- Checking if requested OpenShift version 'v3.9.0' is valid ... OK
-- Checking if requested OpenShift version 'v3.9.0' is supported ... OK
-- Checking if requested hypervisor 'virtualbox' is supported on this platform ... OK
-- Checking if VirtualBox is installed ... OK
-- Checking the ISO URL ... OK
-- Checking if provided oc flags are supported ... OK
-- Starting local OpenShift cluster using 'virtualbox' hypervisor ...
-- Starting Minishift VM ........................ OK
-- Checking for IP address ... OK
-- Checking for nameservers ... OK
-- Checking if external host is reachable from the Minishift VM ...
Pinging 8.8.8.8 ... OK
-- Checking HTTP connectivity from the VM ...
Retrieving http://minishift.io/index.html ... OK
-- Checking if persistent storage volume is mounted ... OK
-- Checking available disk space ... 8% used OK
-- OpenShift cluster will be configured with ...
Version: v3.9.0
-- Copying oc binary from the OpenShift container image to VM .... OK
-- Starting OpenShift cluster ..............................................
What I can do ? I'm following this tutorial and I just want to get to the stage that allows me to add oc to the PATH
Update: new error during openshift cluster start
-- Starting OpenShift cluster ...........Error during 'cluster up' execution: Error starting the cluster. ssh command error:
command : /var/lib/minishift/bin/oc cluster up --use-existing-config --host-volumes-dir /var/lib/minishift/openshift.local.volumes --host-pv-dir /var/lib/minishift/openshift.local.pv --host-config-dir /var/lib/minishift/openshift.local.config --host-data-dir /var/lib/minishift/hostdata --public-hostname 192.168.99.100 --routing-suffix 192.168.99.100.nip.io
err : exit status 1
output : Deleted existing OpenShift container
Using nsenter mounter for OpenShift volumes
Using public hostname IP 192.168.99.100 as the host IP
Using 192.168.99.100 as the server IP
Starting OpenShift using openshift/origin:v3.9.0 ...
-- Starting OpenShift container ...
Starting OpenShift using container 'origin'
Waiting for API server to start listening
FAIL
Error: cannot access master readiness URL https://192.168.99.100:8443/healthz/ready
Details:
Last 10 lines of "origin" container log:
E0625 14:47:40.905680 2341 proxier.go:252] Error removing userspace rule: error checking rule: fork/exec /usr/sbin/iptables: exec format error:
E0625 14:47:40.908353 2341 proxier.go:259] Error removing userspace rule: error checking rule: fork/exec /usr/sbin/iptables: exec format error:
E0625 14:47:40.910681 2341 proxier.go:274] Error flushing userspace chain: error flushing chain "KUBE-PORTALS-CONTAINER": fork/exec /usr/sbin/iptables: exec format error:
E0625 14:47:40.913452 2341 proxier.go:274] Error flushing userspace chain: error flushing chain "KUBE-PORTALS-HOST": fork/exec /usr/sbin/iptables: exec format error:
E0625 14:47:40.919209 2341 proxier.go:274] Error flushing userspace chain: error flushing chain "KUBE-NODEPORT-HOST": fork/exec /usr/sbin/iptables: exec format error:
W0625 14:47:40.931698 2341 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
E0625 14:47:40.932412 2341 proxier.go:274] Error flushing userspace chain: error flushing chain "KUBE-NODEPORT-CONTAINER": fork/exec /usr/sbin/iptables: exec format error:
E0625 14:47:40.938345 2341 proxier.go:274] Error flushing userspace chain: error flushing chain "KUBE-NODEPORT-NON-LOCAL": fork/exec /usr/sbin/iptables: exec format error:
W0625 14:47:40.941639 2341 iptables.go:151] Error checking iptables version, assuming version at least 1.4.11: fork/exec /usr/sbin/iptables: exec format error
F0625 14:47:40.949329 2341 network.go:177] error: Could not initialize Kubernetes Proxy. You must run this process as root (and if containerized, in the host network namespace as privileged) to use the service proxy: failed to initialize iptables: error creating chain "KUBE-PORTALS-CONTAINER": fork/exec /usr/sbin/iptables: exec format error:
Caused By:
Error: Get https://192.168.99.100:8443/healthz/ready: dial tcp 192.168.99.100:8443: getsockopt: connection refused
If this should anyhow help, the first time it did not finish for me at all too. However the image was in a state "running" in VirtualBox.
The second time I run it via elevated command prompt - and it finalized successfully. But perhaps running inside elevated user did not help, but the fact that I run it the second time.
If the issue is that you're stuck on Starting Minishift VM (not the case for the OP), then the issue may be that you're on a VPN. Try disconnecting the VPN and see if that fixes your issues.
Had the same problem. I noticed some network traffic during the first (failing) startup with some slower network connection. I've waited some time until the network traffic was low and tried it again then and it worked. So probably during startup some docker image downloads are done.

FileZilla: able to connect via SFTP, but failed to list directories

I used FileZilla to connect to one of my Linux servers via the SFTP protocol, but got below error stack trace.
Status: Connecting to <server_ip>...
Response: fzSftp started, protocol_version=5
Command: keyfile "C:\ruifeng_ibm.ppk"
Command: open "root#<server_ip>" 22
Status: Connected to <server_ip>
Error: Connection timed out after 20 seconds of inactivity
Error: Could not connect to server
On the server when I ran lsof -i, I was able to see the established sshd connection.
sshd 12333 root 3u IPv4 109406 0t0 TCP <server_hostname>:ssh-><workstation_ip>:54315 (ESTABLISHED)
How could the directories not be listed when the connection is successful? No idea how to debug either.
Turned out to be a silly problem.
I put below welcome message in the .bashrc file.
echo -e "\n\nHello Ruifeng...Welcome to the Arena! \n#>>------>---->>"
Either it contained some illegal characters FileZilla does not honor, or it's completely not supported by FileZilla. Too lazy to further dig in. After removing this message, the connection worked and the directories got listed.

Passing hostname to netty

Background: I've got two machines with identical hostnames, I need to set up a local spark cluster for testing, setting up a master and a worker works fine, but trying to run an application with the driver causes problems, netty doesn't seem to be picking the correct host (regardless of what I put in there, it just picks the first host).
Identical hostname:
$ dig +short corehost
192.168.0.100
192.168.0.101
Spark config (used by master and the local worker):
export SPARK_LOCAL_DIRS=/some/dir
export SPARK_LOCAL_IP=corehost // i tried various like 192.168.0.x for
export SPARK_MASTER_IP=corehost // local, master and the driver
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=2
export SPARK_WORKER_MEMORY=2g
export SPARK_WORKER_INSTANCES=2
export SPARK_WORKER_DIR=/some/dir
Spark starts up and I can see the worker in the web-ui.
When I run the spark "job" below:
val conf = new SparkConf().setAppName("AaA")
// tried 192.168.0.x and localhost
.setMaster("spark://corehost:7077")
val sc = new SparkContext(conf)
I get this exception:
15/04/02 12:34:04 INFO SparkContext: Running Spark version 1.3.0
15/04/02 12:34:04 WARN Utils: Your hostname, corehost resolves to a loopback address: 127.0.0.1; using 192.168.0.100 instead (on interface en1)
15/04/02 12:34:04 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/04/02 12:34:05 ERROR NettyTransport: failed to bind to corehost.home/192.168.0.101:0, shutting down Netty transport
...
Exception in thread "main" java.net.BindException: Failed to bind to: corehost.home/192.168.0.101:0: Service 'sparkDriver' failed after 16 retries!
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Success.map(Try.scala:206)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15/04/02 12:34:05 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/04/02 12:34:05 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
15/04/02 12:34:05 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
Process finished with exit code 1
Not sure how to proceed... its a whole jungle of ip addresses.
Not sure if this is a netty issue either.
My experience with the identical problem is that it revolves around setting things up locally. Try being more verbose in your spark driver code, add the SPARK_LOCAL_IP and driver host ip to the config:
val conf = new SparkConf().setAppName("AaA")
.setMaster("spark://localhost:7077")
.set("spark.local.ip","192.168.1.100")
.set("spark.driver.host","192.168.1.100")
This should tell netty which of the two identical hosts to use.

Binding external IP address to Rabbit MQ server

I have box A and it has a consumer on it that listens on a Rabbit MQ server
I have box B that will publish a message to the listener
So as long as all of this in on box A and I start Rabbit MQ server w/ defaults it works fine.
The defaults are host=127.0.0.1 on port 5672, but
when I telnet box.a.ip.addy 5672 from box B I get:
Trying box.a.ip.addy...
telnet: connect to address box.a.ip.addy: No route to host
telnet: Unable to connect to remote host: No route to host
telnet on port 22 is fine, I can ssh into Box A from Box B
So I assume I need to change the ip that the RabbitMQ server uses
I found this: http://www.rabbitmq.com/configure.html and I now have a config file in the location the documentation said to use, with the name rabbitmq.config and it contains:
[
{rabbit, [{tcp_listeners, {"box.a.ip.addy", 5672}}]}
].
So I stopped the server, and started RabbitMQ server again. It failed. Here are the errors from the error logs. It's a little over my head. (in fact most of this is)
=ERROR REPORT==== 23-Aug-2011::14:49:36 ===
FAILED
Reason: {{case_clause,{{"box.a.ip.addy",5672}}},
[{rabbit_networking,'-boot_tcp/0-lc$^0/1-0-',1},
{rabbit_networking,boot_tcp,0},
{rabbit_networking,boot,0},
{rabbit,'-run_boot_step/1-lc$^1/1-1-',1},
{rabbit,run_boot_step,1},
{rabbit,'-start/2-lc$^0/1-0-',1},
{rabbit,start,2},
{application_master,start_it_old,4}]}
=INFO REPORT==== 23-Aug-2011::14:49:37 ===
application: rabbit
exited: {bad_return,{{rabbit,start,[normal,[]]},
{'EXIT',{rabbit,failure_during_boot}}}}
type: permanent
and here is some more from the start up log:
Erlang has closed
Error: {node_start_failed,normal}
^M
Crash dump was written to: erl_crash.dump^M
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{rabbit,failure_during_boot}}}}})^M
Please help
did you try adding?
RABBITMQ_NODE_IP_ADDRESS=box.a.ip.addy
to the /etc/rabbitmq/rabbitmq.conf file?
Per http://www.rabbitmq.com/configure.html#customise-general-unix-environment
Also per this documentation it states that the default is to bind to all interfaces. Perhaps there is a configuration setting or environment variable already set in your system to restrict the server to localhost overriding anything else you do.
UPDATE: After reading again I realize that the telnet should have returned "Connection Refused" not "No route to host." I would also check to see if you are having a firewall related issue.
You need to open up the tcp port on your firewall
Using Linux, Find the iptables config file:
eric#dev ~$ find / -name "iptables" 2>/dev/null
/etc/sysconfig/iptables
Edit the file:
sudo vi /etc/sysconfig/iptables
Fix the file by adding a port:
# Generated by iptables-save v1.4.7 on Thu Jan 16 16:43:13 2014
*filter
-A INPUT -p tcp -m tcp --dport 15672 -j ACCEPT
COMMIT

JNDI over HTTP on JBoss 4.2.3GA

I've got a remote server on eapps.com that I'm using as my "production" server. I have my own computer at home that I'm using as my "development" server. I'm trying to use JNDI over HTTP to do some batch processing. The following works at home, but not on the eapps machine.
I'm connecting to some EJBs (stateless session), and have my jndi.properties set to this:
(this is for the eapps machine)
java.naming.factory.initial=org.jboss.naming.HttpNamingContextFactory
java.naming.provider.url=http://my.prodhost.com:8080/invoker/JNDIFactory
java.naming.factory.url.pkgs=org.jboss.naming.client:org.jnp.interfaces
# timeout is in milliseconds
jnp.timeout=15000
jnp.sotimeout=15000
jnp.maxRetries=3
(this is for my machine at home)
java.naming.factory.initial=org.jboss.naming.HttpNamingContextFactory
java.naming.provider.url=http://localhost:8080/invoker/JNDIFactory
java.naming.factory.url.pkgs=org.jnp.interfaces
java.naming.factory.url.pkgs=org.jboss.naming.client
# timeout is in milliseconds
jnp.timeout=15000
jnp.sotimeout=15000
jnp.maxRetries=3
As I said, it works at home, but when I try it remotely, I get:
Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://my.prodhost.com:4446//?dataType=invocation&enableTcpNoDelay=true&marshaller=org.jboss.invocation.unified.marshall.InvocationMarshaller&socketTimeout=600000&unmarshaller=org.jboss.invocation.unified.marshall.InvocationUnMarshaller]
...
Caused by: java.net.ConnectException: Connection timed out: connect
Am I doing something wrong here, or is it possibly a firewall issue? To the best of my knowledge, port 4446 is not blocked.
Are the differences in the jndi.properties intentional (at the java.naming.factory.url.pkgs property level)?
Also, can you run a netstat -a | grep 4446 on both machines and update the question with the output?
Update: If the netstat command didn't return anything for port 4446 (JBoss was running, right?), then the JBoss Remoting Connector for the UnifiedInvoker service is very likely not listening on your eApps host, hence the connection timeout. Maybe this service has been disabled by eApps, you should contact the support and discuss this with them.
Just in case, a sample Connector configuration can be found in the jboss-service.xml under the server node's conf directory. Maybe compare the remote one (if you have access to it) with your local file to confirm this (but if it's disable, there must be a reason, discuss it with the support).
And by the way, this is what I get when I run the netstat command with JBoss 4.2.3.GA started on my GNU/Linux machine (default configuration):
$ netstat -a | grep 4446
tcp 0 0 localhost:4446 *:* LISTEN

Resources