Failed to install JUPYTER in restricted dataproc setup - jupyter-notebook

I am looking to setup a dataproc cluster with Jupter optional component .
gcloud beta dataproc clusters create cluster-1ea3 --enable-component-gateway \
--region europe-west1 --subnet data-network --no-address --zone europe-west1-b \
--single-node --master-machine-type n1-standard-4 --master-boot-disk-size 500 \
--image-version 1.5-debian10 --optional-components ANACONDA,JUPYTER \
--scopes 'https://www.googleapis.com/auth/cloud-platform' --project clouddemoenvironment
"--no-address" ensures private IP and the network "data-network" is enabled with Google private access. Things works great if i am not installing Jupyter optional component but cluster fails to come up with below error with optional components.
<13>Nov 5 09:01:44 google-dataproc-startup[1466]: <13>Nov 5 09:01:44 activate-component-jupyter[2710]: Looking in links: /opt/dataproc/jupyter/gcp
<13>Nov 5 09:01:44 google-dataproc-startup[1466]: <13>Nov 5 09:01:44 activate-component-jupyter[2710]: Collecting https://github.com/GoogleCloudPlatform/jupyter-extensions/archive/2cb9d24fe01cd329a8c4352a07b0eb8f9771fb07.zip#subdirectory=jupyter-gcs-contents-manager (from -r /opt/dataproc/jupyter/jupyter_extra_packages.requirements (line 1))
<13>Nov 5 09:01:59 google-dataproc-startup[1466]: <13>Nov 5 09:01:59 activate-component-jupyter[2710]: WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.VerifiedHTTPSConnection object at 0x7f6b1afbac10>, 'Connection to github.com timed out. (connect timeout=15)')': /GoogleCloudPlatform/jupyter-extensions/archive/2cb9d24fe01cd329a8c4352a07b0eb8f9771fb07.zip
I understand that the cluster has no access to github and it makes sense to fail. On checking the documentation it was quoted
If you create a Dataproc cluster with internal IP addresses only, attempts to access github.com over the Internet in an initialization action will fail unless you have configured routes to direct the traffic through Cloud NAT or a Cloud VPN. Without access to the Internet, you can enable Private Google Access, and place job dependencies in Cloud Storage; cluster nodes can download the dependencies from Cloud Storage from internal IPs.
I don't want to use Cloud NAT or Cloud VPN . Is there some thing i can convey to the system to resolve the dependency in a different way . Unfortunately the initialization script concept might also wont work as the order of execution comes after the Optional components.
Any suggestions how I can leverage Optional Components in a non internet environment.
Regards,
Jill

This startup time dependency is a bug in the latest Dataproc images.
It should be fixed with the next Dataproc subminor image version release.
To work around this for now, you can use the previous subminor image version. (--image-version=1.5.18-debian10)
UPDATE: this problem has been fixed in the Nov 9 2020 release, so you can just use the latest version.

Related

MPI A process or daemon was unable to complete a TCP connection

Open MPI: 4.0.1a
HostFile:
34bb0519eAAA
a2935f150BBB
I am in machine 34bb0519eAAA. And I could use ssh a2935f150BBB to connect a2935f150BBB successfully. And also ssh 34bb0519eAAA In machine a2935f150BBB to connect 34bb0519eAAA successfully .
But when I mpiexec command . I get error message
****Warning: Permanently added '[XX.XX.XX.XX]:XX' (a2935f150BBB'IP address) to the list of known hosts.**
----------------------**--------------------------------------
A process or daemon was unable to complete a TCP connection
to another process:
Local host: a2935f150BBB
Remote host: 34bb0519eAAA
This is usually caused by a firewall on the remote host. Please
check that any firewall (e.g., iptables) has been disabled and
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
I am very confused that.Because I run ssh to each other successfully . How could fail that.
Here is ssh connection
ssh a2935f150BBB
Warning: Permanently added '[XX.XX.XX.XX]:XX to the list of known hosts.
Welcome to Ubuntu 18.04.1 LTS (XXXXXXXXXXXXXXXXXX)
Documentation: https://help.ubuntu.com
Management: https://landscape.canonical.com
Support: https://ubuntu.com/advantage
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.
To restore this content, you can run the 'unminimize' command.
Last login:XXXXXXXXXXXXX from XXXXXXXXXX

HAAst terminating with exit code 158

I'm just trying to do a POC test with Telium's HAAst before we offer it to a customer, but I've stalled before I start the haast daemon. Currently I have a single VM with Ubuntu 16.04 LTS with Digium's basic Asterisk 13 installation. I've configured haast.conf, it seems good, but I cannot start haast daemon, it stops after a few seconds. Here is the relevant log output:
General, HAAst version 2.3.2.1 starting as daemon under process ID 2409
Controller, Local peer HAAst state changing to service start
License, License file not found. Switching to Free Edition
General, Settings contained 0 information; 0 warning; and 0 error messages.
Asterisk Controller, Unable to located executable to control Asterisk
Controller, Local peer HAAst state changing to service stop
Controller, Stopped
General, HAAst terminating with exit code 158 (failure to find asterisk control files) after running for 2 seconds
It seems, haast misses the event controller to start Asterisk daemon, unfortunately it didn't contain the installation package. I've tried to make these files (asterisk.start & asterisk.stop) based on the other sample event files, I've set the executable bit, I've wrote the shebang to the first line based on the installation guide, but nothing helped.
Is somebody experienced about this case?
Thanks, Zsolt
This error means that High Availability for Asterisk (HAAst) is unable to find the service/executable file needed to control Asterisk. Since the 'distribution' setting in the [asterisk] stanza of the haast.conf file is it to 2 (Digium Asterisk), it means there's a problem with the Asterisk service file.
Ubuntu 16 uses systemd so have you installed Digium's asterisk.service (systemd) file? If you chose to install an initd service file for Asterisk instead then you may have to explicitly tell HAAst which to look for. If you installed neither then that's your problem. The maker of HAAst (Telium) has a support forum where this topic is addressed (click here).
The pre and post Asterisk event handlers are available in the commercial versions of HAAst only - so that won't help (but it's also the wrong way to solve the problem). There are also a few Ubuntu specific topics on the support forum https://www.telium.io/haast in case that helps.
If you can't find an Asterisk systemd service file here's a sample:
[Unit]
Description=Asterisk PBX and telephony daemon
Documentation=man:asterisk(8)
Wants=network.target
After=network.target
[Service]
Type=simple
User=asterisk
Group=asterisk
ExecStart=/usr/bin/asterisk -f -C /etc/asterisk/asterisk.conf
ExecStop=/usr/bin/asterisk -rx 'core stop now'
ExecReload=/usr/bin/asterisk -rx 'core reload'
[Install]
WantedBy=multi-user.target
Just save that file as 'asterisk.service' and place in /etc/systemd/system/ and ensure permissions match other service/unit files.
Haast configuration is missing or not correct:
Unable to located executable to control Asterisk

bootstrap clodufiy3.4 error occured

I installed cloudify3.4 according to the cloudify DOCS. When I install the manager, and executed like this:
# cfy bootstrap --install-plugins -p openstack-manager-blueprint.yaml -i openstack-manager-blueprint-inputs.yaml
an error occurred:
[ERROR] Workflow failed: Task failed 'fabric_plugin.tasks.run_script' -> Timed out trying to connect to 192.168.17.15 (tried 5 times)
I have already build a extern network 192.168.17.0/24 and I have already installed
cloudify_docker_plugin-1.3.2-py27-none-linux_x86_64-Ubuntu-trusty.wgn
cloudify_fabric_plugin-1.4.1-py27-none-linux_x86_64-centos-Core.wgn
cloudify_fabric_plugin-1.4.1-py27-none-linux_x86_64-redhat-Maipo.wgn
cloudify_host_pool_plugin-1.4-py27-none-linux_x86_64-centos-Core.wgn
cloudify_openstack_plugin-1.4-py27-none-linux_x86_64-redhat-Maipo.wgn
So, how to solve this error? Thank you to everyone who helped me!
It seems that you can't connect the manager.
Please make sure that you have an ssh connection from the CLI to the manager.
Since you are bootstrapping an Openstack manager you should make sure to have an external IP if you are outside of Openstack or that the CLI is on the same network if you are on Openstack.

Permission Issue with Docker Volume Driver for Azure File Storage

I am following the readme for this project (https://github.com/Azure/azurefile-dockervolumedriver/blob/master/contrib/init/upstart/README.md), but when I try and mount a volume on a container like this
docker volume create -d azurefile -o share=myshare --name=myvol
docker run -i -t -v myvol:/data busybox
(inside the container)
# cd /data
# touch file.txt
I get this error:
Error response from daemon: VolumeDriver.Mount: mount failed: exit status 32
output="mount.cifs kernel mount options: ip=168.61.57.82,unc=\\\\cmstoragecd.file.core.windows.net\\myshare,vers=3.0,dir_mode=0777,file_mode=0777,user=cmstoragecd,pass=********\nmount
error(13): Permission denied\nRefer to the mount.cifs(8) manual page (e.g. man mount.cifs)\n"
This is running on an Ubuntu 14.04 server on Azure. I have successfully used the extension with similiar servers, but it is now not working. What can I do to debug this?
your answer is correct. CIFS in many Linux distros currently do not have encryption support ––which Azure File Storage requires in cross-region SMB traffic.
Quoting the note at https://azure.microsoft.com/en-us/documentation/articles/storage-how-to-use-files-linux/
Note: The Linux SMB client doesn’t yet support encryption, so mounting a file share from Linux still requires that the client be in the same Azure region as the file share. However, encryption support for Linux is on the roadmap of Linux developers responsible for SMB functionality. Linux distributions that support encryption in the future will be able to mount an Azure File share from anywhere as well.
In the future, please consider directly contacting to us by opening a new issue on our GitHub repository at: https://github.com/Azure/azurefile-dockervolumedriver/issues.
I managed to get around this error by using a storage account in the same region as the Azure VM. Originally I had a VM running in West Europe, using a file share in East US.

openMPI/mpich2 doesn't run on multiple nodes

I am trying to use install openMPI and mpich2 on a multi-node cluster and I am having trouble running on multiple machines in both cases. Using mpich2 I am able to run on an specific host from the head node, but if I try to run something from the compute nodes to a different node I get:
HYDU_sock_connect (utils/sock/sock.c:172): unable to connect from "destination_node" to "parent_node" (No route to host)
[proxy:0:0#destination_node] main (pm/pmiserv/pmip.c:189): unable to connect to server parent_node at port 56411 (check for firewalls!)
If I try to use sge to set up a job I get similar errors.
On the other hand, if I try to use openMPI to run jobs, I am not able to run in any remote machine, even from the head node. I get:
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
The machines are connected to each other, I can ping, ssh passwordlessly etc from any of them to any other, MPI_LIB and the PATH are well set in all machines.
Usually this is caused because you didn't set up a hostfile or pass the list of hosts on the command line.
For MPICH, you do this by passing the flag -host on the command line, followed by a list of hosts (host1,host2,host3,etc.).
mpiexec -host host1,host2,host3 -n 3 <executable>
You can also put these in a file:
host1
host2
host3
Then you pass that file on the command line like so:
mpiexec -f <hostfile> -n 3 <executable>
Similarly, with Open MPI, you would use:
mpiexec --host host1,host2,host3 -n 3 <executable>
and
mpiexec --hostfile hostfile -n 3 <executable>
You can get more information at these links:
MPICH - https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager#Hydra_with_Non-Ethernet_Networks
Open MPI - http://www.open-mpi.org/faq/?category=running#mpirun-hostfile

Resources