Unable to execute MPICH2 on multiple machines on ubuntu 12.04 (HYDU_sock_connect issue) - mpi

I am facing difficulty in executing MPI program on two machines. The OS is Ubuntu 12.04. And the MPI implementation is MPICH2
ssh is working fine:
root#ubuntu:/home# ssh 192.168.1.9
root#gpuguy's password:
Welcome to Ubuntu 12.04.3 LTS (GNU/Linux 3.8.0-29-generic i686)
* Documentation: https://help.ubuntu.com/
131 packages can be updated.
67 updates are security updates.
Last login: Thu Oct 24 17:36:25 2013 from ubuntu.local
root#gpuguy:~#
But when I run my MPI programs it fails:
root#ubuntu:/home# mpiexec -f hosts.cfg -n 4 hello
root#192.168.1.9's password:
[proxy:0:0#gpuguy] HYDU_sock_connect (./utils/sock/sock.c:171): unable to get host address for ubuntu (1)
[proxy:0:0#gpuguy] main (./pm/pmiserv/pmip.c:209): unable to connect to server ubuntu at port 42104 (check for firewalls!)
I have already disabled firewall on both machines that is the reason I can do ssh successfully. But how to solve this issue?
My MPI code runs successfully on single machine.

For MPICH (or any MPI implementation) to work, you need to have passwordless SSH set up. I should also mention that you really shouldn't have to be logged in as root to make this work. It's generally a very bad idea to be logged in as root all of the time.

In /etc/hosts file, add ip address of each server and its hostname.
You should do this for all the servers.
for example:
10.10.0.5 server1
10.10.0.6 server2
10.10.0.7 server3
Just check in /etc/hosts file, not use tab (\t) instead of space to separate between ip address and hostname.
This is wrong:
10.10.0.5 \t server1
This is true:
10.10.0.5 server1
Be careful to not delete or modify existed lines in /etc/hosts file. only add new lines at end of file.
Also, you do not need to disable firewall to fix this issue.

Related

ERROR 2002 (HY000): Can't connect to server on 'xxx.xxx.XX' (60) (MariaDB 10.8)

Since a day or so I can not access the databases on two of my servers any longer
I use
mysql -h host.sld.TLD -P 3306 -user user
which I have configured to allow my user from my host without password
but get the above error.
However, when I use
telnet host.sld.TLD 3306
I get
5.5.5-10.8.5-MariaDB-1:10.8.5+maria~ubu2004(si4cyW'Y��-n;{ypDA\)VU)mysql_native_passwordC
I am using homebrew's mariadb (currently 10.9.3) on my machine, which I can reach from the outside. One each of the 'failed' remotes is on ubuntu with 10.8 and one on a Mac also with 10.8, and outgoing works from both. OpenSSL is version 1.1.1s on both Macs
I have installed a number of different mariadb versions all have the same issues, as do their perl libraries. mysql itself works.
What am I doing wrong here?
This issue has been fixed in MariaDB 10.9.4 which was released yesterday. Brew still offers 10.9.3, usually it takes a couple of days until latest 10.9 release will be available via brew.
The issue doesn't affect the server itself, but Connector/C and command line tools which link against Connector/C.
See also: MariaDB connector in Python cannot connect to remote server

MPI A process or daemon was unable to complete a TCP connection

Open MPI: 4.0.1a
HostFile:
34bb0519eAAA
a2935f150BBB
I am in machine 34bb0519eAAA. And I could use ssh a2935f150BBB to connect a2935f150BBB successfully. And also ssh 34bb0519eAAA In machine a2935f150BBB to connect 34bb0519eAAA successfully .
But when I mpiexec command . I get error message
****Warning: Permanently added '[XX.XX.XX.XX]:XX' (a2935f150BBB'IP address) to the list of known hosts.**
----------------------**--------------------------------------
A process or daemon was unable to complete a TCP connection
to another process:
Local host: a2935f150BBB
Remote host: 34bb0519eAAA
This is usually caused by a firewall on the remote host. Please
check that any firewall (e.g., iptables) has been disabled and
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
I am very confused that.Because I run ssh to each other successfully . How could fail that.
Here is ssh connection
ssh a2935f150BBB
Warning: Permanently added '[XX.XX.XX.XX]:XX to the list of known hosts.
Welcome to Ubuntu 18.04.1 LTS (XXXXXXXXXXXXXXXXXX)
Documentation: https://help.ubuntu.com
Management: https://landscape.canonical.com
Support: https://ubuntu.com/advantage
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.
To restore this content, you can run the 'unminimize' command.
Last login:XXXXXXXXXXXXX from XXXXXXXXXX

mount.nfs: requested NFS version or transport protocol is not supported

NFS Mount is not working in my RHEL 7 AWS instance.
When I do a
mount -o nfsvers=3 10.10.11.10:/ndvp2 /root/mountme2/
I get the error:
mount.nfs: requested NFS version or transport protocol is not supported
Can anyone point me where I am wrong?
Thanks.
Check the nfs service is started or reboot the nfs service.
sudo systemctl status nfs-kernel-server
In my case this package was not running and the issue was in /etc/exports file where i was having same IP address for two machines.
So i commented one ip address for the machine and restarted nf-kernel-server using
sudo systemctl restart nfs-kernel-server and reload the machine.
It worked.
A precision which might be useful for the dump (like me): systemctl status nfs-server.service and systemctl start nfs-server.service must be executed on the server!
Some additional data
If, like me, you've deleted a VM without shutting it down right you might also need to manually edit the file /etc/exports because NFS is trying to connect to it and fails but doesn't continue with the next, it just dies.
After that you can manually restart as mentioned in other answers.
In my case, a simple reload didn't suffice. I had to perform a full restart:
sudo systemctl status nfs-kernel-server
In my case, it didn't work correctly with version NFS 4.1.
So in Vargantfile in each place where is type: 'nfs' I added coma and nfs_version: 4, nfs_udp: false
Here is more detailing explanation NFS
If you're giving a specific protocol to connect with, also check to make sure your NFS server has that protocol enabled.
I got this error when trying to start up a Vagrant box, and my nfs server was running. It turns out that the command Vagrant uses is:
mount -o vers=3,udp,rw,actimeo=1 192.168.56.1:/dir/on/host /vagrant
Which specifically asks for UDP. My server was running but it was not configured to enable connecting over UDP. After consulting /etc/nfs.conf, I created /etc/nfs.conf.d/10-enable-udp.conf with the following contents to enable udp:
[nfsd]
udp=y
The name of the file doesn't matter, as long as it's in the conf.d directory and ends in .conf. Depending on your distribution it may be configured differently. You can directly edit nfs.conf, but using a conf.d file is more likely to preserve the changes after upgrading your system.
Try to ping IP address of the server "ping " from client "ping , if you get reply then install nfs server on the host. Then edit /etc/exports file don't forget to add port along with IP address
I got the solution: make an entry in nfs server /etc/nfsmount.conf with Defaultvers=3 .
There will # Defaultvers=3 just unhash it and then mount on nfs client.
Issue will be resolved!

MPICH2 gethostbyname failed

I don't understand the error message. I am trying to do is to run a MPICH2 application after I installed mpich2 version 1.4 or 1.5 to /opt/mpich2 (both version failed with the same error). My MPI application was compiled with 1.3 but I am able to run it with mpi 1.4 on another workstation. I am testing it on Ubuntu 12.04.
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(467)..............:
MPID_Init(177).....................: channel initialization failed
MPIDI_CH3_Init(70).................:
MPID_nem_init(319).................:
MPID_nem_tcp_init(171).............:
MPID_nem_tcp_get_business_card(418):
MPID_nem_tcp_init(377).............: gethostbyname failed, localhost (errno 3)
Solution for macOS
I stumbled upon this issue on macOS 10.12.1.
The solution is to add 127.0.0.1 computername.local to /etc/hosts. Your file will look more or less like this:
##
# Host Database
#
# localhost is used to configure the loopback interface
# when the system is booting. Do not change this entry.
##
127.0.0.1 localhost
127.0.0.1 computername.local
255.255.255.255 broadcasthost
::1 localhost
You can change/check your computer's name if you go to System Preferences > Sharing > Computer Name.
What worked for me was the following:
Make sure your hostname is the same for 1 and 2 below:
terminal hostname
"/etc/hosts" hostname
So if you type cat /etc/hosts in terminal it should look like:
// 127.0.0.1 my_hostname
My hostname was not the same for 1 and 2 for me. Once I changed them to match then my mpi program would execute.
To change your terminal hostname type the following:
sudo scutil --set HostName my_new_host_name
To change your /etc/hosts hostname type the following:
sudo nano /etc/hosts
and then add the line
127.0.0.1 my_new_hostname
This error indicates that there's a problem resolving localhost. Check your /etc/hosts file, make certain that you have localhost correctly defined here, it should be pointing to 127.0.0.1. Try using ssh to connect to localhost, make sure that works as well.
Being the question different, the answer is probably the same I gave time ago for OpenMPI:
gethostname() function missing in openMPI
The MPI portable solution is to use MPI_Get_processor_name()
adding -host localhost to the command line solved this for me. Suggested in https://github.com/pmodels/mpich/issues/4710#issuecomment-661933489
e.g.
mpiexec -host localhost -np 4 ./testExecutable
Maybe your /dev/shm is full, try to clean it.

mount: nfs access denied by server

Am trying to mount a NFS device in my linux machine.
My /etc/fstab is like this,
192.168.0.5:/volume2/Asterisk_Recordings /var/spool/newnfs nfs rsize=32768,wsize=32768,intr,noatime 1 0
My /etc/mtab is like this,
192.168.0.5:/volume2/Asterisk_Recordings /var/spool/newnfs nfs rw,addr=192.168.0.5 0 0
I have enabled NFS in my NAS device.
When i type mount " mount -t nfs -v 192.168.0.5:/volume2/Asterisk_Recordings /var/spool/newnfs/" I get like this,
mount.nfs: timeout set for Thu Aug 1 07:01:04 2013
mount.nfs: trying text-based options 'vers=4,addr=192.168.0.5,clientaddr=192.168.1.1'
mount.nfs: mount(2): Permission denied
mount.nfs: access denied by server while mounting 192.168.0.5:/volume2/Asterisk_Recordings
Any possible reasons?
Thanks in advance.
This error can also occur if the /etc/hosts file on the nfs server maps the hostname of the client to an incorrect IP address, or the IP address of the client to an incorrect hostname. It is quick and easy to check, so worth doing before looking for other problems. Note that, if you do have to change any entries then the nfs-server has to be stopped and re-started, as it reads the hosts file only when it is started.
Is there a config file on the NAS where to put allowances for clients? E.g. in debian based OS the config file is "/etc/exports" and you would put there "/volume2/Asterisk_Recordings 192.168.1.1(rw,sync)" and activate this with "exportfs -a" (your NAS may do this automatically if you update the config via a web interface, I guess.) Check also https://stackoverflow.com/questions/22246477/mounting-nfs-results-in-access-denied-by-server.
Remember to add IP addresses/hostnames of your NFS' clients to /etc/hosts.allow of NFS' server
nfs: clienthost2, clienthost2, clienthost3
You might restart nfs config and nfs service on the NFS server as well as run export again.
systemctl restart nfs-config.service
systemctl status nfs.service
exportfs -arv
I have a Debian 10 system with a Debian 10 VM running inside it. I wanted to access a physical partition from the hard drive on the VM. I mounted the physical drive on the host and exported it. I was not able to mount it on the guest continually getting a access denied error
The solution after many hours was to add the no_all_squash option in the exports file. This is supposed to be the default but I needed to add it explicitly. As soon as I did that the problem went away and I could mount the file system. Unfortunately I could not see the files on the fs.
/media/dev 192.168.100.0/24(rw,sync,no_subtree_check,no_root_squash,no_all_squash)
On the server I could see the files and on the host I could not.
I had to change the line to
/media/dev 192.168.100.0/255.255.255.0(rw,sync,no_subtree_check,no_root_squash,no_all_squash)
to see the actual files that were on the file sets
I saw this error presumably due to an older NFS client and adding -o nfsvers=3 fixed the issue for me e.g. mount -t nfs -o nfsvers=3 x.x.x.x:/nfs_mount /mnt/nfs_mount
Or in /etc/fstab
x.x.x.x://nfs_mount /mnt/nfs_mount nfs proto=tcp,port=2049,nfsvers=3 0 0
Ref: https://www.thegeekdiary.com/mount-nfs-access-denied-by-server-while-mounting-how-to-resolve/

Resources