Run MPI program without a X11 server? - mpi

I'm testing simple MPI programs locally on my Ubuntu Focal server (Open MPI 4.0.3). However whatever I run with mpirun it produces an annoying message No protocol specified. The problem appears to be related to the fact that mpirun is trying to connect to the X server. How can I disable this behavior so I can use mpirun without a X server ready? I primarily work over SSH (text-only, with tmux).
An example of what I'm doing:
ubuntu#iBug-Server:~$ cat test.c
#include <mpi.h>
int main(int argc, char **argv) {
MPI_Init(&argc, &argv);
// This is a stub program
MPI_Finalize();
return 0;
}
ubuntu#iBug-Server:~$ mpicc test.c
ubuntu#iBug-Server:~$ mpirun -np 2 a.out
No protocol specified
ubuntu#iBug-Server:~$
Update 1: It appears to be related to LightDM and Xorg. The unwanted message goes away after systemctl stop lightdm. Alternatively, running Open MPI in a graphical terminal (connected via VNC or RDP (xrdp), both work) also eliminates the message, as strace shows that the connection to the X server is successful.

Related

I get "There are not enough slots available in the system" when I run mpi

I am a high school student. An error occurred while studying and coding the basic theory of mpi. I searched on the internet and tried everything, but I couldn't understand it well.
The code is really simple. There is no problem with the code and I understood it well.
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[])
{
int num_procs, my_rank;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
printf("Hello world! I'm rank %d among %d processes.\n", my_rank, num_procs);
MPI_Finalize();
return 0;
}
But there was a problem with running mpi. It works well when i type it like this.
mpirun -np 2 ./hello
Hello world! I'm rank 1 among 2 processes.
Hello world! I'm rank 0 among 2 processes.
This error occurs at -np 3.
mpirun -np 3 ./hello
`There are not enough slots available in the system to satisfy the 3
slots that were requested by the application:
./hello
Either request fewer slots for your application, or make more slots
available for use.
A "slot" is the Open MPI term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which Open MPI processes are run:
1. Hostfile, via "slots=N" clauses (N defaults to number of
processor cores if not provided)
2. The --host command line parameter, via a ":N" suffix on the
hostname (N defaults to 1 if not provided)
3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
4. If none of a hostfile, the --host command line parameter, or an
RM is present, Open MPI defaults to the number of processor cores
In all the above cases, if you want Open MPI to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.
Alternatively, you can use the --oversubscribe option to ignore the
number of available slots when deciding the number of processes to
launch.
My laptop is Intel i5 and cpu core is 2 and 4 threads. Did such a problem happen because there were only 2 cores? I don't exactly understand this part.
There is not much data about mpi in Korea, so I always googling and studying. If that's the cause, is there any way to increase the number of processes? Other people wrote that there was an error in -np 17, how did they increase the process to double digits? Is the computer capable? Please explain it easily so that I can understand it well.
My laptop is Intel i5 and cpu core is 2 and 4 threads. Did such a problem happen because there were only 2 cores?
Yes. By default Open MPI uses the number of cores as slots. So since you only have 2 cores, you can only launch maximum of 2 processes.
If that's the cause, is there any way to increase the number of processes?
Yes, If you use --use-hwthread-cpus with your mpirun command you can use upto 4 mpi processes in your laptop since you have 4 threads in your laptop. Try running the command, mpirun -np 4 --use-hwthread-cpus a.out
Also, you can use --oversubscribe option to increase the number of processes greater than the available cores/threads. For example try this mpirun -np 10 --oversubscribe a.out

How to open a tty device in noncanonical mode on Linux using .NET Core

I'm using .NET Core on an embedded Linux platform with good success so far. I just ran into a problem with trying to open a tty device in raw (noncanonical mode) though. If I was using regular C or C++ I would call cfmakeraw() after opening the device, but how do I do that from a .NET Core app?
The device I need to work with is a CDC ACM function driver for the USB client connector, i.e. it's a virtual COM port. It appears in my system as /dev/ttyGS0. I can open the device and then read from it and write to it using this code:
FileStream vcom = new FileStream("/dev/ttyGS0", FileMode.Open);
Because the tty device opens in canonical mode by default I don't receive any characters until the user sends the carriage return character at the end of the line of text. I need to receive each character as it is sent, rather than waiting untill the carriage return is sent, i.e. I need to use raw mode for the tty device.
The code below does not work because .NET Core does not realize that this device is a virtual serial port, so it throws an exception when I try to open it this way. When I open the real UART devices using SerialPort then they do behave in raw mode as expected.
SerialPort serialPort = new SerialPort("/dev/ttyGS0);
Since you have a terminal device, you could try to alter its termios configuration prior to actually using it.
Try issuing the shell command stty -F /dev/ttyGS0 raw before you run your program.
The raw setting will make the following termios changes (according to the stty man page) for noncanonical mode:
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr -icrnl -ixon -ixany -ixoff -imaxbel
-opost
-icanon -isig -xcase -iuclc
min 1 time 0
Note that no c_cflag attributes (e.g. baudrate, parity, character size) nor echo attributes (as you already know) are changed by the raw setting.
For comparison the libc cfmakeraw() routine that you mention makes the following termios settings:
t->c_iflag &= ~(IGNBRK|BRKINT|PARMRK|ISTRIP|INLCR|IGNCR|ICRNL|IXON);
t->c_oflag &= ~OPOST;
t->c_lflag &= ~(ECHO|ECHONL|ICANON|ISIG|IEXTEN);
t->c_cflag &= ~(CSIZE|PARENB);
t->c_cflag |= CS8;
t->c_cc[VMIN] = 1; /* read returns when one char is available. */
t->c_cc[VTIME] = 0;
You can use stty -F /dev/ttyGS0 sane to restore the terminal to a default termios configuration.

MPI (OpenMPI) - MPI_Publish_name cannot contact global ompi-server and throws error

I am attempting to write an MPI application that would consist of programs in the server client mould. I am stuck trying to get the server to publish its name to the ompi-server in the global scope
Here is the server code:
int main(int argc, char** argv) {
int myrank, nprocs, errmpi;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
char port_name[MPI_MAX_PORT_NAME];
MPI_Info info;
MPI_Info_create(&info);
MPI_Info_set(info, "ompi_global_scope", "yes");
MPI_Open_port(info, port_name);
//Fails here
MPI_Publish_name("ServerName", info, port_name);
// Rest of code...
I get the following error on running it:
$ ./mpi/bin/mpirun -np 1 --mca btl self ServerName
--------------------------------------------------------------------------
Process rank 0 attempted to publish to a global ompi_server that
could not be contacted. This is typically caused by either not
specifying the contact info for the server, or by the server not
currently executing. If you did specify the contact info for a
server, please check to see that the server is running and start
it again (or have your sys admin start it) if it isn't.
--------------------------------------------------------------------------
[xxx:18205] *** An error occurred in MPI_Publish_name
[xxx:18205] *** reported by process [1424949249,139676631433216]
[xxx:18205] *** on communicator MPI_COMM_WORLD
[xxx:18205] *** MPI_ERR_INTERN: internal error
[xxx:18205] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[xxx:18205] *** and potentially your MPI job)
I do have the ompi-server process running in debug mode on console
$ ./ompi-server --no-daemonize -d -r +
[xxx:14140] [[9416,0],0] orte-server: up and running!
Ultimately I will distribute the processes across various nodes, but for now I would really like to get the framework working on a single node. Could someone please help? Thanks very much indeed!
EDIT 1: Thank you very much for your quick reply. I made the following changes
$mpi/bin/ompi-server --no-daemonize -d -r mpiuri
If I now run the program so, I find the program hangs at the point where it previously fails
$./mpi/bin/mpirun --ompi-server file:mpiuri -mca btn tcp,self,sm -np 1 -v Server
While if I run the program with the following,
$ ./mpi/bin/mpirun --ompi-server file:mpiuri -mca btn tcp,self,sm -np 1 -v --wait-for-server --server-wait-time 10 Server
With the following error
--------------------------------------------------------------------------
mpirun was instructed to wait for the requested ompi-server, but was unable to
establish contact with the server during the specified wait time:
Server uri: 799801344.0;tcp://192.168.1.113:44487
Timeout time: 10
Error received: Not supported
Please check to ensure that the requested server matches the actual server
information, and that the server is in operation.
--------------------------------------------------------------------------
I must be close... but I cant quite figure it
I am fairly sure it is not the firewall, since I added the rule ALLOW 192.168.1.0/24 to ufw
Here is how to connect with the ompi-server
1) Ensure that ompi server is up and running, and is writing its uri to a file with the following command
$mpi/bin/ompi-server --no-daemonize -d -r mpiuri
2) Start all the mpi processes with this uri file, ensuring that you
prefix the uri filename with "file:" when you enter the
--ompi-server parameter
enter the hostname of the the node where you are run mpirun ... like so
$./mpi/bin/mpirun --ompi-server file:mpiuri -host myHostName -np 1 -v Server

Why doesn't U-Boot disable the console output

I have this system which is accessed by a serial Debug Port. I want to disable all of the output, that was made during the U-Boot boot. Therefore there is the
setenv silent 1
parameter, which i put into the BOOTCMD string like:
#define CONFIG_BOOTCOMMAND " setenv silent 1;" \
"bootm "
and there is the
#define CONFIG_SILENT_CONSOLE
command, neither one is working (the lines printed out are still the same and the boot time didn't change). Does somebody see the error ?
For my target, U-Boot baseline 2013.10, silent environment variable works at kernel boot time, but it needed more defines:
#define CONFIG_SILENT_CONSOLE
#define CONFIG_SYS_DEVICE_NULLDEV
#define CONFIG_SILENT_CONSOLE_UPDATE_ON_SET
That also killed kernel serial console after successful boot, until I added
#define CONFIG_SILENT_U_BOOT_ONLY
Refer to README.silent for more info.
U-Boot is doing exactly what it should (silencing the output) with the following command:
#define CONFIG_EXTRA_ENV_SETTINGS \
"silent=1\0" \
see also

MPI_Barrier doesn't work properly in Ubuntu

I'm a beginner in using MPI. Here I wrote a very simple program to test if MPI can run. Here is my hello.c:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[]) {
int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
MPI_Barrier(MPI_COMM_WORLD);
printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);
MPI_Finalize();
}
I use to node to test, the hostfile is: node1 node2
So I have two machines with name node1 and node2. I can ssh to each other without password.
I launch the program by typing: mpirun -np 2 -f hostfile ./hello.
The executable hello is in the same directory in both machine.
Then after I run, I get an error:
Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(425).........: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(331)....: Failure during collective
MPIR_Barrier_impl(313)....: MPIR_Barrier_intra(83)....:
dequeue_and_set_error(596): Communication error with rank 0 Fatal
error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(425).........: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(331)....: Failure during collective
MPIR_Barrier_impl(313)....: MPIR_Barrier_intra(83)....:
dequeue_and_set_error(596): Communication error with rank 1
If I comment out the MPI_Barrier(), it can work properly. It seems the communication between machines has problem? Or I didn't install openmpi correctly? Any ideas?
I'm using Ubuntu 12.10
I got some hints: This doesn't work well in MPICH2, if I use openmpi, then it works. I installed MPICH just by sudo apt-get install mpich2. Do I miss something? The size of mpich2 is much smaller than openmpi
In /etc/hosts, newer versions of some Linux distros add the following types of lines at the top of the file:
127.0.0.1 localhost
127.0.0.1 [hostname]
This should be changed so that the hostname line contains your actual IP address. The MPI hydra process will abort if you do not make this change with errors like:
Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(425)...........: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(292)......:
MPIR_Barrier_or_coll_fn(121):
MPIR_Barrier_intra(83)......:
dequeue_and_set_error(596)..: Communication error with rank 0

Resources