Open MPI on an hybrid cluster - mpi

I have set up a two node NFS cluster for running Open MPI applications.
One of them has Intel processors (the master) while the other one has AMD processors. Both are quadcore.
I have installed Open MPI on the same location on both systems (in /use/local)
I am trying to run a simple Hello World example with 4 processes(two on each).
I have tried and tested running these applications locally on both nodes. They run perfectly well.
However, if I pass on a hostfile consisting of these nodes while running (I use mpiexec) it throws errors.
If I compile on Intel, the job does not run on AMD node.
If I compile on AMD, the job does not run on Intel node.
Where could be the problem?

Related

Qt Rendering on headless compute cluster

My simulation software based on Qt uses OpenGL rendering. I would like to run the software on a remote compute cluster. Since, the cluster nodes have no monitors attached, I want to "virtualize" a monitor so that the Qt drivers can find an X-server and run corresponding commands. Some solutions (e.g. https://towardsdatascience.com/how-to-run-unity-on-amazon-cloud-or-without-monitor-3c10ce022639) suggest to configure a virtual display device for X server, but I don't have root access on the cluster nodes.

Cordapp tutorial crashing in a Fedora VirtualBox Machine

I have downloaded the Cordapp example provided in the Corda website. I follow all the steps (to run it from the console) in
https://docs.corda.net/tutorial-cordapp.html
without any problem until "Running the example CorDapp". Here i get to errors one way or another.
First, when running
workflows-kotlin/build/nodes/runnodes
one or more of the nodes would not start. I was using a virtual machine with 2 cores and 4GB of RAM. Eventually, i noticed it seemed to be an issue with the RAM, so i changed the VM condig to 4 cpus and 10 GB of RAM.
Now, i can run
workflows-kotlin/build/nodes/runnodes
and get all 4 nodes working but, as soon as I run the following instruction
/gradlew runPartyXServer
Where X=[A,B,C] for each of the possible nodes, after 20-30 seconds as much, the machine repently slows down and aborts.
The VM has Fedora 30, 4 cores and 10GB of RAM. It is empty except for what i downloaded for the tutorial. I cannot believe those are not enough resources to run the tutorial, Am i wrong? Do i need more? may it be another thing?
Any help is welcome.
== Solved ==
The issue were the resources. I jumped to 8 cores and 32GB and it ran. I will try at some point with 16GB. In any case, the problem, from my point of view, is that having those large hardware requirements, the tutorial should include a section describing the minimum setup needed to run it.
From the given information, I believe you had ran into a Memory issue.
According to our documentation, Corda has a suggested minimal requirement of 1GB of Heap and 2-3GB of Host RAM per node.
https://docs.corda.net/docs/corda-enterprise/4.4/node/sizing-and-performance.html#sizing
I would suggest either reduce the number of nodes hosted on a single machine or expand your RAM size of the VM

Can iarbuild run in parallel mode?

I am using iarbuild in command line to build my projects on a 8-core PC. The build speed is quite slow and it smells the multicore PC's is not fully utilized. Is there a build option that can make the build running in parallel mode? (Like in GNU make, there is a -j option)
I had an email from IAR last week
New version of IAR Embedded Workbench for ARM
Version 7.40 is now available
• Parallel build
The compiler can now run in several parallel processes to better use the available processor cores in the PC. To control parallel build, choose Tools>Options>Project>Enable parallel build.
I believe that this is also becoming available for other targets as I have seen similar for the MSP430.

Torque + mpirun + resources allocation

I'm running Torque with Open MPI on a single machine with 24 cores. Why is it possible to specify in my job,sh, for instance, nodes=1:ppn:2 and still be able to run a job specified by mpirun -np 12 WhatEverCommand? In such case the job is executed on 12 cores, even though the "nodes" says 2 cpus.
Doesn't specifying the "nodes" option make any restrictions on the resources to be used by the submitted job? If it doesn't, then how to prevent users from violating the server rules by overriding the declared resources?
On the other hand - specifying the nodes=1:ppn=8 and mpirun without "-np" option, gives me only 1 cpu running the job.
Am I that bad and missing something fundamental here?
By default, OpenMPI doesn't integrate with Torque at all. You have to compile OpenMPI using the --with-tm configure option, which doesn't seem to be enabled in most distro packages. The OpenMPI project mentions Torque integration in its FAQs on building and running OpenMPI.
Similarly, Torque doesn't actually restrict access to CPUs unless cpuset support is enabled. Again, this seems absent in most distro packages. This is why your OpenMPI app, when compiled without Torque integration, can hit all the cores without restriction.
Building both packages from source is not too difficult, so it's worth researching the configure options and building the support that makes sense for you.

Parallel R on a Windows cluster

I've got a Windows HPC Server running with some nodes in the backend. I would like to run Parallel R using multiple nodes from the backend. I think Parallel R might be using SNOW on Windows, but not too sure about it. My question is, do I need to install R also on the backend nodes?
Say I want to use two nodes, 32 cores per node:
cl <- makeCluster(c(rep("COMP01",32),rep("COMP02",32)),type="SOCK")
Right now, it just hangs.
What else do I need to do? Do the backend nodes need some kind of sshd running to be able to communicate each other?
Setting up snow on a Windows cluster is rather difficult. Each of the machines needs to have R and snow installed, but that's the easy part. To start a SOCK cluster, you would need an sshd daemon running on each of the worker machines, but you can still run into troubles, so I wouldn't recommend it unless you're good at debugging and Windows system administration.
I think your best option on a Windows cluster is to use MPI. I don't have any experience with MPI on Windows myself, but I've heard of people having success with the MPICH and DeinoMPI MPI distributions for Windows. Once MPI is installed on your cluster, you also need to install the Rmpi package from source on each of your worker machines. You would then create the cluster object using the makeMPIcluster function. It's a lot of work, but I think it's more likely to eventually work than trying to use a SOCK cluster due to the problems with ssh/sshd on Windows.
If you're desperate to run a parallel job once or twice on a Windows cluster, you could try using manual mode. It allows you to create a SOCK cluster without ssh:
workers <- c(rep("COMP01",32), rep("COMP02",32))
cl <- makeSOCKluster(workers, manual=TRUE)
The makeSOCKcluster function will prompt you to start each one of the workers, displaying the command to use for each. You have to manually open a command window on the specified machine and execute the specified command. It can be extremely tedious, particularly with many workers, but at least it's not complicated or tricky. It can also be very useful for debugging in combination with the outfile='' option.

Resources