Torque + mpirun + resources allocation - mpi

I'm running Torque with Open MPI on a single machine with 24 cores. Why is it possible to specify in my job,sh, for instance, nodes=1:ppn:2 and still be able to run a job specified by mpirun -np 12 WhatEverCommand? In such case the job is executed on 12 cores, even though the "nodes" says 2 cpus.
Doesn't specifying the "nodes" option make any restrictions on the resources to be used by the submitted job? If it doesn't, then how to prevent users from violating the server rules by overriding the declared resources?
On the other hand - specifying the nodes=1:ppn=8 and mpirun without "-np" option, gives me only 1 cpu running the job.
Am I that bad and missing something fundamental here?

By default, OpenMPI doesn't integrate with Torque at all. You have to compile OpenMPI using the --with-tm configure option, which doesn't seem to be enabled in most distro packages. The OpenMPI project mentions Torque integration in its FAQs on building and running OpenMPI.
Similarly, Torque doesn't actually restrict access to CPUs unless cpuset support is enabled. Again, this seems absent in most distro packages. This is why your OpenMPI app, when compiled without Torque integration, can hit all the cores without restriction.
Building both packages from source is not too difficult, so it's worth researching the configure options and building the support that makes sense for you.

Related

are compiler and shell internal part of Unix?

I had this question on my exam, now in diagrams I saw, we have : hardware, kernel, system call interface to the kernel, then (compilers, shells, sys.libs) and on top some applications. Does OS scope include only kernel, and everything else is just some additional functions we choose to install , or does a Unix OS include everything from the list I gave above?
OS have more or less 2 definitions :
academic : OS is soft for doing a abstraction layer between
hardware and software
pragmatic : OS is soft that come with hardware when we buy it.
Compiler and shell don't enter in definition 1. It can be enter in definition 2.
And usually, users that are interesting by a compiler or a shell prefer to consider OS as asbtraction layer (academic definition).
Simple answer, No. They are not an internal part of Unix but additional functionality to help make the Operating System more usable.
The OS scope applies primarily to the kernel only.
Whilst you need a compiler to build the kernel, you don't necessarily require one for the general day to day use of the system. Most operating systems don't ship the compiler by default and instead, the kernel and applications is built on one machine and then the resulting binarys are packaged and distributed either with the computer directly (Windows/Unix) or via the internet for others to download and install (Linux/BSD)
Likewise with the shell. Although all operating systems ship with a default one (sh/bash/dash on Linux|Unix systems, Command Prompt/Powershell on Windows), most general users can go their entire lives without using it.
Having said that, if you were to delete the shell, you'll almost certainly find your system won't boot up. This is because a lot of core start-up scripts rely on the shell to stop / start the services presenting interfaces between the user and the kernel.
In summary:
You need a compiler to build the kernel and applications but not for running the OS.
You need a shell to execute applications (which also includes the compiler)

How to profile an openmp code natively on Intel MIC?

I have an openmp code written in C. I executed the code on Intel MIC on Stampede. I want to profile the code to find the hotspots in the code so that it will be helpful for me to optimize the code further. I tried to use the profiler gprof but I read somewhere that gprof cannot be used on MIC directly. I tried to use perf by going through tutorial. I could go till a certain step after which when the perf annotate step comes and I execute the code, it gives me the error ")" unexpected. So I am not knowing how to proceed to profile my code. Can anybody please help ??
This is the site where I referred to the perf tutorial : sandsoftwaresound.net/perf/perf-tutorial-hot-spots/ .
80% of optimization for the Xeon Phi is the same as for the host (Xeon). Use gprof, printf, compiler options, and the rest of your toolkit and carry your optimization as far as you can executing your code on the host only. After you can do no more, then focus on specific Xeon Phi optimizations.
As you are on Stampede, I assume you are using the Intel compiler. The compiler has a lot of diagnostic capabilities to profile your code and even provide suggestions. I'd provide you with more specific URLs but am on vacation with limited bandwidth.
Though this isn't specific to your question, here are some other suggestions. If you aren't, you'll most likely get a substantial boost using it. Intel compilers are danged good at optimizations, especially on Intel architectures. Also, you should use Intel MKL where possible. All of MKL's routines are optimized for the different IA architectures, and the most relevant to HPC are optimized specifically for MIC.
You have a few options.
The heavyweight approach is to use Intel Vtune. Firstly add -g to your compiler flags.
I use Vtune from the host command line quite a bit, here is the command I use to profile an application on the MIC. (This is executed on the host machine, Vtune on the host uses ssh
to launch the application on the MIC.)
amplxe-cl -collect knc-hotspots -source-search-dir=/mysrc/dir -search-dir=/mybin/dir -- ssh mic0 /home/me/myapp
Assume the app on the MIC is at /home/me/myapp, and the source dir and source search dir on the host. (With Vtune update 15 at least, I need to specify both of these separately in order to get the Vtune GUI to show me symbol info)
Once your app has finished, run the Vtune GUI on the host with amplxe-gui and open your result set.
There are also some simplified open source profiling tools developed by Intel that support the MIC, Speedometer and Overhead, you can find information about them here
Hopefully this is enough info to get you started.

Parallel R on a Windows cluster

I've got a Windows HPC Server running with some nodes in the backend. I would like to run Parallel R using multiple nodes from the backend. I think Parallel R might be using SNOW on Windows, but not too sure about it. My question is, do I need to install R also on the backend nodes?
Say I want to use two nodes, 32 cores per node:
cl <- makeCluster(c(rep("COMP01",32),rep("COMP02",32)),type="SOCK")
Right now, it just hangs.
What else do I need to do? Do the backend nodes need some kind of sshd running to be able to communicate each other?
Setting up snow on a Windows cluster is rather difficult. Each of the machines needs to have R and snow installed, but that's the easy part. To start a SOCK cluster, you would need an sshd daemon running on each of the worker machines, but you can still run into troubles, so I wouldn't recommend it unless you're good at debugging and Windows system administration.
I think your best option on a Windows cluster is to use MPI. I don't have any experience with MPI on Windows myself, but I've heard of people having success with the MPICH and DeinoMPI MPI distributions for Windows. Once MPI is installed on your cluster, you also need to install the Rmpi package from source on each of your worker machines. You would then create the cluster object using the makeMPIcluster function. It's a lot of work, but I think it's more likely to eventually work than trying to use a SOCK cluster due to the problems with ssh/sshd on Windows.
If you're desperate to run a parallel job once or twice on a Windows cluster, you could try using manual mode. It allows you to create a SOCK cluster without ssh:
workers <- c(rep("COMP01",32), rep("COMP02",32))
cl <- makeSOCKluster(workers, manual=TRUE)
The makeSOCKcluster function will prompt you to start each one of the workers, displaying the command to use for each. You have to manually open a command window on the specified machine and execute the specified command. It can be extremely tedious, particularly with many workers, but at least it's not complicated or tricky. It can also be very useful for debugging in combination with the outfile='' option.

How stable are multiple instances of R when one instance is running an external program?

I am running an external program via R that is pretty memory hungry and can take >8 hours to run. I'd like to open up another instance of R to do other tasks but am concerned about crashing the external program and having to restart the process. Should I expect any problems under these circumstances? The external program is widows only and I'm running it on a Bootcamp partition on a MacBook Pro.
On a proper operating system, both instances will be independent and not interfere with each other. (Unless they compete for the same resources, from that does not seem to be the case from your description.)
This is no different than several users running on server and each running one or two instances...

Is it possible to run OpenStack on a laptop/desktop?

I have some questions:
Is it possible to install openstack on a Notebook with a 4GB DD3 Ram? Because the website says it needs atleast 8GB of RAM.
They say it requirs a double-QuadCore , I assue that means Octacore. Can we install that on a Quadcore?
They say that there is no possibility to install it on a NAS . Did you find any where if there is a possibility to do?. I dint find any even after asking our friend(google).
All in all, is it at-all possible to install on it a notebook/Desktop?
That advice is for production environments,
so 1)If you just want to play around your notebook will do fine. I had a succesful test-run on a 1.2 Ghz 1GB Netbook. It became incredibly slow when it launched it's first instance...
With a Double Quadcore they actually mean two seperate Quad-cores, as in two quad-core xeon processors on a single motherboard
So 2) yes you can install it on a quad-core.
3) a NAS device running openstack an openstack storage service seems to be unlikely indeed. You will most likely need more computing power.However If your NAS supports NFS or SSH or sth you can probably mount this drive and use it for storage.
4) You can perfectly build a all-in-one openstack test setup on your notebook. Performance will be low, but acceptable for testing.
It depends on what you mean by "install OpenStack". OpenStack itself is an extremely modular framework consisting on many services (Compute, Networking, Image service, Block Storage, Object Storage, Orchestration, Telemetry, ...). On top of that, a typical production deployment of OpenStack also requires several components, like load balancers, caching systems, firewalls, web servers and others. It is definitely possible to install a minimal openstack system, even on an average laptop.
The simplest way to run OpenStack on a laptop/desktop is to use Devstack, a shell script that installs all services from source and run them (by default) on a single machine. It is customizable enough to provide very good testing ground; it's used by OpenStack developers as well as the OpenStack QA team to test latest developments against "real" systems.
To avoid messing up your system, it's generally recommended to install OpenStack in a VM. From devstack doc:
DevStack should run in any virtual machine running a supported Linux release. It will perform best with 2Gb or more of RAM.
As of the time of this writing (Jan 2015), supported distros are:
Ubuntu (latest LTS)
Fedora
CentOS
Regarding NAS: you can of course use it, but "outside" Openstack apis, by providing mount points to your vms. It's even mandatory if you want to support live migration.

Resources