Note: I'm currently using MPICH, but I'm open to switching to Open-MPI if what I want is easier in that MPI implementation.
I have a MPI program that I want to run across several nodes. However, these nodes are unfortunately not uniform with what's installed where on them. Unfortunately I do not have control over this. As a result, I need to set the definition of "home" individually for a couple of machines. Is there a way to do this in the hostfile?
Something like:
localhost
host1 STARTPATH=/path/for/host1
host2 STARTPATH=/path/for/host2
etc.
Within these directories everything is identical.
Related
Right now, I am using Horovod to run distributed training of my pytorch models. I would like to start using hydra config for the --multirun feature and enqueue all jobs with SLURM. I know there is the Submitid plugin. But I am not sure, how would the whole pipeline work with Horovod. Right now, my command for training looks as follows:
CUDA_VISIBLE_DEVICES=2,3 horovodrun -np 2 python training_script.py \
--batch_size 30 \
...
Say I want to use hydra --multirun to run several multi-gpu experiments, I want to enqueue the runs with slurm since my resources are limited and would be run sequentially most of the time and I want to use Horovod to synchronize gradients of my networks. Would this setup run out of the box? Would I need to specify CUDA_VISIBLE_DEVICES if slurm took care of the resources? How would I need to adjust my run command or other settings to make this setup plausible? I am especially interested in how the multirun feature handles GPU resources. Any recommendations are welcome.
The Submitit plugin does support GPU allocation, but I am not familiar with Horovod and have no idea if this can work in conjunction with it.
One new feature of Hydra 1.0 is the ability to set or copy environment variables from the launching process.
This might come in handy in case Horovod is trying to set some environment variables. See the docs for info about it.
I don't know what is a merit of a terminal multiplexer like screen or tmux compared to combination of a standard terminal application and a job control feature of a shell.
Typically good features of a terminal multiplexer are cited as follows:
persistance
multiple windows
session sharing
First two features are, however, achieved with a terminal application like iTerm2 and a job control feature of a shell like bash.
Session sharing is a novel feature but it seems to be required in a quite rare situation.
What is a merit of terminal multiplexer? Why do you use it?
I'm interested especially in its merit in daily task.
I can tell you from my perspective as a devops/developer.
Almost every day I have to deploy a bunch of apps (a particular version) on multiple servers. Handling that without something like Terminator or Tmux would be a real pain.
On a single window I can put something like 4 panes (four windows in one) and monitor stuff on 4 different servers...which by it self is a huge deal...without tabs or other terminal instances and what not....
On the first pane I can shutdown nginx, on the second server I can shut down all the processes with supervisord (process manager), and on the third pane I can do the deploy process...if I quickly need to jump to some other server I just use the fourth pane...
Some colleagues that only use a bunch of terminal instances can get really confused when they have to do a bunch of things quickly, constantly ssh-ing in and out ...and if they are not careful they can go to the wrong server because they switched to the wrong terminal instance and entered a command that wasn't meant for that particular server :)...
A terminal multiplexer like Tmux really does help me to be quick and accurate.
There is an package manager for Tmux, which lets you install plugins and really supercharge you terminal even more!
On a different note, a lot of people are using Tmux in combination with Vim...which lets you create some awesome things together...
All in all, those were my two cents on the benefit of using a terminal multiplexer...
I've got a Windows HPC Server running with some nodes in the backend. I would like to run Parallel R using multiple nodes from the backend. I think Parallel R might be using SNOW on Windows, but not too sure about it. My question is, do I need to install R also on the backend nodes?
Say I want to use two nodes, 32 cores per node:
cl <- makeCluster(c(rep("COMP01",32),rep("COMP02",32)),type="SOCK")
Right now, it just hangs.
What else do I need to do? Do the backend nodes need some kind of sshd running to be able to communicate each other?
Setting up snow on a Windows cluster is rather difficult. Each of the machines needs to have R and snow installed, but that's the easy part. To start a SOCK cluster, you would need an sshd daemon running on each of the worker machines, but you can still run into troubles, so I wouldn't recommend it unless you're good at debugging and Windows system administration.
I think your best option on a Windows cluster is to use MPI. I don't have any experience with MPI on Windows myself, but I've heard of people having success with the MPICH and DeinoMPI MPI distributions for Windows. Once MPI is installed on your cluster, you also need to install the Rmpi package from source on each of your worker machines. You would then create the cluster object using the makeMPIcluster function. It's a lot of work, but I think it's more likely to eventually work than trying to use a SOCK cluster due to the problems with ssh/sshd on Windows.
If you're desperate to run a parallel job once or twice on a Windows cluster, you could try using manual mode. It allows you to create a SOCK cluster without ssh:
workers <- c(rep("COMP01",32), rep("COMP02",32))
cl <- makeSOCKluster(workers, manual=TRUE)
The makeSOCKcluster function will prompt you to start each one of the workers, displaying the command to use for each. You have to manually open a command window on the specified machine and execute the specified command. It can be extremely tedious, particularly with many workers, but at least it's not complicated or tricky. It can also be very useful for debugging in combination with the outfile='' option.
Is this possible to do?
Conceptually, a solution should apply across a lot of possible configurations, ranging from two vim instances running in separate virtual terminals in panes in a tmux window, to being in separate terminals on separate machines in separate geographical regions, one or both connected over network (in other words, the vims are hosted by two separate shell processes, which they would already be under tmux anyhow).
The case that prompted me to ponder this:
I have two tmux panels both with vim open and I want to use the Vim yank/paste to copy across the files.
But it only works if I've got them both running in the same instance of Vim, so I am forced to either:
use tmux's copy/paste feature to get the content over (which is somewhat tedious and finicky), or
use the terminal (PuTTY, iTerm2)'s copy/paste feature to get the content over (which is similarly tedious but not subject to network latency, however this only works up to a certain size of text payload to copy at which point this method will not work at all due to the terminal not knowing the contents of the not-currently-visible parts of the file), or
lose Vim buffer history/context and possibly shell history/context in reopening the file manually in one of the Vim instances in either a split buffer or tab and then closing the other terminal context (much less tedious than 1 for large payloads but more so with small payloads).
This is a bit of a PITA and could all be avoided if I have the foresight of switching to an appropriate terminal already running vim to open my files but the destiny of workflow and habit rarely match up with that which would have been convenient.
So the question is, does there exist a command or the possibility of a straighforwardly-constructed (shell) script that allows me to join buffers across independently running vim instances? Am having a hard time getting Google to answer that adequately.
In the absence of an adequate answer (or if it is determined with reasonable certainty that Vim does not possess the features to accomplish the transfer of buffers across its instances), a good implementation (bindable to keys) for approach 3 above is acceptable.
Meanwhile I'll go back to customizing my vim config further and forcing myself to use as few instances of vim as possible.
No, Vim can't share a session between multiple instances. This is how it's designed and it doesn't provide any session-sharing facility. Registers, on-the-fly mappings/settings, command history, etc. are local to a Vim session and you can't realistically do anything about that.
But your title is a bit misleading: you wrote "buffer" but it looks like you are only after copying/pasting (which involves "register", not "buffers") from one Vim instance to another. Is that right? If so, why don't you simply get yourself a proper build with clipboard support?
Copying/yanking across instances is as easy as "+y in one and "+p in another.
Obviously, this won't work if your Vim instances are on different systems. In such a situation, "+y in the source Vim and system-provided paste in the destination Vim (possibly with :set paste) is the most common solution.
If you are on a Mac, install MacVim and move the accompanying mvim shell script somewhere in your path. You can use the MacVim executable in your terminal with mvim -v.
If you are on Linux, install the vim-gnome package from your package manager.
If you are on Windows, install the latest "Vim without Cream".
But the whole thing looks like an XY problem to me. Using Vim's built-in :e[dit] command efficiently is probably the best solution to what appears to be your underlying problem: editing many files from many different shells.
Is there a way to start run R interactively on a slave node?
Although we can login to the server using qlogin, this does not allow us to launch emacs + ess, which runs on the head node.
Thanks!
as per #newuser's request, I found the following parts of the PATH variable that are only found when on the head node, but are removed from path when on the slave node
/opt/eclipse:/opt/maven/bin:/opt/dell/srvadmin/bin
There is a fair chance that emacs is not installed on the slave nodes. You may need to discuss installation with your cluster admin. They will need to install Emacs on each node in order for you to use it there. Keep in mind, however, that this sort of breaks the typical grid paradigm where the master node controls jobs and the slave nodes are worker bees. Generally in this paradigm one does not edit files manually on the slave nodes. I suspect that is why emacs is not installed on the slaves.