Create a shell alias that R can recognise - r

I have an alias (actually, a function) on my .bashrc but R doesn't seem to recognise it.
fun() {
echo "Hello"
}
which runs correctly when I log in via ssh (it's a remote server). However, if I run system("fun"), I get
sh: 1: fun: not found
Warning message:
In system("fun") : error in running command
From a comment on this question I can get system("bash -i -c fun") to work, although with a weird warning/message
bash: cannot set terminal process group (27173): Inappropriate ioctl for device
bash: no job control in this shell
Hello
However, this doesn't apply to my case, because I'm running external code so I cannot modify the system() call. I need system("command") to use the command command that I defined.
BTW, this is all running on Linux (Debian in the remote server, but I also tried on my machine running elementary OS with the same result).

Related

Unable to export env variable from script

I'm currently struggling with running a .sh script I'm trying to trigger from Jenkins.
Within the Jenkins "execute shell" section, I'm connecting to a remote server (The Jenkins agent does not have right OS to build what I need.), using:
cp -r . /to/shared/drive/to/have/access/on/remote
ssh -t -t username#servername << EOF
cd /to/shared/drive/to/have/access/on/remote
source build.sh dev
exit
EOF
Inside build.sh, I'm exporting R_LIBS to build a package for different R versions.
...
for path in "${!rVersionPaths[#]}"; do
export R_LIBS="${path}"
Rscript -e 'install.packages(c("someDependency", "someOtherDependency"), repos="http://cran.r-project.org");'
...
Setting R_LIBS should functions here like setting lib within install.packages(...). For some reason the R_LIBS export doesn't get picked up. Also setting other env variables like http_proxy are ignored. This causes any requests outside the network to fail.
Is there any particular way of achieving this?
Maybe pass those variables with env, like
env R_LIBS="${path}" Rscript -e 'install.packages(c("someDependency", .....
Well i'm not able to comment on the question, so posting it as answer.
I had similar problem when calling remote shell script from Jenkins, the problem was somehow bash_profile variables were not loaded when called the script from Jenkins but locally it worked. Loading the bash profile in ssh connection solved it for me.
Add source to bash_profile in build.sh
. ~/.bash_profile OR source ~/.bash_profile
Or
Reload bash_profile in ssh connection
`ssh -t -t username#servername << EOF
. ~/.bash_profile
your commands here
exit
EOF
You can set that variable in the same command line like this:
R_LIBS="${path}" Rscript -e \
'install.packages(c("someDependency", "someOtherDependency"), repos="http://cran.r-project.org");'
It's possible to append more variables in this way. Note that this will set those environment variables only for the command being called after them (and its children processes as well).
You said that "R_LIBS export doesn't get picked up". Question Is the value UNSET? Or is it set to some other value & you are trying to override it?
It is possible that SSH may be invoking "/bin/sh -c". Based on the second answer to: Why does 'cd' command not work via SSH?, you can simplify the SSH command and explicitly invoke the build.sh script in Bash:
cp -r . /to/shared/drive/to/have/access/on/remote
ssh -t -t username#servername "cd /to/shared/drive/to/have/access/on/remote && bash -f build.sh dev"
This makes the SSH invocation more similar to invoking the command within a remote interactive shell. (You can avoid sourcing scripts and exporting variables.)
You don't need to export R_LIBSor env R_LIBS when it is possible to prefix any command with local environment variable overrides (agrees with Luis' answer):
...
for path in "${!rVersionPaths[#]}"; do
R_LIBS="${path}" Rscript -e 'install.packages(c("someDependency", "someOtherDependency"), repos="http://cran.r-project.org");'
...
The Rscript may be doing a lot with env vars. You can verify that you are setting the R_LIBS env var by replacing Rscript with the env command and observe the output:
...
for path in "${!rVersionPaths[#]}"; do
R_LIBS="${path}" env
...
According to this manual "Initialization at Start of an R Session", Rscript looks in several places to load "site and user files":
$R_PROFILE
$R_HOME/etc/Renviron
$R_HOME/etc/Renviron.site
$R_ENVIRON_USER
$R_PROFILE_USER
./.Rprofile
$HOME/.Rprofile
./.RData
The "Examples" section of that manual shows this:
## Not run:
## Example ~/.Renviron on Unix
R_LIBS=~/R/library
PAGER=/usr/local/bin/less
If you add the --vanilla command-line option to ignore all of these files, then you may get different results and know something in the site/init/environ files is affecting your R_LIBS! I cannot run this system myself. Hopefully we have given you some areas to investigate.
You probably don't want to source build.sh, just invoke it directly (i.e. remove the source command).
By source-ing the file your script is executed in the SSH shell (likely sh) rather than by bash, which it sounds like is what you intended.

UNIX commands from R via shell function

I need to issue unix commands from an R session. I'm on Windows R2 2012 server using RStudio 1.1.383 and R 3.4.3.
The shell() function looks to be the right one for me but when I specify the path to my bash shell (from Git for Windows install) the command fails with error code 127.
shell_path <- "C:\\Program Files\\Git\\git-bash.exe"
shell("ls -a", shell = shell_path)
## running command 'C:\Program Files\Git\git-bash.exe /c ls -a' had status 127'ls -a' execution failed with error code 127
Pretty sure my shell path is correct:
What am I doing wrong?
EDIT: for clarity I would like to pass any number of UNIX commands, I am just using ls -a for an example.
EDIT:
After some playing about 2018-03-09:
shell(cmd = "ls -a", shell = '"C:/Program Files/Git/bin/bash.exe"', intern = TRUE, flag = "-c")
The correct location of my bash.exe was at .../bin/bash.exe. This uses shell with intern = TRUE to return the output as an R object. Note the use of single quote marks around the shell path.
EDIT: 2018-03-09 21:40:46 UT
In RStudio we can also call bash using knitr and setting chunk options:
library(knitr)
```{bash my_bash_chunk, engine.path="C:\\Program Files\\Git\\bin\\bash.exe"}
# Using a call to unix shell
ls -a
```
Two things stand out here. Bash will return exit code 127 if a command is not found; you should try running the fully qualified command name.
I also see that your shell is being run with a /c flag. According to the documentation, the flag argument specifies "the switch to run a command under the shell" and it defaults to /c, but "if the shell is bash or tcsh or sh the default is changed to '-c'." Obviously this isn't happening for git-bash.exe.
Try these changes out:
shell_path <- "C:\\Program Files\\Git\\git-bash.exe"
shell("/bin/ls -a", shell = shell_path, flag = "-c")
Not on Windows, so can't be sure this will work.
Perhaps you need to use shQuote?
shell( paste("ls -a ", shQuote( shell_path) ) )
(Untested. I'm not on Windows. But do read ?shQuote))
If you just want to do ls -a, you can use the below commands:
shell("'ls -a'", shell="C:\\Git\\bin\\sh.exe")
#or
shell('C:\\Git\\bin\\sh.exe -c "ls -a"')
Let us know if the space in "Program Files" is causing problems.
And if you require login before you can call your command,
shell('C:\\Git\\bin\\sh.exe --login -c "ls -a"')
But if you are looking at performing git commands from R, the git2r by ropensci might suit your needs.

R Parallel - connecting to remote cores

Working in R 2.14.1, on Windows 7
Using the package parallel in R, I'm trying to take advantage of cores outside of my local machine available on my network, where all remote hosts I am connecting to are identical Windows machines.
The basic form of the commands are as such to make the connection.
library(parallel)
#assume 8 cores per machine
cl<-makePSOCKcluster(c(rep("localhost", 8), rep("otherhost", 8)))
Of course, trying to debug these things can be pretty tricky, but here is where I'm at with it.
If I specify the manual = TRUE flag as below
cl<-makePSOCKcluster(c(rep("localhost", 8), rep("otherhost", 8)), manual=TRUE)
there are no problems connecting to the remote host, and running a parallel process. The computers have identical setups to the one that I am working on. Yet, when this manual flag is not set, the connection command hangs.
This seems to indicate to me that since the manual flag bypasses ssh to make the connection to the host, that ssh is the problem when manual=FALSE.
It is not guaranteed at the moment that the remote computers have ssh on them. The question is, given that I have all the pertinent windows login information for my remote hosts, and that I cannot change the settings on the remote computers, how would I connect to cores on remote machines with the package parallel in R without specifying manual = true?
Alternatively, if ssh must be installed for this to happen, let's assume all computers have ssh on them. How would I connect to cores on the remote machines without circumventing ssh?
If you need any more information please let me know, I appreciate the time.
UPDATE 1
8-26-14
Thanks to Steve Weston for his insights. I will provide an update with the exact tools and setup I use to get my system working when it's up and running.
Feel free to comment or post if you have anything else to add as to what may be the best route to go in remote connecting to a windows machine from a windows machine via makePSOCKcluster, where the manual flag is set to FALSE.
When creating a PSOCK cluster with manual=FALSE, the only way to start a worker on a remote machine is with "ssh", "rsh", or something command-line compatible, such as "plink" from PuTTY. The reason is that makePSOCKcluster starts the remote workers using the "system" function to execute commands of the form:
ssh -l user otherhost '/usr/lib/R/bin/Rscript' -e 'parallel:::.slaveRSOCK()' MASTER=myhost PORT=10187 OUT=/dev/null TIMEOUT=2592000 METHODS=TRUE XDR=TRUE
You can confirm this by looking at the source code for the newPSOCKnode function in the file snowSOCK.R from the parallel package.
For this to work, the ssh-compatible command must be available on the local machine and a corresponding ssh daemon must be running on each of the remote machines, otherwise makePSOCKcluster will simply hang. I've found that installing a good, working ssh daemon is the difficult part on Windows.
Unfortunately, manual=TRUE is generally the easiest way to create a PSOCK cluster on multiple Windows machines.
Helle everyone, I had the same problem and I managed to solve it. It is June 2018 when I'm writing this answer, my OS is windows 10 and the R version is 3.2.2. It is surprising to see this problem still exists after 4 years. I hope it can be fixed in the following release.
Before you move on, please make sure you can access the server in cmd using ssh. I didn't put any password in my code because I have the private key, you don't need to do that and you will see the reason later.
Fixing The problem
File directory
Since the function makePSOCKcluster works when manually start the workers, my first trying is to let manual=TRUE, and see what's the output. Here is my result:
machineAddresses <-list(list(host='192.168.1.220',user='jeff'))
cl <- makePSOCKcluster(spec,manual = F)
> Manually start worker on 192.168.1.220 with
"C:/PROGRA~1/R/R-32~1.2/bin/x64/Rscript" -e
"parallel:::.slaveRSOCK()" MASTER=DESKTOP-U5JA32O PORT=11756
OUT=/dev/null TIMEOUT=2592000 METHODS=TRUE XDR=TRUE
Ok, Here is the first problem. The Rscript location is incorrect(The location of Rscript in the server). Generally, it locates in C:\Program Files. In my server is C:\Program Files\R\R-3.2.2\bin. So we need to correct them by adding more option to tell this stupid code where the Rscript is:
machineAddresses <-list(list(host='192.168.1.220',
user='jeff',rscript="C:/Program Files/R/R-3.3.2/bin/Rscript"))
CMD problem
Once you fix the directory problem, you will find that the code still hangs forever. Then we need to check if we can manually access the server in R, my code is:
system("ssh jeff#192.168.1.220")
> GetConsoleMode on STD_INPUT_HANDLE failed with 6
I honestly don't know what does this error mean, but we just need to fix that. Inspired by #Steve Weston, I decide to use PuTTY, so I install it, and change my code to:
machineAddresses <-list(list(host='192.168.1.220',user='jeff',rscript="C:/Program Files/R/R-3.3.2/bin/Rscript",rshcmd="plink -pw qwer"))
The option -pw means the password. Because I'm a newbie to PuTTY, I don't know how to let the private key automatically work in PuTTY. Therefore, I use the easiest way to deal with that: put your password! The above code is equivalent to the following in cmd:
plink -pw qwer jeff#192.168.1.220 Rscript -e parallel:::.slaveRSOCK() MASTER=DESKTOP-U5JA32O PORT=11063 OUT=/dev/null TIMEOUT=2592000 METHODS=TRUE XDR=TRUE
And this is exactly what we will do if we manually create the workers. For those who are new like me, you need to add the PuTTY directory in PATH in your environmental variables to run plink. Here are my final codes:
machineAddresses <-list(list(host='192.168.1.220',user='jeff',rscript="C:/Program Files/R/R-3.3.2/bin/Rscript",rshcmd="plink -pw qwer"))
cl <- makePSOCKcluster(machineAddresses,manual = F)
I run it with no problem at all. In summary, the function makePSOCKcluster makes two mistakes:
Assuming a wrong R directory in the server(At least it should assume the same directory as my local computer, but it didn't! I don't know where that strange directory comes from)
Using ssh command to start the connection, which does not work in R. It works well in cmd, but not in R. I don't know the reason.
If you are still not able to use makePSOCKcluster, here is one trick: Try to connect to the server in R using system function first. It can give you some error code, that may instruct you where the problem is. Here is my debugging code:
system("plink -pw qwer jeff#192.168.1.220 Rscript -e parallel:::.slaveRSOCK() MASTER=DESKTOP-U5JA32O PORT=11063 OUT=/dev/null TIMEOUT=2592000 METHODS=TRUE XDR=TRUE")

Redirect not working correctly, 2> /dev/null becomes 2 > /dev/null and stderr doesn't get redirected

I am hoping someone can help me figure out what setting I might need to overwrite. I am working on a Unix terminal server, running a Linux Xterm linux shell. Everytime I use a command like grep "blah" 2> /dev/null at the shell prompt, the command is run as grep "blah" 2 > /dev/null and needless to say the redirection fails.
xterm version is X.Org 6.8.99.903(238)
I can not update or install anything, this is a locked down production server.
Thanks for any help and illumination on the topic, it is making my grep useless at high directory levels with recursion.
That's Bourne shell syntax, and it doesn't work in c-shell.
The best you can do is
( command >stdout_file ) >&stderr_file
Where you get stdout to one file, and stderr to another. Redirecting just stderr is not possible.
In a comment, you say "A minor note, this is csh". That's not a minor note, that's the cause of the problem. xterm is just a terminal emulator, not a shell; all it does is set up a window that provides textual input and output. csh (or bash, or ...) is the shell, the program that interprets the commands you type.
csh has different syntax for redirection, and doesn't let you redirect just stderr. command > file redirects stdout; command >& file redirects both stdout and stderr.
You say the system doesn't have bash, but it does have ksh. I suggest just using ksh; it will be a lot more familiar to you. Both bash and ksh are derived from the old Bourne shell.
All (?) Unix-like systems will have a Bourne-like shell installed as /bin/sh. Even if you're using csh (or tcsh?) as your interactive shell, you can still invoke sh, even in a one-liner. For example:
sh -c 'command 2>/dev/null'
will invoke sh, which in turn will invoke command and redirect just its stderr to /dev/null.
The purpose of an interactive shell is (mostly) to let you use other commands that are available on the system. sh, or any shell, can be used as just another command.

Why OpenMPI uses a different server given a different -n setting?

I am testing out OpenMPI, provided and compiled by another user, (I am using soft link to his directories for all bin, include, etc - all the mandatory directories) but I ran into this weird thing:
First of all, if I ran mpirun with -n setting <= 10, I can run this below. testrunmpi.py simply prints out "run." from each core.
# I am in serverA.
bash-3.2$ /home/karl/bin/mpirun -n 10 ./testrunmpi.py
run.
run.
run.
run.
run.
run.
run.
run.
run.
run.
However, when I tried running -n more than 10, I will run into this:
bash-3.2$ /home/karl/bin/mpirun -n 24 ./testrunmpi.py
karl#serverB's password: Could not chdir to home directory /home/karl: No such file or directory
bash: /home/karl/bin/orted: No such file or directory
--------------------------------------------------------------------------
A daemon (pid 19203) died unexpectedly with status 127 while attempting
to launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
bash-3.2$
bash-3.2$
Permission denied, please try again.
karl#serverB's password:
Permission denied, please try again.
karl#serverB's password:
I see that the work is dispatched to serverB, while I was on serverA. I don't have any account on serverB. But if I invoke mpirun -n <= 10, the work will be on serverA.
This is strange, so I checked out /home/karl/etc/openmpi-default-hostfile, and tried set the following:
serverA slots=24 max_slots=24
serverB slots=0 max_slots=32
But the problem persists and still gives out the same error message above. What must I do in order to have my program run on serverA only?
The default hostfile in Open MPI is system-wide, i.e. its location is determined while the library is being built and installed and there is no user-specific version of it. The actual location can be obtained by running the ompi_info command like this:
$ ompi_info --param orte orte | grep orte_default_hostfile
MCA orte: parameter "orte_default_hostfile" (current value: <LOOK HERE>, data source: default value)
You can override the list of hosts in several different ways. First, you can provide your own hostfile via the -hostfile option to mpirun. If so, you don't have to put hosts with zero slots inside it - simply omit machines that you have no access to. For example:
localhost slots=10 max_slots=10
serverA slots=24 max_slots=24
You can also change the path to the default hostfile by setting the orte_default_hostfile MCA parameter:
$ mpirun --mca orte_default_hostfile /path/to/your/hostfile -n 10 executable
Instead of passing each time the --mca option, you can set the value in an exported environment variable called OMPI_MCA_orte_default_hostfile. This could be set in your shell's dot-rc file, e.g. in .bashrc if using Bash.
You can also specify the list of nodes directly via the -H (or -host) option.

Resources