How to set and use bash variables in R - r

I have a use case where I need to execute bash commands inside an R program. I can send and verify that bash commands are being executed, but for reasons I do not understand, I can't set and use variables. To begin with, a simple command works fine:
$ R
...
> system("ls")
1a.csv 1b.csv 2.csv 3.csv
[1] 0
>
Now moving on to the problem. I've tried as many approaches as I could find, but none seem to work:
> system("TEST_VAR=\"test_val\"")
[1] 127
Warning message:
In system("TEST_VAR=\"test_val\"") : 'TEST_VAR="test_val"' not found
> system("bash -c 'REPORT_S3=report_s3_test_val'")
[1] 0
> system("echo ${REPORT_S3}")
$REPORT_S3
[1] 0
> system('TEST_VAR=test_var')
[1] 127
Warning message:
In system("TEST_VAR=test_var") : 'TEST_VAR=test_var' not found
> Sys.setenv(TEST_VAR = "test_val")
> system("echo $TEST_VAR")
$TEST_VAR
[1] 0
> system("bash -c 'export TEST_VAR=\"test_val\"'")
[1] 0
> system(" echo ${TEST_VAR}")
$TEST_VAR
[1] 0
None of these attempts succeed.
What I need to do is set variables and subsequently use them to create successively more complex commands. This works fine in in bash. But I can't seem to get it to work in R, apparently for the reasons above.
REPORT_S3="s3://xxxxxxxx-reports/r/html/"$RMD_FILE"_"$EMAIL_ADDRESS"_"$CLOUDWATCH_UUID".html"
PRESIGNED_URL=$(aws s3 presign --expires-in 604800 $REPORT_S3)
JSON_STRING='xxxxx"$CLOUDWATCH_UUID"xxxxxxx"$PRESIGNED_URL".....'
echo $JSON_STRING > message.json
echo '{"ToAddresses":["'$EMAIL_ADDRESS'"],"CcAddresses":[],"BccAddresses":[]}' > destination.json
aws ses send-email --from xx#yyyyyyyy.co.nz --destination file://destination.json --message file://message.json --region ap-southeast-2
Perhaps there any other options for issuing bash commands other than system that would permit easier reuse of the original source bash code?

Environment variables are per process, and each system(..) call starts a new process. If you define and reference the variable in the same system call, it works fine:
> system('
var="foo"
echo "The variable is: $var"
')
The variable is: foo
If you put your entire script into a single system(..) call, instead of trying to run line by line, it should therefore work.
An alternative method is using Sys.setenv to set the variables in your current R process, so that future system() calls inherit it:
> Sys.setenv(var = "bar")
> system('echo "The variable is: $var"')
The variable is: bar
Obviously, since Sys.setenv is an R function, you must use R code to define your variables, and not rely on shell syntax like $(..)
PS: system() invokes sh and not bash, so all the code you pass it should be sh compatible.

Related

Make reading RDS from stdin independent whether the file was created via pipe or not

readRDS=function(file = pipe('cat', 'rb'), ...) {
base::readRDS(file, ...)
}
saveRDS=function(object, file = pipe('cat', 'wb'), ...) {
base::saveRDS(object, file, ...)
}
I use the above function to read an RDS file from stdin and write an RDS file to stdout.
See below, one readRDS call works, the other doesn't. It depends on whether the rds file is written using pipe or not.
$ lr -e 'source("saveRDS.R"); saveRDS(1:3, file="/tmp/readRDS.rds");'
$ lr -e 'source("readRDS.R"); print(readRDS());' < /tmp/readRDS.rds
Error in base::readRDS(file, ...) : unknown input format
$ lr -e 'source("saveRDS.R"); saveRDS(1:3);' > /tmp/readRDS.rds
$ lr -e 'source("readRDS.R"); print(readRDS());' < /tmp/readRDS.rds
[1] 1 2 3
Why does it depend on the way the file is written? Is there a way to make readRDS from stdin always work whether the file is written via pipe or not?
There seems to be a difference between compression between the two methods. Rather than reading from cat in the readRDS Try
readRDS=function(file = base::gzcon(base::file("stdin", "rb")), ...) {
base::readRDS(file, ...)
}
The gzcon will do the compression if the gz magic number is found, otherwise it will just pass through the file normally. This should allow things to work in both cases.

in R, invoke external program in path with spaces with command line parameters

A combination of frustrating problems here.
Essentially I want R to open an external program with command line parameters. I am currently trying to achieve it on a Windows machine, ideally it would work cross-platform.
The program (chimera.exe) is in a directory containing spaces: C:\Program Files\Chimera1.15\bin\
The command line options could be for instance a --nogui flag and a script name, so from the shell I would write (space-specifics aside):
C:\Program Files\Chimera1.15\bin\chimera.exe --nogui scriptfile
This works if I go in windows cmd.exe to the directory itself and just type chimera.exe --nogui scriptfile
Now in R:
I've been playing with shell(), shell.exec(), and system(), but essentially I fail because of the spaces and/or the path separators.
most of the times system() just prints "127" for whatever reason:
> system("C:/Program Files/Chimera1.15/bin/chimera.exe")
[1] 127`
back/forward slashes complicate the matter further but don't make it work:
> system("C:\Program Files\Chimera1.15\bin\chimera.exe")
Error: '\P' is an unrecognized escape in character string starting "C\P"
> system("C:\\Program Files\\Chimera1.15\\bin\\chimera.exe")
[1] 127
> system("C:\\Program\ Files\\Chimera1.15\\bin\\chimera.exe")
[1] 127
> system("C:\\Program\\ Files\\Chimera1.15\\bin\\chimera.exe")
[1] 127
When I install the program in a directory without spaces, it works. How can I escape or pass on the space in system() or related commands or how do I invoke the program otherwise?
Try system2 as it does not use the cmd line processor and use r"{...}" to avoid having to double backslashes. This assumes R 4.0 or later. See ?Quotes for the full definition of the quotes syntax.
chimera <- r"{C:\Program Files\Chimera1.15\bin\chimera.exe}"
system2(chimera, c("--nogui", "myscript"))
For example, this works for me (you might need to change the path):
R <- r"{C:\Program Files\R\R-4.1\bin\x64\Rgui.exe}" # modify as needed
system2(R, c("abc", "def"))
and when Rgui is launched we can verify that the arguments were passed by running this in the new instance of R:
commandArgs()
## [1] "C:\\PROGRA~1\\R\\R-4.1\\bin\\x64\\Rgui.exe"
## [2] "abc"
## [3] "def"
system
Alternately use system but put quotes around the path so that cmd interprets it correctly -- if it were typed into the Windows cmd line the quotes would be needed too.
system(r"{"C:\Program Files\Chimera1.15\bin\chimera.exe" --nogui myscript}")

How to pass bash variable into R script

I have a couple of R scripts that processes data in a particular input folder. I have a few folders I need to run this script on, so I started writing a bash script to loop through these folders and run those R scripts.
I'm not familiar with R at all (the script was written by a previous worker and it's basically a black box for me), and I'm inexperienced with passing variables through scripts, especially involving multiple languages. There's also an issue present when I call source("$SWS_output/Step_1_Setup.R") here - R isn't reading my $SWS_output as a variable, but rather a string.
Here's my bash script:
#!/bin/bash
# Inputs
workspace="`pwd`"
preprocessed="$workspace/6_preprocessed"
# Output
SWS_output="$workspace/7_SKSattempt4_results/"
# create output directory
mkdir -p $SWS_output
# Copy data from preprocessed to SWS_output
cp -a $preprocessed/* $SWS_output
# Loop through folders in the output and run the R code on each folder
for qdir in $SWS_output/*/; do
qdir_name=`basename $qdir`
echo -e 'source("$SWS_output/Step_1_Setup.R") \n source("$SWS_output/(Step_2_data.R") \n q()' | R --no-save
done
I need to pass the variable "qdir" into the second R script (Step_2_data.R) to tell it which folder to process.
Thanks!
My previous answer was incomplete. Here is a better effort to explain command line parsing.
It is pretty easy to use R's commandArgs function to process command line arguments. I wrote a small tutorial https://gitlab.crmda.ku.edu/crmda/hpcexample/tree/master/Ex51-R-ManySerialJobs. In cluster computing this works very well for us. The whole hpcexample repo is open source/free.
The basic idea is that in the command line you can run R with command line arguments, as in:
R --vanilla -f r-clargs-3.R --args runI=13 parmsC="params.csv" xN=33.45
In this case, my R program is a file r-clargs-3.R and the arguments that the file will import are three space separated elements, runI, parmsC, xN. You can add as many of these space separated parameters as you like. It is completely at your discretion what these are called, but it is required they are separated by spaces and there is NO SPACE around the equal signs. Character string variables should be quoted.
My habit is to name the arguments with suffix "I" to hint that it is an integer, "C" is for character, and "N" is for floating point numbers.
In the file r-clargs-3.R, include some code to read the arguments and sort through them. For example, my tutorial's example
cli <- commandArgs(trailingOnly = TRUE)
args <- strsplit(cli, "=", fixed = TRUE)
The rest of the work is sorting through the args, and this is my most evolved stanza to sort through arguments (because it looks for suffixes "I", "N", "C", and "L" (for logical)), and then it coerces the inputs to the correct variable types (all input variables are characters, unless we coerce with as.integer(), etc):
for (e in args) {
argname <- e[1]
if (! is.na(e[2])) {
argval <- e[2]
## regular expression to delete initial \" and trailing \"
argval <- gsub("(^\\\"|\\\"$)", "", argval)
}
else {
# If arg specified without value, assume it is bool type and TRUE
argval <- TRUE
}
# Infer type from last character of argname, cast val
type <- substring(argname, nchar(argname), nchar(argname))
if (type == "I") {
argval <- as.integer(argval)
}
if (type == "N") {
argval <- as.numeric(argval)
}
if (type == "L") {
argval <- as.logical(argval)
}
assign(argname, argval)
cat("Assigned", argname, "=", argval, "\n")
}
That will create variables in the R session named paramsC, runI, and xN.
The convenience of this approach is that the same base R code can be run with 100s or 1000s of command parameter variations. Good for Monte Carlo simulation, etc.
Thanks for all the answers they were very helpful. I was able to get a solution that works. Here's my completed script.
#!/bin/bash
# Inputs
workspace="`pwd`"
preprocessed="$workspace/6_preprocessed"
# Output
SWS_output="$workspace/7_SKSattempt4_results"
# create output directory
mkdir -p $SWS_output
# Copy data from preprocessed to SWS_output
cp -a $preprocessed/* $SWS_output
cd $SWS_output
# Loop through folders in the output and run the R code on each folder
for qdir in $SWS_output/*/; do
qdir_name=`basename $qdir`
echo $qdir_name
export VARIABLENAME=$qdir
echo -e 'source("Step_1_Setup.R") \n source("Step_2_Data.R") \n q()' | R --no-save --slave
done
And then the R script looks like this:
qdir<-Sys.getenv("VARIABLENAME")
pathname<-qdir[1]
As a couple of comments have pointed out, this isn't best practice, but this worked exactly as I wanted it to. Thanks!

R calls mGENOVA-an external Program

Recently I have been trying to use R to call a .exe program named mGenov It's command line program. I have some screenshots to help me explain this (I use Windows 10).
Supposedly, it works this way:
double click on mGenova,
type card.txt
hit "enter" the cmd window will close
I have tried a lot; basically they can invoke the program, but pass the command about typing card.txt in the command
shell(cmd="D:\\mgenova\\mGENOVA\\card.txt", shell="D:\\mgenova\\mGENOVA\\mGENOVA.exe",intern=F)
OR
system("\"D:\\mgenova\\mGENOVA\\mGENOVA.exe\" \"D:\\mgenova\\mGENOVA\\card.txt\""
,show.output.on.console=TRUE,invisible=T,intern=T)
And I always got this
[1] "Input the filename containing the control cards." "" "" "*** Control cards file is empty"
attr(,"status")
[1] 1
Warning message:
running command '"D:\mgenova\mGENOVA\mGENOVA.exe" "D:\mgenova\mGENOVA\card.txt"' had status 1
How can I get it run on it? Thanks for helping!!!!!
You could create a batchfile (let's name it batch.bat) on Windows with the content
cd /D D:\mgenova\mGENOVA\
mGENOVA.exe < card.txt
All necessary input for GENOVA must be provided by the file card.txt.
Then in R run the command
system("batch.bat")

Same Random Numbers Every Time

I'm running a script from the command line via
R CMD BATCH script.in.R script.out.R &
I have the following line, which picks 12 random row ids and sorts them:
test.index<-sort(sample(1:nrow(recoded),12))
It spits out the same 12 numbers every time if I don't change the script. If I change it a little bit (change a label or a string or anything) then the numbers are different...I need them to be different every time!
Any ideas?
This sounds weird. What's the rest of the script doing? If it calls (or some other function it calls) set.seed, that would explain things, but since you say changing (what I assume to be) the data, that would imply that the seed is set to some hash of your dataset?! Or is it if you change the script in any way?
Anyway, you can insert a line like rm(.Random.seed, envir=globalenv()) before your call to sample, which should reset the seed to a random one...
Another way is to generate a unique seed yourself. Here's one way based on time and process id.
set.seed( as.integer((as.double(Sys.time())*1000+Sys.getpid()) %% 2^31) )
You probably have a call to set.seed() in there. Here is an example:
$ Rscript -e 'runif(4)'
[1] 0.639716 0.976892 0.486573 0.525979
$ Rscript -e 'runif(4)'
[1] 0.516927 0.951013 0.931756 0.741650
$ Rscript -e 'runif(4)'
[1] 0.159682 0.314798 0.356476 0.584326
$ Rscript -e 'set.seed(42); runif(4)'
[1] 0.914806 0.937075 0.286140 0.830448
$ Rscript -e 'set.seed(42); runif(4)'
[1] 0.914806 0.937075 0.286140 0.830448
$
The first three all differ, then I enforce a common seed and presto the numbers are identical.
Also, Rscript is nicer than R CMD BATCH.
Check if you have not loaded any previous workspace. If you have, the previous seed is also loaded, thus giving you the same results.

Resources