Low latency R submits - r

I have created some R codes, which accept a csv and produces and output, now I call these by:
Rscript code.R input.csv
Here code.R is the the code to be executed and input.csv is the file which it uses as input
Problem:
The script takes 5 seconds or more to produce results, that is because R is called from shell, the libraries need time to be loaded.
Question:
Is it possible to run R in background or as a service with all libraries loaded that I can just do submit my job and it just takes time to compute?
Full disclosure:
The script is a ML model which loads an .RDA object and the scripts calls predict function

Open your R console and run/load all the libraries that are required.
Use source() to now run your R scripts
Test.R file has the following code
#This file has no library declarations
c <- ggplot(mtcars, aes(factor(cyl)))
c <- c + geom_bar()
print(c)
Now I run from my console like this
> library(ggplot2)
> source("<Path>/test.R")
Output:
Edit: To pass Params along with source() command
You can do this by overriding commandArgs()
New test.R file code:
c <- ggplot(mtcars, aes(factor(cyl)))
c <- c + geom_bar()
print(c)
print(commandArgs())
Now from console:
> commandArgs <- function() c('a','b')
> source("<Path>/test.R")
[1] "a" "b"
(Along with the graph)

Related

R - Create a separate environment where to source() an R script, such that the latter does not affect the "caller" environment

Scenario: Let's say I have a master pipeline.R script as follows:
WORKINGDIR="my/master/dir"
setwd(WORKINDIR)
# Step 1
tA = Sys.time()
source("step1.R")
difftime(Sys.time(), tA)
# Add as many steps as desired, ...
And suppose that, within step1.R happens a:
rm(list=ls())
Question:
How can I separate the pipeline.R (caller) environment from the step1.R environment?
More specifically, I would like to run step1.R in separate environment such that any code within it, like the rm, does not affect the caller environment.
There are a few ways to call a R script and run it. One of them would be source().
Source evaluates the r script and does so in a certain environment if called so.
Say we have a Test.R script:
#Test.R
a <- 1
rm(list = ls())
b <- 2
c <- 3
and global variables:
a <- 'a'
b <- 'b'
c <- 'c'
Now you would like to run this script, but in a certain environment not involving the global environment you are calling the script from. You can do this by doing creating a new environment and then calling source:
step1 <- new.env(parent = baseenv())
#Working directory set correctly.
source("Test.R", local = step1)
These are the results after the run, as you can see, the symbols in the global environment are not deleted.
a
#"a"
b
#"b"
step1$a
#NULL
#rm(list = ls()) actually ran in Test.R
step1$b
#2
Note:
You can also run a R script by using system. This will, however, be ran in a different R process and you will not be able to retrieve anything from where you called the script.
system("Rscript Test.R")
We create the new.env
e1 <- new.env()
and use sys.source to source the R script with envir specifying as the 'e1' above
sys.source("step1.R", envir=e1)

Input parameters when calling a script in E

I have an rscript that looks like this (its different but this is for reproducing purposes :)).
#createOutputFunction.R
createOutput <- function(parameter1, parameter2){
x <- parameter1 + parameter2
print(x)
}
This works. But the thing is that I would like to call the parameters to execute the function. So basically want to be able to do:
source("createOutputFunction.R") and input the parameters directly
so that with only calling sourceOutputFunction i can get a different output, depending on the parameters I enter.
Any thoughts on how I can get this working?
Try this:
# main.R file
source("createOutputFunction.R")
args <- commandArgs(trailingOnly = TRUE)
createOutput(args[1], args[2])
# end of main.R file
Now run Rscript with input arguments that is passed on to main.R.
Rscript main.R 1 2
This should print out 3.

Passing command line arguments to R CMD BATCH

I have been using R CMD BATCH my_script.R from a terminal to execute an R script. I am now at the point where I would like to pass an argument to the command, but am having some issues getting it working. If I do R CMD BATCH my_script.R blabla then blabla becomes the output file, rather than being interpreted as an argument available to the R script being executed.
I have tried Rscript my_script.R blabla which seems to pass on blabla correctly as an argument, but then I don't get the my_script.Rout output file that I get with R CMD BATCH (I want the .Rout file). While I could redirect the output of a call to Rscript to a file name of my choosing, I would not be getting the R input commands included in the file in the way R CMD BATCH does in the .Rout file.
So, ideally, I'm after a way to pass arguments to an R script being executed via the R CMD BATCH method, though would be happy with an approach using Rscript if there is a way to make it produce a comparable .Rout file.
My impression is that R CMD BATCH is a bit of a relict. In any case, the more recent Rscript executable (available on all platforms), together with commandArgs() makes processing command line arguments pretty easy.
As an example, here is a little script -- call it "myScript.R":
## myScript.R
args <- commandArgs(trailingOnly = TRUE)
rnorm(n=as.numeric(args[1]), mean=as.numeric(args[2]))
And here is what invoking it from the command line looks like
> Rscript myScript.R 5 100
[1] 98.46435 100.04626 99.44937 98.52910 100.78853
Edit:
Not that I'd recommend it, but ... using a combination of source() and sink(), you could get Rscript to produce an .Rout file like that produced by R CMD BATCH. One way would be to create a little R script -- call it RscriptEcho.R -- which you call directly with Rscript. It might look like this:
## RscriptEcho.R
args <- commandArgs(TRUE)
srcFile <- args[1]
outFile <- paste0(make.names(date()), ".Rout")
args <- args[-1]
sink(outFile, split = TRUE)
source(srcFile, echo = TRUE)
To execute your actual script, you would then do:
Rscript RscriptEcho.R myScript.R 5 100
[1] 98.46435 100.04626 99.44937 98.52910 100.78853
which will execute myScript.R with the supplied arguments and sink interleaved input, output, and messages to a uniquely named .Rout.
Edit2:
You can run Rscript verbosely and place the verbose output in a file.
Rscript --verbose myScript.R 5 100 > myScript.Rout
After trying the options described here, I found this post from Forester in r-bloggers . I think it is a clean option to consider.
I put his code here:
From command line
$ R CMD BATCH --no-save --no-restore '--args a=1 b=c(2,5,6)' test.R test.out &
Test.R
##First read in the arguments listed at the command line
args=(commandArgs(TRUE))
##args is now a list of character vectors
## First check to see if arguments are passed.
## Then cycle through each element of the list and evaluate the expressions.
if(length(args)==0){
print("No arguments supplied.")
##supply default values
a = 1
b = c(1,1,1)
}else{
for(i in 1:length(args)){
eval(parse(text=args[[i]]))
}
}
print(a*2)
print(b*3)
In test.out
> print(a*2)
[1] 2
> print(b*3)
[1] 6 15 18
Thanks to Forester!
You need to put arguments before my_script.R and use - on the arguments, e.g.
R CMD BATCH -blabla my_script.R
commandArgs() will receive -blabla as a character string in this case. See the help for details:
$ R CMD BATCH --help
Usage: R CMD BATCH [options] infile [outfile]
Run R non-interactively with input from infile and place output (stdout
and stderr) to another file. If not given, the name of the output file
is the one of the input file, with a possible '.R' extension stripped,
and '.Rout' appended.
Options:
-h, --help print short help message and exit
-v, --version print version info and exit
--no-timing do not report the timings
-- end processing of options
Further arguments starting with a '-' are considered as options as long
as '--' was not encountered, and are passed on to the R process, which
by default is started with '--restore --save --no-readline'.
See also help('BATCH') inside R.
In your R script, called test.R:
args <- commandArgs(trailingOnly = F)
myargument <- args[length(args)]
myargument <- sub("-","",myargument)
print(myargument)
q(save="no")
From the command line run:
R CMD BATCH -4 test.R
Your output file, test.Rout, will show that the argument 4 has been successfully passed to R:
cat test.Rout
> args <- commandArgs(trailingOnly = F)
> myargument <- args[length(args)]
> myargument <- sub("-","",myargument)
> print(myargument)
[1] "4"
> q(save="no")
> proc.time()
user system elapsed
0.222 0.022 0.236
I add an answer because I think a one line solution is always good!
Atop of your myRscript.R file, add the following line:
eval(parse(text=paste(commandArgs(trailingOnly = TRUE), collapse=";")))
Then submit your script with something like:
R CMD BATCH [options] '--args arguments you want to supply' myRscript.R &
For example:
R CMD BATCH --vanilla '--args N=1 l=list(a=2, b="test") name="aname"' myscript.R &
Then:
> ls()
[1] "N" "l" "name"
Here's another way to process command line args, using R CMD BATCH. My approach, which builds on an earlier answer here, lets you specify arguments at the command line and, in your R script, give some or all of them default values.
Here's an R file, which I name test.R:
defaults <- list(a=1, b=c(1,1,1)) ## default values of any arguments we might pass
## parse each command arg, loading it into global environment
for (arg in commandArgs(TRUE))
eval(parse(text=arg))
## if any variable named in defaults doesn't exist, then create it
## with value from defaults
for (nm in names(defaults))
assign(nm, mget(nm, ifnotfound=list(defaults[[nm]]))[[1]])
print(a)
print(b)
At the command line, if I type
R CMD BATCH --no-save --no-restore '--args a=2 b=c(2,5,6)' test.R
then within R we'll have a = 2 and b = c(2,5,6). But I could, say, omit b, and add in another argument c:
R CMD BATCH --no-save --no-restore '--args a=2 c="hello"' test.R
Then in R we'll have a = 2, b = c(1,1,1) (the default), and c = "hello".
Finally, for convenience we can wrap the R code in a function, as long as we're careful about the environment:
## defaults should be either NULL or a named list
parseCommandArgs <- function(defaults=NULL, envir=globalenv()) {
for (arg in commandArgs(TRUE))
eval(parse(text=arg), envir=envir)
for (nm in names(defaults))
assign(nm, mget(nm, ifnotfound=list(defaults[[nm]]), envir=envir)[[1]], pos=envir)
}
## example usage:
parseCommandArgs(list(a=1, b=c(1,1,1)))

knitr: starting a fresh R session to clear RAM

I sometimes work with lots of objects and it would be nice to have a fresh start because of memory issues between chunks. Consider the following example:
warning: I have 8GB of RAM. If you don't have much, this might eat it all up.
<<chunk1>>=
a <- 1:200000000
#
<<chunk2>>=
b <- 1:200000000
#
<<chunk3>>=
c <- 1:200000000
#
The solution in this case is:
<<chunk1>>=
a <- 1:200000000
#
<<chunk2>>=
rm(a)
gc()
b <- 1:200000000
#
<<chunk3>>=
rm(b)
gc()
c <- 1:200000000
#
However, in my example (which I can post because it relies on a large dataset), even after I remove all of the objects and run gc(), R does not clear all of the memory (only some). The reason is found in ?gc:
However, it can be useful to call ‘gc’ after a large object has
been removed, as this may prompt R to return memory to the
operating system.
Note the important word may. R has a lot of situations where it specifies may like this and so it is not a bug.
Is there a chunk option according to which I can have knitr start a new R session?
My recommendation would to create an individual .Rnw for each of the major tasks, knit them to .tex files and then use \include or \input in a parent.Rnw file to build the full project. Control the building of the project via a makefile.
However, to address this specific question, using a fresh R session for each chunk, you could use the R package subprocess to spawn a R session, run the needed code, extract the results, and then kill the spawned session.
A simple example .Rnw file
\documentclass{article}
\usepackage{fullpage}
\begin{document}
<<include = FALSE>>=
knitr::opts_chunk$set(collapse = FALSE)
#
<<>>=
library(subprocess)
# define a function to identify the R binary
R_binary <- function () {
R_exe <- ifelse (tolower(.Platform$OS.type) == "windows", "R.exe", "R")
return(file.path(R.home("bin"), R_exe))
}
#
<<>>=
# Start a subprocess running vanilla R.
subR <- subprocess::spawn_process(R_binary(), c("--vanilla --quiet"))
Sys.sleep(2) # wait for the process to spawn
# write to the process
subprocess::process_write(subR, "y <- rnorm(100, mean = 2)\n")
subprocess::process_write(subR, "summary(y)\n")
# read from the process
subprocess::process_read(subR, PIPE_STDOUT)
# kill the process before moving on.
subprocess::process_kill(subR)
#
<<>>=
print(sessionInfo(), local = FALSE)
#
\end{document}
Generates the following pdf:

R: is there a command for the end of a file that states whether any errors occurred?

Sometimes when I am running lots of long programs I think it would be nice if there was a statement or command I could add to the bottom of a file that would tell me whether R returned any error messages (or warning messages) while running a file.
I always scroll up through all of the code to visually check for error messages or warnings and keep thinking it would be nice if R simply told me at the bottom of the code whether any error messages or warnings occurred.
Can R do that? I suppose even if R can do that I would need a while to develop trust in the command line to catch all error messages or warning messages.
With SAS I used to use the find command and search the log window for the word ‘Error’ or ‘Warning’.
Thanks for any thoughts or advice about this.
Here is a very simple example of R code that returns 3 error messages.
x <- c(1,2,3,4)
y <- c(3,4)
z <- x / y
zz
a <- matrix(x, nrow=2, byrow=T)
b <- matrix(x, nrows=2, byrow=T)
z x a
z * a
I assume you are running from a GUI, where errors are not fatal. Here is a solution making use of options(error). The error handler is replaced by a function that increments a variable:
.error.count <- 0
old.error.fun <- getOption("error")
new.error.fun <- quote(.error.count <- .error.count + 1)
options(error = new.error.fun)
### your code here ###
x <- c(1,2,3,4)
y <- c(3,4)
z <- x / y
zz
a <- matrix(x, nrow=2, byrow=T)
b <- matrix(x, nrows=2, byrow=T)
z x a
z * a
######################
cat("ERROR COUNT:", .error.count, "\n")
options(error = old.error.fun)
rm(.error.count, old.error.fun, new.error.fun)
this is not a good example, because when i run it, it stops on the first error. however, the general question is probably better solved by the OS and the standard file descriptors. specifically, R will output its normal output to stdout, and its warnings and errors to stderr, and you can deal with those streams separately rather than seeing them together. for example, you can send stdout to a file and keep stderr in the terminal:
Rscript myfile.R 1>output.txt
I get this functionality primarily by using source() to run code; i.e., just dump your code into a file and then run:
source("yourscript.r")
in R, which returns:
Error in source("yourscript.r") : yourscript.r:7:3: unexpected symbol
6: b <- matrix(x, nrows=2, byrow=T)
7: z x
^
It doesn't return all the errors in one pass - syntax errors will stop the file from executing at all, unlike checkorbored's Rscript method that runs and then gives you the first error (see ?source for more details). But it might serve your purposes.
similar to #checkorboard
what I do is put the code in a text file, say "yourscript.r" and then run:
R CMD BATCH yourscript.r &
This will automatically create a file like yourscript.rout with the output of the program and you can easily grep to see if there was an error.

Resources