Does shell() in R remember the previous command? - r

I'm trying to run a batch file via R and I'm trying to use shell() to do so. This is written over two separate instances of shell(), one to set the directory and one to run the file:
shell("cd <file location>")
shell("activate.bat")
While troubleshooting I ran shell("dir") and it returns the current working directory of the R and not the directory set via shell().
Is shell() unable to retain memory between different instances of it being used, and if so is it possible to run two terminal commands in R when one relies on the other being run first?

Related

SparkR: source() other R files in an R Script when running spark-submit

I'm new to Spark and newer to R, and am trying to figure out how to 'include' other R-scripts when running spark-submit.
Say I have the following R script which "sources" another R script:
main.R
source("sub/fun.R")
mult(4, 2)
The second R script looks like this, which exists in a sub-directory "sub":
sub/fun.R
mult <- function(x, y) {
x*y
}
I can invoke this with Rscript and successfully get this to work.
Rscript file.R
[1] 8
However, I want to run this with Spark, and use spark-submit. When I run spark-submit, I need to be able to set the current working directory on the Spark workers to the directory which contains the main.R script, so that the Spark/R worker process will be able to find the "sourced" file in the "sub" subdirectory. (Note: I plan to have a shared filesystem between the Spark workers, so that all workers will have access to the files).
How can I set the current working directory that SparkR executes in such that it can discover any included (sourced) scripts?
Or, is there a flag/sparkconfig to spark-submit to set the current working directory of the worker process that I can point at the directory containing the R Scripts?
Or, does R have an environment variable that I can set to add an entry to the "R-PATH" (forgive me if no such thing exists in R)?
Or, am I able to use the --files flag to spark-submit to include these additional R-files, and if so, how?
Or is there generally a better way to include R scripts when run with spark-submit?
In summary, I'm looking for a way to include files with spark-submit and R.
Thanks for reading. Any thoughts are much appreciated.

Setting .libPaths() For Running R Scripts From Command Line Using Rscript.exe

I am trying to run R scripts via BAT files on Windows Command Prompt.
The scripts require a few R packages such as data.table, tidyR, etc.
For operational reasons, all required R packages and dependencies (including data.table) are installed at C:\Users\username\Documents\R\R-3.5.1\library. I am not allowed to install RStudio in this environment.
When I try
"C:\Program Files\R\R-3.5.1\bin\x64\Rscript.exe" script.R, I get an error similar to
Error in library(data.table) : there is no package called 'data.table'
Execution halted
How can I set the .libPaths via Command Prompt to point to the correct location of the packages (i.e. to C:\Users\username\Documents\R\R-3.5.1\library)?
Thank you in advance.
Disclaimer: I'm unfamiliar with R.
From R: Search paths :
The library search path is initialized at startup from the environment
variable R_LIBS (which should be a colon-separated list of directories
at which R library trees are rooted) followed by those in environment
variable R_LIBS_USER. Only directories which exist at the time will be
included.
By default R_LIBS is unset, and R_LIBS_USER is set to directory
‘R/R.version$platform-library/x.y’ of the home directory (or
‘Library/R/x.y/library’ for CRAN macOS builds), for R x.y.z.
An environment variable can be created with set VARIABLE_NAME=YOUR_VALUE batch command.
So your batch file should probably be something like this:
cd /d "C:\INSERT_PATH_TO_DIRECTORY_CONTAINING_script.R"
set "R_LIBS=C:\Users\username\Documents\R\R-3.5.1\library"
"C:\Program Files\R\R-3.5.1\bin\x64\Rscript.exe" script.R
However for portability reasons (let's say a collegue asks for a copy of your script or your computer dies) I suggest putting the script, R library and batch file in a single directory, let's say C:\Users\username\Documents\R. The batch file C:\Users\username\Documents\R\script.bat becomes:
cd /d "%~dp0"
set "R_LIBS=%~dp0R-3.5.1\library"
"%PROGRAMFILES%\R\R-3.5.1\bin\x64\Rscript.exe" "%~dpn0.R"
%PROGRAMFILES% environment variable expands to full path of program files folder, %~dp0 parameter expands to full path of a directory that holds your batch file, and %~dpn0 is a batch-file full path without extension.
Notice that %~dp0R-3.5.1 is not a typo because %~dp0 includes trailing backslash.
This way you can copy C:\Users\username\Documents\R to D:\Users\SOMEOTHERNAME\Documents\R and the script will still run.
If you create another version of your script, just copy the batch file so that it has same filename as your script but .bat extension instead of .R and it should call the new script - this has proven to be very handy when debugging and distributing scripts.
Alternatively, if you would rather install libraries separately you may want to use %HOMEDRIVE%%HOMEPATH% which expands to C:\Users\username.
Extracting proper Documents folder path, as well as R installation path is possible but requires reading the registry and thus is a bit more complicated.

calling Qiime with system call from R

Hej,
When I try to call QIIME with a system call from R, i.e
system2("macqiime")
R stops responding. It's no problem with other command line programs though.
can certain programs not be called from R via system2() ?
MacQIIME version:
MacQIIME 1.8.0-20140103
Sourcing MacQIIME environment variables...
This is the same as a normal terminal shell, except your default
python is DIFFERENT (/macqiime/bin/python) and there are other new
QIIME-related things in your PATH.
(note that I am primarily interested to call QIIME from R Markdown with engine = "sh" which fails, too. But I strongly suspect the problems are related)
In my experience, when you call Qiime from unix command line, it usually creates a virtual shell of it`s own to run its commands which is different from regular system commands like ls or mv. I suspect you may not be able to run Qiime from within R unless you emulate that same shell or configuration Qiime requires. I tried to run it from a python script and was not successful.

Using R and with the Terminal

I have combed the internet looking for a tutorial on how to use R through the terminal and I can't find anything that exactly fits what I'm looking for. I have a group of R scripts. They need to be run in a specific order. Each produces a set of .csv files to be used as inputs for the next script that will be executed.
I have used R CMD BATCH filename.R and it makes an output file called filename.Rout
But then when I try to use cat filename.Rout, I get an error Fatal error: cannot open file 'filename.R': No such file or directory
Also I'm not sure where my output (.csv files) should be written to. In my R script, there are some write.csv lines pointing the new datasets to be written to a specific directory but when I run R CMD BATCH filename.R in the Terminal, I don't see the files where they should be. I see a filename.Rout in another directory but I can't open it.

batch process for R gui

I have created a batch file to launch R scripts in Rterm.exe. This works well for regular weekly tasks. The < PBWeeklyMeetingScriptV3.R > is the R script run by Rterm.
set R_TERM="C:\Program Files\R\R-2.14.0\bin\x64\Rterm.exe"
%R_TERM% --slave --no-restore --no-save --args 20120401 20110403 01-apr-12 03-apr-11 < PBWeeklyMeetingScriptV3.R > PBWeeklyMeetingScriptV3.batch 2> error.txt
I've tried to modify this to launch the R GUI instead of the background process as I'd like to inspect and potentially manipulate and inspect the data.
If I change my batch file to:
set R_TERM="C:\Program Files\R\R-2.14.0\bin\x64\Rgui.exe"
the batch file will launch the R GUI but doesn't start the script. Is there a way to launch the script too?
Alternatively is there a way to save/load the work space image to access the variables that are created in the script?
You can save and load workspaces by using save.image() and load(). I do this all the time when scripting to pass data sets between two separate script files, tied together using Python or bash. At the end of each R script, just add:
save.image("Your_image_name.RData")
The image will be the workspace that existed whenever the command was run (so, if it's the last command in the file, it's the workspace right before the exist of the file). We also use this at my job to create "snapshots" of input and output data, so we can reproduce the research later. (We use a simple naming convention to get the time of run, and then label the files with that).
Not sure about launching and then running the GUI with specific scripts in it; I don't think that's a feature you'll find in R, simply because the whole point of running a batch file is usually to avoid the GUI. But hopefully, you can just save the image to disk, and then look at it or pass it to other programs as needed. Hope that helps!

Resources