Calling bash from within R - r

I have R generating some .csv files for another python program to run in another folder, I know it is possible to call bash from R but how could I call the command make in my ubuntu virtual machine in another directory?

The simple way is creating an script to cd to your dir and exec make after that
script <- tempfile()
fhandle <- file(script)
writeLines("( cd /your_directory && make )",con=fhandle)
system2("/bin/bash",args=c(script))
You may need to find the correct path to /bin/bash, mine is from MacOs
You can work with system2 parameters to control what happens with output from make command and if you want to run the process in parallel with your R task or wait for completion.

Related

Running system() with git-bash in R

I checked
How to execute git-bash command with system() or shell() in R but this didn't exactly solve the problem for me. I'm very new to R, I'm using it on Windows and I'm modifying another project. I suspect originally this project was written for a different OS.
At one point, the main script calls a .sh file from inside R in a for loop, and uses system() to run it. The sh file creates a new directory, copies files from one directory to the other and modifies them slightly (removes first row & adds another).
The part in the code that calls the file goes like this: (run_this is the sh file we want to run)
directory_name = "data/clean"
for (i in 1:n) {
filename = sprintf("%s.json",i)
cmd = sprintf("run_this.sh %s %s", filename, directory_name)
system(cmd)
}
I suspect system() calls command prompt in Windows, which I've checked doesn't run this sh file. But I've found I can run them from Git Bash. Unfortunately I'd have to do them one by one if I choose this option and since n is large this doesn't work very well for me.
So, I'm wondering
1) is there any way to direct system() or system2() to use Git Bash from inside R? (I have Git Bash to my environment variables.)
2) any other possible solutions to run sh files from command prompt?

Schedule multiple R scripts to run sequentially

I have multiple scripts naming R001.r, R002.r, and so on. I need to schedule them so that they run in a sequential manner one after the other. What would be the best approach to do this.
I think you want to wrap your r scripts in a caller sh file and then invoke it through terminal. Here is what I would do.
Open up a terminal or any text editor available and fill it up with the following commands:
Rscript R0001.r
Rscript R0002.r
Rscript R0003.r
...
Save this file into something like call_my_scripts. You can then execute it via standard unix shell commands as follows:
./call_my_scripts
This will run sequentially by definition. Make sure you give exec permissions to the file before you invoke it as follows:
chmod u+x call_my_scripts

SparkR: source() other R files in an R Script when running spark-submit

I'm new to Spark and newer to R, and am trying to figure out how to 'include' other R-scripts when running spark-submit.
Say I have the following R script which "sources" another R script:
main.R
source("sub/fun.R")
mult(4, 2)
The second R script looks like this, which exists in a sub-directory "sub":
sub/fun.R
mult <- function(x, y) {
x*y
}
I can invoke this with Rscript and successfully get this to work.
Rscript file.R
[1] 8
However, I want to run this with Spark, and use spark-submit. When I run spark-submit, I need to be able to set the current working directory on the Spark workers to the directory which contains the main.R script, so that the Spark/R worker process will be able to find the "sourced" file in the "sub" subdirectory. (Note: I plan to have a shared filesystem between the Spark workers, so that all workers will have access to the files).
How can I set the current working directory that SparkR executes in such that it can discover any included (sourced) scripts?
Or, is there a flag/sparkconfig to spark-submit to set the current working directory of the worker process that I can point at the directory containing the R Scripts?
Or, does R have an environment variable that I can set to add an entry to the "R-PATH" (forgive me if no such thing exists in R)?
Or, am I able to use the --files flag to spark-submit to include these additional R-files, and if so, how?
Or is there generally a better way to include R scripts when run with spark-submit?
In summary, I'm looking for a way to include files with spark-submit and R.
Thanks for reading. Any thoughts are much appreciated.

How do I pass the environment of one R script to another?

I'm effectively trying to tack save.image() onto the end of a script without modifying that script.
I was hoping something like Rscript target_script.R | saveR.R destination_path would work, where saveR.R reads,
args.from.usr<-commandArgs(TRUE)
setwd(args.from.usr[1])
save.image(file=".RData")
But that clearly does not work. Any alternatives?
You can write an R script file that takes two parameters: 1, the script file you want to run, and 2, the file you want to save the image to.
# runAndSave.R ------
args.from.usr <- commandArgs(trailingOnly=TRUE)
source(args.from.usr[1])
setwd(args.from.usr[2])
save.image(file=".RData")
And then run it with
Rscript runAndSave.R target_script.R destination_path
You could try to program a task to be done within the OS of that computer. In Linux you will be using the terminal and there is this tool called CRON. In Windows you can use Task Scheduler. If you program the OS to open a terminal and load an script and later save the image, you maybe will get what you need, save the data generated from the script without actually modifying it.

any way to schedule an R program to run daily?

I am using R program to collect and update data from some local and online sources, which are updated frequently.
Since these sources are fixed, there is no argument to pass to the program, and everything is routine.
Now my supervisor wants me to set this as a scheduled daily task. I know it is impossible for .r file. Is there any way to compile the r file to executable file? such as .exe, .bat, ... ...
I don't need the executable file to be standalone, I can keep R in my computer.
any suggestion is appreciated.
You need to use the standard OS facilities (cron/at on Unix) to run R with the appropriate argument.
E.g., if you add the functions you need to .Rprofile, you can do
R --no-save --no-restore -q -e 'MyFunc(my,args)'
Alternatively, you might want to use Batch Execution of R.
For Windows I have hundreds of scripts that are set up with bat files similar to the below. It assumes that you have a NameOfScript.bat and a NameOfScript.r in the same folder and then run the .bat file from Scheduler and it logs everything from stdout/err to NameOfScript_yyyy-mm-dd.log in the same folder. I normally have the log folder seperate but adding that can be done just by changing the definition of LOG_FILE. Also passes in the folder it's in to R just in case you need to output some files in the folder.
IF DEFINED ProgramFiles(x86) (
SET R_SCRIPT="%ProgramFiles(x86)%\\R\\R-2.15.2\\bin\\Rscript.exe"
) ELSE (
SET R_SCRIPT="%ProgramFiles%\\R\\R-2.15.2\\bin\\Rscript.exe"
)
IF NOT EXIST %R_SCRIPT% GOTO FAIL
SET SCRIPT_DIR=%~dp0
SET SCRIPT_DIR=%SCRIPT_DIR:\=\\%
SET BATCH_FILE=%0
SET BATCH_FILE=%BATCH_FILE:"=%
SET SCRIPT_TO_RUN="%BATCH_FILE:.bat=.r%"
SET day=%DATE:~0,2%
SET month=%DATE:~3,2%
SET year=%DATE:~6,4%
SET yyyymmdd=%year%-%month%-%day%
SET LOG_FILE="%BATCH_FILE:.bat=%"_%yyyymmdd%.log
SET SCRIPT_DIR="%SCRIPT_DIR%"
%R_SCRIPT% --internet2 --max-mem-size=2047M --no-restore --no-save --args %SCRIPT_DIR% < %SCRIPT_TO_RUN% >> %LOG_FILE% 2>&1
PAUSE
EXIT /B 0
:FAIL
ECHO RScript not found. Failed process
You could also call the R script from C#, and run the C# project as a scheduled task.

Resources