R: Startup script on cluster - r

I would like R to run a script anytime I start up R. I am using R on a Linux cluster and I do not have access to R installation and the .Rprofile file and I cannot find it either. I can run R and have my local packages/libraries. For example, I would like to set the lib paths each time R is started.
.libPaths("path")
Is it possible to define and set a path for a custom startup script file?
In a non-interactive session, I could do in Bash something like
Rscript script.R
and my script.R has
.libPaths('path')
...
more code
...

Related

Calling bash from within R

I have R generating some .csv files for another python program to run in another folder, I know it is possible to call bash from R but how could I call the command make in my ubuntu virtual machine in another directory?
The simple way is creating an script to cd to your dir and exec make after that
script <- tempfile()
fhandle <- file(script)
writeLines("( cd /your_directory && make )",con=fhandle)
system2("/bin/bash",args=c(script))
You may need to find the correct path to /bin/bash, mine is from MacOs
You can work with system2 parameters to control what happens with output from make command and if you want to run the process in parallel with your R task or wait for completion.

SparkR: source() other R files in an R Script when running spark-submit

I'm new to Spark and newer to R, and am trying to figure out how to 'include' other R-scripts when running spark-submit.
Say I have the following R script which "sources" another R script:
main.R
source("sub/fun.R")
mult(4, 2)
The second R script looks like this, which exists in a sub-directory "sub":
sub/fun.R
mult <- function(x, y) {
x*y
}
I can invoke this with Rscript and successfully get this to work.
Rscript file.R
[1] 8
However, I want to run this with Spark, and use spark-submit. When I run spark-submit, I need to be able to set the current working directory on the Spark workers to the directory which contains the main.R script, so that the Spark/R worker process will be able to find the "sourced" file in the "sub" subdirectory. (Note: I plan to have a shared filesystem between the Spark workers, so that all workers will have access to the files).
How can I set the current working directory that SparkR executes in such that it can discover any included (sourced) scripts?
Or, is there a flag/sparkconfig to spark-submit to set the current working directory of the worker process that I can point at the directory containing the R Scripts?
Or, does R have an environment variable that I can set to add an entry to the "R-PATH" (forgive me if no such thing exists in R)?
Or, am I able to use the --files flag to spark-submit to include these additional R-files, and if so, how?
Or is there generally a better way to include R scripts when run with spark-submit?
In summary, I'm looking for a way to include files with spark-submit and R.
Thanks for reading. Any thoughts are much appreciated.

How to use local R installation on HPC with qsub

I work with a cluster where it is not possible to globally install a specific R version. Given that I built a specific version for R on folder:
<generic_path>/R/R-X.Y.Z
and I installed some packages locally on:
<generic_path/R/packages
how can I set, in a shell script (bash), the environment variables and aliases to run this specific R version, loading the packages from the local package directory?
Option 1:
Using a shell script for HPC (in my case a qsub script), this is possible by running a shell script (e.g. in bash), which contains the following lines:
alias R="<path_to_R>/R/R-X.Y.Z/bin/R"
export R_LIBS="<path_to_R>/R/packages"
export PATH="<path_to_R>/R/R-X.Y.Z/bin:${PATH}"
The script (here I named it makeenv.sh) may be run inside the qsub script with:
source makeenv.sh
Option 2: Depending on your HPC system, you might have module avail, module load commands, if so then use:
myBsubFile.sh
#!/bin/bash
# some #BSUB headers...
# ...
module load /R/R-X.Y.Z
Rscript myRcode.R
Then load libraries in the R script as:
myRcode.R
library("data.table", lib.loc = "path/to/my/libs")
# some more R code...

Run shiny application with arguments in terminal

Like in the topic, I'd like to run shiny app with parameters. I need to specify the database file's path from which I will grab the data. The problem is that the file changes sometimes thus I have to modify the file.path every time.
This is the command I use when running application from terminal
R -e "shiny::runApp('../Shiny_visualization')"
I tried
R -e "shiny::runApp('../Shiny_visualization')" --args 'db_path' yet I got an error.

Running R file that has Dependencies From Linux Command Line

Let say that I have a file
file1.R and it has a method getMe()
Now, I want to run file2.R and it makes calls to getMe()
Do I need to run
R CMD BATCH file1.R
before whenever I want to run
R CMD BATCH file2.R ?
Or will R some how be able to determine that file2 has a function that is defined in file1.R?
Whats the standard for running a file that has other function dependencies in other files?
You need to source file1.R so that the functions defined there becomes available for others.
source('file1.R')
You can do this inside your file2.R. Then run simply
R CMD BATCH file2.R

Resources