How do I pass the environment of one R script to another? - r

I'm effectively trying to tack save.image() onto the end of a script without modifying that script.
I was hoping something like Rscript target_script.R | saveR.R destination_path would work, where saveR.R reads,
args.from.usr<-commandArgs(TRUE)
setwd(args.from.usr[1])
save.image(file=".RData")
But that clearly does not work. Any alternatives?

You can write an R script file that takes two parameters: 1, the script file you want to run, and 2, the file you want to save the image to.
# runAndSave.R ------
args.from.usr <- commandArgs(trailingOnly=TRUE)
source(args.from.usr[1])
setwd(args.from.usr[2])
save.image(file=".RData")
And then run it with
Rscript runAndSave.R target_script.R destination_path

You could try to program a task to be done within the OS of that computer. In Linux you will be using the terminal and there is this tool called CRON. In Windows you can use Task Scheduler. If you program the OS to open a terminal and load an script and later save the image, you maybe will get what you need, save the data generated from the script without actually modifying it.

Related

SparkR: source() other R files in an R Script when running spark-submit

I'm new to Spark and newer to R, and am trying to figure out how to 'include' other R-scripts when running spark-submit.
Say I have the following R script which "sources" another R script:
main.R
source("sub/fun.R")
mult(4, 2)
The second R script looks like this, which exists in a sub-directory "sub":
sub/fun.R
mult <- function(x, y) {
x*y
}
I can invoke this with Rscript and successfully get this to work.
Rscript file.R
[1] 8
However, I want to run this with Spark, and use spark-submit. When I run spark-submit, I need to be able to set the current working directory on the Spark workers to the directory which contains the main.R script, so that the Spark/R worker process will be able to find the "sourced" file in the "sub" subdirectory. (Note: I plan to have a shared filesystem between the Spark workers, so that all workers will have access to the files).
How can I set the current working directory that SparkR executes in such that it can discover any included (sourced) scripts?
Or, is there a flag/sparkconfig to spark-submit to set the current working directory of the worker process that I can point at the directory containing the R Scripts?
Or, does R have an environment variable that I can set to add an entry to the "R-PATH" (forgive me if no such thing exists in R)?
Or, am I able to use the --files flag to spark-submit to include these additional R-files, and if so, how?
Or is there generally a better way to include R scripts when run with spark-submit?
In summary, I'm looking for a way to include files with spark-submit and R.
Thanks for reading. Any thoughts are much appreciated.

Automating R to run scripts

I'm basically looking for any way to automatically run R scripts just like it would run as if I was copy and pasting it into console. I've tried the package 'taskscheduleR' however it just seems to output to a log file in the directory which isn't as if I were to just run it inside the Rstudio application.
An example might be, say I want to get the last closing stock prices of 5 stocks each night, then the script in Rstudio and have the variables there and all of the code would be in the script file.
Any thoughts?
I would suggest the in-built Task Scheduler application if you using Windows.
Create a task that will run a batchscript file. This batchscript file has only 1 line which executes the Rscript you want. Set it to run each night (or whatever time you want).
I am not that well-versed in linux and MacOS but here's what I know:
Linux has cron. Add a job to crontab with your preferred timing and execute your script 'path/to/bin/r /path/to/script.r'
MacOS has Automator + iCal (for scheduling). It also has crontab like Linux.

Running Plink from within R

I'm trying to use a Windows computer to SSH into a Mac server, run a program, and transfer the output data back to my Windows. I've been able to successfully do this manually using Putty.
Now, I'm attempting to automate the process using Plink. I've added Plink to my Windows Path, so if I open cmd and type in a command, I can successfully log in and pass commands to the server:
However, I'd like to automate this using R, to streamline the data analysis process. Based on some searching, the internet seems to think that the shell command is best suited to this task. Unfortunately, it doesn't seem to find Plink, though passing commands through shell to the terminal is working:
If I try the same thing but manually setting the path to Plink using shell, no output is returned, but the commands do not seem to run (e.g. TESTFOLDER is not created):
Does anyone have any ideas for why Plink is unavailable when I try to call it from R? Alternately, if there are other ideas for how this could be accomplished in R, that would also be appreciated.
Thanks in advance,
-sam
I came here looking for an answer to this question, so I only have so much to offer, but I think I managed to get PLINK's initial steps to work in R using the shell function...
This is what worked for me:
NOT in R:
Install PLINK and add its location to your PATH.
Download the example files from PLINK's tutorial (http://pngu.mgh.harvard.edu/~purcell/plink/tutorial.shtml) and put them in a folder whose path contains NO spaces (unless you know something I don't, in which case, space it up).
Then, in R:
## Set your working directory as the path to the PLINK program files: ##
setwd("C:/Program Files/plink-1.07-dos")
## Use shell to check that you are now in the right directory: ##
shell("cd")
## At this point, the command "plink" should be at least be recognized
# (though you may get a different error)
shell("plink")
## Open the PLINK example files ##
# FYI mine are in "C:/PLINK/", so replace that accordingly...
shell("plink --file C:\\PLINK\\hapmap1")
## Make a binary PED file ##
# (provide the full path, not just the file name)
shell("plink --file C:\\PLINK\\hapmap1 --make-bed --out C:\\PLINK\\hapmap1")
... and so on.
That's all I've done so far. But with any luck, mirroring the structure and general format of those lines of code should allow you to do what you like with PLINK from within R.
Hope that helps!
PS. The PLINK output should just print in your R console when you run the lines above.
All the best,
- CC.
Just saw Caitlin's response and it reminded me I hadn't ever updated with my solution. My approach was kind of a workaround, rather than solving my specific problem, but it may be useful to others.
After adding Plink to my PATH, I created a batch script in Windows which contained all my Plink commands, then called the batch script from R using the command shell:
So, in R:
shell('BatchScript.bat')
The batch script contained all my commands that I wanted to use in Plink:
:: transfer file to phosphorus
pscp C:\Users\Sam\...\file zipper#144.**.**.208:/home/zipper/
:: open connection to Dolphin using plink
plink -ssh zipper#144.**.**.208 Batch_Script_With_Remote_Machine_Commands.bat
:: transfer output back to local machine
pscp zipper#144.**.**.208:/home/zipper/output/ C:\Users\Sam\..\output\
Hope that helps someone!

Problems scheduling an R Script and saving data to the wd

I've just set up an R script to run on my Windows machine - I'm trying to have it save a dataframe on the working directory (which I know from getwd()).
I can see from the Task Scheduler that the script must be running as saves the last run time, however when I check the wd for new time stamps on the dataframes I'm trying to save, they haven't updated? (I save over them each time, or at least that's what I'd like to do, I manually saved them in there to start with).
I'm using this on the scheduler:
C:\Program Files\R\R-2.13.1\bin\R.exe" CMD BATCH  --vanilla --slave “C:\my projects\my_script.R
That appears to be working, but can anyone offer a reason as to why the script that I call doesn't seem to be saving my NEW DF to the wd? I'm using this command to save the DF:
write.table(m23,file="m23.csv",sep=",",row.names=F)
so DF m23 should get updated everyday in the wd when the scheduler calls the script at 6am?
Paul.
Are you sure you know what the current working directory is when the scheduler runs the script? My guess is it might not be what you think is it. I would look at the answers to this question: Rscript: Determine path of the executing script, especially the suggestion about using commandArgs to figure out where you are. Or you can explicitly set the working directory in your script with setwd()

batch process for R gui

I have created a batch file to launch R scripts in Rterm.exe. This works well for regular weekly tasks. The < PBWeeklyMeetingScriptV3.R > is the R script run by Rterm.
set R_TERM="C:\Program Files\R\R-2.14.0\bin\x64\Rterm.exe"
%R_TERM% --slave --no-restore --no-save --args 20120401 20110403 01-apr-12 03-apr-11 < PBWeeklyMeetingScriptV3.R > PBWeeklyMeetingScriptV3.batch 2> error.txt
I've tried to modify this to launch the R GUI instead of the background process as I'd like to inspect and potentially manipulate and inspect the data.
If I change my batch file to:
set R_TERM="C:\Program Files\R\R-2.14.0\bin\x64\Rgui.exe"
the batch file will launch the R GUI but doesn't start the script. Is there a way to launch the script too?
Alternatively is there a way to save/load the work space image to access the variables that are created in the script?
You can save and load workspaces by using save.image() and load(). I do this all the time when scripting to pass data sets between two separate script files, tied together using Python or bash. At the end of each R script, just add:
save.image("Your_image_name.RData")
The image will be the workspace that existed whenever the command was run (so, if it's the last command in the file, it's the workspace right before the exist of the file). We also use this at my job to create "snapshots" of input and output data, so we can reproduce the research later. (We use a simple naming convention to get the time of run, and then label the files with that).
Not sure about launching and then running the GUI with specific scripts in it; I don't think that's a feature you'll find in R, simply because the whole point of running a batch file is usually to avoid the GUI. But hopefully, you can just save the image to disk, and then look at it or pass it to other programs as needed. Hope that helps!

Resources