I have created a batch file to launch R scripts in Rterm.exe. This works well for regular weekly tasks. The < PBWeeklyMeetingScriptV3.R > is the R script run by Rterm.
set R_TERM="C:\Program Files\R\R-2.14.0\bin\x64\Rterm.exe"
%R_TERM% --slave --no-restore --no-save --args 20120401 20110403 01-apr-12 03-apr-11 < PBWeeklyMeetingScriptV3.R > PBWeeklyMeetingScriptV3.batch 2> error.txt
I've tried to modify this to launch the R GUI instead of the background process as I'd like to inspect and potentially manipulate and inspect the data.
If I change my batch file to:
set R_TERM="C:\Program Files\R\R-2.14.0\bin\x64\Rgui.exe"
the batch file will launch the R GUI but doesn't start the script. Is there a way to launch the script too?
Alternatively is there a way to save/load the work space image to access the variables that are created in the script?
You can save and load workspaces by using save.image() and load(). I do this all the time when scripting to pass data sets between two separate script files, tied together using Python or bash. At the end of each R script, just add:
save.image("Your_image_name.RData")
The image will be the workspace that existed whenever the command was run (so, if it's the last command in the file, it's the workspace right before the exist of the file). We also use this at my job to create "snapshots" of input and output data, so we can reproduce the research later. (We use a simple naming convention to get the time of run, and then label the files with that).
Not sure about launching and then running the GUI with specific scripts in it; I don't think that's a feature you'll find in R, simply because the whole point of running a batch file is usually to avoid the GUI. But hopefully, you can just save the image to disk, and then look at it or pass it to other programs as needed. Hope that helps!
Related
I'm new to Spark and newer to R, and am trying to figure out how to 'include' other R-scripts when running spark-submit.
Say I have the following R script which "sources" another R script:
main.R
source("sub/fun.R")
mult(4, 2)
The second R script looks like this, which exists in a sub-directory "sub":
sub/fun.R
mult <- function(x, y) {
x*y
}
I can invoke this with Rscript and successfully get this to work.
Rscript file.R
[1] 8
However, I want to run this with Spark, and use spark-submit. When I run spark-submit, I need to be able to set the current working directory on the Spark workers to the directory which contains the main.R script, so that the Spark/R worker process will be able to find the "sourced" file in the "sub" subdirectory. (Note: I plan to have a shared filesystem between the Spark workers, so that all workers will have access to the files).
How can I set the current working directory that SparkR executes in such that it can discover any included (sourced) scripts?
Or, is there a flag/sparkconfig to spark-submit to set the current working directory of the worker process that I can point at the directory containing the R Scripts?
Or, does R have an environment variable that I can set to add an entry to the "R-PATH" (forgive me if no such thing exists in R)?
Or, am I able to use the --files flag to spark-submit to include these additional R-files, and if so, how?
Or is there generally a better way to include R scripts when run with spark-submit?
In summary, I'm looking for a way to include files with spark-submit and R.
Thanks for reading. Any thoughts are much appreciated.
I'm basically looking for any way to automatically run R scripts just like it would run as if I was copy and pasting it into console. I've tried the package 'taskscheduleR' however it just seems to output to a log file in the directory which isn't as if I were to just run it inside the Rstudio application.
An example might be, say I want to get the last closing stock prices of 5 stocks each night, then the script in Rstudio and have the variables there and all of the code would be in the script file.
Any thoughts?
I would suggest the in-built Task Scheduler application if you using Windows.
Create a task that will run a batchscript file. This batchscript file has only 1 line which executes the Rscript you want. Set it to run each night (or whatever time you want).
I am not that well-versed in linux and MacOS but here's what I know:
Linux has cron. Add a job to crontab with your preferred timing and execute your script 'path/to/bin/r /path/to/script.r'
MacOS has Automator + iCal (for scheduling). It also has crontab like Linux.
My goal is to use PowerShell to schedule the sourcing of an R script.
My current work flow is that I open RStudio, click the "Source" button in the upper right corner. Then I wait until it's finished, and close RStudio. I change nothing in the R script.
In PowerShell I've been using its Register-ScheduledJob cmdlet to kick off C# programs on a daily schedule. And here's the problem, I can't find an example of effectively using PowerShell to source an R script.
I believe the PowerShell script should probably use the Invoke-Expression cmdlet. But I'm not 100% sure.
To no avail I've tried this:
Start-Process "C:\Program Files\R\R-3.2.4revised\bin\x64\Rterm.exe" -RedirectStandardInput "C:\MyScript.R"
Also, I'd like to avoid the solution that uses CMD BATCH as that's defeating the purpose of using PowerShell.
If just sourcing the R script is what you're looking for then one way to do is something like this
& "C:\Program Files\R\R-3.1.1\bin\Rscript.exe" "C:/Program Files/R/R-3.1.1/tests/demos.R"
where "C:\Program Files\R\R-3.1.1\bin\Rscript.exe" is path Rscript in your local R installation and "C:/Program Files/R/R-3.1.1/tests/demos.R" is path to script you'd normally source() directly in RStudio.
One thing to keep in mind is depending on location of files your R script needs you might need to adjust your script with appropriate setwd()
I'm effectively trying to tack save.image() onto the end of a script without modifying that script.
I was hoping something like Rscript target_script.R | saveR.R destination_path would work, where saveR.R reads,
args.from.usr<-commandArgs(TRUE)
setwd(args.from.usr[1])
save.image(file=".RData")
But that clearly does not work. Any alternatives?
You can write an R script file that takes two parameters: 1, the script file you want to run, and 2, the file you want to save the image to.
# runAndSave.R ------
args.from.usr <- commandArgs(trailingOnly=TRUE)
source(args.from.usr[1])
setwd(args.from.usr[2])
save.image(file=".RData")
And then run it with
Rscript runAndSave.R target_script.R destination_path
You could try to program a task to be done within the OS of that computer. In Linux you will be using the terminal and there is this tool called CRON. In Windows you can use Task Scheduler. If you program the OS to open a terminal and load an script and later save the image, you maybe will get what you need, save the data generated from the script without actually modifying it.
I've just set up an R script to run on my Windows machine - I'm trying to have it save a dataframe on the working directory (which I know from getwd()).
I can see from the Task Scheduler that the script must be running as saves the last run time, however when I check the wd for new time stamps on the dataframes I'm trying to save, they haven't updated? (I save over them each time, or at least that's what I'd like to do, I manually saved them in there to start with).
I'm using this on the scheduler:
C:\Program Files\R\R-2.13.1\bin\R.exe" CMD BATCH --vanilla --slave “C:\my projects\my_script.R
That appears to be working, but can anyone offer a reason as to why the script that I call doesn't seem to be saving my NEW DF to the wd? I'm using this command to save the DF:
write.table(m23,file="m23.csv",sep=",",row.names=F)
so DF m23 should get updated everyday in the wd when the scheduler calls the script at 6am?
Paul.
Are you sure you know what the current working directory is when the scheduler runs the script? My guess is it might not be what you think is it. I would look at the answers to this question: Rscript: Determine path of the executing script, especially the suggestion about using commandArgs to figure out where you are. Or you can explicitly set the working directory in your script with setwd()