I want an R script to continuously run and check for files in a folder and do something with those files.
The code simply checks for a file, then moves the file to somewhere else and renames it, deleting the old file (in reality it's a bit more elabore than this).
If I run the script it works fine, however I want R to automatically detect for the files. In other words, is there a way to have R run the script continuously so that I don't have to run the script if I put files in that folder?
In pure R you just need an infinite repeat loop...
repeat {
print('Checking files')
# Your code to do file manipulation
Sys.sleep(time=5) # to stop execution for 5 sec
}
However there may be better tools suitable to do this kind of file manipulation depending on your OS.
You can use the function tclTaskSchedule from the tcltk2 package to schedule a function or expression to run on a regular interval. You can have multiple such tasks scheduled and still work in the R session (just be careful not to modify something that the scheduled task could also modify or you can get unpredictable results).
Though an OS based solution that runs a given rscript may still be a better approach.
Related
I recently started looking into Makefiles to keep track of the scripts inside my research project. To really understand what is going on, I would like to understand the contents of .Rout files produced by R CMD BATCH a little better.
Christopher Gandrud is using a Makefile for his book Reproducible research with R and RStudio. The sample project (https://github.com/christophergandrud/rep-res-book-v3-examples/tree/master/data) has only three .R files: two of them download and clean data, the third one merges both datasets. They are invoked by the following lines of the Makefile:
# Key variables to define
RDIR = .
# Run the RSOURCE files
$(RDIR)/%.Rout: $(RDIR)/%.R
R CMD BATCH $<
None of the first two files outputs data; nor does the merge script explicitly import data - it just uses the objects created in the first two scripts. So how is the data preserved between the scripts?
To me it seems like the batch execution happens within the same R environment, preserving both objects and loaded packages. Is this really the case? And is it the .Rout file that transfers the objects from one script to the other or is it a property of the batch execution itself?
If the working environment is really preserved between the scripts, I see a lot of potential for issues if there are objects with the same names or functions with the same names from different packages. Another issue of this setup seems to be that the Makefile cannot propagate changes in the first two files downstream because there is no explicit input/prerequisite for the merge script.
I would appreciate to learn if my intuition is right and if there are better ways to execute R files in a Makefile.
By default R CMD BATCH will save your workspace to a hidden .Rdata file after running unless you choose --no-save. That's why it's not really the recommended way to run R script. The recommended way is with Rscript which will not save by default. You must write code explicitly to save if that's what you want. This is different than the Rout file which should only have the output from the commands run in the script.
In this case, execution doesn't happen in the exact same environment. R is still called three times, but that environment is serialized and reloaded between each run.
You are correct that there may be a lot of problems with saving and re-loading workspaces by default. That's why most people recommend you do not do that. But in this cause, the author just figured it made things easier for their workflow so they used it. It would be better to be more explicit about input and output files in general though.
For running an Rnw file in RStudio, one can compile or run all. Compiling does not see the variables in the current environment, and the current environment does not see the variables created while compiling. I would like to see how the output would look when I compile, and I debug the code using the environment. This requires me to compile and run, which performs the same calculations twice, which is very impractical for large projects. Is there a way to compile and have the output be seen in the environment?
When you knit a document, the work happens in a different R session, which is why you can't examine the results in the current session.
But you have a lot of choices besides run all. Take a look at the Run button: it allows you to run chunks one at a time, or run all previous chunks, etc.
If some of your chunks take too long to run, then you should consider organizing your work differently. Put the long computations into their own script, and save the results of that script using save(). Run it once, then spend time editing the display of those results in multiple runs in the main .Rnw document.
Finally, if you really want to see variables at the end of a run of your vignette, you can add save.image(file = 'vignette.RData') at the end, and in your interactive session, use load('vignette.RData') to load the values for examination. This won't necessarily give you an accurate view of the state of things at the end of the run, because it will load the values in addition to anything you've already got in your workspace, it won't load option settings or attach packages, but it might be enough for debugging.
I wrote an R function that updates the version number of a package in another question. I work a lot with GitHub and RStudio, and it would safe me quite some time (plus be much more precise) if this function was automatically run every time I opened a certain project (or better yet, make a git commit/push, but I assume that is harder to do). But I don't know how to do this or if this is even possible.
I could use .Rprofile to run R codes every time I start R, so I could just update versions whenever I start R (or build in that it only updates the version if the date is not today or something) but that seems overdoing it.
You can make a separate .Rprofile for each project. You have to put it in the main directory of the project (http://www.rstudio.com/ide/docs/using/projects).
Well I would use .Rprofile for that. There is something to be said for being independent of the tool chain around you: knitr works from RStudio as well as without it, dito for Rcpp/RInside etc pp.
You can hook into commit hooks for svn, both explicitly via hooks in the back end, or simply at your by end adding wrapper scripts. I presume you can do likewise with git but I simply know much less about it. So to abstract this away, I would write myself a 'commitThis' or 'pushThis' or ... function that does the number increment, test run, code push and what have you.
If your code needs RStudio to be already running (e.g. because it's relying on some rstudioapi:: function), putting it directly in .Rprofile won't work (.Rprofile is executed before RStudio is available).
Instead, you could set a hook for "rstudio.sessionInit":
setHook(
hookName = "rstudio.sessionInit",
action = function(newSession) {
if (newSession) {
# your code goes here
},
action = "append"
)
I am new to R and have been trying to use JRI. Through JRI, I have used the "eval()" function to get certain results. If I want to execute an R script, I have used "source()". However I am now in a situation where I need to execute a script on continuously incoming data. While I can still use "source()", I don't think that would be an optimal way from a performance perspectve.
What I did was to read the entire R script into memory and then try and use "eval()" passing the script - but this does not seem to work. I have ensured that the script has been correctly loaded into memory - that is because if I write this script (loaded into the memory) into a file and source this newly created file, it does produce the expected results.
Is there a way for me to not keep sourcing the same file over and over again and execute it from memory? Each of my data units are independent and have to be processed independently and as soon as they become available. I cannot wait to collect a bunch of data units and then pass them on to the R script.
I have searched a lot and not found anything related to this. Any pointers which could help me in this direction would be really helpful.
The way I handled this is as below -
I enclosed the entire script into a function.
I sourced the script file (which now contains the function) at the start of the execution of my program.
The place where I was sourcing the file, I am now just calling the function which contains the script itself i.e. -
REXP result = rengine.eval("retVal<-" + getFunctionName() + "()");
Here, getFunctionName() gives me the name of the name of the function which contains the script.
Since this is loaded into the memory and available, I do not have to source the script file every time I want to execute the script. Any arguments being passed to the script are done as env. variables.
This seems to be a workaround, but solves my problem. Any better options are welcome.
I have about 5 scripts that are all part of a project to be run one after the other. I would like to open the first script, run it and then be prompted at the end, "Do you want to run XrefGenetic.r?" If yes, then XrefGenetic.r should open and run. I am 100% certain R can do this, in fact I think I used to know how but have forgotten and cannot find it anywhere.
How do I open another r script from within an r script?
Are you thinking of source() ?
My usual recommendation is to create a package, as that alleviates all these issues: functions and symbols are known (or hidden if you chose not to export them) and you have generally much better control.