Executing an R script in a way other than using source() in JRI - r

I am new to R and have been trying to use JRI. Through JRI, I have used the "eval()" function to get certain results. If I want to execute an R script, I have used "source()". However I am now in a situation where I need to execute a script on continuously incoming data. While I can still use "source()", I don't think that would be an optimal way from a performance perspectve.
What I did was to read the entire R script into memory and then try and use "eval()" passing the script - but this does not seem to work. I have ensured that the script has been correctly loaded into memory - that is because if I write this script (loaded into the memory) into a file and source this newly created file, it does produce the expected results.
Is there a way for me to not keep sourcing the same file over and over again and execute it from memory? Each of my data units are independent and have to be processed independently and as soon as they become available. I cannot wait to collect a bunch of data units and then pass them on to the R script.
I have searched a lot and not found anything related to this. Any pointers which could help me in this direction would be really helpful.

The way I handled this is as below -
I enclosed the entire script into a function.
I sourced the script file (which now contains the function) at the start of the execution of my program.
The place where I was sourcing the file, I am now just calling the function which contains the script itself i.e. -
REXP result = rengine.eval("retVal<-" + getFunctionName() + "()");
Here, getFunctionName() gives me the name of the name of the function which contains the script.
Since this is loaded into the memory and available, I do not have to source the script file every time I want to execute the script. Any arguments being passed to the script are done as env. variables.
This seems to be a workaround, but solves my problem. Any better options are welcome.

Related

Do .Rout files preserve the R working environment?

I recently started looking into Makefiles to keep track of the scripts inside my research project. To really understand what is going on, I would like to understand the contents of .Rout files produced by R CMD BATCH a little better.
Christopher Gandrud is using a Makefile for his book Reproducible research with R and RStudio. The sample project (https://github.com/christophergandrud/rep-res-book-v3-examples/tree/master/data) has only three .R files: two of them download and clean data, the third one merges both datasets. They are invoked by the following lines of the Makefile:
# Key variables to define
RDIR = .
# Run the RSOURCE files
$(RDIR)/%.Rout: $(RDIR)/%.R
R CMD BATCH $<
None of the first two files outputs data; nor does the merge script explicitly import data - it just uses the objects created in the first two scripts. So how is the data preserved between the scripts?
To me it seems like the batch execution happens within the same R environment, preserving both objects and loaded packages. Is this really the case? And is it the .Rout file that transfers the objects from one script to the other or is it a property of the batch execution itself?
If the working environment is really preserved between the scripts, I see a lot of potential for issues if there are objects with the same names or functions with the same names from different packages. Another issue of this setup seems to be that the Makefile cannot propagate changes in the first two files downstream because there is no explicit input/prerequisite for the merge script.
I would appreciate to learn if my intuition is right and if there are better ways to execute R files in a Makefile.
By default R CMD BATCH will save your workspace to a hidden .Rdata file after running unless you choose --no-save. That's why it's not really the recommended way to run R script. The recommended way is with Rscript which will not save by default. You must write code explicitly to save if that's what you want. This is different than the Rout file which should only have the output from the commands run in the script.
In this case, execution doesn't happen in the exact same environment. R is still called three times, but that environment is serialized and reloaded between each run.
You are correct that there may be a lot of problems with saving and re-loading workspaces by default. That's why most people recommend you do not do that. But in this cause, the author just figured it made things easier for their workflow so they used it. It would be better to be more explicit about input and output files in general though.

Very simple question on Console vs Script in R

I have just started to learn to code on R, so I apologize for the very simple question. I understand it is best to type your code in as a Script so you can edit and save it. However, when I try to make an object in the script section, it does not work. If I make an object in the console, R saves the object and it appears in my environment. I am typing in a very simple code to try a quick exercise on rolling dice:
die <- 1:6
But it only works in the console and not when typed as a script. Any help/explanation appreciated!
Essentially, you interact with R environment differently when running an .R script via RScript.exe or via console with R.exe, Rterm, etc. and in GUI IDEs like RGui or RStudio. (This applies to any programming language with interactive compilers not just R).
The script does save thedie object in R environment but only during the run or lifetime of that script (i.e., from beginning to end of code lines). Your code line is simply an assignment of object. You do nothing with it. Apply some function, output results, and other actions in that script to see.
On the console, the R environment persists interactively until you quit it with q(). So assigned objects remains for lifetime of your console session. After assigning, you can afterwards apply function, output results, or other actions in line by line calls.
Ultimately, scripts gathers all line by line code in advance of run for automated execution without relying on user to supply lines. Imagine running 1,000 lines of code with nested if/then or for/while loops, apply functions on console! Therefore, have all your R coding needs summarily handled in scripts.
It is always better to have the script, as you say, you can save edit correct, without having to rewrite the code to change a variable or number.
I recommend using Rstudio, it is very practical and will help you to program more efficiently and allows you to see, among other things, the different objects that you have created.

Wait for child process to terminate in R

I have multiple r script which I source on one script. I want to run each script at a time such that the second script should wait until the first script is over.
Is their a function like wait() which is use in python subprocess package?
is their similar package in R?
Running the following should work in the way you describe -- file_2.R will not run until the first source() is complete.
source('file_1.R')
source('file_2.R')
Note that, by default, elements in the called script's environment will be available in the global environment (and therefore to anything you source thereafter. You can disable this behavior with the argument local=TRUE).

R - Execute a function in a file

There is a R file and there is a function getInfo() in it.
I want to run this function in that script file alone.
Is that possible ?
I know running the script command on the file and then running the function name will help.
But then it will also run the rest of stuffs from the script file which i dont want.
What is the best way out here
When you use source on a script file, all the code in that file will be loaded into the R session currently active. Any code that is not in a function, will be executed. I see two options:
Put the function in a seperate source file, or even a package if the number of functions grows.
Set a global R variable using option and retrieve its value in the file to be sourced using getOption, making the execution of the non-function code dependend on this option. This does require you to always set this option before sourcing the file, in any project you use it in.
I would go for option 1.

How to continuously restart/loop R script

I want an R script to continuously run and check for files in a folder and do something with those files.
The code simply checks for a file, then moves the file to somewhere else and renames it, deleting the old file (in reality it's a bit more elabore than this).
If I run the script it works fine, however I want R to automatically detect for the files. In other words, is there a way to have R run the script continuously so that I don't have to run the script if I put files in that folder?
In pure R you just need an infinite repeat loop...
repeat {
print('Checking files')
# Your code to do file manipulation
Sys.sleep(time=5) # to stop execution for 5 sec
}
However there may be better tools suitable to do this kind of file manipulation depending on your OS.
You can use the function tclTaskSchedule from the tcltk2 package to schedule a function or expression to run on a regular interval. You can have multiple such tasks scheduled and still work in the R session (just be careful not to modify something that the scheduled task could also modify or you can get unpredictable results).
Though an OS based solution that runs a given rscript may still be a better approach.

Resources