Using an active rsession when running a script via Rscript - r

I know that I can use Rscript to run a R script in the command line. Currently, I'm passing different parameters to my script, load a few packages and run a few functions. Then, I change the parameters — via a bash script — and run the same script with different parameters . This is all fine, however, I was wondering if there is a way that I can instantiate one rsession and grab it instead of going through the process of loading all my packages, etc. every time that Rscript executes my script.

library(ipc) might be of interest.
It allows to set up continuous communication between parent and child R processes by creating so called queues in the parent process.
Unfortunately you haven't specified your usecase with any code, so please see the vignette for examples.

One solution may be to use the session package. With it, you can perform the analogous load.session() and save.session() actions to the start and exit of the standard R launch and exit. The result is not 100 % equivalent, since the q() function is .Internal.

Related

Spawn subprocess in R

I'm trying to spawn a sub-process in R using the subprocess library, as presented in this tutorial. The Problem is that the program I'm trying to launch requires an additional command after the executable.
Example:
I would launch the command from the shell like this:
monetdbd create mydb
where 'create' is the additional command and 'mydb' a parameter.
I tried giving 'create mydb' as parameters in R like this:
handle <- spawn_process('/usr/local/bin/monetdb', c('create mydb'))
However from the output I got with
process_read(handle, PIPE_STDOUT, timeout = 3000)
I conclude that the parameters don't work as I'm getting the info message from monetdb on how to call it, just as if I call only 'monetdb' without the create command from the shell:
Usage: monetdb [options] command [command-options-and-arguments]
The second thing I tried is to include the create command into the path, but this leads to a "No such file and directory" error.
Any hints are appreciated.
MonetDB is the daemon process for MonetDB and has little to do with the (now old) version of MonetDBlite used in R. The latter one is decommissioned from CRAN and a newer version of MonetDBlite is expected to arrive early next year.
Without knowing anything about the package you’re using, and going purely by the documentation, I think you need to separate the command line arguments you pass to the functions:
handle <- spawn_process('/usr/local/bin/monetdb', c('create', 'mydb'))
This also follows the “conventional” API of spawn/fork/exec functions.
In addition, using c(…) is (almost) only necessary when creating a vector of multiple elements. In your code (and in the tutorial) it’s unnecessary around a single character string.
Furthermore, contrary to what the tutorial claims, this functionality is actually already built into R via the system2 and pipe functions (although I don’t doubt that the subprocess package is more feature-complete, and likely easier to use).
But if your ultimate goal is to use MonetDB in R then you’re probably better advised following the other answer, and using dedicated MonetDB R bindings rather than interacting with the daemon binary via subprocess communication.

Problems executing a batch file from R

I am creating a function in R and in one step of the function I run a batch file, this batch file in turn runs a different program which creates files that I then want to read in my function.
I am using shell.exec to run the batch file and it runs fine, the problem is that the next line of my code that wants to read in the output from the program ran by the batch file crashes because it hasn't been created yet.
So I get an error the first time I call my function but if I simply call it again it runs fine. Example of code below: Basically what happens is I get an error message when calling the function saying that .../bat_output.txt does not exist, because the batch file hasn't been run yet, but then when I call the function again, it works fine.
shell.exec("run.bat")
readout<-read.table("bat_output.txt")
Any suggestions?
shell.exec returns immediately while the script is running in the background. The reason bat_output.txt is not found the first time is likely that the script has not finished yet. shell.exec does not give you the ability to wait or any information to determine if the process is still running, so it might not be the best tool for this.
Alternatives:
system("cmd /c run.bat")
system2("cmd", c("/c", "run.bat"))
Realize that if you reference a different path, you might want/need to normalizePath and/or dQuote it going into those commands. (R's system* commands are bad at argument forming.)

How to run R script from command line repeatedly but only load packages the first time

I want to run an R script (in Win 7) from SQL Server 2014 each time a new record is added (to perform some analysis on the data). I saw that this can be done with the xp_cmdshell command which is like running it manually from the command line.
My problems (and questions) are:
I've made out from various websites that probably the best option is to use Rscript. This would have to be used at the command line as:
C:\Program Files\R\R-3.2.3\bin\x64\Rscript "my_file_folder\my_file.r
Can I copy Rscript.exe to the folder where my script is, such that I can run my script independently, even if R is not installed? What other files do I need to copy together with Rscript.exe such that it would work independently?
My script loads some packages that contain functions that it uses. Is there a way to somehow include these in the script such that they don't have to be loaded every time (it takes about 5 sec so far and I need this script to be faster)? Or is there a way to only load these packages the first time that the script runs?
In case the overall approach I've described here is not the best one, I am open to doing it differently. Maybe there is a way to somehow package the R script together with all the required dependencies (libraries and other parts of the R software which the script would need to run independently).
What I ultimately need is a for the script to run silently, and reasonably fast, without any windows or anything else popping up, each time a new record is added to my database, do the analysis and exit.
Thanks in advance for any answers.
UPDATE:
I figured out an elegant solution to running the R script. I'm setting up a job in SQL Server and inside that job I'm using "xp_cmdshell" to run my script as a parameter to Rscript.exe, as detailed at point 1 above. I can start this job from any stored procedure and the beauty of it is that the stored procedure does not wait for the script to finish. It just triggers the job (that runs the script in a separate thread) and then it continues with its business.
But questions from points 1 and 2 still remain.

Can a rscript be aware of other Rscripts in ubuntu

I have a cron job that launches an Rscript to go fetch some data and stash it away in a database. Sometimes the source isn't ready so it waits. Sometimes the source skips a data point so the script ends up waiting until another instance is started from cron. Now that two instances are running they cause problems with each other. Is there something like the following pseudo code that I could put at the top of my scripts so they stop when they see another instance of themselves is already running:
stopifnot(Sys.running('nameofscript.r'))
One thing I thought to do would be for the script to make a temp file with a fixed name at the start and then delete that temp file at the end. This way the script can check for the existence of the file to know if it's already running. I also see there's something called flock which is probably a better solution than that but I'd prefer if there was a way R can do it without bash commands (I've never really learned any bash). If there isn't a pure R way to do this, can the R script call flock on itself or would it only work if the cron task calls a bash script that then calls the rscript. If it isn't already obvious, I don't really know how to use flock.

R - Run source() in background

I want to execute a R script in the background from the R console.
From the console , i usually run R script as source('~/.active-rstudio-document')
I have to wait until the script is completed to go ahead with my rest of work.
Instead of this i want R to be running in the background while i can continue with my work in the console.
Also i should be somehow notified when R completes the source command.
Is this possible in R ?
This might be quite useful as we often sees jobs taking long time.
PS - i want the source script to be running in the same memory space rather than a new one. Hence solutions like fork , system etc wont work for me. I am seeing if i can run the R script as a separate thread and not a separate process.
You can use system() and Rscript to run your script as an asynchronous background process:
system("Rscript -e 'source(\"your-script.R\")'", wait=FALSE)
At the end of your script, you may save your objects with save.image() in order to load them later, and notify of its completion with cat():
...
save.image("script-output.RData")
cat("Script completed\n\n")
Hope this helps!

Resources