Running R scripts in Airflow? - r

Is it possible to run an R script as an airflow dag? I have tried looking online for documentation on this and am unable to do so. Thanks

There doesn't seem to be a R Operator right now.
You could either write your own and contribute to the community or simply run your task as a BashOperator calling RScript.

Another option is to containerize your R script and run it using the DockerOperator, which is included in the standard distribution. This removes the need to have your worker nodes configured with the correct version of R and any needed R libraries.

USe BashOperator for executing R scripts.
For example:
opr_hello = BashOperator(task_id='xyz',bash_command='Rscript Pathtofile/file.r')

There is a pull request open for an R operator, still waiting for it to be incorporated.
https://github.com/apache/incubator-airflow/pull/3115/files

Related

Spawn subprocess in R

I'm trying to spawn a sub-process in R using the subprocess library, as presented in this tutorial. The Problem is that the program I'm trying to launch requires an additional command after the executable.
Example:
I would launch the command from the shell like this:
monetdbd create mydb
where 'create' is the additional command and 'mydb' a parameter.
I tried giving 'create mydb' as parameters in R like this:
handle <- spawn_process('/usr/local/bin/monetdb', c('create mydb'))
However from the output I got with
process_read(handle, PIPE_STDOUT, timeout = 3000)
I conclude that the parameters don't work as I'm getting the info message from monetdb on how to call it, just as if I call only 'monetdb' without the create command from the shell:
Usage: monetdb [options] command [command-options-and-arguments]
The second thing I tried is to include the create command into the path, but this leads to a "No such file and directory" error.
Any hints are appreciated.
MonetDB is the daemon process for MonetDB and has little to do with the (now old) version of MonetDBlite used in R. The latter one is decommissioned from CRAN and a newer version of MonetDBlite is expected to arrive early next year.
Without knowing anything about the package you’re using, and going purely by the documentation, I think you need to separate the command line arguments you pass to the functions:
handle <- spawn_process('/usr/local/bin/monetdb', c('create', 'mydb'))
This also follows the “conventional” API of spawn/fork/exec functions.
In addition, using c(…) is (almost) only necessary when creating a vector of multiple elements. In your code (and in the tutorial) it’s unnecessary around a single character string.
Furthermore, contrary to what the tutorial claims, this functionality is actually already built into R via the system2 and pipe functions (although I don’t doubt that the subprocess package is more feature-complete, and likely easier to use).
But if your ultimate goal is to use MonetDB in R then you’re probably better advised following the other answer, and using dedicated MonetDB R bindings rather than interacting with the daemon binary via subprocess communication.

Scheduling R Script - OSX

I have written a series of R Scripts that create csv files. From there, Tableau will read the csv's and update various dashboards. As Tableau can easily be scheduled to update on a daily cadence, I was hoping to do the same with my R Script.
While there are a bunch of answers already with solutions for Windows, there hasn't been a solution posted for OSX. I have looked into trying to run my script in Terminal and use automator to do it, but couldn't quite figure it out. Basically, when the shell script runs it terminates midway through because there are errors in the R Script - but I do not care about the errors. The Automator didn't work as well.
Additionally, I also looked into Data Integration/Pentaho but the additional software configuration and subsequent installation seemed difficult.
Any help or insight would be greatly appreciated! Thanks!
Type crontab -e and add this line to the resulting file
#daily Rscript 1.R && Rscript 2.R
It will run the files 1.R, followed by 2.R at midnight every day. Hope that helps.
The most flexible way to do this is to use launchd, the service that manages processes on OS X. You can look at some examples in the official documentation.

is it possible to run R as a daemon

I have a script in R that is frequently called during the day (by other scripts). I call R in a terminal using
Rscript code.R
I notice it takes a lot of time to load packages and set up R.
Is it possible to run R as a background service which I hit using a port or something?
Yes, look into RServe which has been available for over a dozen years for this reason. There are a couple of fairly high profile applications too.
You can check out this add-in for Rstudio, it is not a port like solution but maybe it can help you https://github.com/bnosac/taskscheduleR

Running two instances of Rstudio simultaneously on Linux

I've got a lengthy process running in Rstudio and I would like to open a separate session of Rstudio while the first one is running. I know I can run R from the command line to get as many sessions as I want, but I wanted to know if it is possible for me to do this in Rstudio on a Linux computer. Thanks.
#infominer suggested a good solution, which is to simply type rstudio in the command line. That's what I ended up doing
Another convenient way to deal with this is to start a seperate R-instance in the terminal by typing simply
R
and from there just run the script that has a lengthy process with
source("path-to-your-script/your-script.R")
you can than continue to edit and work with your two scripts in the already opened R-Studio editor window.

Call R scripts in Matlab

Is it possible to call R scripts in a MATLAB program? How can I do that?
You can use R in batch mode. If R is in your path, then you can call from MATLAB:
system('R CMD BATCH infile outfile');
will run the code in infile and place output in the outfile.
EDIT:
You can also give it a try with another approach using a R package rscproxy and R(D)COM Server, described here.
After using R(D)COM and Matlab R-link for a while, I do not recommend it. The COM interface has trouble parsing many commands and it is difficult to debug the code. I recommend using a system command from Matlab as described in the
R Wiki.
system is almost definitely the way to go, as described in other answers. For completeness, you could also use MATLAB's capability to run Java code, and JRI or RCaller to call R from Java. Similarly, you can use MATLAB's capability for running .NET code and R.NET.
Yes. On Windows, I have done a lot of this via the Matlab R-link and then R(D)COM server on the R side.
It works beautifully for passing commands and data back and forth. Calling R via the OS is feasible, but then you have to deparse (write) and parse (load) data passed between them. This is tedious and no fun. Especially if you are much data around. It also means that you lose state on the R side and every invocation is just like the first time.
On Linux or another OS, or even for more general usage, I'd now try Rstudio as a server -- see http://www.rstudio.org/docs/server/getting_started for more info.
Another way RWiki recommended:
CurrentDirectory=strrep(pwd,'\','/');
eval(['!C:\R\R-3.0.1\bin/Rscript "' CurrentDirectory '/Commands.R"'])
You can run command line functions in matlab using the unix command. The easiest way would probably be to set up an R script which outputs results to a text file, run the script in matlab using the unix command, and then (in matlab) verify that the file exists and load it up.
You could use the system command to execute R scripts. Something like the following:
[status] = system('R CMD BATCH [options] script.R [outfile]')
where [options] are the options your send to the R interpreter, and [outfile] is your output file.

Resources