Calling R functions from Julia - julia

Is there a convenient way to call R functions from Julia?
If so, what mechanisms for doing so exist? (Potentially ranging from simply calling an R script from the shell & hand-coding the I/O to/from Julia, to interacting with an R environment over multiple Julia calls with Julia DataFrames being seamlessly converted to/from R DataFrames).

Calling R scripts and handcoding I/O is the best way to work with R for the moment. We have functions for reading the RDA binary format that R likes and should add some tools for working with it more easily and also writing data in that format, which will speed up I/O considerably relative to passing CSV files around -- which I've done in the past.
Converting between R and Julia DataFrames could be done, but would be quite costly as Julia isn't using a binary representation of data (e.g. NA) that's nearly equivalent to R's. So you'd need to do some non-trivial work to make this work in a way that would be substantially more efficient than using the RDA binary format.
One thing that would be really nice is to build solid Thrift bindings for both R and Julia and then call back and forth using those bindings.

For calling out to R from within Julia, the RCall package is currently your best bet. For calling out to Julia from within R, try the RJulia package. Both are a bit in the works.

Related

RJulia - sharing large data structures without copying them

I would like to call R functions from within Julia on very large data structures.
I understand that PyCall can share variables between Julia and Python without copying them.
I also understand that as of August 2019 variables still needed to be copied when transferring between R and Julia (https://github.com/Non-Contradiction/JuliaCall/issues/114).
Q1: Is there any way to call R functions in Julia without having to copy data structures?
Q2: Does Apache Arrow work with RJulia?
Q3: If there is a way to transfer variables without copying them, does this apply to all data types? Numerical array, categorical array, time array, table, list, and dictionary?
Thank you in advance

Read C++ binary file in R

Can I read a binary file written by C++ in R?
I have been using Rcpp in my R package and the simulations typically generate a large amount of data. I am planning to write the output to binary files in C++ and then read those back in R. This works if I write as text files but I didn't find a solution with binary files. The program sometimes crashes abruptly if I pass data using many NumericVectors (I am yet to fully understand the memory management using Rcpp).
Can this approach enable me to share larger datasets between C++ and R compared to what is possible by passing vectors? In C++, the maximum vector size is limited by RAM and address bus (may be?) but I think R is able to load larger vectors using swap. Am I correct or misunderstanding the concepts?
Yes you can. But it's "complicated".
You are embarking on a topic called binary serialization. There is a lot of work out there. In essence you are somewhere in the continum between of
minimal: open a file, write out N binary items; then on the other side read N binaries. We did something similar at work years ago where wrote some metadata with <rows,cols,version> and then a binary blob of rows * cols double to attach to a matrix
maximal: use a fully descriptive meta language like Protocol Buffer or MessagePack to describe the binary content, write it in C++ (using the appropriate library) and read in back in R (using the corresponding packages---I am involved with one each: RProtoBuf and RcppMsgPack).
And a lot in between. If you really only need to communicate between C(++) and R you could try the RData / rds format. There is one library: librdata and I experimented with it (and filed some bug reports and made some pull requests). I might start there.
So in short: do some research, figure out what to do and then do it :)
PS If you call C++ via Rcpp from R then you may not need files. We can pass large object back and forth -- the limit may be your RAM.

How to create and use an R function in SAS 9.4?

I have defined some R functions in R studio which has some complicated scripts and a lot of readlines. I can run them successfully in R studio. Is there any way, like macros to transfer these user-defined functions to SAS 9.4 to use? I am not pretty familiar with SAS programming so it is better just copy the R functions into SAS and use it directly. I am trying to figure out how to do the transformation. Thank you!
You can't natively run R code in SAS, and you probably wouldn't want to. R and SAS are entirely different concepts, SAS being closer to a database language while R is a matrix language. Efficient R approaches are terrible in SAS, and vice versa. (Try a simple loop in R and you'll find SAS is orders of magnitude faster; but try matrix algebra in R instead).
You can call R in SAS, though. You need to be in PROC IML, SAS's matrix language (which may be a separate license from your SAS); once there, you use submit / R to submit the code to R. You need the RLANG system option to be set, and you may need some additional details set up on your SAS box to make sure it can see your R installation, and you need R 3.0+. You also need to be running SAS 9.22 or newer.
If you don't have R available through IML, you can use x or call system, if those are enabled and you have access to R through the command line. Alternately, you can run R by hand separately from SAS. Either way you would use a CSV or similar file format to transfer data back and forth.
Finally, I recommend seeing if there's a better approach in SAS for the same problem you solved in R. There usually is, and it's often quite fast.

Sharing large datasets between Matlab and R

I need a relatively efficient way to share data between Matlab and R.
I have checked SaveR and MATLAB R-link, but SaveR formats Matlab's binary data as text strings first and then prints them to an ASCII file, which is not efficient for large datasets, and MATLAB R-link only works on Windows (it uses a COM-based interface).
Update:
Dirk has posted a list of what seem to be better solutions to this problem than SaveR and Matlab R-link. I also learned recently about RAM disks (see here and here for some implementation examples), and thought that they might facilitate the task of sharing large datasets between Matlab and R (or similar computational environments) further. This leads me to the following questions:
Assumming that the data fits in the machines' memory in Matlab's or R's native data containers:
Are any of the solutions listed so
far a better fit for RAM disks?
Are there any additional
considerations to be taken into
account when dealing with RAM disks
instead of with secundary-storage
solutions?
Thanks!
Couple of ideas, and with the caveat that I know more about the R side of things:
Tthe R.matlab package on CRAN can help: This package provides methods to read and write MAT files. It also makes it possible to communicate (evaluate code, send and retrieve objects etc.) with Matlab v6 or higher running locally or on a remote host
HDF5, as you suggested, is a possibility but I heard that the R support in CRAN package hdf5 is somewhat basic
NetCDF may be an alternative; CRAN has packages RNetCDF, ncdf and ncdf4
Use a database, especially a light and file-based one like SQLite or H4 both of which have R support
Use a common serialization / de-serialization format; R has support for Google Protocol Buffers via RProtoBuf and Google points to protobuf-matlab for Matlab
Write your own! Especially when you only need something basic like large rectangular matrices then nothing will beat a direct binary write; I did this once years ago for Octave (which is close to Matlab). You can extend Matab via mex files; R has its API and helpers like Rcpp. The larger your data sets, the more attractive this may look as you save the conversions.
Matlab use HDF5 natively in last versions ("save" and "load"). There is a package for R. Then HDF5 might be a good solution.

Calling Stata Functions from R

Is it possible to call Stata functions from R?
Not directly, i.e. there is no package I am aware of that implements a bridge.
You can always call external programs using system() but that is neither elegant nor efficient. That said, you could prepare data in R, write it out, call Stata and then read the results in; see help(system).
There's now an RStata package on CRAN that bridges R and Stata.
The real problem is that Stata doesn't have an interactive interpreter you can pass arguments to.
Dirk is right; you can just go ahead and write the data to a common format
(if size is large and speed is an issue, fixed width is safe), but you can also just use .dta throughout the process, using read.dta in R and natively reading in Stata.
Also, in R you can call to the system() you can pass a do file or a string containing a bunch of Stata commands.
So, generally, trying to use Stata for this or that task may or may not be worth it, especially if an R equivalent is close by.

Resources