I want to call R function in scala script on databricks. Is there anyway that we can do it?
I use
JVMR_JAR=$(R --slave -e 'library("jvmr"); cat(.jvmr.jar)')
scalac -cp "$JVMR_JAR"
scala -cp ".:$JVMR_JAR"
on my mac and it automatically open a scala which can call R functions.
Is there any way I can do similar stuff on databricks?
On the DataBricks Cloud, you can use the sbt-databricks to deploy external libraries to the cloud and attach them to specific clusters, which are two necessary steps to make sure jvmr is available to the machines you're calling this on.
See the plugin's github README and the blog post.
If those resources don't suffice, perhaps you should ask your questions to Databricks' support.
If you want to call an R function in the scala notebook, you can use the %r shortcut.
df.registerTempTable("temp_table_scores")
Create a new cell, then use:
%r
scores <- table(sqlContext, "temp_table_scores")
local_df <- collect(scores)
someFunc(local_df)
If you want to pass the data back into the environment, you can save it to S3 or register it as a temporary table.
Related
We are using a few R libraries in Azure Databricks which do not come preinstalled. To install these libraries during Job Runs on Job Clusters, we use an init script to install them.
sudo R --vanilla -e 'install.packages("package_name",
repos="https://mran.microsoft.com/snapsot/YYYY-MM-DD")'
During one of our production runs, the Microsoft Server was down (could the timing be any worse?) and the job failed.
As a workaround, we now install libraries in /dbfs/folder_x and when we want to use them, we include the following block in our R code:
.libpaths('/dbfs/folder_x')
library("libraryName")
This does work for us, but what is the ideal solution to this? Since, if we want to update a library to another version, remove a library or add one, we have to go through the following steps everytime and there is a chance of forgetting this during code promotions:
install.packages("xyz")
system("cp -R /databricks/spark/R/lib/xyz /dbfs/folder_x/xyz")
It is a very simple and workable solution, but not ideal.
I can only find information on how to install a ready-made R extension package, but it is nowhere mentioned which commands a developer of an extension package has to use during daily development. I am using Rcpp and I am on Windows.
If this were a typical C++ project, it would go like this:
edit
make # oops, typo
edit # fix typo
make # oops, forgot an #include
edit
make # good; updates header dependencies for subsequent 'make' automatically
./fooreader # test it
make install # only now I'm ready
Which commands do I need for daily development of an Rcpp package project?
I've allocated a skeleton project using these commands from the R command line:
library(Rcpp)
Rcpp.package.skeleton("FooReader", example_code=FALSE,
author="My Name", email="my.email#example.com")
This allocated 3 files:
DESCRIPTION
NAMESPACE
man/FooReader-package.Rd
Now I dropped source code into
src/readfoo.cpp
with these contents:
#include <Rcpp.h>
#error here
I know I can run this from the R command line:
Rcpp::sourceCpp("D:/Projects/FooReader/src/readfoo.cpp")
(this does run the compiler and indicates the #error).
But I want to develop a package ultimately.
There is no universal answer for everybody, I guess.
For some people, RStudio is everything, and with some reason. One can use the package creation facility to create an Rcpp package, then edit and just hit the buttons (or keyboard shortcuts) to compile and re-load and test.
I also work a lot on a shell, so I do a fair amount of editing in Emacs/ESS along with R CMD INSTALL (where thanks to ccache recompilation of unchanged code is immediate) with command-line use via r of the littler package -- this allows me to write compact expressions loading the new package and evaluating: r -lnewpackage -esomeFunc(somearg) to test newpackage::someFunc() with somearg.
You can also launch the build and test from Emacs. As I said, it all depends.
Both those answers are for package, where I do real work. When I just test something in a single file, I do that in one Emacs buffer and sourceCpp() in an R session in another buffer of the same Emacs. Or sometimes I edit in Emacs and run sourceCpp() in RStudio.
There is no one answer. Find what works for you.
Also, the first part of your question describes the initial setup of a package. That is not part of the edit/compile/link/test cycle as it is a one off. And for that too do we have different approaches many of which have been discussed here.
Edit: The other main misunderstanding of your question is that once you have package you generally do not use sourceCpp() anymore.
In order to test an R package, it has to be installed into a (temporary) library such that it can be attached to a running R process. So you will typically need:
R CMD build . to build package_version.tar.gz
R CMD check <package_version.tar.gz> to test your package, including tests placed into the testsfolder
R CMD INSTALL <package_version.tar.gz> to install it into a library
After that you can attach the package and test it. Quite often I try to use a more TTD approach, which means I do not have to INSTALL the package. Running the unit tests (e.g. via R CMD check) is enough.
All that is independent of Rcpp. For a package using Rcpp you need to call Rcpp::compileAttributes() before these steps, e.g. with Rscript -e 'Rcpp::compileAttributes()'.
If you use RStudio for package development, it offers a lot of automation via the devtools package. I still find it useful to know what has to go on under the hood and it is by no means required.
I want to run an R command from command line (actually, from within a Makefile). The command is roxygen2::roxygenise(), if it is relevant. I don't want to create a new file and run that as a script - that will just clutter my directory.
In python, this is simple - you write python -c "import antigravity".
I use the Makefile to build, install and test a (Rcpp) package I'm working on.
This is generally done with so 'shebang scripts'.
Historically, littler was there first, about a decade or so ago. It is still widely used, and contains a number of helper scripts as for example roxy.r which does just what you desire: run roxygen2::roxygenize(). I use this all the time.
Next, Rscript started to ship with R. It is similar to littler but automatically available whereever R is which is a plus. On the minus side, it starts slower, and fails to load the methods package which is a source of a number of bug reports and SO questions.
Much more recently, R itself added the ability to run expressions following the -e ... switch.
So you have plenty of choices. You can also study plenty of src/Makevars files many of which use Rscript.
I am interested in providing a command line interface to an R package called Slidify that I am authoring. It uses Rscript and I think that would make it cross-platform. The scripts are stored in the subdirectory inst/slidify. In order to use the script from any directory, I added its path to my .bash_profile as I am on a Mac.
My question is
How should I handle installation of the script in an automated cross-platform way?
How can I make sure that the file permissions are retained in this process?
What should the shebang line for the script be? I am currently using
#!/usr/bin/Rscript --vanilla --slave
I would appreciate pointers on how to handle this and any examples of R packages that already do it. Just to make sure, I am clear on how this would work, a user would be able to generate a slide deck from slides.Rmd by just running slidify generate slides.Rmd from the command line.
UPDATE:
Here is how I install it on a Mac from the command line. I use the excellent sub library by 37 signals to create the scripts.
echo "$(path/to/clidir/slidify init -)" >> ~/.bash_profile
exec bash
Two follow up questions
Can I package these commands into an R function install_slidify_cli?
How can I mirror these commands for Windows users?
Lovin' slidify so would be glad to help.
But in short, you can't.
R packages simply cannot install outside of $R_HOME or the chosen library folder. Ship the script in the package, and tell users to copy it. If there was a better way, out littler package with predecessor / alternative to Rscript would long have used it, and roxygen / roxygen2 would also have shipped something.
I am interested in providing a command line interface to an R package called Slidify that I am authoring. It uses Rscript and I think that would make it cross-platform. The scripts are stored in the subdirectory inst/slidify. In order to use the script from any directory, I added its path to my .bash_profile as I am on a Mac.
My question is
How should I handle installation of the script in an automated cross-platform way?
How can I make sure that the file permissions are retained in this process?
What should the shebang line for the script be? I am currently using
#!/usr/bin/Rscript --vanilla --slave
I would appreciate pointers on how to handle this and any examples of R packages that already do it. Just to make sure, I am clear on how this would work, a user would be able to generate a slide deck from slides.Rmd by just running slidify generate slides.Rmd from the command line.
UPDATE:
Here is how I install it on a Mac from the command line. I use the excellent sub library by 37 signals to create the scripts.
echo "$(path/to/clidir/slidify init -)" >> ~/.bash_profile
exec bash
Two follow up questions
Can I package these commands into an R function install_slidify_cli?
How can I mirror these commands for Windows users?
Lovin' slidify so would be glad to help.
But in short, you can't.
R packages simply cannot install outside of $R_HOME or the chosen library folder. Ship the script in the package, and tell users to copy it. If there was a better way, out littler package with predecessor / alternative to Rscript would long have used it, and roxygen / roxygen2 would also have shipped something.