Calling R workbook in Databricks - r

Let's say I create a basic function in R:
Addn <- function(X,n)
{
X + n
}
and this is saved to a Databricks workbook; some filepath: "/shared/x/y/z/Addnfunction"
In RStudio, I would typically call that function from another script by writing something like:
source("/shared/x/y/z/Addnfuntion.r")
If I open a new databricks notebook and want to call the above function (example, a shared team function) and use the "source" methodology I just get an error in regards to the function/connection.
Is there a best practice for leverage shared functions/scripts in R for databricks?

Actually this was pretty straightforward:
%run "../z/Addnfunction""

Related

What's the most simple approach to name-spacing R files with `file::function`

Criteria for answer to this question
Given the following function (within its own script)
# something.R
hello <- function(x){
paste0("hello ", x)
}
What is the most minimal amount of setup which will enable the following
library(something)
x <- something::hello('Sue')
# x now has value: "hello Sue"
Context
In python it's very simple to have a directory containing some code, and utilise it as
# here foo is a directory
from foo import bar
bar( ... )
I'm not sure how to do something similar in R though.
I'm aware there's source(file.R), but this puts everything into the global namespace. I'm also aware that there's library(package) which provides package::function. What I'm not sure about is whether there's a simple approach to using this namespacing within R. The packaging tutorials that I've searched for seem to be quite involved (in comparison to Python).
I don't know if there is a real benefit in creating a namespace just for one quick function. It is just not the way it is supposed to be (I think).
But anyway here is a rather minimalistic solution:
First install once: install.packages("namespace")
The function you wanted to call in the namespace:
hello <- function(x){
paste0("hello ", x)
}
Creating your namespace, assigning the function and exporting
ns <- namespace::makeNamespace("newspace")
assign("hello",hello ,env = ns)
base::namespaceExport(ns, ls(ns))
Now you can call your function with your new namespace
newspace::hello("you")
Here's the quickest workflow I know to produce a package, using RStudio. The default package already contains a hello function, that I overwrote with your code.
Notice there was also a box "create package based on source files", which I didn't use but you might.
A package done this way will contain exported undocumented untested functions.
If you want to learn how to document, export or not, write tests and run checks, include other objects than functions, include compiled code, share on github, share on CRAN.. This book describes the workflow used by thousands of users, and is designed so you can usually read sections independently.
If you don't want to do it from GUI you can useutils::package.skeleton() to build a package folder, and remotes::install_local() to install it :
Reproducible setup
# create a file containing function definition
# where your current function is located
function_path <- tempfile(fileext = ".R")
cat('
hello <- function(x){
paste0("hello ", x)
}
', file = function_path)
# where you store your package code
package_path <- tempdir()
Solution :
# create package directory at given location
package.skeleton("something", code_file = file_path, path = package_path)
# remove sample doc to make remotes::install_local happy
unlink(file.path(package_path, "something", "man/"), TRUE)
# install package
remotes::install_local(file.path(package_path, "something"))

How do we set constant variables while building R packages?

We are building a package in R for our service (a robo-advisor here in Brazil) and we send requests all the time to our external API inside our functions.
As it is the first time we build a package we have some questions. :(
When we will use our package to run some scripts we will need some information as api_path, login, password.
How do we place this information inside our package?
Here is a real example:
get_asset_daily <- function(asset_id) {
api_path <- "https://api.verios.com.br"
url <- paste0(api_path, "/assets/", asset_id, "/dailies?asc=d")
data <- fromJSON(url)
data
}
Sometimes we use a staging version of the API and we have to constantly switch paths. How we should call it inside our function?
Should we set a global environment variable, a package environment variable, just define api_path in our scripts or a package config file?
How do we do that?
Thanks for your help in advance.
Ana
One approach would be to use R's options interface. Create a file zzz.r in the R directory (this is the customary name for this file) with the following:
.onLoad <- function(libname, pkgname) {
options(api_path='...', username='name', password='pwd')
}
This will set these options when the package is loaded into memory.

How to catch R background code in rmr map reduce in Rhadoop

I am new in R Hadoop. I am able to run map reduce function of rmr package with Hadoop. Basically in background R runs this map reduce code in Java. It means R converts this R map reduce code in Java, So can I get the java background code when running map reduce.
Can anyone help me?
In Rhadoop, R is not converting R Map Reduce code to java.Rhadoop provides MapReduce interface; mapper and reducer can be described in R code and then called from R.
Rhadoop package will submit R code to Hadoop Cluster using Hadoop
streaming.Hadoop streaming is a utility that comes with the Hadoop
distribution. The utility allows you to create and run Map/Reduce jobs
with any executable or script as the mapper and/or the reducer.
You can understand about this by going throug Rhadoop package code in GitHub.
The RHadoop package submit the hadoop streaming job by using System command in R.
You can get an idea about this from this R scipt in RMR package.The code in that streaming.R is as given below.
final.command =
paste(
hadoop.command,
stream.mapred.io,
if(is.null(backend.parameters)) ""
else
do.call(paste.options, backend.parameters),
input,
output,
mapper,
combiner,
reducer,
image.cmd.line,
m.fl,
r.fl,
c.fl,
input.format.opt,
output.format.opt,
"2>&1")
if(verbose) {
retval = system(final.command)
if (retval != 0) stop("hadoop streaming failed with error code ", retval, "\n")}
else {
console.output = tryCatch(system(final.command, intern=TRUE),
warning = function(e) stop(e))
0}}

R parallel computing with snowfall - writing to files from separate workers

I am using the snowfall 1.84 package for parallel computing and would like each worker to write data to its own separate file during the computation. Is this possible ? if so how ?
I am using the "SOCK" type connection e.g., sfInit( parallel=TRUE, ...,type="SOCK" ) and would like the code to be platform independent (unix/windows).
I know it is possible to Use the "slaveOutfile" option in sfInit to define a file where to write the log files. But this is intended for debugging purposes and all slaves/workers must use the same file. I need each worker to have its OWN output file !!!
The data i need to write are large dataframes, and NOT simple diagnostic messages. These dataframes need be output by the slaves and could not be sent back to the master process.
Anyone knows how i can get this done?
Thanks
A simple solution is to use sfClusterApply to execute a function that opens a different file on each of the workers, assigning the resulting file object to a global variable so you can write to it in subsequent parallel operations:
library(snowfall)
nworkers <- 3
sfInit(parallel=TRUE, cpus=nworkers, type='SOCK')
workerinit <- function(datfile) {
fobj <<- file(datfile, 'w')
NULL
}
sfClusterApply(sprintf('worker_%02d.dat', seq_len(nworkers)), workerinit)
work <- function(i) {
write.csv(data.frame(x=1:3, i=i), file=fobj)
i
}
sfLapply(1:10, work)
sfStop()

import R forecast library JAR files into java

I am trying to import the R package 'forecast; in netbeans to use its functions. I have managed to make the JRI connection and also to import the javaGD library and experimented with it with a certain success. The problem about the forecasting package is that I cannot find the corresponding JAR files so to include them as a library in my project. I am loading it normally : re.eval(library(forecast)), but when I implement one of the library's function, a null value is returned. Although I am quite sure that the code is correct I am posting it just in case.
tnx in advance
Rengine re = new Rengine(Rargs, false, null);
System.out.println("rengine created, waiting for R!");
if(!re.waitForR())
{
System.out.println("cannot load R");
return;
}
re.eval("library(forecast)");
re.eval("library(tseries)");
re.eval("myData <- read.csv('C:/.../I-35E-NB_1.csv', header=F, dec='.', sep=',')");
System.out.println(re.eval("myData"));
re.eval("timeSeries <- ts(myData,start=1,frequency=24)");
System.out.println("this is time series object : " + re.eval("timeSeries"));
re.eval("fitModel <- auto.arima(timeSeries)");
REXP fc = re.eval("forecast(fitModel, n=20)");
System.out.println("this is the forecast output values: " + fc);
You did not convert values from R into java, you should first create a numerical vector of auto.arima output in R, and then use the method .asDoubleArray() to read it into java.
I gave a complete example in [here] How I can load add-on R libraries into JRI and execute from Java? , that shows exactly How to use the auto.arima function in Java using JRI.

Resources