Installing R on Apache Zeppelin - r

I'm trying to install Apache Zeppelin on my old computer that runs Ubuntu. So far, I'm able to install Zeppelin very easily by cloning the latest 0.6.0 snapshot release using
git clone https://github.com/apache/incubator-zeppelin.git
cd incubator-zeppelin
mvn clean package -DskipTests
but I want to have R on Zeppelin. Supposedly, the 0.6.0 snapshot has two R interpreters, but when I run the R tutorial (the pre-made note that uses %r), I get this list of errors.
I followed several guides to try and install R as an interpreter, but each one resulted in some kind of error. I tried this instructional:
http://www.r-bloggers.com/interactive-data-science-with-r-in-apache-zeppelin-notebook/, and got a build failure on "R Interpreter". The error message was
"dependency 'evaluate' is not available for package 'rzeppelin'
* removing '/home/rebecca/Zeppelin-With-R/R/lib/rzeppelin'"
and then a bit lower down
Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:exec (default) on project zeppelin-zrinterpreter: Command execution failed. Process exited with an error: 1
I also tried this Stack Overflow guide: Anyone tried to add R interpreter onto Apache Zeppelin?, and while I was able to run incubator-zeppelin, I received an error when I used either the %spark.r or %r interpreter tags, saying both "interpreter not found" and "prefix not found". Spark doesn't work either after following the first solution, getting the same error mentioned in the second solution (the jar file not being there), and then trying the second solution.
Does anyone have a guide for installing R onto the newest version of Zeppelin? I'm very flexible in the way I can install it. I can run other operating systems onto my computer, and I also have Virtual Box installed on my other computer, which is a mac.

There is currently a bug in the latest HEAD of zeppelin that was recently introduced and prevents the R interpreter from launching cleanly
Did anyone created a Zeppelin Jira Issue for that?
For me it is working on Zeppelin branch-0.6
build Zeppelin with r profile: -DskipTests -Prthis will...
create a directory 'R' in git repo root
copy the 'zeppelin-rinterpreter*.jar' into git_repo_root/interpreter/spark
build Zeppelin with build distro profile: e.g. -DskipTests -Pbuild-distr -Pspark-1.6 -Phadoop-2.6
use zeppelin-distribution/target/zeppelin*.tar.gz for installation
ensure both 1.1 and 1.2 are present in your zeppelin installation

The error you're getting is that you need to have the R package evaluate installed. You can install this simply by launching R and typing install.packages('evaluate').
That said, your excerpt mentions the directory Zeppelin-with-R. That's my repo, which is the R interpreter in the form when it was accepted into Zeppelin. That is version 0.5.6, not 0.6.0. There is currently a bug in the latest HEAD of zeppelin that was recently introduced and prevents the R interpreter from launching cleanly. Your best bet for now is to use the one from my repo and install clean, without trying to pull-in from Zeppelin HEAD.

Related

Installing Julia Packages via GitHub

To preface this question I am coming from the perspective of an absolute beginner, trying to learn Julia. I was recommended to try the SciML tutorial and in trying to install it have fallen at the first hurdle.
So far I have:
Installed Julia
Installed Anaconda
Installed Jupyter Notebook
Added "git" through the Anaconda console
Added "IJulia"
From here I have been able to open Jupyter through Anaconda and access Julia 1.7.3 within it. Following this I attempted to follow the instructions on the SciML website for getting started with the tutorial, which is to enter this code:
using Pkg
pkg"add https://github.com/SciML/SciMLTutorials.jl"
using SciMLTutorials
SciMLTutorials.open_notebooks()
However, the second line is throwing this error
invalid git HEAD (reference 'refs/heads/master' not found)
I have also tried installing the package through the Julia console, but recieve the same error.
I am at a loss of how to proceed - I can't find anything in the documentation that suggests why this may be or how to proceed - since I am so new to this I suspect I may have missed a step in my installation process, but can't figure out where that may be.

Unable to configure rHive for HDP2.4 Sandbox

Problem
I have hortonworks sandbox 2.4 on virtual box. I am following the tutorial found here for installing rHive on the sandbox. I am unable to duplicate steps 4 and 6 without producing errors. They are as follows:
Step 4 Error: Being in the ~/RHive/ directory and using ant build gives me the following error:
BUILD FAILD /root/RHive/build.xml:39: /root/RHive/usr/hdp/current/hive-server2/lib does not exist
Step 6 Error: Using R CMD INSTALL RHive_2.0-0.10.tar.gz produces the following error:
ERROR: dependencies "rJava", "Rserve" are not available for package "RHive"
Attempts
I have followed the directions as specified on here as well, which is the Rhive documents, but unfortunately have gotten nowhere.
Step 4: I am at a loss as to why ant build could be failing. I have verified I installed it correctly and it states Apache Ant(TM) version 1.9.7 compiled April 9 2016 when I run command ant -version. So I followed those procedures correctly.
Step 6:
I used rStudio to install rJava and Rserve using install.package() command. Indeed, the tutorial suggests this as well. I suspect something is wrong with my java dependencies?
I have used Ambari to use Hive before, but this is the first time I am trying to use it in R and I am abviously still new to the hortonworks vm, so I would appreciate any kindness and assistance to help me fix the issues I am encountering.
I think the answer is to abandon rHive. It was yanked from CRAN. Anyone thinking about using it with R and finds this post, please consider RJDBC.

How to reestablish the default library of an R project after updating Ubuntu?

I´m developing an R package in RStudio and a set a local library to contain all my packages. After installing some updates in my Ubuntu system, it seems that my R Project have lost track of the local library and is unable to load the libraries that were associated with it. If I try to Build & Reload the package with
R CMD INSTALL --no-multiarch --with-keep.source mypackage
The program tries to install to library '/usr/local/lib/R/site-library/' which is rejected with:
ERROR: No permission to install to directory '/usr/local/lib/R/site-library/'
As far as I remember, whenever I rebuilt my package, that line pointed to my local directory, where all my libraries were localted
installing to library ‘/home/user/R/x86_64-pc-linux-gnu-library/3.2’
It is clear that somehow R have lost track of the connection between the project and the libraries.
I tried re-including the path with
.LibPaths("/home/user/R/x86_64-pc-linux-gnu-library/3.2")
but, just after I rebuild the package again, the program created a 3.3 directory in x86_64-pc-linux-gnu-library. From there, it is unable to find the libraries that are associated to my program and throws another error:
ERROR: dependencies '...', '...' are not available for package "mypackage"
Is there a way to restore the program to the way it was before so I don't have to reinstall everything and start from scratch?
By default, R adds the major-minor version numbers to the library path (?.libPaths) for a good reason, assuming that the jump from 3.2 to 3.3 introduced efficiencies or incompatibilities. It is implied that this version jump requires new installation of packages.
If you override this, packages assembled in 3.2 may not always play nicely in 3.3. (I'm not going to test this theory, please report back if you can disprove this statement, I'm honestly interested!)
BTW: your call to .libPaths seems suspect: I don't know of a capital-L version, and when calling it you should include the previous path (unless you truly mean to omit the system R library paths entirely), such as:
.libPaths("/home/user/R/x86_64-pc-linux-gnu-library/3.2", .libPaths())
If you choose to do that, any bugs you may find in others' packages are possibly due to that incompatibility and should not necessarily be reported to developers.
Another option would be to re-install all packages from your 3.2 installation into your 3.3 library path. Something like this should help automate the process:
# to reinstall packages installed in R-3.2 subdir into R-3.3
install.packages(list.files(path = "~/R/x86_64-pc-linux-gnu-library/3.2"))

specifying R library path for RKernel in Anaconda Jupyter notebook

First let me preface this with the disclaimer that I'm new to R, but a longtime Python power user. Given that I love the conda ecosystem and the Jupyter notebook, I'm trying to set them up as my R development environment as well.
So using the instructions at: https://www.continuum.io/blog/developer/jupyter-and-conda-r I've set up a Jupyter Notbook that using an RKernel that should be hitting the installation of R installed in my Anaconda folder (I would think anyway).
Getting it setup was easy peasy and everything is working great for standard R stuff but my analysis requires some R libraries that are not available in r-essentials channel. No problem, I think I know how to install an R library. I go to "C:\Anaconda\R\bin\x64\Rgui.exe" and install rgdal, dismo, and some other packages. To check my work I looked in C:\Anaconda\R\library and there they are.
But when I run a jupyter notebook from the Anaconda command prompt. And start a new R notebook I get a "Error in library(dismo): there is no package called 'dismo'" Wait a sec, I run a ".libPaths()" from the notebook and it looks like its pointing
You can add .libPaths('path_where_your_packages_are') in a code cell at the beginning of your notebook to tell jupyter where your packages are. For me that was .libPaths('~/R/win-library/3.2') (work-around from discnerd who filed this issue on github).
To find out the path to your packages, just install a random package in R and wait for the location to be printed to the console.
More details (likely specific to my system/installations): When running .libPaths() in R, I got 2 locations: one for which admin rights were required for writing, and one for which admin rights were not required for writing. While packages installed through R land in the location where admin rights are not required, jupyter looks at the location where admin rights are required.
You can find out the path to your library with installed.packages()

Rscript pointing to incorrect R version in local build

I recently installed a local version of R 3.1.0 on a Linux Redhat server as follows:
# from R-3.1.0 directory
./configure --prefix=$(pwd)
make
make install
In addition, I've updated PATH and R_LIBS in my .bashrc. If I run path/to/local/R/bin/Rscript --version, then it returns the proper version number. However, if I give it a test script that prints sessionInfo it yields information from the system-wide R installation.
Is there any more I need to do to run the local version of R using Rscript? From reading a similar issue here, it looks like the code above should be all that's necessary. There's a similar SO issue here, but it's unresolved.
Edit:
I just fired up Ubuntu in a VM and was able to install R locally and run Rscript without problem (using the same commands listed above). Have I gone crazy? Is there anything that might be floating around the Redhat environment on this server that might mess up the installation? Sanity checks?

Resources