R to Hive connection using RHive Package - r

I installed R 2.15.2 on Windows PC.
Hadoop & Hive are on another PC.
I loaded RHive and its dependencies in to R.
Now i am trying to connect to Hive.
> Sys.setenv(HIVE_HOME="/home/hadoop/hive-0.7.0-cdh3u0")
> Sys.setenv(HADOOP_HOME="/home/hadoop/hadoop-0.20.2-cdh3u0")
> library(RHive)
> rhive.env(ALL=TRUE)
Hive Home Directory : /home/hadoop/hive-0.7.0-cdh3u0
Hadoop Home Directory : /home/hadoop/hive-0.7.0-cdh3u0
Hadoop Conf Directory :
No RServe
Disconnected HiveServer and HDFS
RHive Library List
C:/Program Files/R/R-2.15.2/library/RHive/java/rhive_udf.jar /home/hadoop/hive-0.7.0-cdh3u0/conf
> rhive.init()
[1] "there is no slaves file of HADOOP. so you should pass hosts argument when you call rhive.connect()."
Error in .jnew("org/apache/hadoop/conf/Configuration") :
java.lang.ClassNotFoundException
In addition: Warning message:
In file(file, "rt") :
cannot open file '/home/hadoop/hadoop-0.20.2-cdh3u0/conf/slaves': No such file or directory
> rhive.connect(hdfsurl="hdfs://212.63.135.149:9000/")
Error in .jnew("org/apache/hadoop/conf/Configuration") :
java.lang.ClassNotFoundException
The result is error in connection!
even tried
rhive.connect(host = "212.63.135.149", port = 10000, hdfsurl="hdfs://212.63.135.149:9000/") , but no use.

I had the same problem a few weeks ago when installing RHive. It is because some jar files are not in the classpath which is set in rhive.init.
You need to set the arguments hive, libs, hadoop_home, hadoop_conf, hlibs which indicate where these jar files are located.
I first installed from source, that worked with rhive.init but rhive.connect did not work properly. It did work like a charm when I installed Hive through the Cloudera manager https://ccp.cloudera.com/display/CDH4DOC/Hive+Installation. So I advise you to follow the instructions there, it is well documented.

Probably, it is because wrong Hadoop version you use.
RHive does not work with YARN, then use hadoop-0.20.205.0 or earlier.

I fixed it with fixing rhive_udf.jar classpath (found in RHive source directory, after build)
mkdir –p /usr/lib64/R/library/RHive/java
cp rhive_udf.jar //usr/lib64/R/library/RHive/java
chmod 755 /usr/lib64/R/library/RHive/java/rhive_udf.jar
R
> library("rJava")
> .jinit()
> .jaddClassPath("/usr/lib64/R/library/RHive/java/rhive_udf.jar")
Then test newly added classpath with :
> .jclassPath()
You should see '/usr/lib64/R/library/RHive/java/rhive_udf.jar' in a list!
Then restart R - and here you go!

Related

How to add a search path to R?

In this tutorial, there is a command pymol.dccm(cij, pdb, type="launch"). But I was told
> pymol.dccm(cij, pdb, type="launch")
Error in pymol.dccm(cij, pdb, type = "launch") :
Launching external program failed
make sure 'C:/python27/PyMOL/pymol.exe' is in your search path
In addition: Warning message:
running command 'C:/python27/PyMOL/pymol.exe -cq' had status 127
I already have pymol installed on my PC. Can I ask how to add another search path to R?
Now I think pymol is a sub-package in bio3d. But I already installed bio3d and other commands can work (e.g. pdb <- read.pdb()). But why the pymol command could not work?
I tried
> .libPaths("path/to/pymol2/")
> .libPaths("path/to/pymol2/PyMOL")
> .libPaths("path/to/pymol2/PyMOL/PyMOLWin.exe")
> pymol.dccm(cij, pdb, type="launch")
Error in pymol.dccm(cij, pdb, type = "launch") :
Launching external program failed
make sure 'C:/python27/PyMOL/pymol.exe' is in your search path
In addition: Warning message:
running command 'C:/python27/PyMOL/pymol.exe -cq' had status 127
> PyMOLWin.dccm(cij, pdb, type="launch")
Error: could not find function "PyMOLWin.dccm"
So the .libPaths did not return error. But pymol.dccm and PyMOLWin.dccm did not work.
I also tried to install pymol package in R
> install.packages("pymol")
Warning in install.packages :
package ‘pymol’ is not available (for R version 3.2.2)
There's a mistake in the tutorial command itself. The correct syntax for dccm is
pymol(cij, pdb, type="launch",exefile="C:/Program Files/pymol")
where exefile = file path to the ‘PYMOL’ program on your system (i.e. how is ‘PYMOL’ invoked). If NULL, use OS-dependent default path to the program.
Try the following code, it worked perfectly for me:
pymol(cm, pdb.open, type="launch", exefile="%userprofile%/PyMOL/PyMOLWin.exe")
.libPaths("path/to/package/library") probably does what you need.
.libPaths gets/sets the library trees within which packages are looked for.
Set the path to the parent directory of the directory with the package name rather than the package directory itself.

Connecting to Spark with Sparklyr gives Permission Denied Error

After installing sparklyr package I followed the instruction here ( http://spark.rstudio.com/ ) to connect to spark. But faced with this error. Am I doing something wrong. Please help me.
sc = spark_connect( master = 'local' )
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file 'C:\Users\USER\AppData\Local\Temp\RtmpYb3dq4\fileff47b3411ae_spark.log':
Permission denied
But I am able to find the file at the stated location. And on opening, I found it to be empty.I
First of all, did you install sparklyr from github devtools::install_github("rstudio/sparklyr") or CRAN?
There were some issues some time ago with Windows installations.
The issue you have seems to be related to TEMP and TMP folder level permission on Windows or to file creation permission. Every time you start sc <- spark_connect(), it tries to create a folder and file to write the log files.
Make sure you have a write access to these locations.
I could observe the same error message with version 2.4.3 and 2.4.4
in different cases:
When trying to connect to a non "local" master, using spark_connect(master="spark://192.168.0.12:7077", ..),
if the master is not started or not responding at the specified master url.
when setting a specific incomplete configuration
in my case trying to set dynamicAllocation to true, without other required dynamicAllocation settings:
conf <- spark_config()
conf$spark.dynamicAllocation.enabled <- "true"

R set up local package repository

I'm trying to set up a local repository on my local network, following these instructions : Creating a local R package repository.
And I'm getting the same problem as the one described in the "Update" paragraph of : install.packages errors: Troubleshooting local repo usage,
(Even if the question is marked as solved, there is no available solution provided)
I've put my package into the folder:
"S:/outils/packages R/bin/windows/contrib/3.2"
As per Dirk's instructions in this SO, I've run the following commands:
setwd("S:/outils/packages R/bin/windows/contrib/3.2")
tools::write_PACKAGES(".", type="win.binary")
list.files()
[1] "BayesTree_0.3-1.3.zip" "Epi_2.0.zip" "PACKAGES" "PACKAGES.gz"
I ran the following command in order to point to my new local repository. I copied and pasted this command from the repo paragraph of the options help page :
local({r <- getOption("repos"); r["CRAN"] <- 'file://S:/Outils/packages R'; options(repos = r)})
And when I try to install some packages I get an error :
install.packages("Epi")
Warning in install.packages :
cannot open compressed file 'S:/Outils/packages R/src/contrib/PACKAGES', probable reason 'No such file or directory'
Error in install.packages : cannot open the connection
I tried to put the "PACKAGES" and the "PACKAGES.gz" files into the S:/outils/packages R/bin/windows/contrib/folder or into S:/outils/packages R/, without any success.
If you read to the bottom of Dirk's answer, you'll see he mentions the wonderful drat package. This takes care of the pain you're experiencing and creates the PACKAGES.gz file and associated paths pain free.
As a bonus, drat ties in with github.io free of charge.

R is not connecting to HDFS

Why is R not connecting to Hadoop ?
I am using R to connect to HDFS using 'rhdfs' package. The 'rJava' package is installed and rhdfs package is loaded.
The HADOOP_CMD environment variable is set in R using:
Sys.setenv(HADOOP_CMD='/usr/local/hadoop/bin')
But when hdfs.init() function is given, the following error message is generated:
sh: 1: /usr/local/hadoop/bin: Permission denied
Error in .jnew("org/apache/hadoop/conf/Configuration") :
java.lang.ClassNotFoundException
In addition: Warning message:
running command '/usr/local/hadoop/bin classpath' had status 126
Also, 'rmr2' library was loaded, and the following code was typed:
ints = to.dfs(1:100)
which generated the message given below:
sh: 1: /usr/local/hadoop/bin: Permission denied
The R-Hadoop packages are accessible only to the 'root' user and not 'hduser' (Hadoop user), since they were installed when R was run by the 'root' user.
Simple, only 2 reasons to get this type of problem
1) Wrong path
2) No privileges/permissions to that jar ok
not only that include other system paths. such as given below.
Sys.setenv(HADOOP_HOME="/home/hadoop/path")
Sys.setenv(HADOOP_CMD="/home/hadoop/path/bin/hadoop")
Sys.setenv(HADOOP_STREAMING="/home/hadoop/path/streaming-jar-file.jar")
Sys.setenv(JAVA_HOME="/home/hadoop/java/path")
Then include ibrary(rmr2) and library(rhdfs) paths, surely that error don't occur.
But your problem is Permission problem. So as a root grant all privileges (755) to you then run that jar file, surely that error don't display.
try like this.
Sys.setenv(HADOOP_CMD='/usr/local/hadoop/bin/hadoop')
Sys.setenv(JAVA_HOME='/usr/lib/jvm/java-6-openjdk-amd64')
library(rhdfs)
hdfs.init()
please give the correct HADOOP_CMD path extend with /bin/hadoop

sparkR hdfs error - Server IPC version 9 cannot communicate with client version 4

I have installed sparkR in Ubuntu to support Hadoop version 2.4.0, following the instructions here.
I can see that the assembly JAR for Spark with Hadoop 2.4.0 and YARN support is created at the following location ./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.4.0.jar.
The R code below to read file from local works fine:
library(SparkR)
sc <- sparkR.init("local[2]", "SparkR", "/usr/local/spark",
list(spark.executor.memory="1g"))
lines <- textFile(sc, "//home//manohar//text.txt")
However, I get error when trying to read the file from hdfs.
library(SparkR)
sc <- sparkR.init()
lines <- textFile(sc, "hdfs://localhost:9000//in//text.txt")
Error:
Error in .jcall(getJRDD(rdd), "Ljava/util/List;", "collect") :
org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4
Not sure where I'm doing wrong. Appreciate any help.
The link you gave doesnt have any SparkR installation steps. According to sparkR readme, SparkR by default links to Hadoop 1.0.4. To use SparkR with other Hadoop versions, you will need to rebuild SparkR with the same version that [Spark is linked to]
SPARK_HADOOP_VERSION=2.4.0 ./install-dev.sh

Resources