How could R use RJDBC to connect to Hive? - r

I'm using hadoop-2.2.0 and hive-0.12. I followed the following steps to try to connect to Hive in Rstudio:
library("DBI")
library("rJava")
library("RJDBC")
for(l in list.files('/PATH/TO/hive/lib/')){ .jaddClassPath(paste("/PATH/TO/hive/lib/",l,sep=""))}
for(l in list.files('/PATH/TO/hadoop/')){ .jaddClassPath(paste("/PATH/TO/hadoop/",l,sep=""))}
options( java.parameters = "-Xmx8g" )
drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/PATH/TO/hive/lib/hive-jdbc.jar")
conn <- dbConnect(drv, "jdbc:hive2://HOST:PORT", USER, PASSWD)
But I got the following error:
Error in .jcall(drv#jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
Any tips will be appreciated.

The problem is solved.
I load all of the jar packages in the hadoop dir and then I can connect to Hive.

you can simply connect to hiveserver2 from R using RHIVE package
below are the commands that i had used.
Sys.setenv(HIVE_HOME="/usr/local/hive") Sys.setenv(HADOOP_HOME="/usr/local/hadoop") rhive.env(ALL=TRUE) rhive.init() rhive.connect("localhost")

Related

R connection to Hive protobuf class error

I'm trying to connect to a remote Hive using R, each step forward I find a new error. At the moment I'm doing that:
library("DBI")
library("rJava")
library("RJDBC")
cp = c("/path/jars/hadoop-common-3.1.0.jar",
"/path/jars/hive-jdbc-2.3.3-standalone.jar")
.jinit(classpath=cp)
drv <- JDBC("org.apache.hive.jdbc.HiveDriver",
"/path/jars/hive-jdbc-2.3.3-standalone.jar",
identifier.quote="`")
conn <- dbConnect(drv, "jdbc:hive2://<ip>:10000/default", "myuser", "")
And all I get is the following error, it's something about protobuf, but no idea it's a local problem (env?) or is it server-side.
java.lang.NoClassDefFoundError: com/google/protobuf/ProtocolMessageEnum
Downloading or getting the protobuf.jar from hadoop installation and adding it solved the problem.
cp = c("/path/jars/hadoop-common-3.1.0.jar",
"/path/jars/hive-jdbc-2.3.3-standalone.jar",
"/path/jars/protobuf-java-2.5.0.jar")

Error in .jfindClass(as.character(driverClass)[1]) : class not found - Hive R

I am connected to a remote R server which is built on x86_64-redhat-linux-gnu (64-bit) platform. The R version installed in this server is 3.3.1. I want to connect to remote hive database using this R server so that I can extract data and do some analysis on it. I am trying the following things,
options( java.parameters = "-Xmx8g" )
library(rJava)
library(RJDBC)
drv <- JDBC("org.apache.hive.jdbc.HiveDriver",
"/home/username/R/x86_64-redhat-linux-gnu-library/3.3/hive-jdbc-0.10.0.jar",
identifier.quote="`")
I am getting error as Error in .jfindClass(as.character(driverClass)[1]) : class not found. I downloaded the jar file and kept it in this path , /home/username/R/x86_64-redhat-linux-gnu-library/3.3/. I have downloaded only this jar file. Inside this /home/username/R/x86_64-redhat-linux-gnu-library/3.3/ path, I am having three folders such as DBI, rJava and RJDBC and the file hive-jdbc-0.10.0.jar.
Apart from this have not downloaded anything else for now. Is there anything else which I need to download in order for this error to resolve?
Another attempt which I tried was,
hivedrv <- JDBC("org.apache.hadoop.hive.jdbc.HiveDriver",
c(list.files("/home/username/R/x86_64-redhat-linux-gnu-library/3.3/",pattern="jar$",full.names=T),
list.files("/home/username/R/x86_64-redhat-linux-gnu-library/3.3/",pattern="jar$",full.names=T)))
which ran without any error. But when I try the following command,
hivecon <- dbConnect(hivedrv, "jdbc:hive://hostname:portname/", "username", "password")
I am getting the following error,
Error in .jcall(drv#jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/metastore/api/MetaException
Not sure how to solve this problem. Can anybody please help me in connecting the R server to Hive database? Any information would be helpful.

Unable to Connect to Cassandra Database from R using JDBC

I am trying to connect R with Cassandra. Following is my code:
library(RJDBC)
#Load in the Cassandra-JDBC diver
cassdrv <- JDBC("org.apache.cassandra.cql.jdbc.CassandraDriver",
list.files("D:/cassandra/lib",pattern="jar$",full.names=T))
#Connect to Cassandra node and Keyspace
casscon <- dbConnect(cassdrv, "jdbc:cassandra://192.168.1.20:9042/demodb")
When I run above code in R, I get following error:
Error in .jcall(drv#jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
java.sql.SQLNonTransientConnectionException: org.apache.thrift.transport.TTransportException: Read a negative frame size (--2080374784)!
Any ideas how to solve this error?
Thanks in advance!

connect to Remote Hive Server from R using RJDBC/RHive

I'm using RJDBC 0.2-5 to connect to Hive in Rstudio. My server has hadoop-2.4.1 and hive-0.14. I follow the below mention steps to connect to Hive.
library(DBI)
library(rJava)
library(RJDBC)
.jinit(parameters="-DrJava.debug=true")
drv <- JDBC("org.apache.hadoop.hive.jdbc.HiveDriver",
c("/home/packages/hive/New folder3/commons-logging-1.1.3.jar",
"/home/packages/hive/New folder3/hive-jdbc-0.14.0.jar",
"/home/packages/hive/New folder3/hive-metastore-0.14.0.jar",
"/home/packages/hive/New folder3/hive-service-0.14.0.jar",
"/home/packages/hive/New folder3/libfb303-0.9.0.jar",
"/home/packages/hive/New folder3/libthrift-0.9.0.jar",
"/home/packages/hive/New folder3/log4j-1.2.16.jar",
"/home/packages/hive/New folder3/slf4j-api-1.7.5.jar",
"/home/packages/hive/New folder3/slf4j-log4j12-1.7.5.jar",
"/home/packages/hive/New folder3/hive-common-0.14.0.jar",
"/home/packages/hive/New folder3/hadoop-core-0.20.2.jar",
"/home/packages/hive/New folder3/hive-serde-0.14.0.jar",
"/home/packages/hive/New folder3/hadoop-common-2.4.1.jar"),
identifier.quote="`")
conHive <- dbConnect(drv, "jdbc:hive://myserver:10000/default",
"usr",
"pwd")
But I am always getting the following error:
Error in .jcall(drv#jdrv, "Ljava/sql/Connection;", "connect",
as.character(url)[1], : java.lang.NoClassDefFoundError: Could not
initialize class org.apache.hadoop.hive.conf.HiveConf$ConfVars
Even I tried with different version of Hive jar, Hive-jdbc-standalone.jar but nothing seems to work.. I also use RHive to connect to Hive but there was also no success.
Can anyone help me?.. I kind of stuck :(
I didn't try rHive because it seems to need a complex installation on all the nodes of the cluster.
I successfully connect to Hive using RJDBC, here are a code snipet that works on my Hadoop 2.6 CDH5.4 cluster :
#loading libraries
library("DBI")
library("rJava")
library("RJDBC")
#init of the classpath (works with hadoop 2.6 on CDH 5.4 installation)
cp = c("/usr/lib/hive/lib/hive-jdbc.jar", "/usr/lib/hadoop/client/hadoop-common.jar", "/usr/lib/hive/lib/libthrift-0.9.2.jar", "/usr/lib/hive/lib/hive-service.jar", "/usr/lib/hive/lib/httpclient-4.2.5.jar", "/usr/lib/hive/lib/httpcore-4.2.5.jar", "/usr/lib/hive/lib/hive-jdbc-standalone.jar")
.jinit(classpath=cp)
#initialisation de la connexion
drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/usr/lib/hive/lib/hive-jdbc.jar", identifier.quote="`")
conn <- dbConnect(drv, "jdbc:hive2://localhost:10000/mydb", "myuser", "")
#working with the connexion
show_databases <- dbGetQuery(conn, "show databases")
show_databases
The harder is to find all the needs jars and where to find them ...
UPDATE
The hive standalone JAR contains all that was needed to use Hive, using this standalone JAR with the hadoop-common jar is enough to use Hive.
So this is a simplified version, no need to worry to other jars that the hadoop-common and the hive-standalone jars.
#loading libraries
library("DBI")
library("rJava")
library("RJDBC")
#init of the classpath (works with hadoop 2.6 on CDH 5.4 installation)
cp = c("/usr/lib/hadoop/client/hadoop-common.jar", "/usr/lib/hive/lib/hive-jdbc-standalone.jar")
.jinit(classpath=cp)
#initialisation de la connexion
drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/usr/lib/hive/lib/hive-jdbc-standalone.jar", identifier.quote="`")
conn <- dbConnect(drv, "jdbc:hive2://localhost:10000/mydb", "myuser", "")
#working with the connexion
show_databases <- dbGetQuery(conn, "show databases")
show_databases
Ioicmathieu's answer works for me now after I have switched to an older hive jar for example from 3.1.1 to 2.0.0.
Unfortunately I can't comment on his answer that's why I have written another one.
If you run into the following error try an older version:
Error in .jcall(drv#jdrv, "Ljava/sql/Connection;", "connect",
as.character(url)[1], : java.sql.SQLException: Could not open
client transport with JDBC Uri:
jdbc:hive2://host_name: Could not establish
connection to jdbc:hive2://host_name:10000: Required
field 'client_protocol' is unset!
Struct:TOpenSessionReq(client_protocol:null,
configuration:{set:hiveconf:hive.server2.thrift.resultset.default.fetch.size=1000,
use:database=default})

Connecting to Hive in R

I am trying to connect to hive in R. I have loaded RJDBC and rJava libraries on my R env.
I am using a Linux server with hadoop (hortonworks sandbox 2.1) and R (3.1.1) installed in the same box. This is the script I am using to connect:
drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/usr/lib/hive/lib/hive-jdbc.jar")
conn <- dbConnect(drv, "jdbc:hive2://localhost:10000/default")
I get this error:
Error in .jcall(drv#jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :java.lang.NoClassDefFoundError: Could not initialize class org.apache.hive.service.auth.HiveAuthFactory
I have checked that my classpath contains all the jar files in /usr/lib/hive and /usr/lib/hadoop,but can not be sure if anything else is missing. Any idea what is causing the problem??
I am fairly new to R (and programming for that matter) so any specific steps are much appreciated.
I succesffuly connect to Hive from R just with RJDBC and a few configuration lines. I prefere RJDBC to rHive because rHive needs complex installations on all the node of the cluster (and I don't really understand why).
Here is my R solution :
#loading libraries
library("DBI")
library("rJava")
library("RJDBC")
#init of the classpath (works with hadoop 2.6 on CDH 5.4 installation)
cp = c("/usr/lib/hive/lib/hive-jdbc.jar", "/usr/lib/hadoop/client/hadoop-common.jar", "/usr/lib/hive/lib/libthrift-0.9.2.jar", "/usr/lib/hive/lib/hive-service.jar", "/usr/lib/hive/lib/httpclient-4.2.5.jar", "/usr/lib/hive/lib/httpcore-4.2.5.jar", "/usr/lib/hive/lib/hive-jdbc-standalone.jar")
.jinit(classpath=cp)
#init of the connexion to Hive server
drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/usr/lib/hive/lib/hive-jdbc.jar", identifier.quote="`")
conn <- dbConnect(drv, "jdbc:hive2://localhost:10000/mydb", "myuser", "")
#working with the connexion
show_databases <- dbGetQuery(conn, "show databases")
show_databases
You can simply connect to hiveserver2 from R using the RHIVE package
Below are the commands that I had to use.
Sys.setenv(HIVE_HOME="/usr/local/hive") Sys.setenv(HADOOP_HOME="/usr/local/hadoop") rhive.env(ALL=TRUE) rhive.init() rhive.connect("localhost")

Resources