Kerberos connection error to Hive2 using JDBC in R - r

I used to be able to run R code to pull Hive table using JDBC under Cloudera CDH 4.5. However now I got below connection error after upgraded to CDH5.3 (failed
to find any Kerberos tgt), seems it can not to connect to Cluster anymore.
The Hive server has been upgraded to hive2 server/Beeline.
Please see the code and error log below. Any experience and advise on how to fix this? Thanks.
options(width=120)
options( java.parameters = "-Xmx4g" )
query="select * from Hive_table"
user="user1"
passw="xxxxxxx"
hiveQuerytoDataFrame<-function(user,passw,query){
library(RJDBC)
.jaddClassPath("/opt/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc-0.10.0-cdh5.3.3.jar")
drv <- JDBC("org.apache.hive.jdbc.HiveDriver",classPath = list.files("/opt/cloudera/parcels/CDH/lib/",pattern="jar$",full.names=T, recursive = TRUE),identifier.quote="`")
`conn <- dbConnect(drv,"jdbc:hive2://server.domain.<>.com:10000/default;principal=hive/server.domain.com#SERVER.DOMAIN.COM",user,passw)
#dbListTables(conn)
jdbc_out<-dbGetQuery(conn,query)
str(jdbc_out)
return(jdbc_out)
} `
**Log:
ERROR transport.TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]**`

Related

R teradata DBI:dbConnect() error: TimedOut: No response received when attempting to connect to the Teradata server

I am going to ask and answer this question because I spent more time than I'd like to admit searching for a response and couldn't find one. I installed Teradata ODBC Driver 16.20. In the ODBC Data Source Administrator, I added a Data Source. I named it teradata, put in the name of the Teradata Server to connect to and my username and password for authentication. When I tried running the following code in RStudio:
con <- DBI::dbConnect(odbc::odbc(),
"teradata")
I would get an error:
Error: nanodbc/nanodbc.cpp:1021: HY000: [Teradata][WSock32 DLL] (434) WSA E TimedOut: No response received when attempting to connect to the Teradata server
To solve this, I needed to pass a timeout argument:
con <- DBI::dbConnect(odbc::odbc(),
"teradata",
timeout = 20)

Kerberos authentication for Windows R

I am trying to connect my HDP cluster from RStudio desktop(Windows) using SparkR package.
Spark init is failing with no credentials error message which seem to be because of missing kerberos credentials. (Exact error messages can be found below) I already have a kerberos ticker but it is found RStudio desktop doesn't support Kerberos authentication and RStudio server pro is what i needed. But, looks like RStudio server Pro can't be installed on Windows.
If i want to stick to my current Windows bases R and RStudio environment, is there any other ways to connect Hadoop?
Also, is there any package in core R itself (without RStudio ) i can use to authenticate with Hadoop cluster?
It looks like i can install Microsoft R on Windows but it doesn't look like it support Kerberos authentication?
sparkR.init(master = "yarn-client",appName = "sparkR", sparkHome = "C:/spark-1.6.2-bin-hadoop2.6",sparkEnvir = list(spark.driver.memory="2g"))
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "HostName/IPAddress"; destination host is: "HostName:PORT;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy23.getClusterMetrics(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:202)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethod
Microsoft R Server on Windows, as a client to the Microsoft R Server running on Hadoop does support Kerberos Authentication. It had been tested with Cloudera, Hortonworks HDP, and MapR.
When setting the compute context on Microsoft R Server (client-side) to RxHadoopMR(), R scripts can be executed remotely on the Hadoop Cluster, and, as long as all Nodes have valid Kerberos Tickets, you should be all set.
Please see: https://msdn.microsoft.com/en-us/microsoft-r/rserver-install-hadoop for installing Microsoft R Server on Hadoop.
and: https://msdn.microsoft.com/en-us/microsoft-r/rserver-install-windows for installing Microsoft R Server on Windows.

R- mongolite on OS X Sierra- No suitable servers found

I am trying to follow the "Getting started with MongoDB in R" page to get a database up and running. I have mongoDB installed in my PATH so I am able to run mongod from the terminal and open an instance. Though when I open an instance in the background and try running the following commands in R:
library(mongolite)
m <- mongo(collection = "diamonds") #diamonds is a built in dataset
It throws an error after that last statement saying:
Error: No suitable servers found (`serverSelectionTryOnce` set): [Failed to resolve 'localhost']
How do I enable it to find the connection I have open? Or is it something else? Thanks.
It could be that mongolite is looking in the wrong place for the local server. I solved this same problem for myself by explicitly adding the local host address in the connection call:
m <- mongo(collection = "diamonds", url = "mongodb://127.0.0.1")

Connecting to hive (kerberoes enabled) with R rJDBC package from Rstudio windows

the following issue is coming when trying to connect Hive 2 (kerberoes authenticat is enabled) using R rjdbc. used simba driver to connect to hive.
hiveConnection <- dbConnect(hiveJDBC, "jdbc:hive2://xxxx:10000/default;AuthMech=1;KrbRealm=xx.yy.com;KrbHostFQDN=dddd.yy.com;KrbServiceName=hive")
Error in .jcall(drv#jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
java.sql.SQLException: [Simba]HiveJDBCDriver Invalid operation: Unable to obtain Principal Name for authentication ;
make sure kinit is issued and kerberoes ticket is generated using klist
right Java version for the given R version (32/64 bit) available on the class-path
right slf4j jars available based on your java version
All these steps should resolve the issue assuming your code does not have logic issues.

Want to Connect redshift to R

I tried to use the code from this link but I got an error
driver <- JDBC("com.amazon.redshift.jdbc41.Driver", "RedshiftJDBC41-1.1.9.1009.jar", identifier.quote="`")
JavaVM: requested Java version ((null)) not available. Using Java at "" instead.
JavaVM: Failed to load JVM: /bundle/Libraries/libserver.dylib
JavaVM FATAL: Failed to load the jvm library.
Error in .jinit(classPath) : JNI_GetCreatedJavaVMs returned -1
After loading the driver and trying to connect. I don't know how to connect Redshift to R.
This will not solve the error, but if you want to connect to Redshift from R, you can use RPostgreSQL library.
as in the answer in another R-Redshift connection issue
library (RPostgreSQL)
drv <- dbDriver("PostgreSQL")
conn <- dbConnect(drv, host="your.host.us-east-1.redshift.amazonaws.com",
port="5439",
dbname="your_db_name",
user="user",
password="password")
You also need to make sure that your IP is Redshift security group white list.

Resources