I am trying to connect my HDP cluster from RStudio desktop(Windows) using SparkR package.
Spark init is failing with no credentials error message which seem to be because of missing kerberos credentials. (Exact error messages can be found below) I already have a kerberos ticker but it is found RStudio desktop doesn't support Kerberos authentication and RStudio server pro is what i needed. But, looks like RStudio server Pro can't be installed on Windows.
If i want to stick to my current Windows bases R and RStudio environment, is there any other ways to connect Hadoop?
Also, is there any package in core R itself (without RStudio ) i can use to authenticate with Hadoop cluster?
It looks like i can install Microsoft R on Windows but it doesn't look like it support Kerberos authentication?
sparkR.init(master = "yarn-client",appName = "sparkR", sparkHome = "C:/spark-1.6.2-bin-hadoop2.6",sparkEnvir = list(spark.driver.memory="2g"))
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "HostName/IPAddress"; destination host is: "HostName:PORT;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy23.getClusterMetrics(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:202)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethod
Microsoft R Server on Windows, as a client to the Microsoft R Server running on Hadoop does support Kerberos Authentication. It had been tested with Cloudera, Hortonworks HDP, and MapR.
When setting the compute context on Microsoft R Server (client-side) to RxHadoopMR(), R scripts can be executed remotely on the Hadoop Cluster, and, as long as all Nodes have valid Kerberos Tickets, you should be all set.
Please see: https://msdn.microsoft.com/en-us/microsoft-r/rserver-install-hadoop for installing Microsoft R Server on Hadoop.
and: https://msdn.microsoft.com/en-us/microsoft-r/rserver-install-windows for installing Microsoft R Server on Windows.
Related
I created a virtual machine running an ubuntu server 16.04. I've already installed spark and all dependencies & prerequisites. My Spark cluster is running on the VM and all workers and the master can be started by start-all.sh. Now I'm trying to submit sparkR jobs to this cluster by using Rstudio from my local computer. I specified the sparkContext with master="spark://192.168.0.105:7077" to connect to the cluster, which is obviously running, when calling the IP:8080 master webUI. Is there any config, that has been specified, to call the master from another device, which is not part of the cluster yet?
The error in R is:
Error in handleErrors(returnStatus, conn) :
java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem
You could try using the Livy rest api interface.
https://livy.incubator.apache.org/
see sparklyr - Connect remote hadoop cluster
I am trying to use the migration tool utility from within PingFederate but I keep getting the following error:
List adapters... Downloading adapter index from source... ERROR:
Unable to download from source.
java.security.NoSuchAlgorithmException: E rror constructing
implementation (algorithm: Default, provider: SunJSSE, class:
sun.security.ssl.SSLContextImpl$DefaultSSLContext) Done.
From the configcopy.log:
Caused by: java.net.SocketException: java.security.NoSuchAlgorithmException
Caused by: java.security.NoSuchAlgorithmException: Error constructing implementation
Caused by: java.io.IOException: Invalid keystore format
Windows 7 Professional SP1
java version "1.8.0_144" Java(TM) SE Runtime Environment (build
1.8.0_144-b01) Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
PingFederate: 8.4.2
I am attempting this because we want to automate a deployment process which has currently been manual. I am only trying to use the listadapters.conf template and have set the source.conf to output to a file. The command I am entering is:
configcopy.bat -Dconfigcopy.conf.file=configcopy_templates\\source.conf;configcopy_templates\
\listadapters.conf
and I am running this from the <PF_HOME>/bin directory. The contents of the two files I mentioned are:
source.conf
source.connection.management.service.url =
<my local install url on port 9999>/pf-mgmt-ws/ws/ConnectionMigrationMgr
source.connection.management.service.user = Administrator
source.connection.management.service.password =
OBF:JWE:eyJhbGciOiJkaXIiLCJlbmMiOiJBMTI4Q0JDLUhTMjU2Iiwia2lkIjoibGJhaGtDZlNiSiIsInZlcnNpb24iOiI4LjQuMi4wIn0..ryNLCcpzwEx6KGzXi1FboA.34NbypXUud45R77TLwMvjg.dQFNb9NpbDY_EWIePb9hMA
configcopy.connection.trust.keystore = C:\Program Files\Ping
Identity\pingfederate-8.4.2\pingfederate\server\default\data\pf.jwk
output.file = c:\temp\pf-config.txt
The Administrator is the default one from install with all three roles added to it and the password was obfuscated using obfuscate.bat in the bin directory.
listadapters.conf
cmd=listadapters
debug=true
select.adapter.role = idp
Even though it doesn't look like it above all backslashes are escaped that just hasn't come through here.
I have tried:
removing the path to the keystore altogether
ERROR: Unable to download from source.
sun.security.validator.ValidatorException : PKIX path building failed:
sun.security.provider.certpath.SunCertPathBuilderException: unable to
find valid certification path to requested target Done.
Setting the path to cacerts in jre/lib (same error as above)
I have installed the data.zip from the DotNet-Integration-Kit-2-5-2.zip and that is the only set up on this PC. (my dev box)
The integration kit puts two certificates (maybe the same one twice, not sure) that can be viewed through
Server Configuration > Trusted CAs
Server Configuration > SSL Server Certificates
And I have also added one into
Server Configuration > SSL Client Keys & Certificates
The kits certs show as RSA1024 and the one I created shows as RSA2048.
Questions:
Why does the error state algorithm:default (key store format?)
Is there a setting I am missing that would change it from default
Does anyone know of any docs other than the admin manual (almost know
it by heart now)
Why is pf.jwk the wrong format
Any other ideas at all please.
[update] Dam, I have been trying to use the migration utility but as I am on a version over 7.2 I should be using the administrative API. Back to the drawing board. Still looking for advice though!
The pf.jwk file is an encrypted Java web key. The truststore is a standard jks file that you add PingFed's SSL key to, or its signing CA's public key.
However, as you have found, you should use the admin API. Configcopy is no longer being developed.
the following issue is coming when trying to connect Hive 2 (kerberoes authenticat is enabled) using R rjdbc. used simba driver to connect to hive.
hiveConnection <- dbConnect(hiveJDBC, "jdbc:hive2://xxxx:10000/default;AuthMech=1;KrbRealm=xx.yy.com;KrbHostFQDN=dddd.yy.com;KrbServiceName=hive")
Error in .jcall(drv#jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
java.sql.SQLException: [Simba]HiveJDBCDriver Invalid operation: Unable to obtain Principal Name for authentication ;
make sure kinit is issued and kerberoes ticket is generated using klist
right Java version for the given R version (32/64 bit) available on the class-path
right slf4j jars available based on your java version
All these steps should resolve the issue assuming your code does not have logic issues.
I used to be able to run R code to pull Hive table using JDBC under Cloudera CDH 4.5. However now I got below connection error after upgraded to CDH5.3 (failed
to find any Kerberos tgt), seems it can not to connect to Cluster anymore.
The Hive server has been upgraded to hive2 server/Beeline.
Please see the code and error log below. Any experience and advise on how to fix this? Thanks.
options(width=120)
options( java.parameters = "-Xmx4g" )
query="select * from Hive_table"
user="user1"
passw="xxxxxxx"
hiveQuerytoDataFrame<-function(user,passw,query){
library(RJDBC)
.jaddClassPath("/opt/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc-0.10.0-cdh5.3.3.jar")
drv <- JDBC("org.apache.hive.jdbc.HiveDriver",classPath = list.files("/opt/cloudera/parcels/CDH/lib/",pattern="jar$",full.names=T, recursive = TRUE),identifier.quote="`")
`conn <- dbConnect(drv,"jdbc:hive2://server.domain.<>.com:10000/default;principal=hive/server.domain.com#SERVER.DOMAIN.COM",user,passw)
#dbListTables(conn)
jdbc_out<-dbGetQuery(conn,query)
str(jdbc_out)
return(jdbc_out)
} `
**Log:
ERROR transport.TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]**`
I'm using the RPostgreSQL 0.4 library (compiled on R 2.15.3) on R 2.15.2 under Windows 7 64-bit to interface to PostgreSQL. This works fine when connecting to my PostgreSQL databases on localhost. I'm trying to get my R code to run with a remote PostgreSQL database on Heroku. I can connect to Heroku's PostgreSQL database from the psql command shell on my machine, and it connects without a problem. I get the message:
psql (9.2.3, server 9.1.9)
WARNING: psql version 9.2, server version 9.1.
Some psql features might not work.
WARNING: Console code page (437) differs from Windows code page (1252)
8-bit characters might not work correctly. See psql reference
page "Notes for Windows users" for details.
SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
Clearly, psql uses SSL to connect. When I try to connect using the RPostgreSQL library routine dbConnect(), however, supplying exactly the same credentials using dname=, host=, port=, user=, password=, the connection fails with the complaint:
Error in postgresqlNewConnection(drv, ...) :
RS-DBI driver: (could not connect <user>#<hostname> on dbname <dbname>)
Calls: source ... .valueClassTest -> is -> is -> postgresqlNewConnection -> .Call
Execution halted
I know that Heroku insists on an SSL connection if you want to access their database remotely, so it seems likely that the R interface routine dbConnect() isn't trying SSL. Is there something else that I can do to get a remote connection from R to PostgreSQL on Heroku to work?
To get the JDBC URL for your heroku instance:
Get your hostname, username and password using [pg:credentials].
Your jdbc URL is going to be:
jdbc:postgresql://[hostname]/[database]?user=[user]&password=[password]&ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory
Proceed as you would normally with JDBC.
Apparently there is a way using RJDBC. See:
http://ryepup.unwashedmeme.com/blog/2010/11/17/working-with-r-postgresql-ssl-and-mssql/
Please note that in order to connect to Heroku database with JDBC externally, it is important to set the sslfactory parameter as well. Hope Heroku team goes through it and modifies their documentation.
String dbUri = "jdbc:postgresql://ec2-54-243-202-174.compute-1.amazonaws.com:5432/**xxxxxxx**";
Properties props = new Properties();
props.setProperty("user", "**xxxxx**");
props.setProperty("password", "**xxxxx**");
props.setProperty("ssl", "true");//ssl to be set true
props.setProperty("sslfactory", "org.postgresql.ssl.NonValidatingFactory");// sslfactory to be set as shown above
Connection c=DriverManager.getConnection(dbUri,props);
See answer to related Q at https://stackoverflow.com/a/38942581. The suggestion of using RPostgres (https://github.com/rstats-db/RPostgres) instead of RPostgreSQL resolved this same issue for me.