How to connect R to a running H20 cluster on hadoop

How to connect R to a running H20 cluster on hadoop - r

I am running h20 on a 10 node cluster on hadoop ( h20 started using h20driver.jar)
Using below command in R to connect to the cluster
h20.init(ip="ip-address",startH20=FALSE) fails with below error
Cannot connect to H20 server. Please check that h20 running at
https://ip-address:54321
Any suggestion?

Found it to be a proxy issue. Checked and removed proxy environment variable within R.
Check if there is a proxy, i had one set
Sys.getenv("http_proxy)
Sys.getenv("https_proxy)
Unset the proxy
Sys.setenv("http_proxy"="")
Sys.setenv("https_proxy"="")

Related

OpenStack-Devstack: Can't create instances using KVM on host

I have a Dockerize installation from Devstack all-in-one on Ubuntu 20.04. The goal for me is to connect to the host's KVM and create instances there. Nova was configured as follows for this purpose.
#/etc/nova/nova.conf
#/etc/nova/nova-cpu.conf
[libvirt]
connection_uri = qemu+ssh://root#172.10.1.1/system
When I try to build the instance, I get the following error.
Build of instance cdd6f8b4-6dcf-4a43-b96a-fb6166b20235 aborted: Failed to allocate the network(s), not rescheduling.
ovs-vsctl commands cause the error. What is the problem? Does this need to be done differently?

How to connect R Studio to Tableau?

I'm trying to connect R Studio to Tableau Desktop to do some data analysis work, but an error has occurred during connection saying: localhost:6311: Connection refused
I'm using MacOS version 10.13.6
Coding on R:
install.packages("Rserve")
library(Rserve)
Rserve()

Try adding the following parameter to your Rserve() line which will hard-code the port:
Rserve(port = 6311)
If that doesn't work, it is worth troubleshooting the port with the following command in terminal (telnet might need to be installed as it is not installed by default):
telnet localhost 6311
the return from the telnet command should be
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Rsrv0103QAP1
More information on the above here
If the command returns a failure, the problem is certainly outside of Tableau.
Some thoughts from there would be to edit the R config file to explicitly accept remote connections.

sparklyr - Connect remote hadoop cluster

It is possible to connect sparklyr with a remote hadoop cluster or it is only possible to use it local?
And if it is possible, how? :)
In my opinion the connection from R to hadoop via spark is very important!

Do you mean Hadoop or Spark cluster? If Spark, you can try to connect through Livy, details here:
https://github.com/rstudio/sparklyr#connecting-through-livy
Note: Connecting to Spark clusters through Livy is under experimental development in sparklyr

You could use livy which is a Rest API service for the spark cluster.
once you have set up your HDinsight cluster on Azure check for livy service using curl
#curl test
curl -k --user "admin:mypassword1!" -v -X GET
#r-studio code
sc <- spark_connect(master = "https://<yourclustername>.azurehdinsight.net/livy/",
method = "livy", config = livy_config(
username = "admin",
password = rstudioapi::askForPassword("Livy password:")))
Some useful URL
https://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-livy-rest-interface

Impala 1.2.1 ERROR: Couldn't open transport for localhost:26000(connect() failed: Connection refused)

Using impala-shell, I can see the hive metastore, use any data base created by Hive and query any table created by Hive. When I try to create a table in impala-shell or do a "invalidate metadata", I get
"ERROR: Couldn't open transport for localhost:26000(connect() failed: Connection refused)"
Have following configuration. This is a multi-node cluster configuration * built by hand i.e. without using Cloudera Manager *
CentOS 6
CDH4.5
Impala 1.2.1
Hive MySQL Metastore
impalad are running on multiple nodes with data nodes
statestored and catalogd is running on a single node that is NOT impalad node
In /etc/default/impala I have changed IMPALA_STATE_STORE_HOST to point to IP of the statestored machine
From the /var/log/impala/catalogd.INFO, it seems 26000 is used by catalog service as there is a line in this file "--catalog_service_port=26000"
Just as /etc/default/impala has to tell Impalad where is the statestore (using IMPALA_STATE_STORE_HOST), I am wondering if for 1.2.1 (where catalogd is introduced) there has to be an additional entry for catalogd location as well - just a guess ....
Any help is appreciated.
Thanks,

you have to start the impalad with the option -catalog_service_host=fqdn_to_your_catalog_host.
unfortunately this is not yet in the default configuration so you have to add it yourself
change /etc/default/impala
CATALOG_SERVICE_HOST=fqdn_to_your_catalog_host
IMPALA_SERVER_ARGS=add: -catalog_service_host=${CATALOG_SERVICE_HOST}
restart impalad and it should work now :-)

R and snow on amazon EC2 using starcluster

I'm trying to run analysis in parrallel in R on an AWS EC2 cluster. I am using
starcluster to setup and manage the EC2 cluster, and am trying to use snow and
foreach in R. To start off, I have 2 nodes in the cluster, 1 master and 1
worker.
starcluster start mycluster
starcluster listinstances
-----------------------------------------
mycluster (security group: #sc-mycluster)
-----------------------------------------
....
Cluster nodes:
master running i-xxxxxxxxx masterIP.compute-1.amazonaws.com
node001 running i-xxxxxxxxx node001IP.compute-1.amazonaws.com
Total nodes: 2
starcluster sshmaster mycluster
I then start R and load the snow package and try to create a cluster
object.
R
library("snow")
cl = makeCluster(c("masterIP.compute-1.amazonaws.com", "node001IP.compute-1.amazonaws.com"), type = "SOCK")
This, however, gives me the following error message:
The authenticity of host 'masterIP.compute-1.amazonaws.com (xx.xxx.xx.xx)' can't be established.
ECDSA key fingerprint is xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'masterIP.compute-1.amazonaws.com,xx.xxx.xx.xx' (ECDSA) to the list of known hosts.
Permission denied (publickey).
So I tried copying my ssh key (keyname.rsa to be specific) to the .ssh file
on EC2 and trying again. That still didn't work; I received the same
Permission denied (publickey). error. It was my thought that starcluster
handled the setup of ssh and communication between nodes, so I'm a little
confused as to why I'm not able to set this up. I also tried to just add node001, so cl = makeCluster(c("node001IP.compute-1.amazonaws.com"), type = "SOCK"), but the same error occurs.

It turns out, after much tinkering, that all that was needed was an update to R version 2.15. The command cl = makeCluster(c("masterIP.compute-1.amazonaws.com", "node001IP.compute-1.amazonaws.com"), type = "SOCK") worked perfectly after that.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to connect R to a running H20 cluster on hadoop - r

Found it to be a proxy issue. Checked and removed proxy environment variable within R. Check if there is a proxy, i had one set Sys.getenv("http_proxy) Sys.getenv("https_proxy) Unset the proxy Sys.setenv("http_proxy"="") Sys.setenv("https_proxy"="")

Related

OpenStack-Devstack: Can't create instances using KVM on host

How to connect R Studio to Tableau?

sparklyr - Connect remote hadoop cluster

Impala 1.2.1 ERROR: Couldn't open transport for localhost:26000(connect() failed: Connection refused)

R and snow on amazon EC2 using starcluster

Categories

Resources