connecting to spark standalone cluster does not work within RStudio

connecting to spark standalone cluster does not work within RStudio - r

I created a virtual machine running an ubuntu server 16.04. I've already installed spark and all dependencies & prerequisites. My Spark cluster is running on the VM and all workers and the master can be started by start-all.sh. Now I'm trying to submit sparkR jobs to this cluster by using Rstudio from my local computer. I specified the sparkContext with master="spark://192.168.0.105:7077" to connect to the cluster, which is obviously running, when calling the IP:8080 master webUI. Is there any config, that has been specified, to call the master from another device, which is not part of the cluster yet?
The error in R is:
Error in handleErrors(returnStatus, conn) :
java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem

You could try using the Livy rest api interface.
https://livy.incubator.apache.org/
see sparklyr - Connect remote hadoop cluster

Related

Connecting to a Remote Cloudera Spark Cluster using Sparklyr with the method Livy

Not able to connect to a Remote spark cluster using Sparklyr Livy Method.
config <- livy_config(username="<username>", password="<password>")
sc <- spark_connect(master = "<address>", method = "livy", config = config)
I am getting an error:
Error in value[3L] : Failed to initialize livy connection:
Unable to retrieve a spark_connection from object of class function
Livy server is started on the cluster. Other Livy commands are working fine.
Remote cluster is a cloudera managed cluster.
The problem resolved when the sparklyr was reinstalled from CRAN, earlier the sparlyr was installed from using devtools::install_github("rstudio/sparklyr")

New-AzureRmHDInsightCluster "R Server" ClusterTier powershell

I am trying to create a Azure R Server HDInsight Cluster via PowerShell.
I am using the cmdlet - New-AzureRmHDInsightCluster.
It is throwing the following regardless of the -ClusterTier (The only choices appear to be Standard and Premium);
New-AzureRmHDInsightCluster : BadRequest: Cluster type 'R Server' is
not supported by 'Standard' tier.
Is it possible to create a HDinsight R Server cluster via powershell and if so what are the correct settings for New-AzureRmHDInsightCluster?

You can find documentation for creating R Server clusters via PowerShell here.

How to submit jobs to spark master running locally

I am using R and spark to run a simple example to test spark.
I have a spark master running locally using the following:
spark-class org.apache.spark.deploy.master.Master
I can see the status page at http://localhost:8080/
Code:
system("spark-submit --packages com.databricks:spark-csv_2.10:1.0.3 --master local[*]")
suppressPackageStartupMessages(library(SparkR)) # Load the library
sc <- sparkR.session(master = "local[*]")
df <- as.DataFrame(faithful)
head(df)
Now this runs fine when I do the following (code is saved as 'sparkcode'):
Rscript sparkcode.R
Problem:
But what happens is that a new spark instance is created, I want the R to use the existing master instance (should see this as a completed job http://localhost:8080/#completed-app)
P.S: using Mac OSX , spark 2.1.0 and R 3.3.2

A number of things:
If you use standalone cluster use correct url which should be sparkR.session(master = "spark://hostname:port"). Both hostname and port depend on the configuration but the standard port is 7077 and hostname should default to hostname. This is the main problem.
Avoid using spark-class directly. This is what $SPARK_HOME/sbin/ scripts are for (like start-master.sh). There are not crucial but handle small and tedious tasks for you.
Standalone master is only resource manager. You have to start worker nodes as well (start-slave*).
It is usually better to use bin/spark-submit though it shouldn't matter much here.
spark-csv is no longer necessary in Spark 2.x and even if it was Spark 2.1 uses Scala 2.11 by default. Not to mention 1.0.3 is extremely old (like Spark 1.3 or so).

Hue's Spark notebook not working on cloudera quickstart vm

Recently i installed cloudera quickstart vm 5.8 on my windows machine on top of VMware. By default Spark UI link and Zookeeper link was not there on Hue so, i just edited the hue.ini which had,
app_blacklist = zookeeper, spark
to
app_blacklist =
After doing this i was able to download some Spark examples but the Spark UI link was still not displayed. However i was able to get the zookeeper UI link.
From the downloaded examples i selected sample notebook through which i was able to get the Spark notebook UI. It had some examples but when i run them i'm getting the following error.
HTTPConnectionPool(host='localhost', port=8998): Max retries exceeded with url: /sessions (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fdd7c5f2e50>: Failed to establish a new connection: [Errno 111] Connection refused',))
Do i need to make any changes in addition to the one i have done in hue.ini file. Please guide me through this.

The error was rectified after i installed livy server and ran it using the following commands in terminal;
export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
$HOME/livy-server-0.2.0/bin/livy-server

Shiny Server cannot use RODBC to connect to DB2 but RStudio can in a Docker Container

I am working on deploying a shiny application in a Docker container onto Bluemix. I am using the rocker/shiny Docker image (https://hub.docker.com/r/rocker/shiny/) as my initial starting point. I have installed unixODBC-dev, RODBC, ibm data server driver package, the ibmdbR library for R, and all needed dependencies. My only problem is that when I try to access the shiny app from a web browser it fails to execute, the error is:
Warning in odbcDriverConnect("DSN=BLUDB", :
[RODBC] ERROR: state 01000, code 0, message [unixODBC][Driver Manager]Can't open lib '/root/db2_cli_odbc_driver/dsdriver/odbc_cli_driver/linuxamd64/clidriver/lib/libdb2o.so' : file not found
Warning in odbcDriverConnect("DSN=BLUDB; :
ODBC connection failed
Error in idaInit(con) : con is not an open connection, please use idaConnect() to create an open connection to the data base.
Initially I had this same problem whenever I would try to use isql to connect to the database or try to connect from RStudio, I used ldd on that library file and found what was missing and that fixed making connections from the command line and RStudio, however my Shiny-Server still gives me the same error, is there anything I am missing?

I ended up solving the problem myself, turns out the libraries were not accessible by the shiny-server which was running as a service. I moved the db2 odbc drivers over to /usr/local/lib to make it accessible, I also ran the "ldd" command on the library mentioned in the error message and found that I had to install libxml2 as well. After doing that I simply changed my odbcinst.ini file at /etc to reference the new location of the db2 library and now it all works! Hopefully anyone else trying to deploy Shiny Apps that rely on connecting to a DB2 database will find this useful.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

connecting to spark standalone cluster does not work within RStudio - r

You could try using the Livy rest api interface. https://livy.incubator.apache.org/ see sparklyr - Connect remote hadoop cluster

Related

Connecting to a Remote Cloudera Spark Cluster using Sparklyr with the method Livy

New-AzureRmHDInsightCluster "R Server" ClusterTier powershell

How to submit jobs to spark master running locally

Hue's Spark notebook not working on cloudera quickstart vm

Shiny Server cannot use RODBC to connect to DB2 but RStudio can in a Docker Container

Categories

Resources