New-AzureRmHDInsightCluster "R Server" ClusterTier powershell - r

I am trying to create a Azure R Server HDInsight Cluster via PowerShell.
I am using the cmdlet - New-AzureRmHDInsightCluster.
It is throwing the following regardless of the -ClusterTier (The only choices appear to be Standard and Premium);
New-AzureRmHDInsightCluster : BadRequest: Cluster type 'R Server' is
not supported by 'Standard' tier.
Is it possible to create a HDinsight R Server cluster via powershell and if so what are the correct settings for New-AzureRmHDInsightCluster?

You can find documentation for creating R Server clusters via PowerShell here.

Related

Grakn Error; trying to load schema for "phone calls" example

I am trying to run the example grakn migration "phone_calls" (using python and JSON files).
Before reaching there, I need to load the schema, but I am having trouble with getting the schema loaded, as shown here: https://dev.grakn.ai/docs/examples/phone-calls-schema
System:
-Mac OS 10.15
-grakn-core 1.8.3
-python 3.7.3
The grakn server is started. I checked and the 48555 TCP port is open, so I don't think there is any firewall issue. The schema file is in the same folder (phone_calls) as where the json data files is, for the next step. I am using a virtual environment. The error is below:
(project1_env) (base) tiffanytoor1#MacBook-Pro-2 onco % grakn server start
Storage is already running
Grakn Core Server is already running
(project1_env) (base) tiffanytoor1#MacBook-Pro-2 onco % grakn console --keyspace phone_calls --file phone_calls/schema.gql
Unable to create connection to Grakn instance at localhost:48555
Cause: io.grpc.StatusRuntimeException
UNKNOWN: Could not reach any contact point, make sure you've provided valid addresses (showing first 1, use getErrors() for more: Node(endPoint=/127.0.0.1:9042, hostId=null, hashCode=5f59fd46): com.datastax.oss.driver.api.core.connection.ConnectionInitException: [JanusGraph Session|control|connecting...] init query OPTIONS: error writing ). Please check server logs for the stack trace.
I would appreciate any help! Thanks!
Nevermind -- I found the solution, in case any one else runs into a similar problem. The server configuration file needs to be edited: point the data directory to your project data files (here: the phone_calls data files) & change the server IP address to your own.

connecting to spark standalone cluster does not work within RStudio

I created a virtual machine running an ubuntu server 16.04. I've already installed spark and all dependencies & prerequisites. My Spark cluster is running on the VM and all workers and the master can be started by start-all.sh. Now I'm trying to submit sparkR jobs to this cluster by using Rstudio from my local computer. I specified the sparkContext with master="spark://192.168.0.105:7077" to connect to the cluster, which is obviously running, when calling the IP:8080 master webUI. Is there any config, that has been specified, to call the master from another device, which is not part of the cluster yet?
The error in R is:
Error in handleErrors(returnStatus, conn) :
java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem
You could try using the Livy rest api interface.
https://livy.incubator.apache.org/
see sparklyr - Connect remote hadoop cluster

How to submit jobs to spark master running locally

I am using R and spark to run a simple example to test spark.
I have a spark master running locally using the following:
spark-class org.apache.spark.deploy.master.Master
I can see the status page at http://localhost:8080/
Code:
system("spark-submit --packages com.databricks:spark-csv_2.10:1.0.3 --master local[*]")
suppressPackageStartupMessages(library(SparkR)) # Load the library
sc <- sparkR.session(master = "local[*]")
df <- as.DataFrame(faithful)
head(df)
Now this runs fine when I do the following (code is saved as 'sparkcode'):
Rscript sparkcode.R
Problem:
But what happens is that a new spark instance is created, I want the R to use the existing master instance (should see this as a completed job http://localhost:8080/#completed-app)
P.S: using Mac OSX , spark 2.1.0 and R 3.3.2
A number of things:
If you use standalone cluster use correct url which should be sparkR.session(master = "spark://hostname:port"). Both hostname and port depend on the configuration but the standard port is 7077 and hostname should default to hostname. This is the main problem.
Avoid using spark-class directly. This is what $SPARK_HOME/sbin/ scripts are for (like start-master.sh). There are not crucial but handle small and tedious tasks for you.
Standalone master is only resource manager. You have to start worker nodes as well (start-slave*).
It is usually better to use bin/spark-submit though it shouldn't matter much here.
spark-csv is no longer necessary in Spark 2.x and even if it was Spark 2.1 uses Scala 2.11 by default. Not to mention 1.0.3 is extremely old (like Spark 1.3 or so).

External Scripting and R (Kognitio)

I have created the R script environment (used this command to create it "create script environment RSCRIPT command '/usr/local/R/bin/Rscript --vanilla --slave'") and tried running the one R script but it fails with the below error message.
ERROR: RS 10 S 332659 R 31A004F LO:Script stderr: external script vfork child: No such file or directory
Is it because of the below line which i am using in the script ?
mydata <- read.csv(file=file("stdin"), header=TRUE)
if (nrow(mydata) > 0){
I am not sure what is it expecting.
I have one more questions to ask.
1) do we need to install the R package on our unix box ? if not then the kognitio package has it
I suspect the problem here is that you have not installed the R environment on ALL the database nodes in your system - it must be installed on every DB node involved in processing (as explained in chapter 10 of the Kognitio Guide which you can download from http://www.kognitio.com/forums/viewtopic.php?t=3) or you will see errors like "external script vfork child: No such file or directory".
You would normally use a remote deployment tool (e.g. HP's RDP) to ensure the installation was identical on all DB nodes. Alternatively, you can leverage the Kognitio wxsync tool to synchronise files across nodes.
Section 10.6 of the Kognitio Guide also explains how to constrain which DB nodes are involved in processing - this is appropriate if your script environment should not run on all nodes for some reason (e.g. it has an expensive per-node/per-core licence). That does not seem appropriate for using R though.

R function to connect to wireless

I am running a code for a client. Is there a R function / command to connect the computer to the available wireless and type in the passcode automatically?
Probably not.
However, you can run any OS command with system, so if you have a shell command foo which does what you want, you can just do system("foo") in R.
Most connections to the "outside world" with R are handled through the OS version of system-level 'libcurl' package. There is a package for R called RCurl. The authentication system is described in the variaous pages of:
help(package="RCurl") # brings up the index page.
help(getURL, package="RCurl") # has examples of authentication to a server with R code.
You should also read:
?connections

Resources