RStudio Connection to Spark on IBM Watson Studio - r

I'm trying to connect to Spark from an RStudio instance on IBM Watson Studio but I'm getting the following error.
No encoding supplied: defaulting to UTF-8. Error in force(code) :
Failed during initialize_connection: attempt to use zero-length
variable name
Log: /tmp/Rtmpdee7QC/file1b33141066_spark.log
---- Output Log ----
hummingbird kernel
http://localhost:8081/apsrstudio/agent/v1/kernel/hb-connect ; Time
Diff :1.31352798938751
{"code": "import sparklyr._"} ; Time Diff :0.00552034378051758
Here's the code I'm using to create the connection:
kernels <- load_spark_kernels()
sc <- spark_connect(config = kernels[1])
Any help would be highly appreciated!

I was able to fix this issue! Seems like I was missing a Project Access Token. Project access tokens can be manually created as described here. Tokens can be created on the Settings page of your project. From the link shared above:
Create an access token on the Settings page of your project. Only project admins can create access tokens. The access token can have viewer or editor access permissions. Only editors can inject the token into a notebook.
After adding a project access token, I could connect to Spark using the code provided in the question with no problems.
kernels <- load_spark_kernels()
sc <- spark_connect(config = kernels[1])

If you are using IBM Watson Studio on Cloud and using Rstudio in it, you should be using list_spark_kernels() to list the kernels.
kernels <- list_spark_kernels()
Then use spark_connect() to connect to it.
One more thing, do not upgrade the sparklyr, if you did, uninstall it.
Since sparklyr that Rstudio on watson studio cloud has is customized to allow to be able to connect to spark service from IBM Cloud.
sc <- spark_connect(config = kernels[1])
Uninstalling the sparklyr or removing your version of sparklyr will load the original sparklyr(customized).
Hope it helps.

Related

Google Cloud VM - googleAuthR library no called error

I have an R script that I run in google cloud environment, it helps me pull google analytics data and then I store it in storage. I call the googleAuthR library in one line of my script but I keep getting the same error. Has anyone had this problem before or can help?
I call the library like this:
library(googleAuthR)
library(googleCloudStorageR)
and error text i get:
Error in library(googleAuthR) : there is no package called ‘googleAuthR’
Looks like your R installation cannot find the package. Probably it is not installed where R is looking for it.
To fix it, just open R from a terminal and execute:
install.packages("googleAuthR");
and
install.packages("googleCloudStorageR");
Remember that you will need to pass your Google credentials to work with cloud storage (for instance in a .json file, and set the GOOGLE_APPLICATION_CREDENTIALS environment variable - see https://cloud.google.com/docs/authentication/getting-started ).

Publishing Tableau Dashboard with R scripts to Server

I have built a Dashboard that has multiple calculations using R Scripts and now I want to publish this to our internal Server. I get the following error message:
"This worksheet contains R scripts, which cannot be viewed on the target platform until the administrator configures an Rserve connection."
From what I understand, the administrator has to configure the Rserve, but what about the rest of the installed packages I have in use? Should the administrator install those too and every time I use a new package, I should inform him to install that particular package?
You need to install the packages on the server that your script will use. Then make sure to start Rserve there and connect your Tableau workbook to the server where Rserve is used (using Rserve connection in Tableau).
Tableau described the process pretty well:
http://kb.tableau.com/articles/knowledgebase/r-implementation-notes

Refresh the dataset in Azure machine learning

I have an experiment (exp) which is published as a web service (exp [Predictive Exp.]) in the azure machine learning studio, the data used by this experiment was pushed by R using AzureML package
library(AzureML)
ws <- workspace(
id = 'xxxxxxxxx',
auth = 'xxxxxxxxxxx'
)
upload.dataset(data_for_azure, ws, "data_for_azure")
The above thing worked, but lets say I want to update the dataset(same schema just added more rows)
I tired this but this does not work:
delete.datasets(ws, "data_for_azure")
refresh(ws, what = c("everything", "data_for_azure", "exp", "exp [Predictive Exp.]"))
I get the error stating the following:
Error: AzureML returns error code:
HTTP status code : 409
Unable to delete dataset due to lingering dependants
I went through the documentation, and I know that a simple refresh is not possible(same name), the only alternative I see is to delete the web service and perform everything again
Any solution will be greatly helped!
From the R doc.
The AzureML API does not support uploads for replacing datasets with
new data by re-using a name. If you need to do this, first delete the
dataset from the AzureML Studio interface, then upload a new version.
Now, I think this is particular for the R sdk, as the Python SDK, and the AzureML Studio UI lets you upload a new dataset. Will check in with the R team about this.
I would recommend uploading it as a new dataset with a new name, and then replacing the dataset in your experiment with this new dataset. Sorry this seem's round about, but I think is the easier option.
Unless you want to upload the new version using the AzureML Studio, in which case go to +NEW, Dataset, select your file and select the checkbox that says this is an existing dataset. The filename should be the same.

How to configure target url for BPM 8.5.6 Standard?

I'm trying to install IBM BPM 8.5.6 in a linux environment with Oracle database.
Steps I followed to install was
Installed the IBM Installation
Manager using BPM PFS
Installed WAS
and BPM Process Center using The
installation manager.
Created 3 oracle schema for shred db, process
server and performance server
Configured the installation using
sample single cluster process center
file provided by IBM. : using
BPMConfig –create option
The installation was successful and I could see all tables being created. Then I started started it using BPMConfig –start option. That too completed successfully.
I didn't change any ports so it should be using all default ports. Afterwards when I try to access the console like http://servername:9080/ProcessAdmin or http://servername:9080/ProcessCenter or anything i'm getting a 404 error message Error 404: com.ibm.ws.webcontainer.servlet.exception.NoTargetForURIException: No target servlet configured for uri: /ProcessAdmin
Do I have to do anything else? Or what is the starting point or default url to get to process portal or admin console. The WAS admin console is working fine.
Any help is appreciated. Thanks.
Since you probably used custom installation, you have to properly initialize data calling following command:
bootstrapProcessServerData.bat -clusterName cluster_name

Connecting to Analysis Services from R or Nodejs

I am trying to connect Analysis services from either through R or Nodejs.
For R, I have found the following library:
https://github.com/overcoil/X4R
For Nodejs, I have found the following library:
https://github.com/rpbouman/xmla4js
Analysis Services Server is external. It is not in my local machine. Currently I am able to connect it successfully from Excel using both Windows and basic authentication (username/password).
For accessing it through R or nodejs, in the following link it is said I need to configure http access using IIS. But since it is not local how can I get the
file msmdpump.dll and configure it.
In this link https://www.linkedin.com/grp/post/77616-265568694, at the end Sarah Lukens said that I need to follow the steps mentioned in https://msdn.microsoft.com/en-us/library/gg492140.aspx
Since I didn't work before in SSAS, I don't have much clarity. Can anyone please guide me in establishing the connection from R or Nodejs to Analysis services. Then I just want to submit the MDX queries and get the result.
thanks,
r karthik.
It seems there is no way to connect to your SSAS remotely without IIS. You have to share your msmdpump.dll in order to get access to your SSAS connection for third-party APIs via the xmla interface.

Resources