How to create a table in SQL Server using RevoScaleR? - r

I'd like to able manage table on SQL Server via my R script and RevoScaleR library.
I have Machine Learning Services installed on a local instance and Microsoft R Client as an interpreter for R. I can get a connection to the server.
However, it seems, I can't create a table on the server.
I've tried:
> predictionSql = RxSqlServerData(table = "PredictionLinReg", connectionString = connStr)
> predict_linReg = rxPredict(LinReg_model, input_data, outData = predictionSql, writeModelVars=TRUE)
...which returns:
Error in rxCompleteClusterJob(hpcServerJob, consoleOutput,
autoCleanup) : No results available - final job state: failed
Help would be appreciated. New to R.

Related

Connecting to Azure Databricks from R using jdbc and sparklyr

I'm trying to connect my on-premise R environment to an Azure Databricks backend using sparklyr and jdbc. I need to perform operations in databricks and then collect the results locally. Some limitations:
No RStudio available, only a terminal
No databricks-connect. Only odbc or jdbc.
The configuration with odbc + dplyr is working, but it seems too complicated, so I would like to use jdbc and sparklyr. Also, if I use RJDBC it works, but it would be great to have the tidyverse available for data manipulation. For that reason I would like to use sparklyr.
I've the jar file for Databricks (DatabricksJDBC42.jar) in my current directory. I downloaded it from: https://www.databricks.com/spark/jdbc-drivers-download. This is what I got so far:
library(sparklyr)
config <- spark_config()
config$`sparklyr.shell.driver-class-path` <- "./DatabricksJDBC42.jar"
# something in the configuration should be wrong
sc <- spark_connect(master = "https://adb-xxxx.azuredatabricks.net/?o=xxxx",
method = "databricks",
config = config)
spark_read_jdbc(sc, "table",
options = list(
url = "jdbc:databricks://adb-{URL}.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/{ORG_ID}/{CLUSTER_ID};AuthMech=3;UID=token;PWD={PERSONAL_ACCESS_TOKEN}",
dbtable = "table",
driver = "com.databricks.client.jdbc.Driver"))
This is the error:
Error: java.lang.IllegalArgumentException: invalid method toDF for object 17/org.apache.spark.sql.DataFrameReader fields 0 selected 0
My intuition is that the sc might not be not working. Maybe a problem in the master parameter?
PS: this is the solution that works via RJDBC
databricks_jdbc <- function(address, port, organization, cluster, token) {
location <- Sys.getenv("DATABRICKS_JAR")
driver <- RJDBC::JDBC(driverClass = "com.databricks.client.jdbc.Driver",
classPath = location)
con <- DBI::dbConnect(driver, sprintf("jdbc:databricks://%s:%s/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/%s/%s;AuthMech=3;UID=token;PWD=%s", address, port, organization, cluster, token))
con
}
DATABRICKS_JAR is an environment variable with the path "./DatabricksJDBC42.jar"
Then I can use DBI::dbSendQuery(), etc.
Thanks,
I Tried multiple configurations for master. So far I know that jdbc for the string "jdbc:databricks:..." is working. The JDBC connection works as shown in the code of the PS section.
Configure R studios with azure databricks -> go to cluster -> app -> set up azure Rstudio .
For information refer this third party link it has detail information about connecting azure databricks with R
Alternative approach in python:
Code:
Server_name = "vamsisql.database.windows.net"
Database = "<database_name"
Port = "1433"
user_name = "<user_name>"
Password = "<password"
jdbc_Url = "jdbc:sqlserver://{0}:{1};database={2}".format(Server_name, Port,Database)
conProp = {
"user" : user_name,
"password" : Password,
"driver" : "com.microsoft.sqlserver.jdbc.SQLServerDriver"
}
df = spark.read.jdbc(url=jdbc_Url, table="<table_name>", properties=conProp)
display(df)
Output:

How to connect R Server (Microsoft Machine Learning Server) to a SQL Server? Can connect locally but not remotely

I'm trying to load a very large dataset into R that is stored in SQL. I am able to use local R (Microsoft R Client 3.3.3.0) to connect to SQL via the following:
library(RODBC)
db <- odbcDriverConnect('driver={SQL Server};server=servername\\servername;database=dbname;trusted_connection=true')
tb <- paste("select top 100 * from dbname", sep = "")
df <- sqlQuery(db, tb)
And all that works fine. However, when I try to do this in a remote session (MMLS R version 3.4.3), it doesn't work:
library(mrsdeploy)
remoteLogin("http://some.url", session = TRUE)
REMOTE> #insert script from above
which returns the following error message:
Error in sqlQuery(db, tb) : first argument is not an open RODBC channel
The problem is with odbcDriverConnect(), as running that locally returns, as expected, an object of class "RODBC" delineating the RODBC connection details but running that remotely returns a scalar of class "integer" (-1).
Am I doing something wrong or is it not possible to remotely connect to a SQL database while remotely connected to an R server?

R Pool Order of Connections

I am using the R Pool package to connect to both SQL Server and Oracle databases within the same application. If I create the Oracle pool first I am unable to connect to SQL Server and get the following error:
Error in .verify.JDBC.result(jc, "Unable to connect JDBC to ", url) :
but if I create the SQL Server pool first I have no problems and can subsequently connect to Oracle.
wd <- getwd()
driver1 = JDBC("com.microsoft.sqlserver.jdbc.SQLServerDriver", classPath = file.path(wd,"mssql-jdbc-7.2.2.jre8.jar",":",wd,"jtds-1.3.1a.jar"))
connection1 = dbPool(drv=driver1, url=**, user=**, password=**, minsize=1, maxsize=5)
driver2 = JDBC("oracle.jdbc.driver.OracleDriver", classPath = file.path(wd,"ojdbc6.jar"))
connection2 = dbPool(drv=driver2, url=**, user=**, password=**, minsize=1, maxsize=5)
Has anyone experienced something similar?

Error When Running Data Function With "Force Server" On Run Location

I'm having a constant error whenever I try to execute my data function forcing it to run on Spotfire Server.
The script works fine on R Studio and also on Spotfire if the Run Location configuration is set to "Force Local" or "Default".
When I use the Force Server, I get the following error message when executing a query:
Could not execute function call. TIBCO Spotfire Statistics Services returned an error:
'Error in sqlQuery(myconn, mappingQuery, errors = TRUE, rows_at_time = : first argument is not an open RODBC channel'.
at Spotfire.Dxp.Data.DataFunctions.Executors.RemoteFunctionClient.OnExecuting(FunctionClient funcClient)
at Spotfire.Dxp.Data.DataFunctions.Executors.AbstractFunctionClient.<RunFunction>d__0.MoveNext()
at Spotfire.Dxp.Data.DataFunctions.Executors.SPlusFunctionExecutor.<ExecuteFunction>d__0.MoveNext()
at Spotfire.Dxp.Data.DataFunctions.DataFunctionExecutorService.<ExecuteFunction>d__6.MoveNext()
Even if I have a straightforward script and query like the one below, the results are the same:
require(RODBC)
myconn <- odbcDriverConnect("Driver={SQL Server};Server=MY_SERVER;Database=MY_DATABASE;Trusted_Connection=True")
# myconn <- odbcDriverConnect("Driver={SQL Server};Server=MY_SERVER;Database=MY_DATABASE;UID=MY_USER;Pwd=MY_PASSWORD") ## Same result with trusted connection or user/password
query <- "SELECT * FROM MY_TABLE"
df <- sqlQuery(myconn, query)
print(df)
Have anyone ever seen this?
Thanks!
The error seems to suggest that the "RODBC" package's 'odbcDriverConnect()' function might not be finding the ODBC drivers it needs on the server that TIBCO Spotfire Statistics Services (TSSS) is installed on.
Try installing the required odbc drivers on the machine where the TIBCO Spotfire Statistics Services is installed.

R- mongolite on OS X Sierra- No suitable servers found

I am trying to follow the "Getting started with MongoDB in R" page to get a database up and running. I have mongoDB installed in my PATH so I am able to run mongod from the terminal and open an instance. Though when I open an instance in the background and try running the following commands in R:
library(mongolite)
m <- mongo(collection = "diamonds") #diamonds is a built in dataset
It throws an error after that last statement saying:
Error: No suitable servers found (`serverSelectionTryOnce` set): [Failed to resolve 'localhost']
How do I enable it to find the connection I have open? Or is it something else? Thanks.
It could be that mongolite is looking in the wrong place for the local server. I solved this same problem for myself by explicitly adding the local host address in the connection call:
m <- mongo(collection = "diamonds", url = "mongodb://127.0.0.1")

Resources