RJDBC dbGetQuery() ERROR to create external table HIVE - r

I encounter this problem: the DB call only creates a table, it has problem of retrieving JDBC result set.
Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for
Calls: dbGetQuery ... dbSendQuery -> dbSendQuery -> .local -> .verify.JDBC.result
Execution halted
options( java.parameters = "-Xmx32g" )
library(rJava)
library(RJDBC)
drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/tmp/r_jars/hive-jdbc.jar")
for(jar in list.files('/tmp/r_jars/')){
.jaddClassPath(paste("/tmp/r_jars/",jar,sep=""))
}
conn <- dbConnect(drv, "jdbc:hive2://10.40.51.75:10000/default", "myusername", "mypassword")
createSCOREDDL_query <- "CREATE EXTERNAL TABLE hiveschema.mytable (
myvariables
)
ROW FORMAT SERDE
'com.bizo.hive.serde.csv.CSVSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://mybucket/myschema/'"
dbGetQuery(conn, createSCOREDDL_query)
dbDisconnect(conn)

Instead of dbGetQuery can you try using dbSendUpdate? I was having similar issues and making this switch solved the problem.

I tried with the following code as suggested by #KaIC and it worked:
dbSendUpdate(conn, "CREATE EXTERNAL TABLE hiveschema.mytable ( col_A string, col_B string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE")
For multiple tables, you can create a list or loop within a function and use an apply() construct to apply it to the entire loop.

Related

R dbWriteTable to Cloud Spanner requiring column names

I'm trying to insert data into a Cloud Spanner table using DBI's dbWriteTable in R, however, it is asking me to supply column names. It is my understanding that as long as the dataframe contains the same amount of columns as required by the table then this should work. Here's my code and error I'm facing (leaving out connection details):
Code:
write.to.spanner <- function(table, df){
dbWriteTable(con, table, df, overwrite=FALSE, append=TRUE, row.names=FALSE)
}
req_df <- data.frame(req_id=123, req_name="test req")
write.to.spanner("dbi_test", req_df)
DDL for spanner table:
CREATE TABLE dbi_test (
req_id INT64,
req_name STRING(20),
) PRIMARY KEY(req_id);
Error:
Warning: Error in .local: execute JDBC update query failed in dbSendUpdate ([Simba][SpannerJDBCDriver](100605) There was an error while executing the DML query : INVALID_ARGUMENT: com.simba.cloudspanner.shaded.com.google.api.gax.rpc.InvalidArgumentException: com.simba.cloudspanner.shaded.io.grpc.StatusRuntimeException: INVALID_ARGUMENT: INSERT must specify a column list [at 1:1]
INSERT INTO dbi_test VALUES(#var1,#var2)
I'm able to insert using dbSendUpdate but would prefer to use dbWriteTable and not write out insert statements.
This works fine:
write.to.spanner <- function(table, df){
insert_qry <- paste0("INSERT INTO ",table," (req_id, req_name) VALUES (",df[1,1],",'",df[1,2],"');")
dbSendUpdate(con,insert_qry)
}

Read a View created from a procedure in SAP HANA from R

I have schema in SAP HANA by the name "HYZ_ProcurementToSales" and View "V_HYZ_P25_Market_Market_Orders" which is created from a procedure, I am trying to extract the view in the R server version 1.0.153. The code I am using is:
library(RJDBC)
conn_server <- dbConnect(jdbcDriver,
"jdbc:sap:rdkom12.dhcp.pal.sap.corp:30015", "system",
"manager")
res <- dbGetQuery(conn,"select * from
HYZ_ProcurementToSales.V_HYZ_P25_Market_Market_Orders")
The error that I get is this:
"Unable to retrieve JDBC result set for
select * from HYZ_ProcurementToSales.V_HYZ_P25_Market_Market_Orders".
My belief is that something else instead of dbGetQuery will do the trick here. It works fine if I simply do
res <- dbGetQuery(conn,"select * from Tables")
The following works for me on HANA 1 SPS12 with a procedure that exposes a view called V_CURRENTUSERS:
library(RJDBC)
drv <- JDBC("com.sap.db.jdbc.Driver",
"C:\\Program Files\\SAP\\hdbclient\\ngdbc.jar",
identifier.quote='"')
conn <- dbConnect(drv, "jdbc:sap://<hanaserver>:3<instance>15/?", "**username**", "*pw*")
jusers <- dbFetch(dbSendQuery(conn = conn, statement = 'select * from v_currentusers;'))
At this point, the whole result set is bound to jusers.
Once finished you should release the result set again:
dbClearResult(jusers)
and finally close the connection
dbDisconnect(conn)
Be aware that procedures with result views are deprecated and should not be used/developed anymore. Instead, use table functions as these can also be reused in information views and allow for dynamic parameter assignment.

R JDBC error "Unable to retrieve JDBC result set for insert into ..."

I am trying to write an R data.frame to a Netezza table. It has about 55K rows and I have set 4GB as memory limit for Java (options(java.parameters = "-Xmx4096m"))
Query:
insert into MY_TABLE_NAME select * from external 'csv_file_containing_data_frame.csv' using (delim ',' remotesource 'jdbc');
The above line of SQL works without any issues when I run it from a tool like DbVisualizer but I get the following error when I try to run it from RStudio.
R Code:
driver <- JDBC(driverClass="org.netezza.Driver", classPath = "drivers//nzjdbc.jar", "'")
connWrite <- dbConnect(driver, "jdbc:netezza://DB_SERVER:1234//DB_NAME", username, password)
str_insert_query <- paste(
"insert into MY_TABLE select * from external '", OutputFile , "' using (delim ',' remotesource 'jdbc');", sep = ""
dbSendQuery(connWrite, str_insert_query[1])
dbDisconnect(connWrite)
Error Message:
Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", :
Unable to retrieve JDBC result set for insert into MY_TABLE select * from external 'C:/.../csv_file_containing_data_frame.csv' using (delim ',' remotesource 'jdbc'); (netezza.bad.query.result)
dbWriteTable works but is so slow that it cannot be used.
Tried to assign the result of dbSendQuery() to a variable but it
didn't work.
Any help will be greatly appreciated. Thank you!
Need to use dbSendUpdate.
dbSendUpdate(connWrite, str_insert_query[1])

dplyr & monetdb - appropriate syntax for querying schema.table?

In monetdb I have set up a schema main and my tables are created into this schema.
For example, the department table is main.department.
With dplyr I try to query the table:
mdb <- src_monetdb(dbname="model", user="monetdb", password="monetdb")
tbl(mdb, "department")
But I get
Error in .local(conn, statement, ...) :
Unable to execute statement 'PREPARE SELECT * FROM "department"'.
Server says 'SELECT: no such table 'department'' [#42S02].
I tried to use "main.department" and other similar combinations with no luck.
What is the appropriate syntax?
There is a somewhat hacky workaround for this: We can manually set the default schema for the connection. I have a database testing, in there is a schema foo with a table called bar.
mdb <- src_monetdb("testing")
dbSendQuery(mdb$con, "SET SCHEMA foo");
t <- tbl(mdb, "bar")
The dbplyr package (a backend of dplyr for database connections) has a in_schema() function for these cases:
conn <- dbConnect(
MonetDB.R(),
host = "localhost",
dbname = "model",
user = "monetdb",
password = "monetdb",
timeout = 86400L
)
department = tbl(conn, dbplyr::in_schema("main", "department"))

How to select database column name with a dot in it in R?

The Vertica database table I'm using has a column called: incident.date
I connect to it ok:
install.packages("RJDBC",dep=TRUE)
library(RJDBC)
vDriver <- JDBC(driverClass="com.vertica.jdbc.Driver", classPath="C:/Vertica/vertica jar/vertica-jdbc-7.0.1-0.jar")
vertica <- dbConnect(vDriver, "jdbc:vertica://127.0.0.1:5433/dir", "name", "pass")
I can pull a regular query from it:
myframe = dbGetQuery(vertica, "Select * from output_servers")
but if I want specific column with a dot in the name, I get an error.
myframe = dbGetQuery(vertica, "Select product, incident, incident.date from output_servers")
Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", :
Unable to retrieve JDBC result set for Select product, incident, incident.date from output_servers ([Vertica][VJDBC](4566) ERROR: Relation "incident" does not exist)
I've tried square brackets, backticks, single and double quotes, and backslashes around the column name. I'm pretty sure it's simple, but what am I missing? Thanks!
I found it:
myframe = dbGetQuery(vertica, "Select product, incident, \"incident.date\" from output_servers")
Apparently it's Vertica that cares, not R.

Resources