JDBC error when querying Oracle 11.1 (ORA-00933) - r

I have searched high and low for answers so apologies if it has already been answered!
Using R I am trying to perform a lazy evaluation of Oracle 11.1 databases. I have used JDBC to facilitate the connection and I can confirm it works fine. I am also able to query tables using dbGetQuery, although the results are so large that I quickly run out of memory.
I have tried dbplyr/dplyr tbl(con, "ORACLE_TABLE") although I get the following error:
Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", :
Unable to retrieve JDBC result set for SELECT *
FROM "ORACLE_TABLE" AS "zzz39"
WHERE (0 = 1) (ORA-00933: SQL command not properly ended)
I have also tried using db_table <- tbl(con, in_schema('ORACLE_TABLE'))
This is happening with all databases I am connected to, despite being able to perform a regular dbGetQuery.
Full Code:
# Libraries
library(odbc)
library(DBI)
library(config)
library(RJDBC)
library(dplyr)
library(tidyr)
library(magrittr)
library(stringr)
library(xlsx)
library(RSQLite)
library(dbplyr)
Oracle Connection
db <- config::get('db')
drv1 <- JDBC(driverClass=db$driverClass, classPath=db$classPath)
con_db <- dbConnect(drv1, db$connStr, db$orauser, db$orapw, trusted_connection = TRUE)
# Query (This one works but the data set is too large)
db_data <- dbSendQuery(con_db, "SELECT end_dte, reference, id_number FROM ORACLE_TABLE where end_dte > '01JAN2019'")
**# Query (this one wont work)**
oracle_table <- tbl(con_db, "ORACLE_TABLE")
Solved:
Updated Rstudio + Packages.
Follow this manual:
https://www.linkedin.com/pulse/connect-oracle-database-r-rjdbc-tianwei-zhang/
Insert the following code after 'con':
sql_translate_env.JDBCConnection <- dbplyr:::sql_translate_env.Oracle
sql_select.JDBCConnection <- dbplyr:::sql_select.Oracle
sql_subquery.JDBCConnection <- dbplyr:::sql_subquery.Oracle

Related

dbGetQuery and dbReadTable failing to return a really large table DBI

I have a really large table (8M rows) that I need to import in R on which I will be doing some processing. Problem is when I try to bring it in R using the DBI package I get an error
My code is below
options(java.parameters = "-Xmx8048m")
psql.jdbc.driver <- "../postgresql-42.2.1.jar"
jdbc.url <- "jdbc:postgresql://server_url:port"
pgsql <- JDBC("org.postgresql.Driver", psql.jdbc.driver, "`")
con <- dbConnect(pgsql, jdbc.url, user="", password= '')
tbl <- dbGetQuery(con, "SELECT * FROM my_table;")
And the error I get is
Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", :
Unable to retrieve JDBC result set for SELECT * FROM my_table; (Ran out of memory retrieving query results.)
I can understand its because the result set is too big but I am not sure how to retrieve it by batches instead of all of it together. I have tried using dBSendQuery, dbReadTable and dbGetQuery all of them give the same error.
Any help would be appreciated!
I got it to work by using the RPostgreSQL package instead of the default RJDBC and DBI package.
It was able to do a sendQuery and then used fetch recursively to get the data in chunks of 10,000.
main_tbl <- dbFetch(postgres_query, n=-1) #didnt work so tried in chunks
df<- data.frame()
while (!dbHasCompleted(postgres_query)) {
chunk <- dbFetch(postgres_query, 10000)
print(nrow(chunk))
df = rbind(df, chunk)
}

dplyr copy_to large data

Hi I would like to know.
what is the best way to copy large data into an sql database using dplyr?
copy_to fails as data is larger than memory. I am using code:
library(DBI)
library(dplyr)
db <- dbConnect(
RPostgreSQL::PostgreSQL(),
dbname = "postgres",
host="localhost",
user="user",
password="password")
Data<-rio::import("Data/data.feather")
Data <- copy_to(db, Data,temporary=FALSE)
This results in error:
Error in postgresqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not Retrieve the result : ERROR: out of memory
DETAIL: Cannot enlarge string buffer containing 0 bytes by 1305608043 more bytes.
)
is there a way to do this without having to import the data first?
is there a way to do this though sparkly since this is using only one core?

Unable to debug a function in R

Debugging the R script I have come across a strange error: “Error in debug(fun, text, condition) : argument must be a closure”.
PC features: Win7/64 bit, Oracle client 12 (both 32 and 64bit), R (64bit)
Earlier the script has been debugged well without errors. I have looked for a clue in the Inet but have found no clear explanation what the mistake is and how to remove it.
Running the script as a plain script but not a function produces no errors.
I would be very grateful for your ideas 
The source script (connection to oracle DB and executing a simple query)as follows (conects to Oracle DB and execute the query:
download1<-function(){
if (require("dplyr")){
#install.packages("dplyr")
}
if (require("RODBC")){
#install.packages("RODBC")
}
library(RODBC)
library(dplyr)
# to establish connection with DB or schema
con <- odbcConnect("DB", uid="ANALYTICS", pwd="122334fgcx", rows_at_time = 500,believeNRows=FALSE)
# Check that connection is working (Optional)
odbcGetInfo(con)
# Query the database and put the results into the data frame "dataframe"
ptm <- proc.time()
x<-sqlQuery(con, "select * from my_table")
proc.time()-ptm
# to extract all field names to the separate vector
#field_names<-sqlQuery(con,"SELECT column_name FROM all_tab_cols WHERE table_name = 'MY_TABLE'")
close(con)
}
debug(download1(),text = "", condition = NULL)
Use
debug(download1)
download1()

R: How to use RJDBC to download blob data from oracle database?

Does anyone know of a way to download blob data from an Oracle database using RJDBC package?
When I do something like this:
library(RJDBC)
drv <- JDBC(driverClass=..., classPath=...)
conn <- dbConnect(drv, ...)
blobdata <- dbGetQuery(conn, "select blobfield from blobtable where id=1")
I get this message:
Error in .jcall(rp, "I", "fetch", stride) :
java.sql.SQLException: Ongeldig kolomtype.: getString not implemented for class oracle.jdbc.driver.T4CBlobAccessor
Well, the message is clear, but still I hope there is a way to download blobs. I read something about 'getBinary()' as a way of getting blob information. Can I find a solution in that direction?
The problem is that RJDBC tries to convert the SQL data type it reads to either double or String in Java. Typically the trick works because JDBC driver for Oracle has routines to convert different data types to String (accessed by getString() method of java.sql.ResultSet class). For BLOB, though, the getString() method has been discontinued from some moment. RJDBC still tries calling it, which results in an error.
I tried digging into the guts of RJDBC to see if I can get it to call proper function for BLOB columns, and apparently the solution requires modification of fetch S4 method in this package and also the result-grabbing Java class within the package. I'll try to get this patch to package maintainers. Meanwhile, quick and dirty fix using rJava (assuming conn and q as in your example):
s <- .jcall(conn#jc, "Ljava/sql/Statement;", "createStatement")
r <- .jcall(s, "Ljava/sql/ResultSet;", "executeQuery", q, check=FALSE)
listraws <- list()
col_num <- 1L
i <- 1
while(.jcall(r, 'Z', 'next')){
listraws[[i]] <- .jcall(r, '[B', 'getBytes', col_num)
i <- i + 1
}
This retrieves list of raw vectors in R. The next steps depend on the nature of data - in my application these vectors represent PNG images and can be handled pretty much as file connections by png package.
Done using R 3.1.3, RJDBC 0.2-5, Oracle 11-2 and OJDBC driver for JDK >= 1.6

How to read data from Cassandra with R?

I am using R 2.14.1 and Cassandra 1.2.11, I have a separate program which has written data to a single Cassandra table. I am failing to read them from R.
The Cassandra schema is defined like this:
create table chosen_samples (id bigint , temperature double, primary key(id))
I have first tried the RCassandra package (http://www.rforge.net/RCassandra/)
> # install.packages("RCassandra")
> library(RCassandra)
> rc <- RC.connect(host ="192.168.33.10", port = 9160L)
> RC.use(rc, "poc1_samples")
> cs <- RC.read.table(rc, c.family="chosen_samples")
The connection seems to succeed but the parsing of the table into data frame fails:
> cs
Error in data.frame(..dfd. = c("#\"ffffff", "#(<cc><cc><cc><cc><cc><cd>", :
duplicate row.names:
I have also tried using JDBC connector, as described here: http://www.datastax.com/dev/blog/big-analytics-with-r-cassandra-and-hive
> # install.packages("RJDBC")
> library(RJDBC)
> cassdrv <- JDBC("org.apache.cassandra.cql.jdbc.CassandraDriver", "/Users/svend/dev/libs/cassandra-jdbc-1.2.5.jar", "`")
But this one fails like this:
Error in .jfindClass(as.character(driverClass)[1]) : class not found
Even though the location to the java driver is correct
$ ls /Users/svend/dev/libs/cassandra-jdbc-1.2.5.jar
/Users/svend/dev/libs/cassandra-jdbc-1.2.5.jar
You have to download apache-cassandra-2.0.10-bin.tar.gz and cassandra-jdbc-1.2.5.jar and cassandra-all-1.1.0.jar.
There is no need to install Cassandra on your local machine; just put the cassandra-jdbc-1.2.5.jar and the cassandra-all-1.1.0.jar files in the lib directory of unziped apache-cassandra-2.0.10-bin.tar.gz. Then you can use
library(RJDBC)
drv <- JDBC("org.apache.cassandra.cql.jdbc.CassandraDriver",
list.files("D:/apache-cassandra-2.0.10/lib",
pattern="jar$",full.names=T))
That is working on my unix but not on my windows machine.
Hope that helps.
This question is old now, but since it's the one of the top hits for R and Cassandra I thought I'd leave a simple solution here, as I found frustratingly little up-to-date support for what I thought would be a fairly common task.
Sparklyr makes this pretty easy to do from scratch now, as it exposes a java context so the Spark-Cassandra-Connector can be used directly. I've wrapped up the bindings in this simple package, crassy, but it's not necessary to use.
I mostly made it to demystify the config around how to make sparklyr load the connector, and as the syntax for selecting a subset of columns is a little unwieldy (assuming no Scala knowledge).
Column selection and partition filtering are supported. These were the only features I thought were necessary for general Cassandra use cases, given CQL can't be submitted directly to the cluster.
I've not found a solution to submitting more general CQL queries which doesn't involve writing custom scala, however there's an example of how this can work here.
Right, I found an (admittedly ugly) way, simply by calling python from R, parsing the NA manually and re-assigning the data-frames names in R, like this
# install.packages("rPython")
# (don't forget to "pip install cql")
library(rPython)
python.exec("import sys")
# adding libraries from virtualenv
python.exec("sys.path.append('/Users/svend/dev/pyVe/playground/lib/python2.7/site-packages/')")
python.exec("import cql")
python.exec("connection=cql.connect('192.168.33.10', cql_version='3.0.0')")
python.exec("cursor = connection.cursor()")
python.exec("cursor.execute('use poc1_samples')")
python.exec("cursor.execute('select * from chosen_samples' )")
# coding python None into NA (rPython seem to just return nothing )
python.exec("rep = lambda x : '__NA__' if x is None else x")
python.exec( "def getData(): return [rep(num) for line in cursor for num in line ]" )
data <- python.call("getData")
df <- as.data.frame(matrix(unlist(data), ncol=15, byrow=T))
names(df) <- c("temperature", "maxTemp", "minTemp",
"dewpoint", "elevation", "gust", "latitude", "longitude",
"maxwindspeed", "precipitation", "seelevelpressure", "visibility", "windspeed")
# and decoding NA's
parsena <- function (x) if (x=="__NA__") NA else x
df <- as.data.frame(lapply(df, parsena))
Anybody has a better idea?
I had the same error message when executing Rscript with RJDBC connection via batch file (R 3.2.4, Teradata driver).
Also, when run in RStudio it worked fine in the second run but not first.
What helped was explicitly call:
library(rJava)
.jinit()
It not enough to just download the driver, you have to also download the dependencies and put them into your JAVA ClassPath (MacOS: /Library/Java/Extensions) as stated on the project main page.
Include the Cassandra JDBC dependencies in your classpath : download dependencies
As of the RCassandra package, right now it's still too primitive compared to RJDBC.

Resources