Writing an R dataset to Oracle with very large fields (8000 char) - r

I need to write an R dataset to an Oracle database using R package ROracle version 1.3-1, R version 3.4.1, Oracle OraClient 11g home and am new to R.
The dataset included variables of several different data types and lengths, including several character type up to 8000 characters long.
Using dbWriteTable
dbWriteTable(conn, "OracleTableName", df)
I get this error:
Error in .oci.WriteTable(conn, name, value, row.names = row.names,
overwrite = overwrite, :
Error in .oci.GetQuery(con, stmt, data = value) :
ORA-01461: can bind a LONG value only for insert into a LONG column
or this
Error in .oci.GetQuery(con, stmt) :
ORA-02263: need to specify the datatype for this column
or this
drv <-
dbDriver("Oracle")
conn <-
dbConnect(
drv,
username = "username",
password = "password",
dbname = "dbname")
test.df1 <- subset(
df, select=c(
Var, Var2, Var3,
Var4, Var5, Var6))
dat <- as.character(test.df1)
attr(dat, "ora.type") <- "clob"
dbWriteTable(conn, "test2", dat)
returns this
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘dbWriteTable’ for
signature ‘"OraConnection", "character", "character"’
From researching the error, it appears that the first error is indicating that the larger fields - BLOB fields - are not being recognized as BLOB by Oracle.
Documentation indicates that ROracle version 1.3-1 should be able to handle larger datatypes. It suggests using attribute to map NCHAR, CLOB, BLOB, NCLOB columns correctly in dbWriteTable. I have not been able to follow this example successfully as I keep getting the same error. Perhaps I just need a different example than that provided in the documentation?
Initially, I was using the RODBC package, but found that it's known that RODBC does not handle large datatypes (BLOB).
Any assistance or advice is appreciated.

Related

rhandsontable writing NULL values to SQL database using dbWriteTable

I'm creating the following table using library(rhandsontable)
rhandsontable(df, className = "htCenter", stretchH = "all", columnSorting = TRUE) %>%
hot_col("Date", type = "date", dateFormat = "YYYY-MM-DD") %>%
hot_cols(format = '1')
The table is converted to a data frame dfHot <- hot_to_r(input$hot) and then copied to a SSMS SQL database with
dbWriteTable(con, "TEST", dfHot, append = TRUE, overwrite = FALSE, row.names = FALSE)
The problem is the table can contain NULL values which appear as NA in R, but I need them to come across as NULL into SQL.
I'm getting this error when NA values are present in the table:
Warning: Error in .local: execute JDBC update query failed in
dbSendUpdate (The incoming tabular data stream (TDS) remote procedure
call (RPC) protocol stream is incorrect. Parameter 6 (""): The
supplied value is not a valid instance of data type float. Check the
source data for invalid values. An example of an invalid value is data
of numeric type with scale greater than precision.)
If no NA values are present, the table imports without any issues.
Instead of using dbWriteTable, I resolved this issue by inserting the VALUES into SQL. Details found here: https://www.pmg.com/blog/insert-r-data-frame-sql%EF%BB%BF/

RSQLite dbWriteTable not working on large data

Here is my code, where I am trying to write a data from R to SQLite database file.
library(DBI)
library(RSQLite)
library(dplyr)
library(data.table)
con <- dbConnect(RSQLite::SQLite(), "data.sqlite")
### Read the file you want to load to the SQLite database
data <- read_rds("data.rds")
dbSafeNames = function(names) {
names = gsub('[^a-z0-9]+','_',tolower(names))
names = make.names(names, unique=TRUE, allow_=TRUE)
names = gsub('.','_',names, fixed=TRUE)
names
}
colnames(data) = dbSafeNames(colnames(data))
### Load the dataset to the SQLite database
dbWriteTable(conn=con, name="data", value= data, row.names=FALSE, header=TRUE)
While writing the 80GB data, I see the size of the data.sqlite increasing upto 45GB and then it stops and throws the following error.
Error in rsqlite_send_query(conn#ptr, statement) : disk I/O error
Error in rsqlite_send_query(conn#ptr, statement) :
no such savepoint: dbWriteTable
What is the fix and what should I do? If it's only with RSQLite, please suggest the most robust database creation method like RMySQL, RPostgreSQL, etc.

Using R, Error in rsqlite_send_query(conn#ptr, statement) : too many SQL variables

I uses sqldfpackage to make SQLite database, my matrix dimension is 2880x1951. I write the table on the SQLite database, unfortunately it asks Error in rsqlite_send_query(conn#ptr, statement) : too many SQL variables. I read from SQlite website if the limitation of number variables use is limited to 999. Is there a simple way to increase the value of this?
Here is my syntax:
db <- dbConnect(SQLite(), dbname="xxx.sqlite")
bunch_vis <- read.csv("xxx.csv")
dbWriteTable(db, name = "xxx", value = xxx,
row.names = FALSE, header = TRUE)
and the output:
Error in rsqlite_send_query(conn#ptr, statement) : too many SQL variables

Errors using RODBC to Create/Populate ORACLE table

Similar symptoms as create a table from a dataframe using RODBC.
First Attempt
I'm connecting to ORACLE successfully with readOnly=FALSE, and can retrieve data using sqlQuery. I created a test table RTest with a single NUMBER(16,8) field called Value. I can insert into the table from R using:
sqlQuery(channel,"insert into RTest (value) values(25)")
So I appear to have WRITE permissions.
Next Attempt
Following several internet examples, I created an R data.frame test with a single row and column and named Value where I attempted:
sqlSave(channel, test, "RTest", fast=FALSE, append=TRUE)
and with safer=TRUE. I receive an Oracle error:
ORA-00922 - missing or invalid option. and RODBC error: "Could not SQLExecDirect 'CREATE TABLE DSN=...."
So, sqlSave appears to be attempting to create the table. I've added and deleted options: rownames=FALSE, colnames=FALSE, safer=TRUE/FALSE, varTypes=list, and varInfo=list to no avail.
Final Attempt
Next I deleted the table in ORACLE and used sqlSave with safer=TRUE that should have created the table based on the data.frame structure but receive same error.
Eventually, I need to periodically read data files and process in R then upload 100+ fields with millions of rows. Queries against an Oracle table will help with memory requirements to perform analyses and preclude reading the data into R from zipped files every time.
The help articles on inserting a data.frame into ORACLE via RODBC appear simple with default options. I thought this was a no-brainer, but this error keeps cropping up. Any clues?
New Information
Some progress:
1) Noted that connection uses case="nochange" and ORACLE defaults for tables and fields within tables is upper case. sqlSave attempts used lower case. Error changed when coordinated table name case.
2) Discovered through trial and error that ORACLE required a two part name USERNAME.TableName.
3) R numeric columns are double, oracle fields were defined NUMERIC(16,8) [and later (34,17)] which should hold the necessary digits. R generated data type errors trying to convert binary double. So switched Oracle to binary double and that error disappeared.
Response to Information Requests and New Problem Description
See following session text. Note that the local server name was replaced with {DummyServerName}, and the user name with {UserName}.
** Create ORACLE table to match base R data frame USArrests**
## in Oracle create table to match data types in demo data.frame USArrests
SQL> CREATE TABLE "{UserName}"."RTEST"
2 ("Murder" BINARY_DOUBLE,
3 "Assault" DECIMAL,
4 "UrbanPop" DECIMAL,
5 "Rape" BINARY_DOUBLE,
6 "State" VARCHAR2(255 BYTE)
7 );
Table created.
SQL> desc RTEST
Name Null? Type
----------------------------------------- -------- ---------------
Murder BINARY_DOUBLE
Assault NUMBER(38)
UrbanPop NUMBER(38)
Rape BINARY_DOUBLE
State VARCHAR2(255)
** Connection Details**
## establish ORACLE connection in R
library(RODBC)
channel <- odbcConnect("{DummyServerName}", readOnly=FALSE, connection="TNS", case="nochange", believeNRows=FALSE)
odbcGetInfo(channel)
DBMS_Name DBMS_Ver Driver_ODBC_Ver Data_Source_Name Driver_Name
"Oracle" "11.02.0030" "03.52" "Default" "SQORA32.DLL"
Driver_Ver ODBC_Ver Server_Name
"11.02.0003" "03.80.0000" "{DummyServerName}"
The odbcGetInfo results indicate successful connection
**Insert manually into at table from R **
### manually insert value into table to show write capability
y <- sqlQuery(channel,"insert into {UserName}.RTEST values(25,26,27,28,'JUNK')")
y
character(0)
examine the table
x <- sqlQuery(channel,"select * from RTest")
x
Murder Assault UrbanPop Rape State
1 25 26 27 28 JUNK
So a manual insertion is successful. I can insert from R, but so far not from sqlSave.
Attempt to add value from data.frame using sqlSave
##use data frame from R examples
attach(USArrests)
The following objects are masked from USArrests (pos = 3):
Assault, Murder, Rape, UrbanPop
##capture the state names from the rownames to match ORACLE table structure
State<-rownames(USArrests)
Arrests2<-cbind(USArrests[,1:4],State)
typeof(Arrests2)
[1] "list"
class(Arrests2)
[1] "data.frame"
Now attempt to populate the ORACLE table using sqlSave...
sqlSave(channel,Arrests2,tablename="{UserName}.RTEST", fast=FALSE, safer=TRUE,rownames=FALSE)
Error in sqlSave(channel, Arrests2, tablename = "{UserName}.RTEST", fast = FALSE, :
table ‘{UserName}.RTEST’ already exists
** OK - safer=TRUE should have appended, but see if forcing an append will work using append=TRUE instead **
sqlSave(channel,Arrests2,tablename="{UserName}.RTEST", fast=FALSE, append=TRUE, rownames=FALSE)
Error in sqlSave(channel, Arrests2, tablename = "{UserName}.RTEST", fast = FALSE, :
unable to append to table ‘{UserName}.RTEST’
** now let R create the table **
##In ORACLE...
SQL> drop table RTEST;
Table dropped.
back in R - safer=TRUE should create the table if it does not exist
apply sqlSave using ?sqlSave defaults, omit rownames to match table structure
sqlSave(channel,Arrests2,tablename="{UserName}.RTEST", fast=FALSE, safer=TRUE,rownames=FALSE)
Error in sqlSave(channel, Arrests2, tablename = "{UserName}.RTEST", fast = FALSE, :
HY000 902 [Oracle][ODBC][Ora]ORA-00902: invalid datatype
[RODBC] ERROR: Could not SQLExecDirect 'CREATE TABLE {UserName}.RTEST
(DSN={DummyServerName}MurderDSN={DummyServerName} binary_double,
DSN={DummyServerName}AssaultDSN={DummyServerName} decimal,
DSN={DummyServerName}UrbanPopDSN={DummyServerName} decimal,
DSN={DummyServerName}RapeDSN={DummyServerName} binary_double,
DSN={DummyServerName}StateDSN={DummyServerName} varchar(255))'
** follow the sqlSave example exactly - this should create table USArrests **
sqlSave(channel, USArrests, rownames = "state", addPK=TRUE)
Error in sqlSave(channel, USArrests, rownames = "state", addPK = TRUE) :
HY000 922 [Oracle][ODBC][Ora]ORA-00922: missing or invalid option
[RODBC] ERROR: Could not SQLExecDirect 'CREATE TABLE DSN={DummyServerName}USArrestsDSN={DummyServerName}
(DSN={DummyServerName}stateDSN={DummyServerName} varchar(255) NOT NULL PRIMARY KEY,
DSN={DummyServerName}MurderDSN={DummyServerName} binary_double,
DSN={DummyServerName}AssaultDSN={DummyServerName} decimal,
DSN={DummyServerName}UrbanPopDSN={DummyServerName} decimal,
DSN={DummyServerName}RapeDSN={DummyServerName} binary_double)'
**perhaps the uname qualifier is needed in tablename - force the table names **
sqlSave(channel, USArrests, tablename="{UserName}.USARRESTS", rownames = "state", addPK=TRUE)
Error in sqlSave(channel, USArrests, tablename = "{UserName}.USARRESTS", :
HY000 902 [Oracle][ODBC][Ora]ORA-00902: invalid datatype
[RODBC] ERROR: Could not SQLExecDirect 'CREATE TABLE {UserName}.USARRESTS
(DSN={DummyServerName}stateDSN={DummyServerName} varchar(255) NOT NULL PRIMARY KEY,
DSN={DummyServerName}MurderDSN={DummyServerName} binary_double,
DSN={DummyServerName}AssaultDSN={DummyServerName} decimal,
DSN={DummyServerName}UrbanPopDSN={DummyServerName} decimal,
DSN={DummyServerName}RapeDSN={DummyServerName} binary_double)'
**Finally, keep uname qualifier - force the table names, and use safer=TRUE that should create the table **
sqlSave(channel, USArrests, tablename="{UserName}.USARRESTS", rownames = "state", addPK=TRUE, safer=TRUE)
Error in sqlSave(channel, USArrests, tablename = "{UserName}.USARRESTS", :
HY000 902 [Oracle][ODBC][Ora]ORA-00902: invalid datatype
[RODBC] ERROR: Could not SQLExecDirect 'CREATE TABLE {UserName}.USARRESTS
(DSN={DummyServerName}stateDSN={DummyServerName} varchar(255) NOT NULL PRIMARY KEY,
DSN={DummyServerName}MurderDSN={DummyServerName} binary_double,
DSN={DummyServerName}AssaultDSN={DummyServerName} decimal,
DSN={DummyServerName}UrbanPopDSN={DummyServerName} decimal,
DSN={DummyServerName}RapeDSN={DummyServerName} binary_double)'
Still confused. Closest success was with existing table and "safer=true" but the insert failed. Examining the sqlSave code that generates the error message shows that error occurs when sqlwrite returns -1, which I assume is a failure. Failure of sqlSave examples may indicate a local ORACLE

RecordLinkage Package and RLBigDataLinkage-Class Objects

I am attempting to use R package RecordLinkage, and am using two articles by the package authors as usage guides, in addition to the package documentation.
I am using 2 large datasets (100k+ rows), which I hope to link, and so I am using those elements of the package which are built around S4 class RLBigDataLinkage.
I begin by running the following lines in R:
>library('RecordLinkage')
>data1 <- as.data.frame(#source)
>data2 <- as.data.frame(#source)
>rpairs <- RLBigDataLinkage(data1, data2, strcmp = 2:8, exclude = 9:10)
This works fine (though it takes some time), and writes the necessary .ff files to deal with the large data sets.
If I then try:
>rpairs <- epiWeights(rpairs)
Or:
>rpairs <- epiWeights(rpairs, e = 0.01, f = getFrequencies(rpairs))
Then when I run:
>summary(rpairs)
I get the error message:
Error in dbGetQuery(object#con, "select count(*) from data1") :
error in evaluating the argument 'conn' in selecting a method for function 'dbGetQuery': Error: no slot of name "con" for this object of class "RLBigDataLinkage"
If, on the other hand, I run:
>result <- epiClassify(rpairs, 0.5)
>getTable(result)
I get the error message:
Error in table.ff(object#data#pairs$is_match, object#prediction, useNA = "ifany") :
Only vmodes integer currently allowed - are you sure ... contains only factors or integers?
I'm clearly missing something about how these objects need to be handled. Does anyone have any experience with this package that sees my error? Thanks kindly.
when the type of 'rpairs' is 'RLBigDataLinkage' use print(rpairs) ,you will get the summary of rpairs.

Resources