The dbWriteTable function in RPostgreSQL seems to ignore column names and tries to push data from R to PostgreSQL as-is. This is problematic when appending to existing tables, particularly if there are columns un-specified in the R object that should be given default values.
RMySQL handles this case very gracefully by adding the column names to LOAD DATA LOCAL INFILE. How do I force RPostgreSQL to assign default values to un-specified columns in dbWriteTable when append=TRUE?
Here is an example:
CREATE TABLE test (
column_a varchar(255) not null default 'hello',
column_b integer not null
);
insert into test values (DEFAULT, 1);
Which yields the following table:
select * from test;
column_a | column_b
----------+----------
hello | 1
(1 row)
I want to insert some new data to this table from R:
require('RPostgreSQL')
driver <- PostgreSQL()
con <- dbConnect(driver, host='localhost', dbname='development')
set.seed(42)
x <- data.frame(column_b=sample(1:100, 10))
dbWriteTable(con, name='test', value=x, append=TRUE, row.names=FALSE)
dbDisconnect(con)
But I get the following error:
Error in postgresqlgetResult(new.con) :
RS-DBI driver: (could not Retrieve the result : ERROR: missing data for
column "column_b"
CONTEXT: COPY test, line 1: "92"
)
This is because I have not specified the column_a field, so dbWriteTable is trying to write the data for column_b into column_a. I would like to force dbWriteTable to use the defaults for column_a, and properly write column_b to column_b.
I should only get a failure if:
I fail to specify a column with no default value
I try to insert a column that doesn't exist in the table
I insert the wrong datatype into an existing column
I had exactly the same problem, this fixed it.
Check out the dbWriteTable2 function from package caroline.
The code then allows you to write a data frame without an id column into the database using add_id = TRUE, e.g.
dbWriteTable2(con_psql,"domains",data_domains,append=TRUE,overwrite=FALSE,row.names=FALSE,add.id=TRUE)
Related
I created a table in Oracle like
Create table t1
(id_record NUMERIC GENERATED AS IDENTITY START WITH 500000 NOT NULL,
col1 numeric(2,0),
col2 varchar(10),
primary key(id_record))
where id_record is identity column the value of which is generated automatically when appending data to table.
I create a data.frame in R with 2 columns (table_in_R <- data.frame(col1, col2)). Let's skip the values of data frame for simplicity reasons.
When I append data from R to Oracle db using the following code
dbWriteTable(con, 't1', table_in_R,
append =T, row.names=F, overwrite = F)
where con is a connection object the error ORA-00947 arises and no data appended.
When I slightly modify my code (append = F, overwrite = T).
dbWriteTable(con_dwh, 't1', table_in_R,
append =FALSE, row.names=F, overwrite = TRUE)
the data is appended, but the identity column id_record is dropped.
How can I append data to Oracle db without dropping the identity column?
I'd never (based on this answer) recommend this one step approach where the dbWriteTabledirectly maintains the target table.
Instead I'd recommend a two step approach, where the R part fills a temporary table (with overwrite = T i.e. DROP and CREATE)
df <- data.frame(col1 = c(as.integer(1),as.integer(0)), col2 = c('x',NA))
dbWriteTable(jdbcConnection,"TEMP", df, rownames=FALSE, overwrite = TRUE, append = FALSE)
In the second step you simple adds the new rows to the target table using
insert into t1(col1,col2) select col1,col2 from temp;
You may call it direct with a database connection or also from R:
res <- dbSendUpdate(jdbcConnection,"insert into t1(col1,col2) select col1,col2 from temp")
Note there is a workaround anyway:
Define the identity column as
id_record NUMERIC GENERATED BY DEFAULT ON NULL AS IDENTITY
This configuration of the identity column provides the correct sequence value instead of the NULL value - but you will fail on the above linked problem of Inserting NULL in a Number column.
So the second trick is to use a character NA in the data.frame
Add the
identity column to your data.frame and fill it with all as.character(NA).
df <- data.frame(id_record =c(as.character(NA),as.character(NA) ), col1 = c(as.integer(1),as.integer(0)), col2 = c('x',NA))
dbWriteTable(jdbcConnection,"T1", df, rownames=FALSE, overwrite = F, append = T)
Test works fine, but as mentioned I'd recommend the two step approach.
I am using RJDBC and dbWriteTable to write a data.table into an existing SQL Server database table.
Here is my sample data: mtcars
After I get connection to DB, I am using dbWriteTable to create a DB table "mtcars".
dbWriteTable(conn, "mtcars", mtcars[1:5, ])
Next use append=T to insert two rows:
dbWriteTable(conn.pre.alg, "mtcars",mtcars[6:7, ], append = T)
Then I set a NA in a row:
mtcars[8, 2] = NA
I can insert the record without any problem.
dbWriteTable(conn.pre.alg, "mtcars",mtcars[8, ], append = T)
But when I set NA in a row and try to insert two rows:
mtcars[9:10, 2] = NA
dbWriteTable(conn.pre.alg, "mtcars",mtcars[9:10, ], append = T)
I get an error:
Error in .local(conn, statement, ...) :
execute JDBC update query failed in dbSendUpdate (The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 4 (""): The supplied value is not a valid instance of data type float. Check the source data for invalid values. An example of an invalid value is data of numeric type with scale greater than precision.)
I tried to set field.types, but I still get the same error.
I am trying to use RSQLite to read in tables from my database. All the tables have column names with ".".
For example: my test table has 2 columns: index, first.name
How do I write a query to filter test table with first name column:
My code is:
dbGetQuery(con,"SELECT * FROM test WHERE 'first.name' = 'Joe'")
and it gave me an error:
Error: no such column: first.name
The below should work: Adding []
dbGetQuery(con,"SELECT * FROM test WHERE [first.name] = 'Joe'")
See the below thread:
How to write a column name with dot (".") in the SELECT clause?
Similar symptoms as create a table from a dataframe using RODBC.
First Attempt
I'm connecting to ORACLE successfully with readOnly=FALSE, and can retrieve data using sqlQuery. I created a test table RTest with a single NUMBER(16,8) field called Value. I can insert into the table from R using:
sqlQuery(channel,"insert into RTest (value) values(25)")
So I appear to have WRITE permissions.
Next Attempt
Following several internet examples, I created an R data.frame test with a single row and column and named Value where I attempted:
sqlSave(channel, test, "RTest", fast=FALSE, append=TRUE)
and with safer=TRUE. I receive an Oracle error:
ORA-00922 - missing or invalid option. and RODBC error: "Could not SQLExecDirect 'CREATE TABLE DSN=...."
So, sqlSave appears to be attempting to create the table. I've added and deleted options: rownames=FALSE, colnames=FALSE, safer=TRUE/FALSE, varTypes=list, and varInfo=list to no avail.
Final Attempt
Next I deleted the table in ORACLE and used sqlSave with safer=TRUE that should have created the table based on the data.frame structure but receive same error.
Eventually, I need to periodically read data files and process in R then upload 100+ fields with millions of rows. Queries against an Oracle table will help with memory requirements to perform analyses and preclude reading the data into R from zipped files every time.
The help articles on inserting a data.frame into ORACLE via RODBC appear simple with default options. I thought this was a no-brainer, but this error keeps cropping up. Any clues?
New Information
Some progress:
1) Noted that connection uses case="nochange" and ORACLE defaults for tables and fields within tables is upper case. sqlSave attempts used lower case. Error changed when coordinated table name case.
2) Discovered through trial and error that ORACLE required a two part name USERNAME.TableName.
3) R numeric columns are double, oracle fields were defined NUMERIC(16,8) [and later (34,17)] which should hold the necessary digits. R generated data type errors trying to convert binary double. So switched Oracle to binary double and that error disappeared.
Response to Information Requests and New Problem Description
See following session text. Note that the local server name was replaced with {DummyServerName}, and the user name with {UserName}.
** Create ORACLE table to match base R data frame USArrests**
## in Oracle create table to match data types in demo data.frame USArrests
SQL> CREATE TABLE "{UserName}"."RTEST"
2 ("Murder" BINARY_DOUBLE,
3 "Assault" DECIMAL,
4 "UrbanPop" DECIMAL,
5 "Rape" BINARY_DOUBLE,
6 "State" VARCHAR2(255 BYTE)
7 );
Table created.
SQL> desc RTEST
Name Null? Type
----------------------------------------- -------- ---------------
Murder BINARY_DOUBLE
Assault NUMBER(38)
UrbanPop NUMBER(38)
Rape BINARY_DOUBLE
State VARCHAR2(255)
** Connection Details**
## establish ORACLE connection in R
library(RODBC)
channel <- odbcConnect("{DummyServerName}", readOnly=FALSE, connection="TNS", case="nochange", believeNRows=FALSE)
odbcGetInfo(channel)
DBMS_Name DBMS_Ver Driver_ODBC_Ver Data_Source_Name Driver_Name
"Oracle" "11.02.0030" "03.52" "Default" "SQORA32.DLL"
Driver_Ver ODBC_Ver Server_Name
"11.02.0003" "03.80.0000" "{DummyServerName}"
The odbcGetInfo results indicate successful connection
**Insert manually into at table from R **
### manually insert value into table to show write capability
y <- sqlQuery(channel,"insert into {UserName}.RTEST values(25,26,27,28,'JUNK')")
y
character(0)
examine the table
x <- sqlQuery(channel,"select * from RTest")
x
Murder Assault UrbanPop Rape State
1 25 26 27 28 JUNK
So a manual insertion is successful. I can insert from R, but so far not from sqlSave.
Attempt to add value from data.frame using sqlSave
##use data frame from R examples
attach(USArrests)
The following objects are masked from USArrests (pos = 3):
Assault, Murder, Rape, UrbanPop
##capture the state names from the rownames to match ORACLE table structure
State<-rownames(USArrests)
Arrests2<-cbind(USArrests[,1:4],State)
typeof(Arrests2)
[1] "list"
class(Arrests2)
[1] "data.frame"
Now attempt to populate the ORACLE table using sqlSave...
sqlSave(channel,Arrests2,tablename="{UserName}.RTEST", fast=FALSE, safer=TRUE,rownames=FALSE)
Error in sqlSave(channel, Arrests2, tablename = "{UserName}.RTEST", fast = FALSE, :
table ‘{UserName}.RTEST’ already exists
** OK - safer=TRUE should have appended, but see if forcing an append will work using append=TRUE instead **
sqlSave(channel,Arrests2,tablename="{UserName}.RTEST", fast=FALSE, append=TRUE, rownames=FALSE)
Error in sqlSave(channel, Arrests2, tablename = "{UserName}.RTEST", fast = FALSE, :
unable to append to table ‘{UserName}.RTEST’
** now let R create the table **
##In ORACLE...
SQL> drop table RTEST;
Table dropped.
back in R - safer=TRUE should create the table if it does not exist
apply sqlSave using ?sqlSave defaults, omit rownames to match table structure
sqlSave(channel,Arrests2,tablename="{UserName}.RTEST", fast=FALSE, safer=TRUE,rownames=FALSE)
Error in sqlSave(channel, Arrests2, tablename = "{UserName}.RTEST", fast = FALSE, :
HY000 902 [Oracle][ODBC][Ora]ORA-00902: invalid datatype
[RODBC] ERROR: Could not SQLExecDirect 'CREATE TABLE {UserName}.RTEST
(DSN={DummyServerName}MurderDSN={DummyServerName} binary_double,
DSN={DummyServerName}AssaultDSN={DummyServerName} decimal,
DSN={DummyServerName}UrbanPopDSN={DummyServerName} decimal,
DSN={DummyServerName}RapeDSN={DummyServerName} binary_double,
DSN={DummyServerName}StateDSN={DummyServerName} varchar(255))'
** follow the sqlSave example exactly - this should create table USArrests **
sqlSave(channel, USArrests, rownames = "state", addPK=TRUE)
Error in sqlSave(channel, USArrests, rownames = "state", addPK = TRUE) :
HY000 922 [Oracle][ODBC][Ora]ORA-00922: missing or invalid option
[RODBC] ERROR: Could not SQLExecDirect 'CREATE TABLE DSN={DummyServerName}USArrestsDSN={DummyServerName}
(DSN={DummyServerName}stateDSN={DummyServerName} varchar(255) NOT NULL PRIMARY KEY,
DSN={DummyServerName}MurderDSN={DummyServerName} binary_double,
DSN={DummyServerName}AssaultDSN={DummyServerName} decimal,
DSN={DummyServerName}UrbanPopDSN={DummyServerName} decimal,
DSN={DummyServerName}RapeDSN={DummyServerName} binary_double)'
**perhaps the uname qualifier is needed in tablename - force the table names **
sqlSave(channel, USArrests, tablename="{UserName}.USARRESTS", rownames = "state", addPK=TRUE)
Error in sqlSave(channel, USArrests, tablename = "{UserName}.USARRESTS", :
HY000 902 [Oracle][ODBC][Ora]ORA-00902: invalid datatype
[RODBC] ERROR: Could not SQLExecDirect 'CREATE TABLE {UserName}.USARRESTS
(DSN={DummyServerName}stateDSN={DummyServerName} varchar(255) NOT NULL PRIMARY KEY,
DSN={DummyServerName}MurderDSN={DummyServerName} binary_double,
DSN={DummyServerName}AssaultDSN={DummyServerName} decimal,
DSN={DummyServerName}UrbanPopDSN={DummyServerName} decimal,
DSN={DummyServerName}RapeDSN={DummyServerName} binary_double)'
**Finally, keep uname qualifier - force the table names, and use safer=TRUE that should create the table **
sqlSave(channel, USArrests, tablename="{UserName}.USARRESTS", rownames = "state", addPK=TRUE, safer=TRUE)
Error in sqlSave(channel, USArrests, tablename = "{UserName}.USARRESTS", :
HY000 902 [Oracle][ODBC][Ora]ORA-00902: invalid datatype
[RODBC] ERROR: Could not SQLExecDirect 'CREATE TABLE {UserName}.USARRESTS
(DSN={DummyServerName}stateDSN={DummyServerName} varchar(255) NOT NULL PRIMARY KEY,
DSN={DummyServerName}MurderDSN={DummyServerName} binary_double,
DSN={DummyServerName}AssaultDSN={DummyServerName} decimal,
DSN={DummyServerName}UrbanPopDSN={DummyServerName} decimal,
DSN={DummyServerName}RapeDSN={DummyServerName} binary_double)'
Still confused. Closest success was with existing table and "safer=true" but the insert failed. Examining the sqlSave code that generates the error message shows that error occurs when sqlwrite returns -1, which I assume is a failure. Failure of sqlSave examples may indicate a local ORACLE
I would like to use the RODBC package to partially overwrite a Microsoft Access table with a data frame. Rather than overwriting the entire table, I am looking for a way in which to remove only specific rows from that table -- and then to append my data frame to its end.
My method for appending the frame is pretty straightforward. I would use the following function:
sqlSave(ch, df, tablename = "accessTable", rownames = F, append = T)
The challenge is finding a function that will allow me to clear specific row numbers from the Access table ahead of time. The sqlDrop and sqlClear functions do not seem to get me there, since they will either delete or clear the entire table as a whole.
Any recommendation to achieve this task would be much appreciated!
Indeed, consider using sqlQuery to subset your Access table of the rows you want to keep, then rbind with current dataframe and finally sqlSave, purposely overwriting original Access table with append = FALSE.
# IMPORT QUERY RESULTS INTO DATAFRAME
keeprows <- sqlQuery(ch, "SELECT * FROM [accesstable] WHERE timedata >= somevalue")
# CONCATENATE df to END
finaldata <- rbind(keeprows, df)
# OVERWRITE ORIGINAL ACCESS TABLE
sqlSave(ch, finaldata, tablename = "accessTable", rownames = FALSE, append = FALSE)
Of course you can also do the counter, deleting rows from table per specified logic and then appending (NOT overwriting) with sqlSave:
# ACTION QUERY TO RUN IN DATABASE
sqlQuery(ch, "DELETE FROM [accesstable] WHERE timedata <= somevalue")
# APPEND TO ACCESS TABLE
sqlSave(ch, df, tablename = "accessTable", rownames = FALSE, append = TRUE)
The key is finding the SQL logic that specifies the rows you intend to keep.