Inserting Data into an Oracle Table - r

I am very new to R, so please forgive any obvious or naive errors. I need to insert multiple rows of data from R into an Oracle database table.
Make the data frame (I have made the RJDBC connection earlier in the script):
df <- data.frame("field_1" = 1:2, "field_2" = c("f","k"), "field_3"= c("j","t"))
This code runs without error, but inserts only the first row into the table:
insert <- sprintf("insert into temp_r_test_u_suck values (%s')",
apply(df, 1, function(i) gsub(" ", "", paste("'", i, collapse="',"), fixed = TRUE)))
dbSendUpdate(con, insert)
This code runs:
insert <- sprintf("into temp_r_test_u_suck values (%s')",
apply(df, 1, function(i) gsub(" ", "", paste("'", i, collapse="',"), fixed = TRUE)))
insert_all <- c("insert all", insert, "select * from dual")
dbSendUpdate(con, insert_all)
But gives me this error:
Error in .local(conn, statement, ...) :
execute JDBC update query failed in dbSendUpdate (ORA-00905: missing keyword
Both of the queries work on their own in Oracle. WHAT am I doing wrong?
Thank you!

Multiple SQL statements are not supported in dbGetQuery, dbSendQuery, dbSendUpdate calls. You need to iterate through them for each statement. Hence, why only the first statement processes. To resolve, extend the anonymous function inside apply to call dbSendUpdate:
apply(df, 1, function(i) {
# BUILD SQL STATEMENT
insert <- sprintf("insert into temp_r_test_u_suck values (%s')",
paste0("'", i, collapse="',"))
# RUN QUERY
dbSendUpdate(con, insert)
})
However, RJDBC extends the DBI standard by supporting parameterization with dbSendUpdate as mentioned in rForge docs for bulk-inserts with no need for iteratively concatenating strings.
dbSendUpdate(conn, statement, ...) This function is analogous to
dbSendQuery, but works with DBML statements and thus doesn't return a
result set. It is more efficient than dbSendQuery. In addition, as of
RJDBC 0.2-9 it supports vectors in prepared statements which allows
bulk-inserts.
# ALL CHARACTER DATAFRAME
df <- data.frame(field_1=as.character(1:2), field_2=c("f","k"), field_3=c("j","t"),
stringsAsFactors=FALSE)
# PREPARED STATEMENT
sql <- "insert into temp_r_test_u_suck values (?, ?, ?)"
# RUN QUERY
dbSendUpdate(con, sql, df$field_1, df$field_2, df$field_3)

Related

R - Concatenation of string and varaable using the RODBC library

I would like to run some queries with RStudio using the RODBC library. Normally, code like this works fine:
query_6 <- sqlQuery(con, "Select * from my_table where condition = more_than_sth")
I prefer to have some variable which will be defined by me before and stay for more_than_sth. Lets says it is x. Is there any method which would make me able to put this variable into the query string? Should I use some kind of paste, maybe before, or put it in directly?
Regards,
Rafał
Concatenate function in R is paste, it automatically append a whitespace between each object, you can remove them by using paste(..., sep = "") or paste0().
more_than_sth <- "x"
query_6 <- sqlQuery(con, paste0("Select * from my_table where condition ='", more_than_sth, "'"))

Insert into table RODBC

I would like to insert contents of a dataframe into an existing table in an oracle database.
sqlSave(conn, df[1:3,c(which(names(df) == "x"), which(names(df) == "y"), which(names(df) == "z")], tablename = "A_X", append = TRUE)
I get the error Error in odbcUpdate(channel, query, mydata, coldata[m, ], test = test, :
missing columns in »data« because the chosen columns and those of the Oracle table are not matching.
The oracle table has more columns than the data frame, so the non matching columns should be filled with NULL. How can I implement this in R? I would like to include the contents of the df in the SQL code at the bottom as follows:
INSERT INTO A_X
VALUES (df[1:3,c(which(names(df) == "x"), which(names(df) == "y"), which(names(df) == "z")], AUTO_ID, NULL, NULL);
In Oracle SQL this would be possible with the following code:
INSERT INTO A_X
VALUES (300, 'text', 'text', AUTO_ID, NULL, NULL);
The second problem is to generate the ID AUTO_ID automatically. I have version Oracle DB 11.2.0.3 and it's not possible to update to version 12c currently.
Here is an alternate approach.
In stead of just using sqlSave, use a combination of sqlSave & sqlQuery from RODBC or dbExecute from DBI.
You can then write the df as either a permanent or temp table and then wrap the sqlQuery with an INSERT or UPDATE statement to operate on your target table and the temp table. This is not very elegant, but should give you a scalable solution even if the schema changes in future.

How can we bulk insert data in SQLServer without creating a text file from RODBC package?

This question is the extension of this question How to quickly export data from R to SQL Server. Currently I am using following code:
# DB Handle for config file #
dbhandle <- odbcDriverConnect()
# save the data in the table finally
sqlSave(dbhandle, bp, "FACT_OP", append=TRUE, rownames=FALSE, verbose = verbose, fast = TRUE)
# varTypes <- c(Date="datetime", QueryDate = "datetime")
# sqlSave(dbhandle, bp, "FACT_OP", rownames=FALSE,verbose = TRUE, fast = TRUE, varTypes=varTypes)
# DB handle close
odbcClose(dbhandle)
I have tried this approach also, which is working beautifully and I have gained significant speed as well.
toSQL = data.frame(...);
write.table(toSQL,"C:\\export\\filename.txt",quote=FALSE,sep=",",row.names=FALSE,col.names=FALSE,append=FALSE);
sqlQuery(channel,"BULK
INSERT Yada.dbo.yada
FROM '\\\\<server-that-SQL-server-can-see>\\export\\filename.txt'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\\n'
)");
But my issue is I can NOT keep my data at rest between the transaction (Writing data to a file is not an option because of data security), so I was looking for solution if I can directly Bulk insert from memory or cache the data. Thanks for the help.
Good question - also useful in instances where the BULK INSERT permissions cannot be setup for whatever reason.
I threw together this poor man's solution a while back when I had enough data that sqlSave was too slow, but not enough to justify setting up BULK INSERT, so it does not require any data being written to a file. The primary reason that sqlSave and parameterized queries are so slow for inserting data is that each row is inserted with a new INSERT statement. Having R write the INSERT statement manually bypasses this in my example below:
library(RODBC)
channel <- ...
dataTable <- ...relevant data...
numberOfThousands <- floor(nrow(dataTable)/1000)
extra <- nrow(dataTable)%%1000
thousandInsertQuery <- function(channel,dat,range){
sqlQuery(channel,paste0("INSERT INTO Database.dbo.Responses (IDNum,State,Answer)
VALUES "
,paste0(
sapply(range,function(k) {
paste0("(",dat$IDNum[k],",'",
dat$State[k],"','",
gsub("'","''",dat$Answer[k],fixed=TRUE),"')")
})
,collapse=",")))
}
if(numberOfThousands)
for(n in 1:numberOfThousands)
{
thousandInsertQuery(channel,(1000*(n-1)+1):(1000*n),dataTable)
}
if(extra)
thousandInsertQuery(channel,(1000*numberOfThousands+1):(1000*numberOfThousands+extra))
SQL's INSERT statements written out with values will only accept up to 1000 rows at a time, so this code breaks it up into chunks (much more efficiently than one row at a time).
The thousandInsertQuery function will obviously have to be customized to handle whatever columns your data frame has - note also that there are single quotes around the character/factor columns and a gsub to handle any single quotes that might be in the character column. Other than this there are no safeguards against SQL injection attacks.
What about using DBI::dbWriteTable() function?
Example below (I am connecting my R code to AWS RDS instance of MS SQL Express):
library(DBI)
library(RJDBC)
library(tidyverse)
# Specify where you driver lives
drv <- JDBC(
"com.microsoft.sqlserver.jdbc.SQLServerDriver",
"c:/R/SQL/sqljdbc42.jar")
# Connect to AWS RDS instance
conn <- drv %>%
dbConnect(
host = "jdbc:sqlserver://xxx.ccgqenhjdi18.ap-southeast-2.rds.amazonaws.com",
user = "xxx",
password = "********",
port = 1433,
dbname= "qlik")
if(0) { # check what the conn object has access to
queryResults <- conn %>%
dbGetQuery("select * from information_schema.tables")
}
# Create test data
example_data <- data.frame(animal=c("dog", "cat", "sea cucumber", "sea urchin"),
feel=c("furry", "furry", "squishy", "spiny"),
weight=c(45, 8, 1.1, 0.8))
# Works in 20ms in my case
system.time(
conn %>% dbWriteTable(
"qlik.export.test",
example_data
)
)
# Let us see if we see the exported results
conn %>% dbGetQuery("select * FROM qlik.export.test")
# Let's clean the mess and force-close connection at the end of the process
conn %>% dbDisconnect()
It works pretty fast for small amount of data transferred and seems rather elegant if you want data.frame -> SQL table solution.
Enjoy!
Building on #jpd527 solution which I found really worth digging into...
require(RODBC)
channel <- #connection parameters
dbPath <- # path to your table, database.table
data <- # the DF you have prepared for insertion, /!\ beware of column names and values types...
# Function to insert 1000 rows of data in one sqlQuery call, coming from
# any DF and into any database.table
insert1000Rows <- function(channel, dbPath, data, range){
# Defines columns names for the database.table
columns <- paste(names(data), collapse = ", ")
# Initialize a string which will incorporate all 1000 rows of values
values <- ""
# Not very elegant, but appropriately builds the values (a, b, c...), (d, e, f...) into a string
for (i in range) {
for (j in 1:ncol(data)) {
# First column
if (j == 1) {
if (i == min(range)) {
# First row, only "("
values <- paste0(values, "(")
} else {
# Next rows, ",("
values <- paste0(values, ",(")
}
}
# Value Handling
values <- paste0(
values
# Handling NA values you want to insert as NULL values
, ifelse(is.na(data[i, j])
, "null"
# Handling numeric values you want to insert as INT
, ifelse(is.numeric(data[i, j])
, data[i, J]
# Else handling as character to insert as VARCHAR
, paste0("'", data[i, j], "'")
)
)
)
# Separator for columns
if (j == ncol(data)) {
# Last column, close parenthesis
values <- paste0(values, ")")
} else {
# Other columns, add comma
values <- paste0(values, ",")
}
}
}
# Once the string is built, insert it into SQL Server
sqlQuery(channel,paste0("insert into ", dbPath, " (", columns, ") values ", values))
}
This insert1000Rows function is used in a loop in the next function, sqlInsertAll, for which you simply define which DF you want to insert into which database.table.
# Main function which uses the insert1000rows function in a loop
sqlInsertAll <- function(channel, dbPath, data) {
numberOfThousands <- floor(nrow(data) / 1000)
extra <- nrow(data) %% 1000
if (numberOfThousands) {
for(n in 1:numberOfThousands) {
insert1000Rows(channel, dbPath, data, (1000 * (n - 1) + 1):(1000 * n))
print(paste0(n, "/", numberOfThousands))
}
}
if (extra) {
insert1000Rows(channel, dbPath, data, (1000 * numberOfThousands + 1):(1000 * numberOfThousands + extra))
}
}
With this, I am able to insert 250k rows of data in 5 minutes or so, whereas it took more than 24 hours using sqlSave from the RODBC package.

How do I write data from R to PostgreSQL tables with an autoincrementing primary key?

I have a table in a PostgreSQL database that has a BIGSERIAL auto-incrementing primary key. Recreate it using:
CREATE TABLE foo
(
"Id" bigserial PRIMARY KEY,
"SomeData" text NOT NULL
);
I want to append some data to this table from R via the RPostgreSQL package. In R, the data doesn't include the Id column because I want the database to generate those value.
dfr <- data.frame(SomeData = letters)
Here's the code I used to try and write the data:
library(RPostgreSQL)
conn <- dbConnect(
"PostgreSQL",
user = "yourname",
password = "your password",
dbname = "test"
)
dbWriteTable(conn, "foo", dfr, append = TRUE, row.names = FALSE)
dbDisconnect(conn)
Unfortunately, dbWriteTable throws an error:
## Error in postgresqlgetResult(new.con) :
## RS-DBI driver: (could not Retrieve the result : ERROR: invalid input syntax for integer: "a"
## CONTEXT: COPY foo, line 1, column Id: "a"
## )
The error message isn't completely clear, but I interpret this as R trying to pass the contents of the SomeData column to the first column in the database (which is Id).
How should I be passing the data to PostgreSQL so that the Id column is auto-generated?
From the thread in hrbrmstr's comment, I found a hack to make this work.
In the postgresqlWriteTable in the RPostgreSQL package, you need to replace the line
sql4 <- paste("COPY", postgresqlTableRef(name), "FROM STDIN")
with
sql4 <- paste(
"COPY ",
postgresqlTableRef(name),
"(",
paste(postgresqlQuoteId(names(value)), collapse = ","),
") FROM STDIN"
)
Note that the quoting of variables (not included in the original hack) is necessary to pass case-sensitive column names.
Here's a script to do that:
body_lines <- deparse(body(RPostgreSQL::postgresqlWriteTable))
new_body_lines <- sub(
'postgresqlTableRef(name), "FROM STDIN")',
'postgresqlTableRef(name), "(", paste(shQuote(names(value)), collapse = ","), ") FROM STDIN")',
body_lines,
fixed = TRUE
)
fn <- RPostgreSQL::postgresqlWriteTable
body(fn) <- parse(text = new_body_lines)
while("RPostgreSQL" %in% search()) detach("package:RPostgreSQL")
assignInNamespace("postgresqlWriteTable", fn, "RPostgreSQL")
I struggled with an issue very similar to this today, and stumbled across this thread as I tried out different approaches. As of this writing (02/12/2018), it looks like the patch recommended above has been implemented into the latest version of RPostgreSQL::postgresqlWriteTable, but I still kept getting an error indicating that the primary key R assigned to my new rows was duplicated in the source data table.
I ultimately implemented a workaround generating an incrementing primary key in R to append to my inserted data to update the source table in my postgreSQL Db. For my purposes, I only needed to insert one record into my table at a time and I can't imagine this is an optimal solution for inserting a batch of records requiring a serially incremented primary key. Predictably, an error of "table my_table exists in database: aborting assignTable" was thrown when I omitted the 'append=TRUE' from my script; however this option did not automatically assign an incrementing primary key as I had hoped, even with the code patch described above.
drv <- dbDriver("PostgreSQL")
localdb <- dbConnect(drv, dbname= 'MyDatabase',
host= 'localhost',
port = 5432,
user = 'postgres',
password= 'MyPassword')
KeyPlusOne <- sum(dbGetQuery(localdb, "SELECT count(*) FROM my_table"),1)
NewRecord <- t(c(KeyPlusOne, 'Var1','Var2','Var3','Var4'))
NewRecord <- as.data.frame(NewRecord)
NewRecord <- setNames(KeyPlusOne, c("PK","VarName1","VarName2","VarName3","VarName4"))
postgresqlWriteTable(localdb, "my_table", NewRecord, append=TRUE, row.names=FALSE)

Insert in RMySQL from data frame

Im trying to add data to MySQL table by using RMySQL. I only need to add one row at a time and it's not working. What I'm trying to do is this.
dbGetQuery(con,"INSERT INTO names VALUES(data[1,1], data[1,2])")
so what I'm doing is that I have values in data frame that is named as "data" and I need to put them into mysql table. before that I will check them if they are already in the table or not and if they are not then I will add them, but that way it isn't working. The data is read from .csv file by read.csv .
You can use paste to construct that actual query.
dat <- matrix(1:4, 2, 2)
query <- paste("INSERT INTO names VALUES(",data[1,1], ",", data[1,2], ")")
query
#[1] "INSERT INTO names VALUES( 1 , 3 )"
dbGetQuery(con, query)
# If there are a lot of columns this could be tedious...
# So we could also use paste to add all the values at once.
query <- paste("INSERT INTO names VALUES(", paste(data[1,], collapse = ", "), ")")
query
#[1] "INSERT INTO names VALUES( 1, 3 )"
You could try with:
dbWriteTable(names, data[1,],append=True)
as the DBI package details

Resources