Update selected rows in sqlite table in r - r

I am using the RSQLite package in a shiny app. I need to be able to dynamically update an sqlite db as users progress through the app. I want to use the UPDATE syntax in SQLite to achieve this, but I have come up against a problem when trying to update multiple rows for the same user.
Consider the following code:
# Load libraries
library("RSQLite")
## Path for SQLite db
sqlitePath <- "test.db"
# Create db to store tables
con <- dbConnect(SQLite(),sqlitePath)
## Create toy data
who <- c("jane", "patrick", "samantha", "jane", "patrick", "samantha")
tmp_var_1 <- c(1,2,3, 4, 5, 6)
tmp_var_2 <- c(2,4,6,8,10,12)
# Create original table
users <- data.frame(who = as.character(who), tmp_var_1 = tmp_var_1, tmp_var_2 = tmp_var_2)
users$who <- as.character(users$who)
# Write original table
dbWriteTable(con, "users", users)
# Subset users data
jane <- users[who=="jane",]
patrick <- users[who=="patrick",]
samantha <- users[who=="samantha",]
# Edit Jane's data
jane$tmp_var_1 <- c(99,100)
# Save edits back to SQL (this is where the problem is!)
table <- "users"
db <- dbConnect(SQLite(), sqlitePath)
query <- sprintf(
"UPDATE %s SET %s = ('%s') WHERE who = %s",
table,
paste(names(jane), collapse = ", "),
paste(jane, collapse = "', '"),
"'jane'"
)
dbGetQuery(db, query)
## Load data to check update has worked
loadData <- function(table) {
# Connect to the database
db <- dbConnect(SQLite(), sqlitePath)
# Construct the fetching query
query <- sprintf("SELECT * FROM %s", table)
# Submit the fetch query and disconnect
data <- dbGetQuery(db, query)
dbDisconnect(db)
data
}
loadData("users")
Here I am trying to update the entry for Jane so that the values for tmp_var_1 are changed, but all other columns remain the same. In response to questions from #zx8754 and #Altons posted below, the value for query is as follows:
UPDATE users SET who, tmp_var_1, tmp_var_2 = ('c(\"jane\", \"jane\")', 'c(99, 100)', 'c(2, 8)') WHERE who = 'jane'
The problem is almost certainly coming from the way that I am specifying the query to RSQlite. When I run dbGetQuery(db, query) I get the following error:
Error in sqliteSendQuery(con, statement, bind.data) :
error in statement: near ",": syntax error
Any suggestions for improvement would be most welcome.

Related

Delete specific rows in specific table in a SQLite database

I have multiple Datatable in a SQLite database. I am trying to delete specific rows of a datatable using DBI package. Here is the code:
library(dplyr)
library(DBI)
con <- DBI::dbConnect(RSQLite::SQLite(), dbname = "C:\\DB2.sqlite" , password="password")
DBI::dbWriteTable(con,"data_iris",iris,overwrite=TRUE)
query<-"DELETE FROM data_iris WHERE Species = ?;"
specie<-'setosa'
res <- dbExecute(con,query,params = list(specie))
res
[1] 50
The above code works good. But why the following code does not work:
query <- 'DELETE FROM ? WHERE Species = ?;'
table_name<-"data_iris"
res <- dbExecute(con,query,params = c(table_name,specie))
#Error: near "?": syntax error
I can not use the first code since the table_name changes dynamically (in a shiny APP).

R Updating database with dbi

I have worked little bit with DBI in R and first question is more of best practice, as currently appending new data to DB is taking more time than I hoped. Second is error that I'm receiving when trying to update old information in database. Here is my current workflow when inserting new data to existing table in DB:
con <- dbConnect(odbc(), "myDSN")
# Example table 1
tbl1 <- tibble(Key = c("A", "B", "C", "D", "E"),
Val = c(1, 2, 3, 4, 5))
# Original table in DB
dbWriteTable(con, "tbl1", tbl1, overwrite = TRUE)
# Link to Original table
db_tbl <- tbl(con, in_schema("dbo", "tbl1"))
# New data
tbl2 <- tibble(Key = c("D", "E", "F", "G", "H"),
val = c(10, 11, 12, 13, 14))
# Write it to Staging
dbWriteTable(con, "tbl1_staging", tbl2, overwrite = TRUE)
# Get a link to staging
db_tblStaging <- tbl(con, in_schema("dbo", "tbl1_staging"))
# Compare Info
not_in_db <- db_tblStaging %>%
anti_join(db_tbl, by="Key") %>%
collect()
# Append missing info to DB
dbWriteTable(con, "tbl1", not_in_db, append = TRUE)
# Voila!
dbReadTable(con, "tbl1")
That will do the trick, but I'm looking for better solution, as I hate the collect() part of the code, which means that I'm bringing something to in R memory (as far as I understand it) could be a problem in future, when I have bigger data. What I hoped would work is something like this, that would allow me to append new data to DB in a fly, without it visiting in memory.
# What I hoped to have
db_tblStaging %>%
anti_join(db_tbl, by="Key") %>%
dbWriteTable(con, "tbl1", ., append = TRUE)
Second problem is updating existing table. Here is what I tried, but error will emerge and can't figure it out. Here is link where I tried to copy the answer: How to pass data.frame for UPDATE with R DBI. I would like to update key E and D with new values in val.
# Trying to update tbl1
update_values <- db_tblStaging %>%
semi_join(db_tbl, by="Key") %>%
collect()
update <- dbSendQuery(con, 'UPDATE tbl1
SET "val" = ?
WHERE Key = ?')
dbBind(update, update_values)
Error in result_bind(res#ptr, as.list(params)) :
nanodbc/nanodbc.cpp:1587: 42000: [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Incorrect syntax near the keyword 'Key'.
Has the package changed in some way? I can't spot my syntax error.
Consider running pure SQL after your table staging uploads as it looks like you need the NOT EXISTS (to avoid duplicates) and UPDATE INNER JOIN (for existing records). This avoids any R client side query imports and exports.
And Key is a reserved word in SQL Server. Hence, escape it with square brackets.
apn_sql <- "INSERT INTO dbo.tbl (s.[Key], s.[Val])
SELECT s.[Key], s.[Val] FROM dbo.tbl_staging s
WHERE NOT EXISTS
(SELECT 1 FROM dbo.tbl t
WHERE t.[Key] = s.[Key])"
dbSendQuery(con, apn_sql)
upd_sql <- "UPDATE t
SET t.Val = s.Val
FROM dbo.tbl t
INNER JOIN dbo.tbl_staging s
ON t.[Key] = s.[Key]"
dbSendQuery(con, upd_sql)
Rextester demo
In fact, SQL Server has the MERGE query to handle both in one call:
MERGE dbo.tbl AS Target
USING (SELECT [Key], [Val] FROM dbo.tbl_staging) AS Source
ON (Target.[Key] = Source.[Key])
WHEN MATCHED THEN
UPDATE SET Target.Val = Source.Val
WHEN NOT MATCHED BY TARGET THEN
INSERT ([Key], [Val])
VALUES (Source.[Key], Source.[Val]);
Rextester demo

How can we bulk insert data in SQLServer without creating a text file from RODBC package?

This question is the extension of this question How to quickly export data from R to SQL Server. Currently I am using following code:
# DB Handle for config file #
dbhandle <- odbcDriverConnect()
# save the data in the table finally
sqlSave(dbhandle, bp, "FACT_OP", append=TRUE, rownames=FALSE, verbose = verbose, fast = TRUE)
# varTypes <- c(Date="datetime", QueryDate = "datetime")
# sqlSave(dbhandle, bp, "FACT_OP", rownames=FALSE,verbose = TRUE, fast = TRUE, varTypes=varTypes)
# DB handle close
odbcClose(dbhandle)
I have tried this approach also, which is working beautifully and I have gained significant speed as well.
toSQL = data.frame(...);
write.table(toSQL,"C:\\export\\filename.txt",quote=FALSE,sep=",",row.names=FALSE,col.names=FALSE,append=FALSE);
sqlQuery(channel,"BULK
INSERT Yada.dbo.yada
FROM '\\\\<server-that-SQL-server-can-see>\\export\\filename.txt'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\\n'
)");
But my issue is I can NOT keep my data at rest between the transaction (Writing data to a file is not an option because of data security), so I was looking for solution if I can directly Bulk insert from memory or cache the data. Thanks for the help.
Good question - also useful in instances where the BULK INSERT permissions cannot be setup for whatever reason.
I threw together this poor man's solution a while back when I had enough data that sqlSave was too slow, but not enough to justify setting up BULK INSERT, so it does not require any data being written to a file. The primary reason that sqlSave and parameterized queries are so slow for inserting data is that each row is inserted with a new INSERT statement. Having R write the INSERT statement manually bypasses this in my example below:
library(RODBC)
channel <- ...
dataTable <- ...relevant data...
numberOfThousands <- floor(nrow(dataTable)/1000)
extra <- nrow(dataTable)%%1000
thousandInsertQuery <- function(channel,dat,range){
sqlQuery(channel,paste0("INSERT INTO Database.dbo.Responses (IDNum,State,Answer)
VALUES "
,paste0(
sapply(range,function(k) {
paste0("(",dat$IDNum[k],",'",
dat$State[k],"','",
gsub("'","''",dat$Answer[k],fixed=TRUE),"')")
})
,collapse=",")))
}
if(numberOfThousands)
for(n in 1:numberOfThousands)
{
thousandInsertQuery(channel,(1000*(n-1)+1):(1000*n),dataTable)
}
if(extra)
thousandInsertQuery(channel,(1000*numberOfThousands+1):(1000*numberOfThousands+extra))
SQL's INSERT statements written out with values will only accept up to 1000 rows at a time, so this code breaks it up into chunks (much more efficiently than one row at a time).
The thousandInsertQuery function will obviously have to be customized to handle whatever columns your data frame has - note also that there are single quotes around the character/factor columns and a gsub to handle any single quotes that might be in the character column. Other than this there are no safeguards against SQL injection attacks.
What about using DBI::dbWriteTable() function?
Example below (I am connecting my R code to AWS RDS instance of MS SQL Express):
library(DBI)
library(RJDBC)
library(tidyverse)
# Specify where you driver lives
drv <- JDBC(
"com.microsoft.sqlserver.jdbc.SQLServerDriver",
"c:/R/SQL/sqljdbc42.jar")
# Connect to AWS RDS instance
conn <- drv %>%
dbConnect(
host = "jdbc:sqlserver://xxx.ccgqenhjdi18.ap-southeast-2.rds.amazonaws.com",
user = "xxx",
password = "********",
port = 1433,
dbname= "qlik")
if(0) { # check what the conn object has access to
queryResults <- conn %>%
dbGetQuery("select * from information_schema.tables")
}
# Create test data
example_data <- data.frame(animal=c("dog", "cat", "sea cucumber", "sea urchin"),
feel=c("furry", "furry", "squishy", "spiny"),
weight=c(45, 8, 1.1, 0.8))
# Works in 20ms in my case
system.time(
conn %>% dbWriteTable(
"qlik.export.test",
example_data
)
)
# Let us see if we see the exported results
conn %>% dbGetQuery("select * FROM qlik.export.test")
# Let's clean the mess and force-close connection at the end of the process
conn %>% dbDisconnect()
It works pretty fast for small amount of data transferred and seems rather elegant if you want data.frame -> SQL table solution.
Enjoy!
Building on #jpd527 solution which I found really worth digging into...
require(RODBC)
channel <- #connection parameters
dbPath <- # path to your table, database.table
data <- # the DF you have prepared for insertion, /!\ beware of column names and values types...
# Function to insert 1000 rows of data in one sqlQuery call, coming from
# any DF and into any database.table
insert1000Rows <- function(channel, dbPath, data, range){
# Defines columns names for the database.table
columns <- paste(names(data), collapse = ", ")
# Initialize a string which will incorporate all 1000 rows of values
values <- ""
# Not very elegant, but appropriately builds the values (a, b, c...), (d, e, f...) into a string
for (i in range) {
for (j in 1:ncol(data)) {
# First column
if (j == 1) {
if (i == min(range)) {
# First row, only "("
values <- paste0(values, "(")
} else {
# Next rows, ",("
values <- paste0(values, ",(")
}
}
# Value Handling
values <- paste0(
values
# Handling NA values you want to insert as NULL values
, ifelse(is.na(data[i, j])
, "null"
# Handling numeric values you want to insert as INT
, ifelse(is.numeric(data[i, j])
, data[i, J]
# Else handling as character to insert as VARCHAR
, paste0("'", data[i, j], "'")
)
)
)
# Separator for columns
if (j == ncol(data)) {
# Last column, close parenthesis
values <- paste0(values, ")")
} else {
# Other columns, add comma
values <- paste0(values, ",")
}
}
}
# Once the string is built, insert it into SQL Server
sqlQuery(channel,paste0("insert into ", dbPath, " (", columns, ") values ", values))
}
This insert1000Rows function is used in a loop in the next function, sqlInsertAll, for which you simply define which DF you want to insert into which database.table.
# Main function which uses the insert1000rows function in a loop
sqlInsertAll <- function(channel, dbPath, data) {
numberOfThousands <- floor(nrow(data) / 1000)
extra <- nrow(data) %% 1000
if (numberOfThousands) {
for(n in 1:numberOfThousands) {
insert1000Rows(channel, dbPath, data, (1000 * (n - 1) + 1):(1000 * n))
print(paste0(n, "/", numberOfThousands))
}
}
if (extra) {
insert1000Rows(channel, dbPath, data, (1000 * numberOfThousands + 1):(1000 * numberOfThousands + extra))
}
}
With this, I am able to insert 250k rows of data in 5 minutes or so, whereas it took more than 24 hours using sqlSave from the RODBC package.

Filter table from redshift database using R dplyr

I have a table saved in AWS redshift that has lots of rows and I want to collect only a subset of them using a "user_id" column. I am trying to use R with the dplyr library to accomplish this (see below).
conn_dplyr <- src_postgres('dev',
host = '****',
port = ****,
user = "****",
password = "****")
df <- tbl(conn_dplyr, "redshift_table")
However, when I try to subset over a collection of user ids it fails (see below). Can someone help me understand how I might be able to collect the data table over a collection of user id elements? The individual calls work, but when I combine them both it fails. In this case there are only 2 user ids, but in general it could be hundreds or thousands, so I don't want to do each one individually. Thanks for your help.
df_subset1 <- filter(df, user_id=="2239257806")
df_subset1 <- collect(df_subset1)
df_subset2 <- filter(df, user_id=="22159960")
df_subset2 <- collect(df_subset2)
df_subset_both <- filter(df, user_id==c("2239257806", "22159960"))
df_subset_both <- collect(df_subset_both)
Error in postgresqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not Retrieve the result : ERROR: operator does not exist: character varying = record
HINT: No operator matches the given name and argument type(s). You may need to add explicit type casts.
)
Try this:
df_subset_both <- filter(df, user_id %in% c("2239257806", "22159960"))
Also you can add condition in the query you uploaded from redshift.
install.packages("RPostgreSQL")
library(RPostgreSQL)
drv <- dbDriver("PostgreSQL")
conn <-dbConnect(drv,host='host link',port='5439',dbname='dbname',user='xxx',password='yyy')
df_subset_both <- dbSendQuery(conn,"select * from my_table where user_id in (2239257806,22159960)")

How do I write data from R to PostgreSQL tables with an autoincrementing primary key?

I have a table in a PostgreSQL database that has a BIGSERIAL auto-incrementing primary key. Recreate it using:
CREATE TABLE foo
(
"Id" bigserial PRIMARY KEY,
"SomeData" text NOT NULL
);
I want to append some data to this table from R via the RPostgreSQL package. In R, the data doesn't include the Id column because I want the database to generate those value.
dfr <- data.frame(SomeData = letters)
Here's the code I used to try and write the data:
library(RPostgreSQL)
conn <- dbConnect(
"PostgreSQL",
user = "yourname",
password = "your password",
dbname = "test"
)
dbWriteTable(conn, "foo", dfr, append = TRUE, row.names = FALSE)
dbDisconnect(conn)
Unfortunately, dbWriteTable throws an error:
## Error in postgresqlgetResult(new.con) :
## RS-DBI driver: (could not Retrieve the result : ERROR: invalid input syntax for integer: "a"
## CONTEXT: COPY foo, line 1, column Id: "a"
## )
The error message isn't completely clear, but I interpret this as R trying to pass the contents of the SomeData column to the first column in the database (which is Id).
How should I be passing the data to PostgreSQL so that the Id column is auto-generated?
From the thread in hrbrmstr's comment, I found a hack to make this work.
In the postgresqlWriteTable in the RPostgreSQL package, you need to replace the line
sql4 <- paste("COPY", postgresqlTableRef(name), "FROM STDIN")
with
sql4 <- paste(
"COPY ",
postgresqlTableRef(name),
"(",
paste(postgresqlQuoteId(names(value)), collapse = ","),
") FROM STDIN"
)
Note that the quoting of variables (not included in the original hack) is necessary to pass case-sensitive column names.
Here's a script to do that:
body_lines <- deparse(body(RPostgreSQL::postgresqlWriteTable))
new_body_lines <- sub(
'postgresqlTableRef(name), "FROM STDIN")',
'postgresqlTableRef(name), "(", paste(shQuote(names(value)), collapse = ","), ") FROM STDIN")',
body_lines,
fixed = TRUE
)
fn <- RPostgreSQL::postgresqlWriteTable
body(fn) <- parse(text = new_body_lines)
while("RPostgreSQL" %in% search()) detach("package:RPostgreSQL")
assignInNamespace("postgresqlWriteTable", fn, "RPostgreSQL")
I struggled with an issue very similar to this today, and stumbled across this thread as I tried out different approaches. As of this writing (02/12/2018), it looks like the patch recommended above has been implemented into the latest version of RPostgreSQL::postgresqlWriteTable, but I still kept getting an error indicating that the primary key R assigned to my new rows was duplicated in the source data table.
I ultimately implemented a workaround generating an incrementing primary key in R to append to my inserted data to update the source table in my postgreSQL Db. For my purposes, I only needed to insert one record into my table at a time and I can't imagine this is an optimal solution for inserting a batch of records requiring a serially incremented primary key. Predictably, an error of "table my_table exists in database: aborting assignTable" was thrown when I omitted the 'append=TRUE' from my script; however this option did not automatically assign an incrementing primary key as I had hoped, even with the code patch described above.
drv <- dbDriver("PostgreSQL")
localdb <- dbConnect(drv, dbname= 'MyDatabase',
host= 'localhost',
port = 5432,
user = 'postgres',
password= 'MyPassword')
KeyPlusOne <- sum(dbGetQuery(localdb, "SELECT count(*) FROM my_table"),1)
NewRecord <- t(c(KeyPlusOne, 'Var1','Var2','Var3','Var4'))
NewRecord <- as.data.frame(NewRecord)
NewRecord <- setNames(KeyPlusOne, c("PK","VarName1","VarName2","VarName3","VarName4"))
postgresqlWriteTable(localdb, "my_table", NewRecord, append=TRUE, row.names=FALSE)

Resources