Get all rows that DBI::dbWriteTable has just written - r

I want to use dbWriteTable() of R's DBI package to write data into a database. Usually, the respective tables are already present so I use the argument append = TRUE. How do I get which rows were added to the table by dbWriteTable()? Most of tables have certain columns with UNIQUE values so a SELECT will work (see below for a simple example). However, this is not true for all of them or only several columns together are UNIQUE making the SELECT more complicated. In addition, I would like to put the writing and querying into a function so I would prefer a consistent approach for all cases.
I mainly need this to get the PRIMARY KEY's added by the database and to allow a user to quickly see what was added. If important, my database is PostgreSQL and I would like to use the odbc package for connection.
I have something like this in mind, however, I am looking for a more general solution:
library(DBI)
con <- dbConnect(odbc::odbc(), dsn = "database")
dbWriteTable(con,
name = "site", value = data.frame(name = c("abcd", "efgh"),
append = TRUE))
dbGetQuery(conn,
paste("SELECT * FROM site WHERE name in ('abcd', 'efgh');"))

Related

dbplyr: delete row from a table in database

What is the dbplyr verbs combination that is equivalent to DBI::dbSendQuery(con, "DELETE FROM <table> WHERE <condition>").
What I want is not querying data from database, but removing data from and updating a table in database.
I want to do it in a dplyr way, but I am not sure if it is possible. I could not find anything similar in the package reference.
dbplyr translates dplyr commands to query database tables. I am not aware of any inbuilt way to modify existing database tables using pure dbplyr.
This is likely a design choice.
Within R we do not need to distinguish between fetching data from a table (querying) and modifying a table. This is probably because in R we can reload the original data into memory if an error/mistake occurs.
But in databases querying and modifying a table are deliberately different things. When modifying a database, you are modifying the source so additional controls are used (because recovering deleted data is a lot harder).
The DBI package is probably your best choice for modifying the database
This is the approach I use for all my dbplyr work. Often a custom function that takes the query produced by dbplyr translation and inserting it into a DBI call (you can see examples of this in my dbplyr helpers GitHub repo).
Two approaches to consider for this: (1) an anti-join (on all columns) followed by writing a new table, (2) the DELETE FROM syntax.
Mock up of anti-join approach
records_to_remove = remote_table %>%
filter(conditions)
desired_final_table = remote_table %>%
anti_join(records_to_remove, by = colnames(remote_table))
query = paste0("SELECT * INTO output_table FROM (",
sql_render(desired_final_table),
") AS subquery")
DBI::dbExecute(db_con, as.character(query))
Mock up of DELETE FROM syntax
records_to_remove = remote_table %>%
filter(conditions)
query = sql_render(records_to_remove) %>%
as.character() %>%
gsub(search_term = "SELECT *", replacement_term = "DELETE")
DBI::dbExecute(db_con, query)
If you plan to run these queries multiple times, then wrapping them in a function, with checks for validity would be recommended.
For some use cases deleting rows will not be necessary.
You could think of the filter command in R as deleting rows from a table. For example in R we might run:
prepared_table = input_table %>%
filter(colX == 1) %>%
select(colA, colB, colZ)
And think of this as deleting rows where colX == 1 before producing output:
output = prepared_table %>%
group_by(colA) %>%
summarise(sumZ = sum(colZ))
(Or you could use an anti-join above instead of a filter.)
But for this type of deleting, you do not need to edit the source data, as you can just filter out the unwanted rows at runtime every time. Yes it will make your database query larger, but this is normal for working with databases.
So combining the preparation and output in SQL is normal (something like this):
SELECT colA, SUM(colZ) AS sumZ
FROM (
SELECT colA, colB, colZ
FROM input_table
WHERE colX = 1
) AS prepared_table
GROUP BY colA
So unless you need to modify the database, I would recommend filtering instead of deleting.

Joining across databases with dbplyr

I am working with database tables with dbplyr
I have a local table and want to join it with a large (150m rows) table on the database
The database PRODUCTION is read only
# Set up the connection and point to the table
library(odbc); library(dbplyr)
my_conn_string <- paste("Driver={Teradata};DBCName=teradata2690;DATABASE=PRODUCTION;UID=",
t2690_username,";PWD=",t2690_password, sep="")
t2690 <- dbConnect(odbc::odbc(), .connection_string=my_conn_string)
order_line <- tbl(t2690, "order_line") #150m rows
I also have a local table, let's call it orders
# fill df with random data
orders <- data.frame(matrix(rexp(50), nrow = 100000, ncol = 5))
names(orders) <- c("customer_id", paste0(rep("variable_", 4), 1:4))
let's say I wanted to join these two tables, I get the following error:
complete_orders <- orders %>% left_join(order_line)
> Error: `x` and `y` must share the same src, set `copy` = TRUE (may be slow)
The issue is, if I were to set copy = TRUE, it would try to download the whole of order_line and my computer would quickly run out of memory
Another option could be to upload the orders table to the database. The issue here is that the PRODUCTION database is read only - I would have to upload to a different database. Trying to copy across databases in dbplyr results in the same error.
The only solution I have found is to upload into the writable database and use sql to join them, which is far from ideal
I have found the answer, you can use the in_schema() function within the tbl pointer to work across schemas within the same connection
# Connect without specifying a database
my_conn_string <- paste("Driver={Teradata};DBCName=teradata2690;UID=",
t2690_username,";PWD=",t2690_password, sep="")
# Upload the local table to the TEMP db then point to it
orders <- tbl(t2690, in_schema("TEMP", "orders"))
order_line <- tbl(t2690, in_schema("PRODUCTION", "order_line"))
complete_orders <- orders %>% left_join(order_line)
Another option could be to upload the orders table to the database. The issue here is that the PRODUCTION database is read only - I would have to upload to a different database. Trying to copy across databases in dbplyr results in the same error.
In your use case, it seems (based on the accepted answer) that your databases are on the same server and it's just a matter of using in_schema. If this were not the case, another approach would be that given here, which in effect gives a version of copy_to that works on a read-only connection.

Query MS SQL using R with criteria from an R data frame

I have rather a large table in MS SQL Server (120 million rows) which I would like to query. I also have a dataframe in R that has unique ID's that I would like to use as part of my query criteria. I am familiar with the dplyr package but not sure if its possible to have the R query execute on the MS SQL server rather than bring all data onto my laptop memory (likely would crash my laptop).
of course, other option is to load the dataframe onto sql as a table which is currently what I am doing but I would prefer not to do this.
depending on what exactly you want to do, you may find value in the RODBCext package.
let's say you want to pull columns from an MS SQL table where IDs are in a vector that you have in R. you might try code like this:
library(RODBC)
library(RODBCext)
library(tidyverse)
dbconnect <- odbcDriverConnect('driver={SQL Server};
server=servername;database=dbname;trusted_connection=true')
v1 <- c(34,23,56,87,123,45)
qdf <- data_frame(idlist=v1)
sqlq <- "SELECT * FROM tablename WHERE idcol %in% ( ? )"
qr <- sqlExecute(dbconnect,sqlq,qdf,fetch=TRUE)
basically you want to put all the info you want to pass to the query into a dataframe. think of it like variables or parameters for your query; for each parameter you want a column in a dataframe. then you write the query as a character string and store it in a variable. you put it all together using the sqlExecute function.

Using r to Insert Records into a Database using apply

I have a table i wish to insert records into in a Teradata environment using R
I have connected to the the DB and created my Table using JDBC
From reading the documentation there doesn't appear to be an easy way to insert records into the system except to create your own manual insert statements. I am trying to do this by creating a vectorized approach using apply (or anything similar)
Below is my code but I'm clearly not using apply correctly. Can anyone help?
s <- seq(1:1000)
str_update_table <- sprintf("INSERT INTO foo VALUES (%s)", s)
# Set Up the Connections
myconn <- dbConnect(drv,service, username, password)
# Attempt to run each of the 1000 sql statements
apply(str_update_table,2,dbSendUpdate,myconn)
I have not got the infrastructure to test, but you pass a vector to apply where apply expects an array. With your vector str_update_table the 2 in apply does not make much sense.
Try Map like in
Map(function(x) dbSendUpdate(myconn, x), str_update_table)
(untested)

Persistence of data frames(R objects) into Database

I have a database table lets say Table 1. Table 1 has 10 columns lets assume:
column1,column2,column3,column4,column5,column6,column7,column8,column9,column10...
I have a data-frame as
sample_frame<-data.frame(column1=1,column2=2,column3=3,column4=4)
I wish to persist the data frame i.e. sample_frame into my database table i.e. Table 1.
presently I am using ROracle package to write into database. the code which I am using is as follows:
library(ROracle)
dbWriteTable(con, name="Table 1", value=sample_frame, row.names = FALSE,
overwrite = FALSE,append = TRUE, schema ="sample_schema")
I have created connection object using dbConnect(), As far as integrity and null constraints of Table 1 is concerned, I have taken care of that. When I try to write into the table using dbWriteTable(), the following error is thrown:
"ORA-00947: not enough values"
Can someone correct the method I am using or provide me an alternative method of inserting selective columns(non-nullable columns) into the Table 1 while leaving other columns empty. I am using R 2.15.3
As I mentioned in my comment, you are creating sample_frame with lesser number of columns you are getting this error... Try this (if you actual table in database have same column names)
sample_frame<-data.frame(column1=1,column2=2,column3=3,column4=4,
column5=5,column6=6,column7=7,column8=8,
column9=9,column10=10)
library(ROracle)
dbWriteTable(con, name="Table 1", value=sample_frame, row.names = FALSE,
overwrite = FALSE,append = TRUE, schema ="sample_schema")
Update
Considering your new requirement, I would suggest you prepare a query and use following
qry = #You update query
dbSendQuery(con, qry)

Resources