Calling `odbc` connection within function does not display in RStudio Connection Pane - r

I'm working to streamline some database connections. Using the odbc package, I have successfully established a connection with one of my databases like so:
library(odbc)
con <- dbConnect(odbc::odbc(), "db_name",
UID = "username",
PWD = "password")
This works, and the database schema displays in the Connection Pane as anticipated (using RStudio Server 1.1.383)
However, I need to call this connection within a user-defined function that decrypts our users credentials. A minimal example:
db_Connect_mod <- function(userid,
password,
...){
# Needed Processes, but ommitted for simplicity of this question
# ...
con <- dbConnect(odbc::odbc(), "db_name",
UID = userid,
PWD = password)
return(con)
}
So then I run:
con <- db_Connect_mod(userid, password, ...)
The actual database connection con is successful, but it no longer appears in the RStudio Connection Pane.
I know that odbc uses a Connections Contract, but it doesn't seem that it carries over to my new function. Is there a way to force the Connections Contract to carry over to the top-level function?
I have looked at using odbc:::on_connection_opened(con, code = "..."), which seems to work, but is not as functional as inheriting the Connections Contract from odbc within my new function and would rather not be reliant on a non-exported function.
I believe this behavior is due to changes from this odbc github issue

Didn't seem to be much interest, but posting the work-around I've been using:
I am using match.call() to gather the arguments, and then parsing that into the odbc:::on_connection_opened() function as previously discussed. Probably not best practices, but hey it works.
I added the logical argument connection_pane to easily turn off or on this feature:
internal_package::db_Connect_mod <- function(userid,
password,
connection_pane,
...){
# Needed Processes, but ommitted for simplicity of this question
# ...
con <- dbConnect(odbc::odbc(), "db_name",
UID = userid,
PWD = password)
if(connection_pane){
code <- c(match.call()) # This saves what was typed into R
odbc:::on_connection_opened(
con,
paste(c("library(internal_package)",
paste("con <-", gsub(", ", ",\n\t", code))),
collapse = "\n"))
}
return(con)
}
Imagine this could look a lot prettier with glue or stringr for further improvements

Related

How to save to pre-existing Snowflake table from R using pool

I am using pool to handle connections to my Snowflake warehouse. I have created a connection to my database and can read data in a pre-existing table with no issues e.g:
my_pool <- dbPool(odbc::odbc(),
Driver = "Snowflake",
Server = Sys.getenv('WH_URL'),
UID = Sys.getenv('WH_USER'),
PWD = Sys.getenv('WH_PW'),
Warehouse = Sys.getenv('WH_WH'),
Database = "MY_DB")
my_data<-tbl(my_pool, in_schema(sql("schema_name"), sql("table_name"))) %>%
collect()
I would like to save back to a table (table_name) and I believe the best way to do this is with pool::dbWriteTable:
# Create some data to save to db
data<-data.frame("user_email" = "tim#apple.com",
"query_run" = "arrivals_departures",
"data_downloaded" = FALSE,
"created_at" = as.character(Sys.time()))
# Define where to save the data
table_id <- Id(database="MY_DB", schema="MY_SCHEMA", table="TABLE_NAME")
# Write to database
pool::dbWriteTable(my_pool, table_id, data, append=TRUE)
However this returns the error:
Error in new_result(connection#ptr, statement, immediate) :
nanodbc/nanodbc.cpp:1594: 00000: SQL compilation error:
Object 'MY_DB.MY_SCHEMA.TABLE_NAME' already exists.
I have read/write/update permissions for this database for the user specified in my_pool.
I have explored the accepted answers here and here to create the above attempt and can't figure out what I'm doing wrong. It's probably something simple that I've forgotten to do - any thoughts?
EDIT: Wondering if my issue is anything to do with: https://github.com/r-dbi/odbc/issues/480

Reading Data from a SQL server in RStudio (dplyr)

I am sure this question is very basic, but this is the first time I am using R connected to a server, so a few things still confuse me.
I used ODBC Data Sources on Windows to create a DNS, and used
con <- dbConnect(odbc::odbc(), "TEST_SERVER")
this worked, and now under the connection tab I can see the server, and if I double click I can see the databases and tables that exist in the server. How would I go about reading something inside one of those databases?
For Example, if the database name is db1, and the table name is t1, what is the code needed to read that table into local memory? I would prefer using dbplyr as I am familiar with the syntax. I am just unsure how to refer to a particular database and table after making the connection to the server.
I haven't used dbplyr before, but you can query the database using dbGetQuery.
test <- dbGetQuery(
con,
"SELECT *
FROM db1.t1
"
)
You can also pass the database into the connection string.
con <- dbConnect(
drv = odbc(),
dsn = "TEST_SERVER",
database = "db1"
)
And then your query would just be "SELECT * FROM t1".
EDIT: To query the table using dbplyr:
tbl1 <- tbl(con, "t1")
qry <- tbl1 %>% head() %>% collect()
I like to use RODBC-
con <- RODBC::odbcConnect(dsn = 'your_dsn',
uid = 'userid',
pwd = 'password')
table_output <- RODBC::sqlQuery(con, 'SELECT * FROM Table')

Appending new data to a local Access data base file with r after a successful connection

So I am currently working with a connecting to an Access database. I am able to get connected to the Access DB which is located on my local system. This is actually connected to a SharePoint list. I would love to automate the process handling this SharePoint list with an R and Access combo! What I want to be able to do actually pretty basic, I want to introduce new data via a .csv which is processed for the relevant content and then compared to the current Access DB and finally the new information uploaded from r to Access.
I've learned that you need to pair the bit version of your Windows OS, Office version, and R version. So I am x64 on all of the above. This allowed me to connect to the Access DB. You also need the 'Microsoft Access Database Engine 2016 Redistributable' which is essentially the driver for the connection.
So what I have so far is:
library(odbc)
library(DBI)
file_path <- "C:/user/Documents/R Projects/...pathtofile.../filename.accdb"
accdb_con <- dbConnect(drv = odbc(), .connection_string = paste0("Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=",file_path,";"))
access.db <- dbReadTable(accdb_con, "sNPS Deep Dives")
That now connects!
I then read in a .csv of new information
new.df <- read.csv("C:/user/Documents/R projects/...pathtofile.csv", header=T, stringsAsFactors=FALSE, na.strings=c("","NA"))
an example of the data set might just look something like this:
date <- c("15/10/2018","15/10/2018", "16/10/2018", "12/11/2018", "07/09/2018")
score <- c("6", "10", "7", "10", "9")
group <- c("a","b", "b", "a", "b")
CaseID <- c("301", "302", "303", "304", "305")
new.df <- data.frame(date,score,group,CaseID)
new.df$date <- as.character(new.df$date)
new.df$score <- as.numeric(new.df$score)
new.df$group <- as.character(new.df$group)
new.df$CaseID <- as.numeric(new.df$CaseID)
Notably there are more columns in the Access DB that people will fill in by hand with further information.
and I process it to be ready go into the Access DB.
probably not that interesting...
Then I compare the the new data against the Access DB as such:
library(dplyr)
new <- anti_join(new.df, access.db, by= "Case.ID")
Now I've tried:
dbWriteTable(access.db.copy, new, append = TRUE)
dbAppendTable(access.db.copy, new)
I don't seem to be able to get this to go anywhere
I am getting an error:
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘dbWriteTable’ for signature ‘"ACCESS", "data.frame", "missing"’
I've seen plenty of posts in which people are having trouble connecting to an Access DB but I haven't seen anything about writing new data into that database.
I know this isn't quite a reproducible example but it seems like a difficult problem to recreate since it's a connection problem between different tools. I would be happy to provide example sets that might make this easier
I would appreciate any direction you all can provide.
Thanks!
Edit:
It appears that Bing Sun was right, I was missing an argument. So it appears that we need something more like:
dbWriteTable(access.db.copy, "Name of table",new, append = TRUE)
Which produces the error:
Error in result_insert_dataframe(rs#ptr, values) :
nanodbc/nanodbc.cpp:1944: HY104: [Microsoft][ODBC Microsoft Access Driver]Invalid precision value
I wonder if this may something that is an error from Access about a file type?
now if I use the append I don't get an error I get a 0 for output
dbAppendTable(access.db.copy, "Name of table", new, append= TRUE)
With output:
[1] 0
But I don't see any of the new values when I check the Access file.
I know it's years later, but hopefully this will help someone else with this issue since you're right CrayCrayTown, there aren't very many posts covering this issue.
I've run into this problem repeatedly when dealing with R and MS Access. The solution that I've come up with is pretty "hacky" but it accomplishes what's trying to be done...just not very eloquently.
The way I do this is with a combo of RODBC and DBI packages.
First, I open a connection to the DB with RODBC, and use that connection to write my data to the DB as an intermediary table:
chan <- RODBC::odbcDriverConnection(connection = "/path/to/database.accdb")
RODBC::sqlSave(channel = chan,
dat = df,
tablename = "tbl_intermediary",
rownames = FALSE,
append = FALSE)
RODBC::odbcClose(chan)
rm(chan)
Make sure to close the RODBC connection, I also destroy it for good measure, because why not? I use RODBC for the intermediary table because it supports batch insert statements. I know that the same thing can, in theory, be done with DBI with DBI::dbAppendTable()(but we wouldn't be on this post if that worked how we had hoped). I tried this in a previous SO question here, but it didn't solve my problem. I also don't know how big my intermediary tables could get in the future. Hopefully by the time they get too big we'll be in a different DBMS.
Next, I reopen the connection, this time with DBI, and send a statement to the DB to write those data from the intermediary table to the final resting place for those data, and then drop the intermediary table.
con <- DBI::dbConnect(odbc::odbc(), .connection_string = "/path/to/database.accdb")
DBI::dbSendStatement(
conn = con,
statement = 'UPDATE
tbl_intermediary INNER JOIN final_tbl ON tbl_intermediary.SampleID = final_tbl.sampleNumber
SET
final_tbl.field1 = [tbl_intermediary].[field1],
final_tbl.notes = IIf(Nz([tbl_intermediary].[Notes],"")="",[final_tbl].[notes],[final_tbl].[notes] & "; Newest Notes: " & [tbl_intermediary].[Notes]);'
)
DBI::dbSendStatement(
conn = con,
statement = 'DROP TABLE tbl_intermediary;'
DBI::dbDisconnect(con)
rm(con)
)
The main reason why I chose this method is because some of the SQL I use with Access also has some VBA in it. When I send the SQL-VBA hybrid string with RODBC, I get assorted errors in the IIF() and Nz() functions (see example above). From the RODBC CRAN docs the query argument for the sqlQuery() function is strictly assumed to be a valid SQL statement. So, RODBC has no clue how to interpret the IIf() and Nz() MS Access functions. I think this also has to do with how the ODBC driver handles communication as well (please, someone correct me if I'm wrong about this).
As I understand it, DBI::dbSendStatment() however lets the database engine you're working with interpret how to use the statement argument you provide. In the situation above, the VBA is executed exactly how I would expect if it were run in Access directly. As per the DBI docs, for interactive use you'll generally want to use dbGetQuery or dbExecute.

R PostgreSQL - data is not getting updated in the table of the database

Hi I am trying to update a postgresql table using RpostgreSQL package, the commands in R are executed successfully but the new data is not getting reflected in the database. Below are the commands i have executed in R
for(i in new_data$FIPS) {
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname="ip_platform", host="******", port="5432", user="data_loader", password="******")
txt <- paste("UPDATE adminvectors.county SET attributes= hstore('usco#TP-TotPop#2010'::TEXT,",new_data$TP.TotPop[new_data$FIPS == i],"::TEXT) where geoid ='",i,"'")
dbGetQuery(con, txt)
dbCommit(con)
dbDisconnect(con)
}
Can anyone let me know if I have done something wrong? Any help is highly appreciated
Simplify, simplify, simplify -- the RPostgreSQL has had unit tests for these types of operations since the very beginning (in 2008 no less) and this works (unless you have database setup issues).
See eg here in the GitHub repo for all the tests.
You are calling dbGetQuery instead of dbSendQuery and also disconnecting from the database in your for loop. You also are creating a new connection object for every loop iteration, which is not necessary. Try this:
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname="ip_platform", host="******", port="5432", user="data_loader", password="******")
for(i in new_data$FIPS) {
txt <- paste("UPDATE adminvectors.county SET attributes= hstore('usco#TP-TotPop#2010'::TEXT,",new_data$TP.TotPop[new_data$FIPS == i],"::TEXT) where geoid ='",i,"'")
dbSendQuery(con, txt)
}
dbDisconnect(con)
You shouldn't call dbCommit(con) explicitly. The enclosing transaction will always be commited when dbSendQuery returns, exactly as when you do an UPDATE with pure SQL. You don't call COMMIT unless you have created a new transaction with BEGIN TRANSACTION.
The warning "there is no transaction in progress" is PostgreSQL's way of telling you that you have issued a COMMIT statement without having issued a BEGIN TRANSACTION statement which is exactly what you are doing in your function.
Thanks for all your inputs.
The issue is with the paste() function which I used in the for loop. The paste() function has replaced the comma with a space in the query as a result the where condition is failing. I have added a sep="" attribute in the paste() and the query is now properly sent to the database and the rows are getting updated as expected.

Writing to specific schemas with RPostgreSQL

I'm using RPostgreSQL to read and write data. Reading from any schema works perfectly, but I'm not able to write to non-public schemas. For example, the following code places a table in the public schema, with the name myschema.tablex
# write dataframe to postgres
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, host="localhost", user="postgres", password="zzzz", dbname="mydatabase", port="5436")
if(dbExistsTable(con,"myschema.tablex")) {
dbRemoveTable(con,"myschema.vkt_tablex")}
dbWriteTable(con,"myschema.tablex", dataframe, row.names=F)
What I want to do, is to place the table tablex in the schema myschema. I've also tried to name the schema in the connection: dbname="mydatabase.myschema" and trying the argument schemaname which I saw referred to in an earlier bug.
None of these approaches work, so I'm wondering if there is another method that I can use.
Use this:
library(RPostgreSQL)
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname = "db", host = "host", port = 5432,
user = "user", password = "pwd")
dbWriteTable(con, c("yourschema", "yourtable"), value = yourRdataframe)
dbDisconnect(con)
More details: https://stat.ethz.ch/pipermail/r-sig-db/2011q1/001043.html
The default schema where objects are created is defined by the search_path. One way would be to set it accordingly. For instance:
SET search_path = myschema, public;
I quote the manual:
When objects are created without specifying a particular target
schema, they will be placed in the first schema listed in the search
path. An error is reported if the search path is empty.
You can also make this the default for a role, so it is set automatically for every connection made by this role. More:
How does the search_path influence identifier resolution and the "current schema"
In case a reader is using the newer package RPostgres to do this, the code to specify schemas is:
dbCreateTable(conn = con, name = Id(schema = "yourschema", table = "yourtable"), fields = yourRdataframe)

Resources