I have a postgresql database connection and want to get a table from the database. Presumably it's good practice to keep the connection info in a different file?
I have two files just now:
#getthetable.R
library(tidyverse)
library(dbplyr)
## connect to db
con <- src_postgres(dbname = "thedbname",
host = "blablabla.amazonaws.com",
port = NULL,
user = "myname",
password = "1234")
thetable <- tbl(con, "thetable") %>% select(id, apples, carrots) %>% collect
And then:
#main.R
library(tidyverse)
## get data from getthetable script with connection
source("rscripts/getthetable.R")
This now makes both con and thetable variables available in main.R. I just want the variable thetable from getthetable.R. How do I do that? Leaving out con variable?
Also, is there a best practice here when working with db connections in r? Is my thinking logical? Are there drawbacks to what I'm doing or do most people just put the connection in together with the main scripts?
I also like to capture such things (like connections) in a different file, but also in an designated environment like this:
ConnectionManager <- local({
con <- src_postgres(dbname = "thedbname",
host = "blablabla.amazonaws.com",
port = NULL,
user = "myname",
password = "1234")
collectTable <- function() {
tbl(con, "thetable") %>% select(id, apples, carrots) %>% collect
}
list(collectTable = collectTable)
})
This way you have only one object ConnectionManager after sourcing the file and can get the table with ConnectionManager$collectTable(). Additionally you can easily extend it to fetch other tables or to include some connection utility functions.
Related
I have an Oracle database which is refreshed once a day. I am a bit confused on how apps work in Shiny, what gets run once on app startup - and what gets run once per session.
My naive approach was to create a database connection and run a query outside of UI and Server code to create a dataframe of around 600,000 records...which can then be filtered and sliced during the session. I am a bit concerned by doing it inside app.R in global scope, that this connection and dataframe will only be created once when the server starts the app, and will never get run again (if that makes sense).
If I create the data frame in server, then my UI code fails, as is is dependent on the results of a query to populate the select list, and I do this in app.R scope at the moment, so UI can access it.
library(shiny)
library(DBI)
library(dplyr)
library(odbc)
library(stringdist)
library(reactable)
############################################################################
# business functions #
############################################################################
get_list_of_actives_from_db <- function() {
con <- dbConnect(odbc::odbc(), Driver="oracle", Host = "server.mycompany.net", Port = "1521", SVC = "service1", UID = "user_01", PWD = "hello", timeout = 10)
ingredients_df = dbGetQuery(con,
'
select DISTINCT INGREDIENTS FROM AES
'
)
}
get_adverse_events_from_db <- function() {
con <- dbConnect(odbc::odbc(), Driver="oracle", Host = "server.mycompany.net", Port = "1521", SVC = "service1", UID = "user_01", PWD = "hello", timeout = 10)
cases_df = dbGetQuery(con,
'
select * FROM AES
'
)
return(cases_df)
}
############################################################################
# load data sets for use in dashboard #
############################################################################
cases_df = get_adverse_events_from_db() # drive select list in UI
ingredients_df = get_list_of_actives_from_db() # main data to slice and filter
############################################################################
# shiny UI #
############################################################################
ui <- fluidPage(
"Adverse Event Fuzzy Search Tool",
fluidRow(
selectInput("ingredients", label = "Select on or more Active Ingredients:", choices = ingredients_df$PRIMARY_SUSPECT_KEY_INGREDIENT, multi=TRUE),
textInput("search_term", "AE Search Term:"),
actionButton("do_search", "Perform Search")
)
,
fluidRow(
reactableOutput("search_results")
)
)
############################################################################
# shiny server #
############################################################################
server <- function(input, output, session) {
# do stuff here to filter the data frame based on the selected value and render a table
}
# Run the application
shinyApp(ui = ui, server = server)
My main concern is doing this in the root of app.R, both functions run oracle queries which never need to be re-run for the session, as the data will only change overnight via ETL.
############################################################################
# load data sets for use in dashboard #
############################################################################
cases_df = get_adverse_events_from_db()
ingredients_df = get_list_of_actives_from_db()
When and how often is this called? Once when the app is initialized so the data set is never updated and is shared across sessions by users? Or is the entire script run end to end whenever a new sessions is started?
Part of me thinks it should be in the server function, so it runs once per session. But being new to Shiny I feel like server is called constantly whenever there is a change in the UI, I dont want to be constantly loading 600,000 records from Oracle.
Ideally I would cache the results once a day and make them available to all users across all sessions, not sure how to achieve that - so for now just want to know the best way to achieve this, so each user runs the query once and has the data frame cached for the session.
Please check RStudio's article Scoping rules for Shiny apps in this context.
If I got you right, you are asking to share a dataset across shiny-sessions and update it daily (The title of the question didn't really fit your explanation of the problem - I edited it).
I'd suggest using a cross-session reactivePoll to avoid unnecessary DB queries (I once asked a similar question here - Over there I gave an example showing, that the same can be achived via reactiveValues but it's more complex).
Here is the simple pattern you can use - please note that reactivePoll is defined outside the server function so all sessions share the same data:
library(shiny)
ui <- fluidPage(textOutput("my_db_data"))
updated_db_data <- reactivePoll(
intervalMillis = 1000L*60L*5L, # check for a new day every 5 minutes
session = NULL,
checkFunc = function() {
print(paste("Running checkFunc:", Sys.time()))
Sys.Date()
},
valueFunc = function() {
# your db query goes here:
paste("Latests DB update:", Sys.time())
}
)
server <- function(input, output, session) {
output$my_db_data <- renderText(updated_db_data())
}
shinyApp(ui, server)
Here, every 5 minutes the checkFunc checks for a new day - valueFunc is executed only if the result of checkFunc changed. As a (real world) alternative for checkFunc you could implement a query to check for the number of rows of a certain DB table.
PS: There is an example given on a cross-session reactiveFileReader (which is based on reactivePoll) when viewing ?reactiveFileReader
PPS: When doing further filtering etc. on that dataset also check bindCache().
While untested, perhaps this architecture will work:
server <- function(input, output, session) {
dailydata_ <- reactiveValues(when = NULL, what = NULL)
dailydata <- reactive({
oldwhen <- dailydata_$when
if (is.null(oldwhen) ||
as.Date(oldwhen) < Sys.Date()) {
newdata <- tryCatch(
DBI::dbGetQuery(con, "..."),
error = function(e) e)
if (inherits(newdata, "error")) {
warning("error retrieving new data: ", conditionMessage(e))
warning("using stale data instead")
} else {
dailydata_$when <- Sys.time()
dailydata_$what <- newdata
}
}
dailydata_$what
})
# some consumer of the real data
output$tbl <- renderTable(dailydata())
}
The advantage to this is that it's re-query will trigger when the data was retrieved on a different day. Granted, when the new ETL is available might change how exactly this conditional is fashioned, it might be that if it is updated at (say) 2am, then you may need some more time-math to determine if the current data is before or after the most recent update.
This logic has a "data available" fail: if it could not be queried, then the current/stale data is re-used. If you prefer that it returns no data, that is easy enough to change in the code.
(One thing you might want to do is to show the user when the data was last retrieved; this can be retrieved directly with dailydata_$when, accepting that it might be NULL.)
I am using RStudio Server and ODBC to connect to a redshift database. I can connect easily using:
conn <- dbConnect(odbc::odbc(), Driver="redshift",
Server = SERVER_URL,
Port = "5439",
Database = DB_NAME,
PWD = PASSWORD,
UID = CREDENTIALS,
timeout = 10,
Trusted_Connection = "True")
When connected in shows up in the sidebar "connections" where I have an UI to look through the database. That is exactly what i want.
The problem is that if i call the same code inside a function, then I get the database connection but no UI?!? How do i get the UI to appear when calling this code from inside a function?
C
onnection_odbc_profile <- function(INPUT){
conn <- dbConnect(odbc::odbc(), Driver="redshift",
Server = SERVER_URL,
Port = "5439",
Database = DB_NAME,
PWD = PASSWORD,
UID = CREDENTIALS,
timeout = 10,
Trusted_Connection = "True")
return(conn)
}
I think the issue is that the connection pane only gets updated when the code is run at top-level. Is there any way to force a line of code in a function to run at top-level (or directly in the console)
I solved the problem by adding:
code <- c(match.call()) # This saves what was typed into R
odbc:::on_connection_opened(conn, paste(c(paste("con <-", gsub(", ", ",\n\t", code))), collapse = "\n"))
As I am new to developing with Shiny, I am interested in the best practices for automated database queries. At the time of writing there are a number of different sources with different information.
If I am to query my postgres database every 10 minutes as in the example below, I want to make sure that there are no issues with a) closing the connection on session exit and b) not being able to connect due to too many open connections. My dashboard will in the future have at most a dozen users at one time.
Having done some research, I am convinced that the best way to do this is not necessarily to use a pool but to use the "one connection per query" method documented by Shiny here
Is using reactivePoll() as I have below the correct way to implement a query that will refresh the rendered table every 10 minutes? The database I will be querying will definitely return different data with every call. Does that mean that checkFunc and valueFunc should be the same or can checkFunc be left as an empty function altogether ?
library(shiny)
library(DBI)
args <- list(
drv = dbDriver("PostgreSQL"),
dbname = "shinydemo",
host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com",
username = "guest",
password = "guest"
)
ui <- fluidPage(
textInput("ID", "Enter your ID:", "5"),
tableOutput("tbl"),
numericInput("nrows", "How many cities to show?", 10),
plotOutput("popPlot")
)
server <- function(input, output, session) {
output$tbl <- renderTable({
conn <- do.call(DBI::dbConnect, args)
on.exit(DBI::dbDisconnect(conn))
sql <- "SELECT * FROM City WHERE ID = ?id;"
query <- sqlInterpolate(conn, sql, id = input$ID)
data <- reactivePoll(10000, session,
checkFunc = function() {}
valueFunc = function() {
dbGetQuery(conn, query)
})
})
}
shinyApp(ui, server)
I recommend creating your db connection 'conn' out of any output objects.
args <- list(
drv = dbDriver("PostgreSQL"),
dbname = "shinydemo",
host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com",
username = "guest",
password = "guest"
)
conn <- do.call(DBI::dbConnect, args)
it could be a global environment object, like the 'args' list in your sample code, or inside the server function, queries within rendered output objects will all access the same 'conn' db connection. In my experience, disconnect was not necessary to include, after the Rsession with the Shiny app is closed the database disconnects too.
I'm trying to catalog the structure of a MSSQL 2008 R2 database using R/RODBC. I have set up a DSN, connected via R and used the sqlTables() command but this is only getting the 'system databases' info.
library(RODBC)
conn1 <- odbcConnect('my_dsn')
sqlTables(conn1)
However if I do this:
library(RODBC)
conn1 <- odbcConnect('my_dsn')
sqlQuery('USE my_db_1')
sqlTables(conn1)
I get the tables associated with the my_db_1 database. Is there a way to see all of the databases and tables without manually typing in a separate USE statement for each?
There may or may not be a more idiomatic way to do this directly in SQL, but we can piece together a data set of all tables from all databases (a bit more programatically than repeated USE xyz; statements) by getting a list of databases from master..sysdatabases and passing these as the catalog argument to sqlTables - e.g.
library(RODBC)
library(DBI)
##
tcon <- RODBC::odbcConnect(
dsn = "my_dsn",
uid = "my_uid",
pwd = "my_pwd"
)
##
db_list <- RODBC::sqlQuery(
channel = tcon,
query = "SELECT name FROM master..sysdatabases")
##
R> RODBC::sqlTables(
channel = tcon,
catalog = db_list[14, 1]
)
(I can't show any of the output for confidentiality reasons, but it produces the correct results.) Of course, in your case you probably want to do something like
all_metadata <- lapply(db_list$name, function(DB) {
RODBC::sqlTables(
channel = tcon,
catalog = DB
)
})
# or some more efficient variant of data.table::rbindlist...
meta_df <- do.call("rbind", all_metadata)
I'm using tbl_sql object in my Shiny app to have access to a database table. I've noticed that sometimes dplyr close this connection. It might be because garbage collector calls db_disconnector. Is there any way to stop this? I could close the connection on the shiny close event.
It seems like, if you d <- src_mysql(...) (I guess that's the backend you're using, and how you're connecting to the data base?) then the garbage collector will only run if d goes out of scope. Maybe its the database that is timing out connections as a way to manage load?
One way to test this is to write your own wrapper (rather than src_mysql()) that does not disconnect
src_yoursql <-
function (dbname, host = NULL, port = 0L, user = "root", password = "",
...)
{
if (!requireNamespace("RMySQL", quietly = TRUE)) {
stop("RMySQL package required to connect to mysql/mariadb",
call. = FALSE)
}
con <- DBI::dbConnect(RMySQL::MySQL(), dbname = dbname, host = host,
port = port, username = user, password = password, ...)
info <- DBI::dbGetInfo(con)
src_sql("mysql", con, info = info)
}
d = src_yoursql(...)
Close it manually with
DBI::dbDisconnect(d$con)