How to run a scheduled cross-session database query in shiny - r

I have an Oracle database which is refreshed once a day. I am a bit confused on how apps work in Shiny, what gets run once on app startup - and what gets run once per session.
My naive approach was to create a database connection and run a query outside of UI and Server code to create a dataframe of around 600,000 records...which can then be filtered and sliced during the session. I am a bit concerned by doing it inside app.R in global scope, that this connection and dataframe will only be created once when the server starts the app, and will never get run again (if that makes sense).
If I create the data frame in server, then my UI code fails, as is is dependent on the results of a query to populate the select list, and I do this in app.R scope at the moment, so UI can access it.
library(shiny)
library(DBI)
library(dplyr)
library(odbc)
library(stringdist)
library(reactable)
############################################################################
# business functions #
############################################################################
get_list_of_actives_from_db <- function() {
con <- dbConnect(odbc::odbc(), Driver="oracle", Host = "server.mycompany.net", Port = "1521", SVC = "service1", UID = "user_01", PWD = "hello", timeout = 10)
ingredients_df = dbGetQuery(con,
'
select DISTINCT INGREDIENTS FROM AES
'
)
}
get_adverse_events_from_db <- function() {
con <- dbConnect(odbc::odbc(), Driver="oracle", Host = "server.mycompany.net", Port = "1521", SVC = "service1", UID = "user_01", PWD = "hello", timeout = 10)
cases_df = dbGetQuery(con,
'
select * FROM AES
'
)
return(cases_df)
}
############################################################################
# load data sets for use in dashboard #
############################################################################
cases_df = get_adverse_events_from_db() # drive select list in UI
ingredients_df = get_list_of_actives_from_db() # main data to slice and filter
############################################################################
# shiny UI #
############################################################################
ui <- fluidPage(
"Adverse Event Fuzzy Search Tool",
fluidRow(
selectInput("ingredients", label = "Select on or more Active Ingredients:", choices = ingredients_df$PRIMARY_SUSPECT_KEY_INGREDIENT, multi=TRUE),
textInput("search_term", "AE Search Term:"),
actionButton("do_search", "Perform Search")
)
,
fluidRow(
reactableOutput("search_results")
)
)
############################################################################
# shiny server #
############################################################################
server <- function(input, output, session) {
# do stuff here to filter the data frame based on the selected value and render a table
}
# Run the application
shinyApp(ui = ui, server = server)
My main concern is doing this in the root of app.R, both functions run oracle queries which never need to be re-run for the session, as the data will only change overnight via ETL.
############################################################################
# load data sets for use in dashboard #
############################################################################
cases_df = get_adverse_events_from_db()
ingredients_df = get_list_of_actives_from_db()
When and how often is this called? Once when the app is initialized so the data set is never updated and is shared across sessions by users? Or is the entire script run end to end whenever a new sessions is started?
Part of me thinks it should be in the server function, so it runs once per session. But being new to Shiny I feel like server is called constantly whenever there is a change in the UI, I dont want to be constantly loading 600,000 records from Oracle.
Ideally I would cache the results once a day and make them available to all users across all sessions, not sure how to achieve that - so for now just want to know the best way to achieve this, so each user runs the query once and has the data frame cached for the session.

Please check RStudio's article Scoping rules for Shiny apps in this context.
If I got you right, you are asking to share a dataset across shiny-sessions and update it daily (The title of the question didn't really fit your explanation of the problem - I edited it).
I'd suggest using a cross-session reactivePoll to avoid unnecessary DB queries (I once asked a similar question here - Over there I gave an example showing, that the same can be achived via reactiveValues but it's more complex).
Here is the simple pattern you can use - please note that reactivePoll is defined outside the server function so all sessions share the same data:
library(shiny)
ui <- fluidPage(textOutput("my_db_data"))
updated_db_data <- reactivePoll(
intervalMillis = 1000L*60L*5L, # check for a new day every 5 minutes
session = NULL,
checkFunc = function() {
print(paste("Running checkFunc:", Sys.time()))
Sys.Date()
},
valueFunc = function() {
# your db query goes here:
paste("Latests DB update:", Sys.time())
}
)
server <- function(input, output, session) {
output$my_db_data <- renderText(updated_db_data())
}
shinyApp(ui, server)
Here, every 5 minutes the checkFunc checks for a new day - valueFunc is executed only if the result of checkFunc changed. As a (real world) alternative for checkFunc you could implement a query to check for the number of rows of a certain DB table.
PS: There is an example given on a cross-session reactiveFileReader (which is based on reactivePoll) when viewing ?reactiveFileReader
PPS: When doing further filtering etc. on that dataset also check bindCache().

While untested, perhaps this architecture will work:
server <- function(input, output, session) {
dailydata_ <- reactiveValues(when = NULL, what = NULL)
dailydata <- reactive({
oldwhen <- dailydata_$when
if (is.null(oldwhen) ||
as.Date(oldwhen) < Sys.Date()) {
newdata <- tryCatch(
DBI::dbGetQuery(con, "..."),
error = function(e) e)
if (inherits(newdata, "error")) {
warning("error retrieving new data: ", conditionMessage(e))
warning("using stale data instead")
} else {
dailydata_$when <- Sys.time()
dailydata_$what <- newdata
}
}
dailydata_$what
})
# some consumer of the real data
output$tbl <- renderTable(dailydata())
}
The advantage to this is that it's re-query will trigger when the data was retrieved on a different day. Granted, when the new ETL is available might change how exactly this conditional is fashioned, it might be that if it is updated at (say) 2am, then you may need some more time-math to determine if the current data is before or after the most recent update.
This logic has a "data available" fail: if it could not be queried, then the current/stale data is re-used. If you prefer that it returns no data, that is easy enough to change in the code.
(One thing you might want to do is to show the user when the data was last retrieved; this can be retrieved directly with dailydata_$when, accepting that it might be NULL.)

Related

Shiny App re-runs observeEvent when restoring a bookmarked state

I'm building a data-driven Shiny App that requires the user to bookmark the App state sometimes so that she can save the processed data then reverting to it later to continue working on the App. I have used the bookmarking method in Shiny but I'm stuck with a problem that makes Shiny re-run the event whenever I restore the state, as if I hit the button again.
I attach herewith a sample code representing the problem. It is a simple interface with months and years dropdown menus. The user is expected to select a month and a year then hit the 'Set Billing Month' button to store the values. If the user elects to Bookmark the state, she hits the Bookmark button and receives the modal box with the URL. The user can then plug the URL into the browser and restores the state as she left it. All that works well except that when the App restores the state, it re-runs the observeEvent function of the Set Month button. This is evident by throwing the designed message "Billing Month has been set!" that appears when hitting the button.
I'm expecting Shiny to restore the bookmarked values without re-running the App to get them back. The bookmarking is done on 'server' as a store. This issue causes a problem to me especially for other functions that require loading big data and processing them.
I tried to replicate the problem with the sample code below.
Any assistance is highly appreciated!
`
library(shiny)
library(shinyjs)
enableBookmarking()
ui <- function(request) {
fluidPage(
useShinyjs(),
bookmarkButton(),
titlePanel("Billing App"),
tabsetPanel(
tabPanel("Invoicing Setup",
selectInput("BillMo",label = "Month",choices = c(1,2,3,4,5,6,7,8,9,10,11,12)),
selectInput("BillYr",label = "Year",choices = c(2022,2023,2024,2025,2026,2027,2028,2029,2030)),
actionButton("btnSetInvMo",label = "Set Billing Month",icon = NULL)
)
)
)
}
server <- function(input, output, session) {
# Set up Reactive Values
rVals <- reactiveValues(rBillMo=NULL,
rBillYr=NULL)
observeEvent(input$btnSetInvMo,{
BillMo <- as.integer(input$BillMo)
BillYr <- as.integer(input$BillYr)
# Assign reactive variables
observe({rVals$rBillMo <- BillMo})
observe({rVals$rPrvMo2 <- BillYr})
shinyjs::disable("btnSetInvMo")
showNotification("Billing Month has been set!", type = "message")
})
# BOOKMARKING
onBookmark(function(state){
state$values$billMo <- rVals$rBillMo
state$values$billYr <- rVals$rBillYr
})
onRestore(function(state){
rVals$rBillMo <- reactive(state$values$BillMo)
rVals$rBillYr <- reactive(state$values$BillYr)
})
}
shinyApp(ui, server,enableBookmarking = "server")
`
Add ignoreInit = TRUE to your observer
observeEvent(input$btnSetInvMo,{
BillMo <- as.integer(input$BillMo)
BillYr <- as.integer(input$BillYr)
# Assign reactive variables
observe({rVals$rBillMo <- BillMo})
observe({rVals$rPrvMo2 <- BillYr})
shinyjs::disable("btnSetInvMo")
print("doing")
showNotification("Billing Month has been set!", type = "message")
},ignoreInit = TRUE)

Use reactivePoll in R: The checkFunc didn't execute

I am quite new to R. I tried to use reactivePoll to update my dashboard data. All my data is drawn from the database. The code shows no error. But the dashboard is not updated by day as I set it. Here is my code:
log_db <- reactivePoll(60000*60*24, session,
# Check for maximum month
checkFunc = function() {
#connect to the database
#check for maximum month in the database. If there's a change, the value function will run.
maxmonth <- paste("SQL code")
month <- dbGetQuery(maxmonth)
return(month)
},
# Pull new table if value has changed
valueFunc = function() {
#connect to db
#pull new dataframe,
return(oldnew_combined)
}
)
}
I think the format is fine since there are no error shows. I also tried to see the maximum month in the console. However, it says object not found which basically means the checkFunc didn't run. I wonder what goes wrong here. Thank you!
Steps:
1-You need to create the reactivepoll inside the server. log_db
2-
Create a rendering object inside the server (in your case: renderTable) with reactivePoll inside with parentheses: output$idforUi<- renderTable( { log_db() })
3-Create the output for your render object in the ui.
ui=fluidPage(tableOutput("idforUi"))
library(shiny) # good practices
library(RMariaDB) #good practices
server <- function(input, output,session) {
#The connection to SQL does not need to be inside the checkfunction or valuefunction,
#if you put it inside the checkfunction it will connect every milliseconds argument.
#If you place the connection inside the server but outside the reactivePoll, when you open the app it connects, and updates every milliseconds inside the reactivePoll
localuserpassword="yourpassword"
storiesDb<- dbConnect(RMariaDB::MariaDB(), user='YOUR_USER', password=localuserpassword, dbname='DBNAME', host='YOURHOST')
#your database will be checked if it changes every 60000 * 60 * 24 milliseconds (24 hours)
log_db <- reactivePoll(60000*60*24, session, #reactivePoll inside the server
# Check for maximum month
checkFunc = function() {
query2= "SELECT * FROM YOURTABLE"
rs = dbSendQuery(storiesDb,query2)
dbFetch(rs)# visualize
},
# Pull new table if value has changed
valueFunc = function() {
query2= "SELECT * FROM YOURTABLE"
rs = dbSendQuery(storiesDb,query2)
dbFetch(rs)# visualize
}
)
#log_db is a function dont forget the () inside renderTable()
output$idforUi<- renderTable( { log_db() }) # renderTable
#create a object to send the result of your reactivepoll for User Interface
}
# table output
ui=fluidPage(tableOutput("idforUi"))
# Receive the result of your reactivepoll in the User Interface
shinyApp(ui, server)
You are unable to access it from the console does not mean that checkFunc did not run,you will not be able to access the "month" object on the console because it exists only in the reactivepoll function(local variable), not in global environment. See this

DT Editing in Shiny application with client-side processing (server = F) throws JSON Error

I have a Shiny Server application in which the user can edit a datatable, after which some reactive summary statistics update accordingly. I am hosting this app on a fairly slow framework, which is why I want to use client-side processing for the DT rendering, i.e. server = F passed to DT::renderDataTable. Let me break down the main points of my problem:
The code is fully operational when server = T is passed.
When passing server = F, the browser throws the following error message when the user edits a cell in the DT:
DataTables warning: table id=DataTables_Table_5 - Invalid JSON
response. For more information about this error, please see
http://datatables.net/tn/1
An interesting thing is that when this error window is dismissed, the dependent summary statistics update correctly according to the edit, and the Shiny app carries on. Hence, everything works except for the error. I should note that I visited the site referred to in the error without becoming any wiser.
Reproducible example below:
library(shiny)
library(DT)
dt = data.frame(V1 = c(1,2), V2 = c(3,4))
server <- function(input, output, session) {
val = reactiveValues(mat = data.table(dt))
output$testDT = renderDataTable({
DT::datatable(val$mat, editable = TRUE)
}, server = FALSE)
proxy = dataTableProxy('testDT')
observeEvent(input$testDT_cell_edit, {
info = input$testDT_cell_edit
str(info)
i = info$row
j = info$col
v = info$val
if (j == 1){
val$mat$V1[i] = DT::coerceValue(v, val$mat$V1[i])
replaceData(proxy, val$mat, rownames = FALSE)
}
})
}
ui <- fluidPage(
dataTableOutput('testDT')
)
shinyApp(ui, server)
Thanks!
It's has been answered on the Github thread and I'm sharing my answer here.
Probably it's not documented clearly. It has nothing to do with the editing. It's because replaceData() calls reloadData(), which requires the server-side processing mode. See ?reloadData().
reloadData() only works for tables in the server-side processing mode, e.g. tables rendered with renderDataTable(server = TRUE). The data to be reloaded (i.e. the one you pass to dataTableAjax()) must have exactly the same number of columns as the previous data object in the table.

Async process blocking R Shiny app

It should be possible to use the R packages future and promises to trigger asynchronous (long running) processing via Shiny apps without freezing the rest of the app while the async process is running in another R process.
See:
https://cran.r-project.org/web/packages/promises/vignettes/intro.html
https://cran.r-project.org/web/packages/promises/vignettes/overview.html
https://cran.r-project.org/web/packages/promises/vignettes/futures.html
https://cran.r-project.org/web/packages/promises/vignettes/shiny.html
I got this to work in R-script-based environment but can't get this to work when I implement a simple shiny app with 2 functions. The "not-async" function is always blocked while the async function is running, but that should not be the case.
I have posted the same question on the GitHub repo of the package promises: https://github.com/rstudio/promises/issues/23
I am posting it here as well hoping someone can help.
The question is:
Can you take a look at the shiny app example posted below and let me know why the async processing is blocking the app? (It should not block).
Ideally, can you provide a small example of an app with a non-blocking async and normal functionality (accessible while the async is running)?
Environment
Mac OS 10.12
$ R --version
R version 3.4.3 (2017-11-30) -- "Kite-Eating Tree"
remove.packages("future")
remove.packages("promises")
remove.packages("shiny")
install.packages("future")
install.packages("devtools")
devtools::install_github("rstudio/promises")
devtools::install_github("rstudio/shiny")
> packageVersion("future")
[1] ‘1.8.1’
> packageVersion("promises")
[1] ‘1.0.1’
> packageVersion("shiny")
[1] ‘1.0.5.9000’
One side question on the shiny package version, https://rstudio.github.io/promises/articles/intro.html says it should be >=1.1, but even installing with devtools, the version remains 1.0.5... . Is this an issue or is there a typo in the doc?
First, you can use promises with Shiny outputs. If you’re using an async-compatible version of Shiny (version >=1.1), all of the built-in renderXXX functions can deal with either regular values or promises.
Example of issue
I have implemented this simple shiny app inspired from the example at the URLs mentioned above.
The shiny app has 2 "sections":
A button to trigger the "long running" async processing. This is simulated by a function read_csv_async which sleeps for a few seconds, reads a csv file into a data frame. The df is then rendered below the button.
A simple functionality which should work at any time (including when the async processing has been triggered): it includes a slider defining a number of random values to be generated. We then render a histogram of these values.
The issue is that the second functionality (histogram plot update) is blocked while the async processing is occurring.
global.R
library("shiny")
library("promises")
library("dplyr")
library("future")
# path containing all files, including ui.R and server.R
setwd("/path/to/my/shiny/app/dir")
plan(multiprocess)
# A function to simulate a long running process
read_csv_async = function(sleep, path){
log_path = "./mylog.log"
pid = Sys.getpid()
write(x = paste(format(Sys.time(), "%Y-%m-%d %H:%M:%OS"), "pid:", pid, "Async process started"), file = log_path, append = TRUE)
Sys.sleep(sleep)
df = read.csv(path)
write(x = paste(format(Sys.time(), "%Y-%m-%d %H:%M:%OS"), "pid:", pid, "Async process work completed\n"), file = log_path, append = TRUE)
df
}
ui.R
fluidPage(
actionButton(inputId = "submit_and_retrieve", label = "Submit short async analysis"),
br(),
br(),
tableOutput("user_content"),
br(),
br(),
br(),
hr(),
sliderInput(inputId = "hist_slider_val",
label = "Histogram slider",
value = 25,
min = 1,
max = 100),
plotOutput("userHist")
)
server.R
function(input, output){
# When button is clicked
# load csv asynchronously and render table
data_promise = eventReactive(input$submit_and_retrieve, {
future({ read_csv_async(10, "./data.csv") })
})
output$user_content <- renderTable({
data_promise() %...>% head(5)
})
# Render a new histogram
# every time the slider is moved
output$userHist = renderPlot({
hist(rnorm(input$hist_slider_val))
})
}
data.csv
Column1,Column2
foo,2
bar,5
baz,0
Thanks!
So this behavior is normal, see the response of the package developer at https://github.com/rstudio/promises/issues/23
Summary:
In shiny apps, one R process can be shared by multiple users.
If one user submits a long running task, then all the other users sharing the same underlying R process are blocked.
The goal of promises is to avoid this. So promises will prevent blocking between "user sessions" within one R process but not within a single "user session".
The author of the package mentioned that this feature is not supported yet and that it may be added if enough people ask for it. If you are looking for this, please go the GitHub issue and like the original question - this is how interest for new features is measured.
Thanks!
As this or similar questions about shiny intra-session responsiveness are frequently asked on stackoverflow I think it's worth mentioning the workaround Joe Cheng provides in the GitHub issue #Raphvanns created:
If you really must have this kind of behavior, there is a way to work
around it. You can "hide" the async operation from the Shiny session
(allowing the session to move on with its event loop) by not returning
your promise chain from your observer/reactive code. Essentially the
async operation becomes a "fire and forget". You need to hook up a
promise handler to have some side effect; in the example below, I set
a reactiveVal on successful completion.
Some caveats to this approach:
By doing this you are inherently opening yourself up to race
conditions. Even in this very simple example, the user can click the
Submit button multiple times; if the long-running task has very
variable runtime you might end up with multiple results coming back,
but out of order. Or if you reference input values in promise
handlers, they might pick up values that were set after the submit
button was clicked!
You also lose the automatic semi-transparent
indication that an output has been invalidated (though below I at
least null the reactiveVal out in the beginning of the observeEvent).
Accordingly the solution for the above example code can be something like this:
library("shiny")
library("promises")
library("dplyr")
library("future")
# path containing all files, including ui.R and server.R
# setwd("/path/to/my/shiny/app/dir")
write.csv(data.frame(stringsAsFactors=FALSE,
Column1 = c("foo", "bar", "baz"),
Column2 = c(2, 5, 0)
), file = "./data.csv")
onStop(function() {
file.remove("./data.csv")
})
plan(multiprocess)
# A function to simulate a long running process
read_csv_async = function(sleep, path){
log_path = "./mylog.log"
pid = Sys.getpid()
write(x = paste(format(Sys.time(), "%Y-%m-%d %H:%M:%OS"), "pid:", pid, "Async process started"), file = log_path, append = TRUE)
Sys.sleep(sleep)
df = read.csv(path)
write(x = paste(format(Sys.time(), "%Y-%m-%d %H:%M:%OS"), "pid:", pid, "Async process work completed\n"), file = log_path, append = TRUE)
df
}
ui <- fluidPage(
textOutput("parallel"),
sliderInput(inputId = "hist_slider_val",
label = "Histogram slider",
value = 25,
min = 1,
max = 100),
plotOutput("userHist"),
actionButton(inputId = "submit_and_retrieve", label = "Submit short async analysis"),
tableOutput("user_content")
)
server <- function(input, output, session) {
data_promise <- reactiveVal()
# When button is clicked
# load csv asynchronously and render table
observeEvent(input$submit_and_retrieve, {
future({ read_csv_async(10, "./data.csv") }) %...>% data_promise()
return(NULL) # hide future
})
output$user_content <- renderTable({
req(data_promise())
head(data_promise(), 5)
})
# Render a new histogram
# every time the slider is moved
output$userHist = renderPlot({
hist(rnorm(input$hist_slider_val))
})
output$parallel <- renderText({
invalidateLater(300)
paste("Something running in parallel:", Sys.time())
})
}
shinyApp(ui = ui, server = server)
Note the return(NULL) in the observeEvent call to hide the future. This way the long running process no longer blocks the execution of the other reactives.

Best practices in automated database queries in Shiny

As I am new to developing with Shiny, I am interested in the best practices for automated database queries. At the time of writing there are a number of different sources with different information.
If I am to query my postgres database every 10 minutes as in the example below, I want to make sure that there are no issues with a) closing the connection on session exit and b) not being able to connect due to too many open connections. My dashboard will in the future have at most a dozen users at one time.
Having done some research, I am convinced that the best way to do this is not necessarily to use a pool but to use the "one connection per query" method documented by Shiny here
Is using reactivePoll() as I have below the correct way to implement a query that will refresh the rendered table every 10 minutes? The database I will be querying will definitely return different data with every call. Does that mean that checkFunc and valueFunc should be the same or can checkFunc be left as an empty function altogether ?
library(shiny)
library(DBI)
args <- list(
drv = dbDriver("PostgreSQL"),
dbname = "shinydemo",
host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com",
username = "guest",
password = "guest"
)
ui <- fluidPage(
textInput("ID", "Enter your ID:", "5"),
tableOutput("tbl"),
numericInput("nrows", "How many cities to show?", 10),
plotOutput("popPlot")
)
server <- function(input, output, session) {
output$tbl <- renderTable({
conn <- do.call(DBI::dbConnect, args)
on.exit(DBI::dbDisconnect(conn))
sql <- "SELECT * FROM City WHERE ID = ?id;"
query <- sqlInterpolate(conn, sql, id = input$ID)
data <- reactivePoll(10000, session,
checkFunc = function() {}
valueFunc = function() {
dbGetQuery(conn, query)
})
})
}
shinyApp(ui, server)
I recommend creating your db connection 'conn' out of any output objects.
args <- list(
drv = dbDriver("PostgreSQL"),
dbname = "shinydemo",
host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com",
username = "guest",
password = "guest"
)
conn <- do.call(DBI::dbConnect, args)
it could be a global environment object, like the 'args' list in your sample code, or inside the server function, queries within rendered output objects will all access the same 'conn' db connection. In my experience, disconnect was not necessary to include, after the Rsession with the Shiny app is closed the database disconnects too.

Resources