In the following shiny app, the plotly package is used to create an interactive correlation heat map. When individual tiles are clicked, the corresponding scatter plot appears. One can then download the individual scatters by clicking download plot as png. But is there a way to download all the possible scatter plots at once without having to click each individual tile and save each individual one? Thank you
library(plotly)
library(shiny)
# compute a correlation matrix
correlation <- round(cor(mtcars), 3)
nms <- names(mtcars)
ui <- fluidPage(
mainPanel(
plotlyOutput("heat"),
plotlyOutput("scatterplot")
),
verbatimTextOutput("selection")
)
server <- function(input, output, session) {
output$heat <- renderPlotly({
plot_ly(x = nms, y = nms, z = correlation,
key = correlation, type = "heatmap", source = "heatplot") %>%
layout(xaxis = list(title = ""),
yaxis = list(title = ""))
})
output$selection <- renderPrint({
s <- event_data("plotly_click")
if (length(s) == 0) {
"Click on a cell in the heatmap to display a scatterplot"
} else {
cat("You selected: \n\n")
as.list(s)
}
})
output$scatterplot <- renderPlotly({
s <- event_data("plotly_click", source = "heatplot")
if (length(s)) {
vars <- c(s[["x"]], s[["y"]])
d <- setNames(mtcars[vars], c("x", "y"))
yhat <- fitted(lm(y ~ x, data = d))
plot_ly(d, x = ~x) %>%
add_markers(y = ~y) %>%
add_lines(y = ~yhat) %>%
layout(xaxis = list(title = s[["x"]]),
yaxis = list(title = s[["y"]]),
showlegend = FALSE)
} else {
plotly_empty()
}
})
}
shinyApp(ui, server)
You can use webshot to capture a static image of Plotly's HTML output using the instructions here: https://plot.ly/r/static-image-export/
An example for loop below generates random scatter plots from mtcars.
library(plotly)
library(webshot)
## You'll need to run the function the first time if you dont't have phantomjs installed
#webshot::install_phantomjs()
ColumnOptions <- colnames(mtcars)
for (i in seq_len(5)){
xCol <- sample(ColumnOptions,1)
yCol <- sample(ColumnOptions,1)
ThisFileName <- paste0("Scatter_",xCol,"_vs_",yCol,".png")
plot_ly(x = mtcars[[xCol]], y = mtcars[[yCol]], type = "scatter", mode = "markers") %>%
export(., file = ThisFileName)
}
However, if you're going to be potentially doing this dozens of times, the amount of computation required to go through the following steps really adds up.
Generate a JSON plotly object from R
Use htmlwidgets/htmltoolsto generate a self-contained HTML web page
Render that HTML as a browser would see it with an external program --webshot
Use webshot to render an image of that HTML and save it as a PNG
This isn't really a reflection of plotly being slow, but to make an analogy it's kind've like using an airplane to travel half a mile -- the plane gets you there, but if you need to make that trip more than a few times you should probably consider a car.
The plotly loop above takes 27 seconds to render 5 PNG images, but the alternative method below using ggplot2 takes 1.2 seconds.
library(ggplot2)
ColumnOptions <- colnames(mtcars)
for (i in seq_len(5)){
xCol <- sample(ColumnOptions,1)
yCol <- sample(ColumnOptions,1)
ThisFileName <- paste0("ggplot2_Scatter_",xCol,"_vs_",yCol,".png")
ggplot() +
geom_point(aes(x = mtcars[[xCol]], y = mtcars[[yCol]])) +
labs(x = xCol, y = yCol) -> ThisPlot
ggsave(plot = ThisPlot, filename = ThisFileName)
}
Related
I'm trying to include a plotly plot in a shiny app where the y variable is selected by the user. I initially used ggplot2 and plotly together, and the code I have works just fine for that. But because the number of data points is quite large, the plot takes several minutes to load, so I tried switching to plotly only because I read somewhere that that makes it faster. Unfortunately I cannot get the y variable selection to work.
I have tried the suggestions given here: Change plotly chart y variable based on selectInput and here: Error: invalid first argument with R Shiny plot and none of them work. At this point I have tried so many things I don't remember in detail, but basically I either get the error "invalid first argument" when using some variation of yvar <- get(input$yvariable1) and then including ~yvar in the plot function, or I get "Error: cannot set attribute on a symbol" when it's y = ~input$yvariable1. When I use y = newdata[ ,input$yvariable1] something gets plotted but it's completely wrong (the scale of the axis is up to 50k or something instead of 10 and the distribution is not right either - basically it looks nothing like when I plot it by simply entering the same y variable non-reactively).
My code looks as follows - in UI:
uiOutput("ySelection1")
in server:
function(input, output) {
output$ySelection1 <- renderUI({
varSelectInput("yvariable1", "Y Variable:", df[, c('PO_count_citing', 'cpc_3digits_count_citing', 'cpc_4digits_count_citing')], selected='PO_count_citing')
})
yvar1 <- eventReactive(input$yvariable1, {input$yvariable1})
output$plot1 <- renderPlotly({
newdata <- subset(df, Technology == input$type & appln_auth%in%input$PO)
validate(no_data(nrow(newdata)))
#yvar <- get(yvar1()) (failed attempt at making this work)
#yvar <- get(input$yvariable1) (another failed attempt)
scatterPlot <- plot_ly(newdata, x = ~appln_filing_year, y = ~input$yvariable1, type="scatter", mode="markers",
# Hover text:
text = ~paste(some text),
color = ~appln_auth)
})
}
But I can't get it to work. In the original ggplot2 version it was entered as aes(x = appln_filing_year, y = !!yvar1(), bla bla)
But the !! or even one ! or removing the brackets after yvar1 all throw up errors in plotly.
Does anyone have any suggestions?
Here is a simple example using get:
library(shiny)
library(plotly)
DF <- setNames(data.frame(rep(1:20, 5), mapply(runif, min = 1:5, max = 2:6, MoreArgs = list(n = 20))), c("x", paste0("y", 1:5)))
library(shiny)
ui <- fluidPage(
plotlyOutput("myPlot"),
selectInput("yvariable", "Select the Y variable", paste0("y", 1:5))
)
server <- function(input, output, session) {
output$myPlot <- renderPlotly({
req(input$yvariable)
plot_ly(data = DF, x = ~x, y = ~get(input$yvariable), type = "scatter", mode = "markers")
})
}
shinyApp(ui, server)
I'm trying to make a flexdashboard using IMDb data, that has an interactive jitter plot where you can change the x and y for visualizing hierarchical clustering result. The code that I've already made can change only the x and number of k. I think I should use reactive function but I don't really understand in using that. I've already tried many other ways from youtube and some documentary but still can't change the y. Here is layout of my dashboard, The y stuck at the runtime variable
data=df %>%
select(Rating, Votes, Gross, Runtime, Metascore)
selectInput("x", label = "X : ",choices = names(data))
selectInput("y", label = "Y : ",choices = names(data))
sliderInput('k',"Cluster",min = 2,max = 10, value = 6)
selectedData=reactive({
data %>% select(input$x, input$y)
})
data_scaled=scale(data)
dist_data=dist(data_scaled, method='euclidean')
hc_data=hclust(dist_data, method = "average")
renderPlot({
ggplot(selectedData(),
aes(x=!!rlang::sym(input$x), y=!!rlang::sym(input$y),
col=factor(cutree(hc_data, k=input$k))))+
geom_jitter(size=5, alpha=0.5 )+
labs(col="Cluster")
})
Here is an alternative example that seems to work, using the diamonds dataset from ggplot2. My guess is that the scaling and clustering steps take so long to run that the the y reactive only appears not to work. I would suggest pre-processing your data if app run times are a problem.
data=diamonds[1:1e3,] %>%
dplyr::select(where(is.numeric))
selectInput("x", label = "X : ",choices = names(data))
selectInput("y", label = "Y : ",choices = names(data))
sliderInput('k',"Cluster",min = 2,max = 10, value = 6)
data_scaled=scale(data)
dist_data=dist(data_scaled, method='euclidean')
hc_data=hclust(dist_data, method = "average")
renderPlot({
ggplot(data,
aes(x=!!rlang::sym(input$x), y=!!rlang::sym(input$y),
col=factor(cutree(hc_data, k=input$k))))+
geom_jitter(size=5, alpha=0.5 )+
labs(col="Cluster")
})
I am trying to make an interactive stock performance plot from R. It is to compare the relative performance of several stocks. Each stock's performance line should start at 0%.
For static plots I would use dplyr group_by and mutate to calculate performance (see my code).
With ggplot2 and plotly/ggplotly, rangeslider() allows to interactively select the x-axis range. Now I'd like performance to be starting at 0 from any start range selected.
How can I either move the dplyr calculation into the plotting or have a feedback loop to recalculate as the range is changed?
Ideally it should be usable in static RMarkdown HTML. Alternatively I'd also switch to Shiny.
I tried several options for rangeslider. Also I tried with ggplot stat_function but could not achieve the desired result. Also I found dygraphs which has dyRangeSelector. But also here I face the same problem.
This is my code:
library(plotly)
library(tidyquant)
stocks <- tq_get(c("AAPL", "MSFT"), from = "2019-01-01")
range_from <- as.Date("2019-02-01")
stocks_range <- stocks %>%
filter(date >= range_from) %>%
group_by(symbol) %>%
mutate(performance = adjusted/first(adjusted)-1)
p <- stocks_range %>%
ggplot(aes(x = date, y = performance, color = symbol)) +
geom_line()
ggplotly(p, dynamicTicks = T) %>%
rangeslider(borderwidth = 1) %>%
layout(hovermode = "x", yaxis = list(tickformat = "%"))
If you do not want to use shiny, you can either use the dyRebase option in dygraphs, or you have to insert custom javascript code in plotly. In both examples, I rebase to one, not zero.
Option 1: with dygraphs
library(dygraphs)
library(tidyquant)
library(timetk)
library(tidyr)
stocks <- tq_get(c("AAPL", "MSFT"), from = "2019-01-01")
stocks %>%
dplyr::select(symbol, date, adjusted) %>%
tidyr::spread(key = symbol, value = adjusted) %>%
timetk::tk_xts() %>%
dygraph() %>%
dyRebase(value = 1) %>%
dyRangeSelector()
Note that `dyRebase(value = 0) does not work.
Option 2: with plotly using event handlers. I try to avoid ggplotly, hence my plot_ly solution. Here the time selection is just by zooming, but I think it can be done by a range selector as well. The javascript code in onRenderRebaseTxt rebases every trace to the first visible data point (taking care of possible missing values). It is only called with the relayout event, hence the first rebasing must be done before the plot.
library(tidyquant)
library(plotly)
library(htmlwidgets)
library(dplyr)
stocks <- tq_get(c("AAPL", "MSFT"), from = "2019-01-01")
pltly <-
stocks %>%
dplyr::group_by(symbol) %>%
dplyr::mutate(adjusted = adjusted / adjusted[1L]) %>%
plotly::plot_ly(x = ~date, y = ~adjusted, color = ~symbol,
type = "scatter", mode = "lines") %>%
plotly::layout(dragmode = "zoom",
datarevision = 0)
onRenderRebaseTxt <- "
function(el, x) {
el.on('plotly_relayout', function(rlyt) {
var nrTrcs = el.data.length;
// array of x index to rebase to; defaults to zero when all x are shown, needs to be one per trace
baseX = Array.from({length: nrTrcs}, (v, i) => 0);
// if x zoomed, increase baseX until first x point larger than x-range start
if (el.layout.xaxis.autorange == false) {
for (var trc = 0; trc < nrTrcs; trc++) {
while (el.data[[trc]].x[baseX[trc]] < el.layout.xaxis.range[0]) {baseX[trc]++;}
}
}
// rebase each trace
for (var trc = 0; trc < nrTrcs; trc++) {
el.data[trc].y = el.data[[trc]].y.map(x => x / el.data[[trc]].y[baseX[trc]]);
}
el.layout.yaxis.autorange = true; // to show all traces if y was zoomed as well
el.layout.datarevision++; // needs to change for react method to show data changes
Plotly.react(el, el.data, el.layout);
});
}
"
htmlwidgets::onRender(pltly, onRenderRebaseTxt)
I found a solution with plotly_relayout which reads out the visible x-axis range. This is used to recompute the performance. It works as a Shiny app. Here's my code:
library(shiny)
library(plotly)
library(tidyquant)
library(lubridate)
stocks <- tq_get(c("AAPL", "MSFT"), from = "2019-01-01")
ui <- fluidPage(
titlePanel("Rangesliding performance"),
mainPanel(
plotlyOutput("plot")
)
)
server <- function(input, output) {
d <- reactive({ e <- event_data("plotly_relayout")
if (is.null(e)) {
e$xaxis.range <- c(min(stocks$date), max(stocks$date))
}
e })
stocks_range_dyn <- reactive({
s <- stocks %>%
group_by(symbol) %>%
mutate(performance = adjusted/first(adjusted)-1)
if (!is.null(d())) {
s <- s %>%
mutate(performance = adjusted/nth(adjusted, which.min(abs(date - date(d()$xaxis.range[[1]]))))-1)
}
s
})
output$plot <- renderPlotly({
plot_ly(stocks_range_dyn(), x = ~date, y = ~performance, color = ~symbol) %>%
add_lines() %>%
rangeslider(start = d()$xaxis.range[[1]], end = d()$xaxis.range[[2]], borderwidth = 1)
})
}
shinyApp(ui = ui, server = server)
Definign the start/end of the rangeslider only works with plot_ly, not with a ggplot object converted by ggplotly. I am unsure if this is a bug, therefore opened an issue on Github.
I am attempting to use some public information to produce a heat-map of Canada for some labor statistics. Using the spacial files from the census, and data from Statistics Canada (these are large zip files that are not necessary to dig into). Below is a working example that illustrates both the problems I am having with little relative change between regions( though there may be a big absolute change between periods, and the slow draw time.To get this to work, you need to download the .zip file from the census link and unzip the files to a data folder.
library(shiny)
library(maptools)
library(ggplot2)
require(reshape2)
library(tidyr)
library(maptools)
library(ggplot2)
library(RColorBrewer)
ui <- fluidPage(
titlePanel("heatmap"),
# Sidebar with a slider input for year of interest
sidebarLayout(
sidebarPanel(
sliderInput("year",h3("Select year or push play button"),
min = 2000, max = 2002, step = 1, value = 2000,
animate = TRUE)
),
# Output of the map
mainPanel(
plotOutput("unemployment")
)
)
)
server <- function(input, output) {
#to get the spacial data: from file in link above
provinces<-maptools::readShapeSpatial("data/gpr_000a11a_e.shp")
data.p<- ggplot2::fortify(provinces, region = "PRUID")
data.p<-data.p[which(data.p$id<60),]
#dataframe with same structure as statscan csv after processing
unem <- runif(10,min=0,max=100)
unem1 <- unem+runif(1,-10,10)
unem2 <- unem1+runif(1,-10,10)
unemployment <- c(unem,unem1,unem2)
#dataframe with same structure as statscan csv after processing
X <- data.frame("id" = c(10,11,12,13,24,35,46,47,48,59,
10,11,12,13,24,35,46,47,48,59,
10,11,12,13,24,35,46,47,48,59),
"Unemployment" = unemployment,
"year" = c(rep(2000,10),rep(2001,10),rep(2002,10))
)
plot.data<- reactive({
a<- X[which(X$year == input$year),]
return(merge(data.p,a,by = "id"))
})
output$unemployment <- renderPlot({
ggplot(plot.data(),
aes(x = long, y = lat,
group = group , fill =Unemployment)) +
geom_polygon() +
coord_equal()
})
}
# Run the application
shinyApp(ui = ui, server = server)
Any help with either of the issues would be greatly appreciated
For this type of animation it is much faster to use leaflet instead of ggplot as leaflet allows you to only re-render the polygons, not the entire map.
I use two other tricks to speed up the animation:
I join the data outside of the reactive. Within the reactive it is just a simple subset. Note, the join could be done outside of the app and read in as a pre-processed .rds file.
I simplify the polygons with the rmapshaper package to reduce drawing time by leaflet. Again, this could be done outside the app to reduce loading time at the start.
The animation could likely be even more seamless if you use circles (i.e. centroid of each province) instead of polygons. Circle size could vary with Unemployment value.
Note, you need the leaflet, sf, dplyr and rmapshaper packages for this approach.
library(shiny)
library(dplyr)
library(leaflet)
library(sf)
library(rmapshaper)
ui <- fluidPage(
titlePanel("heatmap"),
# Sidebar with a slider input for year of interest
sidebarLayout(
sidebarPanel(
sliderInput("year",h3("Select year or push play button"),
min = 2000, max = 2002, step = 1, value = 2000,
animate = TRUE)
),
# Output of the map
mainPanel(
leafletOutput("unemployment")
)
)
)
server <- function(input, output) {
#to get the spacial data: from file in link above
data.p <- sf::st_read("input/gpr_000a11a_e.shp") %>%
st_transform(4326) %>%
rmapshaper::ms_simplify()
data.p$PRUID <- as.character(data.p$PRUID) %>% as.numeric
data.p <- data.p[which(data.p$PRUID < 60),]
lng.center <- -99
lat.center <- 60
zoom.def <- 3
#dataframe with same structure as statscan csv after processing
unem <- runif(10,min=0,max=100)
unem1 <- unem+runif(1,-10,10)
unem2 <- unem1+runif(1,-10,10)
unemployment <- c(unem,unem1,unem2)
#dataframe with same structure as statscan csv after processing
X <- data.frame("id" = c(10,11,12,13,24,35,46,47,48,59,
10,11,12,13,24,35,46,47,48,59,
10,11,12,13,24,35,46,47,48,59),
"Unemployment" = unemployment,
"year" = c(rep(2000,10),rep(2001,10),rep(2002,10))
)
data <- left_join(data.p, X, by = c("PRUID"= "id"))
output$unemployment <- renderLeaflet({
leaflet(data = data.p) %>%
addProviderTiles("OpenStreetMap.Mapnik", options = providerTileOptions(opacity = 1), group = "Open Street Map") %>%
setView(lng = lng.center, lat = lat.center, zoom = zoom.def) %>%
addPolygons(group = 'base',
fillColor = 'transparent',
color = 'black',
weight = 1.5) %>%
addLegend(pal = pal(), values = X$Unemployment, opacity = 0.7, title = NULL,
position = "topright")
})
get_data <- reactive({
data[which(data$year == input$year),]
})
pal <- reactive({
colorNumeric("viridis", domain = X$Unemployment)
})
observe({
data <- get_data()
leafletProxy('unemployment', data = data) %>%
clearGroup('polygons') %>%
addPolygons(group = 'polygons',
fillColor = ~pal()(Unemployment),
fillOpacity = 0.9,
color = 'black',
weight = 1.5)
})
}
# Run the application
shinyApp(ui = ui, server = server)
I didn't find the drawing time to be unreasonably long at ~2-3 seconds, which for a 2.4mb shapefile seems about right. It takes just as long outside shiny as it does in the app on my machine, anyway.
To hold a constant colour gradient you can specify limits in scale_fill_gradient which will hold the same gradient despite changes to your maps:
output$unemployment <- renderPlot({
ggplot(plot.data(),
aes(x = long, y = lat,
group = group , fill =Unemployment)) +
geom_polygon() +
scale_fill_gradient(limits=c(0,100)) +
coord_equal()
})
Goal: I am trying to create a shiny app that displays (1) the stressplot of a non-metric multidimensional scaling solution, (2) a ggplot of the point configuration, and (3) the results of clustering the point configuration by plotting the point configuration and superimposing chulls of the clustering.
Problem: The first two plots work without difficulty. Instead of a third plot, I get the error: 'data' must be of a vector type, was 'NULL'
I would appreciate any advice on how to resolve the specific problem, i.e. "error in array: 'data' must be of a vector type, was 'NULL'"
I would also appreciate any general advice on how to debug shiny. My only strategy is to treat the code like it isn't reactive code, and I suspect that this strategy isn't terribly effective.
My attempt to solve: I've searched the error on rseek and stack overflow and reviewed the posts. In some of the cases with similar errors the problem was that necessary data wasn't being calculated. I went through the code, treated it as normal (non-reactive) code, and used fake data. When I did this I didn't have any problem, so I assume it is something about the reactivity? Question 2 about how to debug is a reaction to the fact that trying to debug like the code wasn't dynamic didn't identify the problem.
Reproducible Example: I put together a shiny app that has randomly generated data. Before doing the testing I updated R and all the packages I use.
# Packages and options
library(shiny)
library(vegan)
library(cluster)
library(tidyverse)
options(digits = 3)
# Create dissimilarity matrix
d <- rnorm(1000)
mat <- matrix(d, ncol = 10)
diss_m <- daisy(mat) %>% as.matrix()
# Function
find_chulls <- function(df, x, y) {
ch <- chull(df[[x]], df[[y]])
df[ch,] %>% as.data.frame()
}
ui <- fluidPage(
titlePanel("Research"),
sidebarLayout(
sidebarPanel(
numericInput('dim', 'Dimensions', 2, min = 2, max = 15)
),
mainPanel(
h3('Stressplot'),
plotOutput('plot0'),
h3('Non-Metric Multidimensional Scaling'),
plotOutput('plot1'),
h3('2d Density Plot'),
plotOutput('plot2'),
h3('Cluster Analysis'),
plotOutput('plot3')
)
)
)
server <- function(input, output, session) {
nmds <- reactive({
metaMDS(diss_m,
distance = "euclidean",
k = input$dim,
trymax = 200,
autotransform = FALSE,
noshare = FALSE,
wascores = FALSE)
})
output$plot0 <- renderPlot({
stressplot(nmds())
})
pts <- reactive({
nmds()$points %>% as.data.frame()
})
output$plot1 <- renderPlot({
ggplot(pts(), aes(x = MDS1, y = MDS2)) +
geom_point()
})
output$plot2 <- renderPlot({
ggplot(pts(), aes(x = MDS1, y = MDS2)) +
geom_point() +
geom_density2d()
})
df_cl <- reactive({
km <- kmeans(x = pts(), centers = input$clust)
cl <- km$cluster
data.frame(pts(), clust = cl)
})
df_ch <- reactive({
df_ch_temp <- df_cl() %>% group_by(clust) %>% do(find_chulls(., 1, 2))
df_ch_temp %>% as.data.frame()
})
The plot below is the one that doesn't work
output$plot3 <- renderPlot({
ggplot(df_ch(), aes(x = MDS1, y = MDS2, fill = as.factor(clust))) + geom_polygon(alpha = 0.10)
})
}
# Run the application
shinyApp(ui = ui, server = server)
Your input$clust is undefined in:
df_cl <- reactive({
km <- kmeans(x = pts(), centers = input$clust)
cl <- km$cluster
data.frame(pts(), clust = cl)
})
You need to add an input binding for clust, e.g.:
numericInput('clust', 'Clusters', 2, min = 2, max = 15)
As for debugging: I added browser() at the top in df_cl, then execution stops and you can inspect variables and run code in the terminal (e.g. in Rstudio). When I ran km <- kmeans(x = pts(), centers = input$clust) I got the error you described and could then see that input contains no clust element.