R/Shiny: Group by and Summarise on column selected in dropdown? - r

I am very new to R and Shiny, so I apologize in advance if this is too basic.
I am trying to render a bar graph where I select the X and Y off dropdowns. I need to aggregate by average values of Y for each X I choose. When I ran the below code, I got an error stating "Column 'y' not found". I cannot pass the actual column name selected as that can change. How do I solve this issue?
output$MultivariatePlot <- renderPlotly({
if (input$Apply > 0){
isolate({
req(data$Policies)
x <- input$MultivariateX
y <- input$MultivariateY
rv$g <- data$Policies %>%
group_by(y) %>% summarise(y = mean(y)) %>%
ggplot2::ggplot(aes_(x = sym(x), y = sym(y))) +
# ggplot2::stat_summary(fun.y = "mean", geom = "bar") +
ggplot2::geom_bar(stat='identity') +
ggplot2::scale_fill_manual(values = rara::ColorSelect(2)) +
ggplot2::theme_classic() +
ggplot2::theme(panel.grid.major = element_line(color = 'gray80', linetype = 'longdash', size = 0.3)) +
ggplot2::labs(title = paste('Comparison Between', x, 'and', y))
})

You can use !!sym(x) in places that you would normally use the symbol version of x, and !!sym(y) in places where you would normally use the symbol version of y.
This turns your code into the following:
rv$g <- data$Policies %>%
group_by(!!sym(x)) %>%
summarise(!!sym(y) := mean(!!sym(y))) %>%
ggplot2::ggplot(aes(x = !!sym(x), y = !!sym(y))) +
ggplot2::geom_bar(stat='identity') +
# etc
There's one last complication in there, which is that you had to use := instead of = when it's to the right of a sym(). (Also note that aes_ isn't usually recommended anymore, so I used x = !!sym(x) in regular aes(), and geom_col() is a shortcut for ggplot2::geom_bar(stat='identity')).
Here's a reproducible example of the above, which takes two strings for x and y and aggregates + plots them:
x <- "cyl"
y <- "mpg"
mtcars %>%
group_by(!!sym(x)) %>%
summarise(!!sym(y) := mean(!!sym(y))) %>%
ggplot2::ggplot(aes(x = !!sym(x), y = !!sym(y))) +
ggplot2::geom_col()

Related

How to return either a vector or string based on condition in ifelse statement?

I am trying to write a function that creates a scatterplot - of which the points may need to be colored based on a variable or not.
I tried the following approach. But it doesn't color the points by group. Although the code runs fine without the ifelse statement.
data <- data.frame(x = rnorm(100,sd=2),
y1 = x*0.5+rnorm(100,sd=1),
y2 = fitted(lm(y~x))) %>%
pivot_longer(cols = -x,
names_to = "Group",
values_to = "yy")
group <- "Group"
ygroups <- 2
defaultcol = "black"
ggplot(data = data, mapping = aes(x = x , y = yy,
color = ifelse(ygroups > 1, get(group), defaultcol))) +
geom_point()
# runs fine
ggplot(data = data, mapping = aes(x = x , y = yy, color = get(group))) +
geom_point()
You don't want to use ifelse in this case because you need to return vectors of different length that your input. Just use a regular if/else
ggplot(data = data) +
aes(x = x , y = yy, color = if(ygroups > 1) get(group) else defaultcol) +
geom_point() +
labs(color="Color")
But you can't set selecific default colors in an aes(color=) -- that will remap the color name via your color scale. If you just want to conditionally add the scale, then do
ggplot(data = data) +
aes(x = x , y = yy) +
{if( ygroups > 1) aes(color=.data[[group]])} +
geom_point()
(using .data[[ ]] is recommended over using get())

Use scale_x_continuous with labeller function that also takes a data frame as an argument as well as default breaks

Here's a code block:
# scale the log of price per group (cut)
my_diamonds <- diamonds %>%
mutate(log_price = log(price)) %>%
group_by(cut) %>%
mutate(scaled_log_price = scale(log_price) %>% as.numeric) %>% # scale within each group as opposed to overall
nest() %>%
mutate(mean_log_price = map_dbl(data, ~ .x$log_price %>% mean)) %>%
mutate(sd_log_price = map_dbl(data, ~ .x$log_price %>% sd)) %>%
unnest %>%
select(cut, price, price_scaled:sd_log_price) %>%
ungroup
# for each cut, find the back transformed actual values (exp) of each unit of zscore between -3:3
for (i in -3:3) {
my_diamonds <- my_diamonds %>%
mutate(!! paste0('mean_', ifelse(i < 0 , 'less_', 'plus_'), abs(i), 'z') := map2(.x = mean_log_price, .y = sd_log_price, ~ (.x + (i * .y)) %>% exp) %>% unlist)
}
my_diamonds_split <- my_diamonds %>% group_split(cut)
split_names <- my_diamonds %>% mutate(cut = as.character(cut)) %>% group_keys(cut) %>% pull(cut)
names(my_diamonds_split) <- split_names
I now have a variable my_diamonds_split that is a list of data frames. I would like to loop over these data frames and each time create a new ggplot.
I can use a custom labeller function with a single df, but I don't know how to do this within a loop:
labeller <- function(x) {
paste0(x,"\n", scales::dollar(sd(ex_df$price) * x + mean(ex_df$price)))
}
ex_df <- my_diamonds_split$Ideal
ex_df %>%
ggplot(aes(x = scaled_log_price)) +
geom_density() +
scale_x_continuous(label = labeller, limits = c(-3, 3))
This creates a plot for the 'Ideal' cut of diamonds. I also get two data points on the x axis, the zscore values at -2, 0 and 2 as well as the raw dollar values of 3.8K, 3.9K and 11.8K.
When I define the labeller function, I must specify the df to scale with. Tried instead with placing the dot instead of my_df, hoping that on each iteration ggplot would get the value of the df on any iteration:
labeller <- function(x) {
paste0(x,"\n", scales::dollar(sd(.$price) * x + mean(.$price)))
}
ex_df <- my_diamonds_split$Ideal
ex_df %>%
ggplot(aes(x = scaled_log_price)) +
geom_density() +
scale_x_continuous(label = labeller, limits = c(-3, 3))
Returns:
Error in is.data.frame(x) : object '.' not found
I then tried writing the function to accept an argument for the df to scale with:
labeller <- function(x, df) {
paste0(x,"\n", scales::dollar(sd(df$price) * x + mean(df$price)))
}
ex_df <- my_diamonds_split$Ideal
ex_df %>%
ggplot(aes(x = scaled_log_price)) +
geom_density() +
scale_x_continuous(label = labeller(df = ex_df), limits = c(-3, 3)) # because when it comes to running in real life, I will try something like labeller(df = my_diamonds_split[[i]])
Error in paste0(x, "\n", scales::dollar(sd(df$price) * x + mean(df$price))) :
argument "x" is missing, with no default
Bearing in mind that the scaling must be done per iteration, how could I loop over my_diamonds_split, and on each iteration generate a ggplot per above?
labeller <- function(x) {
# how can I make df variable
paste0(x,"\n", scales::dollar(sd(df$price) * x + mean(df$price)))
}
for (i in split_names) {
my_diamonds_split[[i]] %>%
ggplot(aes(x = scaled_log_price)) +
geom_density() +
scale_x_continuous(label = labeller, # <--- here, labeller must be defined with df$price except that will difer on each iteration
limits = c(-3, 3))
}
There's a hacky way to get this result in facets. Basically, after converting to z scores, you add different amounts (say, multiples of 1000) to each group's z scores. Then you set all the breaks to this collection of points and label them with pre-calculated labels.
library(ggplot2)
library(dplyr)
f <- function(x) {
y <- diamonds$price[diamonds$cut == x]
paste(seq(-3, 3), scales::dollar(round(mean(y) + seq(-3, 3) * sd(y))), sep = "\n")
}
breaks <- as.vector(sapply(levels(diamonds$cut), f))
diamonds %>%
group_by(cut) %>%
mutate(z = scale(price) + 3 + 1000 * as.numeric(cut)) %>%
ggplot(aes(z)) +
geom_point(aes(x = z - 2, y = 1), alpha = 0) +
geom_density() +
scale_x_continuous(breaks = as.vector(sapply(1:5 * 1000, "+", 0:6)),
labels = breaks) +
facet_wrap(vars(cut), scales = "free_x") +
theme(text = element_text(size = 16),
axis.text.x = element_text(size = 6))
You would have to increase the plot size to make the dollar values more visible of course.
Created on 2020-08-04 by the reprex package (v0.3.0)

How to plot a(n unknown) number of data series as geom_line in same chart

My first Q here, so please go lightly if I'm out of step anywhere.
I'm trying to code R to produce a single chart to contain a number of data series lines. The number of data series may vary but will be provided in the data frame. I have tried to rearrange another thread's content to print the geom_line , but not successfully.
The logic is:
#desire to replace loop of 1:5 with ncol(df)
print(ggplot(df,aes(x=time))
for (i in 1:5) {
print (+ geom_line(aes(y=df[,i]))
}
#functioning geom point loops ggplot production:
for (i in 1:5) {
print(ggplot(df,aes(x=time,y=df[,i]))+geom_point())
}
#functioning multi-line ggplot where n is explicit:
ggplot(data=df, aes(x=time), group=1) +
geom_line(aes(y=df$`3`))+
geom_line(aes(y=df$`4`))
The functioning example code produces n number of point charts, 5 in this case. I would like just one chart to contain n line series.
This may be similar to How to plot n dimensional matrix? for which there are currently no relevant answers
Any contributions much appreciated, thanks
You can use gather from tidyverse "world" to do that.
As you didn't supply a sample data I used mtcars.
I created two data.frames one with 3 columns one with 9. In each one of them I plotted all of the variables against the variable mpg.
library(tidyverse)
df3Columns <- mtcars[, 1:4]
df9Columns <- mtcars[, 1:10]
df3Columns %>%
gather(var, value, -mpg) %>%
ggplot(aes(mpg, value, group = var, color = var)) +
geom_line()
df9Columns %>%
gather(var, value, -mpg) %>%
ggplot(aes(mpg, value, group = var, color = var)) +
geom_line()
Edit - using the sample data in comments.
library(tidyverse)
df %>%
rownames_to_column("time") %>%
gather(var, value, -time) %>%
ggplot(aes(time, value, group = var, color = var)) +
geom_line()
Sample data:
df <- structure(list("39083" = c(96, 100, 100), "39090" = c(99, 100, 100), "39097" = c(99, 100, 100)), row.names = 3:5, class = "data.frame")
To strictly answer your question, you can simply store your ggplot in a variable and add the geom_line one by one:
df <- structure(list("39083" = c(96, 100, 100), "39090" = c(99, 100, 100), "39097" = c(99, 100, 100)), row.names = 3:5, class = "data.frame")
g <- ggplot(df, aes(x = 1:nrow(df)))
for (i in colnames(df))
{
g <- g + geom_line(y = df[,i])
}
g <- g + scale_y_continuous(limits = c(min(df), max(df)))
print(g)
However, this is not a very convenient solution. I would highly recommend to refactor your data frame to be more ggplot style.
df.ultimate <- data.frame(time = numeric(), value = numeric(), group = character())
for (i in colnames(df))
{
df.ultimate <- rbind(df.ultimate, data.frame(time = 1:nrow(df), value = df[, i], group = i))
}
g <- ggplot(df.ultimate, aes(x = time, y = value, color = group))
g <- g + geom_line()
print(g)
A one-line solution:
ggplot(data.frame(time = rep(1:nrow(df), ncol(df)),
value = as.vector(as.matrix(df)),
group = rep(colnames(df), each = nrow(df))),
aes(x = time, y = value, color = group)) + geom_line()

How can I loop colnames as plot titles along with data using lapply in R?

I have this function that works close to what I need -- it creates a clean table from my original raw data, makes it a ggplot, and uses lapply to run it through all the variables I want from the original table, data:
#Get colnames of all numeric varaibles
nlist <- names(data[,sapply(data,is.numeric)])
#Create function
varviz_n <- function(dat, var){
var <- dat[,which(names(dat) == var)]
title<-var
tab <- dat %>%
group_by(group = cut(var, breaks = seq(0, max(var), 10)),
groupedsupport) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n)) %>%
filter(!is.na(group),n>10)
tab2 <- tab %>%
group_by(groupedsupport) %>%
summarise(mean = mean(freq),
median = median(freq))
finaltab <- tab %>% left_join(tab2, by = "groupedsupport")
fplot <- finaltab %>%
ggplot(aes(fill=group,x=groupedsupport,y=freq)) +
geom_col(position="dodge") +
geom_text(aes(label = paste("n =",n), n = (n + 0.05)), position = position_dodge(0.9), vjust = 0, size=2) +
geom_errorbar(aes(groupedsupport, ymax = median, ymin = mean),
size=0.5, linetype = "longdash", inherit.aes = F, width = 1) +
scale_y_continuous(labels = scales::percent) +
xlab("") + ylab("") +
ggtitle(title) +
scale_fill_discrete("")
filename = filename <- paste0(finaltab$var)
ggsave(paste("Plots/",filename,".png"), width = 10, height = 7)
return(fplot)
}
#Run function
lapply(nlist, varviz_n, dat = data)
This does almost exactly what I want -- the problem is that all of the variables it's running through are 0-100 numeric and it's creating the plots but I can't at all figure out how to get the column name as the title of the plot or of the key. So I have no idea which graph is getting returned.
Can someone please help me figure out a way to get the column name from nlist to be the title of my plot? The way it is now prints out the first value of the column instead of the actual column name:
The final piece of code to save it in the 'Plots' folder doesn't work either since the title/var isn't populating correctly.
You can use something like this to create data to test out the code: data <- data.frame(v1 = sample(1:100,1000,replace=T),v2 = sample(1:100,1000,replace=T),v3 = sample(1:100,1000,replace=T),groupedsupport = sample(LETTERS[1:3],1000,replace = TRUE))
Thanks!
I think you just need to swap these steps:
var <- dat[,which(names(dat) == var)]
title <- var
should be
title <- var
var <- dat[,which(names(dat) == var)]
var being assigned to the column of selected data so when it is called again in title, it is looking at that vector and not the column name.
If this doesn't resole it, please give us some code to mimic the contents of data.

How to include an object with saved text in expression() to be used in ggplot2 graph?

I'm trying to combine mathematical symbols and objects with values saved to them to be displayed in a ggplot graph with geom_text(). Here's example code related to my problem:
# values
diff <- "0.81"
p <- "p < .01"
# approach 1) pasting in values
temp <- data.frame(condition = c("first"), value = c(2)) %>%
mutate(test = as.character(expression(atop(beta["2"] - beta["1"] == "-0.80", "p < 0.01"))))
ggplot() +
geom_bar(data = temp, aes(x = condition, y = value), stat = "identity") +
ylim(0, 5) +
geom_text(data = temp, x = 1, y = 4, aes(label = test), size = 7, parse = TRUE)
# approach 2) referring to objects with values
temp <- data.frame(condition = c("first"), value = c(2)) %>%
mutate(test = as.character(expression(atop(beta["2"] - beta["1"] == diff, p))))
ggplot() +
geom_bar(data = temp, aes(x = condition, y = value), stat = "identity") +
ylim(0, 5) +
geom_text(data = temp, x = 1, y = 4, aes(label = test), size = 7, parse = TRUE)
Approach 1 creates the graph I'm aiming for, but I want to be able to easily refer to objects to supply values to appear following the betas. If I take the current approach 2, it doesn't use the values saved to the objects, but instead just the text "diff" and "p". Is there a way to maintain the basic structure of approach 1 but using objects to create the graph I want?
I'm not sure exactly what you want to happen when you have more rows, but if you want to partially exapnd some variables, i think that's easiest to do with bquote. I pulled it out into a function because to get it to properly vectorize can be a bit tricky
mylabs <- function(diff, p) {
sapply(mapply(function(diff, p) bquote(atop(beta["2"] - beta["1"] == .(diff), .(p))), diff, p), deparse)
}
temp <- data.frame(condition = c("first"), value = c(2)) %>%
mutate(test = mylabs(diff, p))

Resources