r dplyr non standard evaluation - ordering bar plot in a function - r

I have read http://dplyr.tidyverse.org/articles/programming.html about non standard evaluation in dplyr but still can't get things to work.
plot_column <- "columnA"
raw_data %>%
group_by(.dots = plot_column) %>%
summarise (percentage = mean(columnB)) %>%
filter(percentage > 0) %>%
arrange(percentage) %>%
# mutate(!!plot_column := factor(!!plot_column, !!plot_column))%>%
ggplot() + aes_string(x=plot_column, y="percentage") +
geom_bar(stat="identity", width = 0.5) +
coord_flip()
works fine when the mutate statement is disabled. However, when enabling it in order to order the bars by height only a single bar is returned.
How can I convert the statement above into a function / to use a variable but still plot multiple bars ordered by their size.
An example Dataset could be:
columnA,columnB
a, 1
a, 0.4
a, 0.3
b, 0.5
edit
a sample:
mtcars %>%
group_by(mpg) %>%
summarise (mean_col = mean(cyl)) %>%
filter(mean_col > 0) %>%
arrange(mean_col) %>%
mutate(mpg := factor(mpg, mpg))%>%
ggplot() + aes(x=mpg, y=mean_col) +
geom_bar(stat="identity")
coord_flip()
will output an ordered bar chart.
How can I wrap this into a function where the column can be replaced and I get multiple bars?

This works with dplyr 0.7.0 and ggplot 2.2.1:
rm(list = ls())
library(ggplot2)
library(dplyr)
raw_data <- tibble(columnA = c("a", "a", "b", "b"), columnB = c(1, 0.4, 0.3, 0.5))
plot_col <- function(df, plot_column, val_column){
pc <- enquo(plot_column)
vc <- enquo(val_column)
pc_name <- quo_name(pc) # generate a name from the enquoted statement!
df <- df %>%
group_by(!!pc) %>%
summarise (percentage = mean(!!vc)) %>%
filter(percentage > 0) %>%
arrange(percentage) %>%
mutate(!!pc_name := factor(!!pc, !!pc)) # insert pc_name here!
ggplot(df) + aes_(y = ~percentage, x = substitute(plot_column)) +
geom_bar(stat="identity", width = 0.5) +
coord_flip()
}
plot_col(raw_data, columnA, columnB)
plot_col(mtcars, mpg, cyl)
Problem I ran into was kind of that ggplot and dplyr use different kinds of non-standard evaluation. I got the answer at this question: Creating a function using ggplot2 .
EDIT: parameterized the value column (e.g. columnB/cyl) and added mtcars example.

Related

Cannot conditionally make axis labels bold in ggplot

I am trying to make select axis labels bold, based on a conditional from a different column. In the code below, if Signif equals 1, then the Predictor axis text should be bold. In addition, the segments should appear in the order of Values increasing value.
However, this code is not changing any of the the axis texts to bold.
library(tidyverse)
library(ggtext)
library(glue)
df <- tibble(Predictor = c("Blue","Red","Green","Yellow"),
Value = c(1,3,2,4),
Signif = c(0,1,0,1))
df <- df %>% mutate(Predictor = ifelse(Signif==1,
glue("<span style = 'font-weight:bold'>{Predictor}</span>"),
glue("<span style = 'font-weight:plain'>{Predictor}</span>"))
)
df %>%
arrange(desc(Value)) %>%
mutate(Predictor=factor(Predictor,
levels=df$Predictor[order(df$Value)])) %>%
ggplot(aes(x=Predictor, y=Value)) +
geom_segment(aes(xend=Predictor, yend=0)) +
theme(axis.text.x = element_markdown())
If instead I use element_text() in the last line, and skip the mutate to markdown step above:
df %>%
arrange(desc(Value)) %>%
mutate(Predictor=factor(Predictor,
levels=df$Predictor[order(df$Value)])) %>%
ggplot(aes(x=Predictor, y=Value)) +
geom_segment(aes(xend=Predictor, yend=0)) +
theme(axis.text.x = element_text(face = if_else(df$Signif==1, "bold", "plain")))
It bolds the 2nd and 4th axis label, which corresponds to the Signif equals 1 in the original df.
How can I get the correct axis text labels to appear in bold?
I would’ve expected your code to work honestly, but you can use <b> instead of <span style...>:
library(tidyverse)
library(ggtext)
library(glue)
df <- df %>%
mutate(Predictor = ifelse(Signif==1,
glue("<b>{Predictor}</b>"),
Predictor))
df %>%
arrange(desc(Value)) %>%
mutate(Predictor=factor(Predictor,
levels=df$Predictor[order(df$Value)])) %>%
ggplot(aes(x=Predictor, y=Value)) +
geom_segment(aes(xend=Predictor, yend=0)) +
theme(axis.text.x = element_markdown())
The issue is that ggtext currently supports only a limited set of CSS properties. From the docs
The CSS properties color, font-size, and font-family are currently supported.
But if you just want to have bold text the answer by #zephryl is the way to go or as a second option use markdown, i.e. wrap inside **:
library(ggplot2)
library(dplyr)
library(ggtext)
library(glue)
df <- df %>%
mutate(Predictor = ifelse(Signif == 1,
glue("**{Predictor}**"),
Predictor
))
df %>%
arrange(desc(Value)) %>%
mutate(Predictor = factor(Predictor,
levels = df$Predictor[order(df$Value)]
)) %>%
ggplot(aes(x = Predictor, y = Value)) +
geom_segment(aes(xend = Predictor, yend = 0)) +
theme(axis.text.x = element_markdown())
All nice answers. The deeper underlying issue however is (not exactly) "hidden" in the warning that comes when you use a vector in a theme element (see plot below).
Your original plot code would work if you would first rearrange your data frame (and re-assign it!) - see code below. I do not encourage that - ggtext::element_markdown was designed exactly with the idea in mind to avoid the use of vectors in theme.
library(tidyverse)
df <- tibble(Predictor = c("Blue","Red","Green","Yellow"),
Value = c(1,3,2,4),
Signif = c(0,1,0,1))
df <- arrange(df, Predictor)
df %>%
arrange(desc(Value)) %>%
mutate(Predictor=factor(Predictor,
levels=df$Predictor[order(df$Value)])) %>%
ggplot(aes(x=Predictor, y=Value)) +
geom_segment(aes(xend=Predictor, yend=0)) +
theme(axis.text.x = element_text(face = if_else(df$Signif==1, "bold", "plain")))
#> Warning: Vectorized input to `element_text()` is not officially supported.
#> ℹ Results may be unexpected or may change in future versions of ggplot2.
Created on 2023-01-29 with reprex v2.0.2

Timeseries graphs of mean values of group in R

I am learning R and dealing a data set of with multiple repetitive columns, say 200 times as given columns are repeated 200 times.
I want to take mean of each column and the group the mean of each variable. So there will be 200 values of mean of each variable. I want to make a line chart like this of mean values of each variable.
I am trying these codes
library(data.table)
library(tidyverse)
library(ggplot2)
library(viridisLite)
df <- read.table("H-W.csv", sep = ",")
df
dat %>% filter(Scenario != 'NULL') %>%
mutate("Scenario" = ifelse(Scenario == 'NULL2', "BASELINE", Scenario)) %>%
group_by(.dots = c("X.step.", "Scenario")) %>%
summarise('height.people' = mean(height),
'weight.people' = mean(weight),
"wealth.people" = mean(wealth)) %>%
pivot_longer(c('height.people', 'weight.people', 'wealth.people')) %>%
ggplot(aes(x = X.step., y = value, colour = Scenario)) +
geom_line(size = 1) + facet_grid(name~., scales = "free_y") + theme_classic() +
scale_colour_viridis_d() + scale_y_log10()
I found this error
Error in UseMethod("filter") :
no applicable method for 'filter' applied to an object of class "NULL"
I think you might have the same problem as this...
Is your data in a data.frame or tibble?
Other wise if that doesn't work try this...
filter is a function in stats and dplyr,
so you could try changing
dat %>% filter(Scenario != 'NULL') %>%
to
dat %>% dplyr::filter(Scenario != "NULL") %>%

Plot percentages in R as blocks

I have the table to the left
table <- cbind(c("x1","x2", "x3"), c("0.4173","0.9211","0.0109"))
and is trying to make the plot two the right.
Is there any packages in R, which can do, what I'm trying to achieve?
A base R, option would be to use barplot applied on a named vector
barplot(v1)
Or convert to two column data.frame with stack and use the formula method
barplot(values ~ ind, stack(v1))
Or we can can use tidyverse with ggplot
library(dplyr)
library(ggplot2)
library(tidyr)
library(tibble)
enframe(v1, name = "id", value = 'block') %>%
mutate(non_block = 1 - block) %>%
pivot_longer(cols = -id) %>%
ggplot(aes(x = id, y = value, fill = name)) +
geom_col() +
coord_flip() +
theme_bw()
-output
data
v1 <- setNames(c(0.4173, 0.9211, 0.0109), paste0("x", 1:3))

Put dplyr & ggplot in Loop/Apply

I'm newish to R programming and am trying to standardise, or generalise, a piece of code so that I apply it to different data exports of the same structure. The code is trivial, but I am having trouble getting getting it to loop:
Here is my code:
plot <- data %>%
group_by(Age, ID) %>%
summarise(Rev = sum(TotalRevenue)) %>%
ggplot(aes(
x = AgeGroup,
y = Rev,
fill = AgeGroup
)) +
geom_col(alpha = 0.9) +
theme_minimal()
I want to generalise the code so that I can switch out 'Age' w/ variables I put into a list. Here is my amateur code:
cols <- c(data$Col1, data$Col2) #Im pretty sure this is wrong
for (i in cols) {
plot <- data %>%
group_by(i, ID) %>%
summarise(Rev = sum(TotalRevenue)) %>%
ggplot(aes(
x = AgeGroup,
y = Rev,
fill = AgeGroup
)) +
geom_col(alpha = 0.9) +
theme_minimal()
}
And this doesn't work. The datasets I will be receiving will have the same variables, just different observations and so standardising this process will be a lifesaver.
Thanks in advance.
You were probably trying to do :
library(dplyr)
library(rlang)
cols <- c('col1', 'col2')
plot_list <- lapply(cols, function(i)
data %>%
group_by(!!sym(i), ID) %>%
summarise(Rev = sum(TotalRevenue)) %>%
ggplot(aes(x = AgeGroup,y = Rev,fill = AgeGroup)) +
geom_col(alpha = 0.9) + theme_minimal())
This will return you list of plots which can be accessed as plot_list[[1]], plot_list[[2]] etc. Also look into facets to combine multiple plots.

Pass column names to a function

How can I turn this ggplot() call into a function? I can't figure out how to get R to recognize the column names I want to pass to the function. I've come across several similar sounding questions, but I've not had success adapting ideas. See here for substitute().
# setup
library(dplyr)
library(ggplot2)
set.seed(205)
dat = data.frame(t=rep(1:2, each=10),
pairs=rep(1:10,2),
value=rnorm(20))
# working example
ggplot(dat %>% group_by(pairs) %>%
mutate(slope = (value[t==2] - value[t==1])/(2-1)),
aes(t, value, group=pairs, colour=slope > 0)) +
geom_point() +
geom_line() +
stat_summary(fun.y=mean,geom="line",lwd=2,aes(group=1))
# attempt at turning into a function
plotFun <- function(df, groupBy, dv, time) {
groupBy2 <- substitute(groupBy)
dv2 <- substitute(dv)
time2 <- substitute(time)
ggplot(df %>% group_by(groupBy2) %>%
mutate(slope = (dv2[time2==2] - dv2[time2==1])/(2-1)),
aes(time2, dv2, group=groupBy2, colour=slope > 0)) +
geom_point() +
geom_line() +
stat_summary(fun.y=mean,geom="line",lwd=2,aes(group=1))
}
# error time
plotFun(dat, pairs, value, t)
Update
I took #joran's advice to look at this answer, and here's what I came up with:
library(dplyr)
library(ggplot2)
library(lazyeval)
plotFun <- function(df, groupBy, dv, time) {
ggplot(df %>% group_by_(groupBy) %>%
mutate_(slope = interp(~(dv2[time2==2] - dv2[time2==1])/(2-1),
dv2=as.name(dv),
time2=as.name(time))),
aes(time, dv, group=groupBy, colour=slope > 0)) +
geom_point() +
geom_line() +
stat_summary(fun.y=mean,geom="line",lwd=2,aes(group=1))
}
plotFun(dat, "pairs", "value", "t")
The code runs but the plot is not correct:
geom_path: Each group consists of only one observation. Do you need to
adjust the group aesthetic?
Here's the working solution informed by all of the commenters:
# setup
library(dplyr)
library(ggplot2)
library(lazyeval)
set.seed(205)
dat = data.frame(t=rep(1:2, each=10),
pairs=rep(1:10,2),
value=rnorm(20))
# function
plotFun <- function(df, groupBy, dv, time) {
ggplot(df %>% group_by_(groupBy) %>%
mutate_(slope = interp(~(dv2[time2==2] - dv2[time2==1])/(2-1),
dv2=as.name(dv),
time2=as.name(time))),
aes_string(time, dv, group = groupBy,
colour = 'slope > 0')) +
geom_point() +
geom_line() +
stat_summary(fun.y=mean,geom="line",lwd=2,aes(group=1))
}
# plot
plotFun(dat, "pairs", "value", "t")

Resources