How can I turn this ggplot() call into a function? I can't figure out how to get R to recognize the column names I want to pass to the function. I've come across several similar sounding questions, but I've not had success adapting ideas. See here for substitute().
# setup
library(dplyr)
library(ggplot2)
set.seed(205)
dat = data.frame(t=rep(1:2, each=10),
pairs=rep(1:10,2),
value=rnorm(20))
# working example
ggplot(dat %>% group_by(pairs) %>%
mutate(slope = (value[t==2] - value[t==1])/(2-1)),
aes(t, value, group=pairs, colour=slope > 0)) +
geom_point() +
geom_line() +
stat_summary(fun.y=mean,geom="line",lwd=2,aes(group=1))
# attempt at turning into a function
plotFun <- function(df, groupBy, dv, time) {
groupBy2 <- substitute(groupBy)
dv2 <- substitute(dv)
time2 <- substitute(time)
ggplot(df %>% group_by(groupBy2) %>%
mutate(slope = (dv2[time2==2] - dv2[time2==1])/(2-1)),
aes(time2, dv2, group=groupBy2, colour=slope > 0)) +
geom_point() +
geom_line() +
stat_summary(fun.y=mean,geom="line",lwd=2,aes(group=1))
}
# error time
plotFun(dat, pairs, value, t)
Update
I took #joran's advice to look at this answer, and here's what I came up with:
library(dplyr)
library(ggplot2)
library(lazyeval)
plotFun <- function(df, groupBy, dv, time) {
ggplot(df %>% group_by_(groupBy) %>%
mutate_(slope = interp(~(dv2[time2==2] - dv2[time2==1])/(2-1),
dv2=as.name(dv),
time2=as.name(time))),
aes(time, dv, group=groupBy, colour=slope > 0)) +
geom_point() +
geom_line() +
stat_summary(fun.y=mean,geom="line",lwd=2,aes(group=1))
}
plotFun(dat, "pairs", "value", "t")
The code runs but the plot is not correct:
geom_path: Each group consists of only one observation. Do you need to
adjust the group aesthetic?
Here's the working solution informed by all of the commenters:
# setup
library(dplyr)
library(ggplot2)
library(lazyeval)
set.seed(205)
dat = data.frame(t=rep(1:2, each=10),
pairs=rep(1:10,2),
value=rnorm(20))
# function
plotFun <- function(df, groupBy, dv, time) {
ggplot(df %>% group_by_(groupBy) %>%
mutate_(slope = interp(~(dv2[time2==2] - dv2[time2==1])/(2-1),
dv2=as.name(dv),
time2=as.name(time))),
aes_string(time, dv, group = groupBy,
colour = 'slope > 0')) +
geom_point() +
geom_line() +
stat_summary(fun.y=mean,geom="line",lwd=2,aes(group=1))
}
# plot
plotFun(dat, "pairs", "value", "t")
Related
Here's that illustrates the obstacle I'm facing.
library(tidyverse)
co2_list <- CO2 %>%
group_split(Type)
reprex_fun <- function(x){
x %>%
ggplot(aes(conc, uptake)) +
geom_point() +
facet_wrap(~Plant, ncol = 2)
}
lapply(co2_list, reprex_fun)
Since the listed dataframes are based on the Type value,
How can I add the corresponding title with the type, to the plots I just made?
You can also try labs, similar to ggtitle:
#Data
data("CO2")
#Plot
co2_list <- CO2 %>%
group_split(Type)
#Function
reprex_fun <- function(x){
x %>%
ggplot(aes(conc, uptake)) +
geom_point() +
labs(title = unique(x$Type))+
facet_wrap(~Plant, ncol = 2)
}
#Plots
lapply(co2_list, reprex_fun)
I would like to sort by ggplot facet_wrap by color.
For example, in this demo code, the color corresponds to groups A, B, C. I am looking to have all the red plots next to each other, and same for the blue and green plots.
I tried sorting my data by group but ggplot seems to switch the order when plotting.
library(tidyverse)
set.seed(42)
# Generate example data frame
id <- 1:15
data <- map(id, ~rnorm(10))
date <- map(id, ~1:10)
group <- map_chr(id, ~sample(c('a','b','c'), size=1))
df <- tibble(id=id, data=data, date=date, group=group) %>% unnest(cols = c(data, date))
# Generate plot
df %>%
arrange(group) %>%
ggplot(mapping = aes(x=date, y=data, color=group)) +
geom_line() +
geom_point() +
facet_wrap(~ id)
This could help:
library(tidyverse)
set.seed(42)
# Generate example data frame
id <- 1:15
data <- map(id, ~rnorm(10))
date <- map(id, ~1:10)
group <- map_chr(id, ~sample(c('a','b','c'), size=1))
df <- tibble(id=id, data=data, date=date, group=group) %>% unnest(cols = c(data, date))
df2 <- df %>% mutate(id=factor(id))%>%
group_by(group) %>%
mutate(N = n()) %>%
ungroup() %>%
mutate(id = fct_reorder(id, N))
# Generate plot
df2 %>%
arrange(group) %>%
ggplot(mapping = aes(x=date, y=data, color=group)) +
geom_line() +
geom_point() +
facet_wrap(~ id)
This would be a way (would have to get rid of the double title though):
df %>%
arrange(group) %>%
ggplot(mapping = aes(x=date, y=data, color=group)) +
geom_line() +
geom_point() +
facet_wrap(~ group + id)
I have a dataset similar to this:
library(ggplot2)
data(economics_long)
economics_long$date2 <- as.numeric(economics_long$date) + 915
ggplot(economics_long, aes(date2, value01, colour = variable)) +
geom_line()
Which gives the following plot:
Now I would like to normalize it to the start value of the green line (or the mean), so all variables start at the same point of the Y axes. Similar to this:
Thanks for any help.
You could subtract the starting value of each vector depending on variable-value using by().
library(ggplot2)
l <- by(economics_long, economics_long$variable, function(x)
within(x, varnorm <- value01 - value01[1]))
dat <- do.call(rbind, l)
ggplot(dat, aes(date2, value01.n, colour = variable)) +
geom_line()
use group_by() and mutate() to shift each variable by its initial y-value.
library(tidyverse)
data(economics_long)
economics_long %>%
group_by(variable) %>%
mutate(value_shifted = value01 - value01[1]) %>%
ungroup() %>%
ggplot(aes(date2, value_shifted, colour = variable)) +
geom_line()
How would I be able to do a line showing the median of the sum?
What I currently have
dataset$`Created Date`<- gsub("T.*","",dataset$`Created Date`)
dataset$`Created Date`<- ymd(strptime(dataset$`Created Date`, format="%Y-%m-%d"))
names(dataset) <- gsub(" ","_",names(dataset)) #rename column to remove space
dfcount <- data.frame(count(dataset, `Created_Date`)) #create dataframe
dfcount$Created_Date <- as.POSIXlt(dfcount$Created_Date) #Convert to POSIX for weekdayfilter
Monthlywithavg <- ggplot(dfcount,aes(Month, n))+
stat_summary(fun.y = sum, geom = "line") +
scale_x_date(labels = date_format("%Y-%m"))+
stat_summary(fun.y = mean, geom = "line")
Monthlywithavg
Let me know if there's anything else I should change too.
Thanks!
Here is another alternative to CPak's, it is essentially the same.
library(ggplot2)
library(plyr)
agg = plyr::ddply(mtcars,'cyl',summarize,mpg = sum(mpg) )
ggplot(mtcars, aes(x=cyl, y=mpg)) +
stat_summary(fun.y=sum, geom="line") +
geom_hline(data = agg,aes(yintercept = median(mpg)),color="red")
Hopefully someone has a better answer but I think you'll have to calculate the median of the sum yourself. See this reproducible example.
library(ggplot2)
library(dplyr)
library(magrittr)
median_of_sum <- mtcars %>%
group_by(cyl) %>%
summarise(sum = sum(mpg)) %>%
ungroup() %>%
summarise(median = median(sum))
ggplot(mtcars, aes(x=cyl, y=mpg)) +
stat_summary(fun.y=sum, geom="line") +
geom_hline(data=median_of_sum, aes(yintercept=median), color="red")
I have read http://dplyr.tidyverse.org/articles/programming.html about non standard evaluation in dplyr but still can't get things to work.
plot_column <- "columnA"
raw_data %>%
group_by(.dots = plot_column) %>%
summarise (percentage = mean(columnB)) %>%
filter(percentage > 0) %>%
arrange(percentage) %>%
# mutate(!!plot_column := factor(!!plot_column, !!plot_column))%>%
ggplot() + aes_string(x=plot_column, y="percentage") +
geom_bar(stat="identity", width = 0.5) +
coord_flip()
works fine when the mutate statement is disabled. However, when enabling it in order to order the bars by height only a single bar is returned.
How can I convert the statement above into a function / to use a variable but still plot multiple bars ordered by their size.
An example Dataset could be:
columnA,columnB
a, 1
a, 0.4
a, 0.3
b, 0.5
edit
a sample:
mtcars %>%
group_by(mpg) %>%
summarise (mean_col = mean(cyl)) %>%
filter(mean_col > 0) %>%
arrange(mean_col) %>%
mutate(mpg := factor(mpg, mpg))%>%
ggplot() + aes(x=mpg, y=mean_col) +
geom_bar(stat="identity")
coord_flip()
will output an ordered bar chart.
How can I wrap this into a function where the column can be replaced and I get multiple bars?
This works with dplyr 0.7.0 and ggplot 2.2.1:
rm(list = ls())
library(ggplot2)
library(dplyr)
raw_data <- tibble(columnA = c("a", "a", "b", "b"), columnB = c(1, 0.4, 0.3, 0.5))
plot_col <- function(df, plot_column, val_column){
pc <- enquo(plot_column)
vc <- enquo(val_column)
pc_name <- quo_name(pc) # generate a name from the enquoted statement!
df <- df %>%
group_by(!!pc) %>%
summarise (percentage = mean(!!vc)) %>%
filter(percentage > 0) %>%
arrange(percentage) %>%
mutate(!!pc_name := factor(!!pc, !!pc)) # insert pc_name here!
ggplot(df) + aes_(y = ~percentage, x = substitute(plot_column)) +
geom_bar(stat="identity", width = 0.5) +
coord_flip()
}
plot_col(raw_data, columnA, columnB)
plot_col(mtcars, mpg, cyl)
Problem I ran into was kind of that ggplot and dplyr use different kinds of non-standard evaluation. I got the answer at this question: Creating a function using ggplot2 .
EDIT: parameterized the value column (e.g. columnB/cyl) and added mtcars example.