How to add error bars to barplot in R [closed] - r

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I am new to R and have produced a graph, but I want to add error bars as simply as possible and I do not know how.
ana <- read.table(text="Infiltration Grazing Burn
3301.145496 G S
8165.771889 U S
9937.833576 G L
11576.5892 U L
32739.07643 G N
25923.84328 U N", header=TRUE)
That is my data and below is the code I have used.
barplot(xtabs(ana$Infiltration ~ ana$Grazing + ana$Burn ),beside = TRUE, col = c( "tan4", "darkgreen"), xlab = "Burn Treatment", names = c( "Long Rotation", "Burned 1954", "Short Rotation" ) , ylab = "Mean Infiltration Rate (mm/h) " , legend = c( "Grazed", "Ungrazed"), args.legend = list(title = "Graze Treatment", x = "topright", cex = .7), ylim = c(0, 35000) )
as I am new to R please explain as simply as possible!

This is a basic ggplot2 implementation of what you are after
library(dplyr)
library(ggplot2)
library(magrittr)
## Read in the q data
df <- read.table(text = "Infiltration Grazing Burn
3301.145496 G S
8165.771889 U S
9937.833576 G L
11576.5892 U L
32739.07643 G N
25923.84328 U N",
header = TRUE)
## Add test Lower and upper bounds, trans varnames
df <- df %>%
mutate(ll = Infiltration * 0.9,
hh = Infiltration * 1.1) %>%
mutate(Grazing = Grazing %>%
recode(G = "Grazed", U = "Ungrazed"),
Burn = Burn %>%
recode(S = "Short Rotation", L = "Long Rotation", N = "Burned 194")) %>%
rename(`Graze Treatment` = Grazing)
## Basic boxplot with ci's
df %>%
ggplot(aes(x = Burn, y = Infiltration, fill = `Graze Treatment`)) +
geom_bar(stat = "identity", position = "dodge") +
geom_errorbar(aes(ymin = ll, ymax = hh), position = "dodge") +
theme_minimal() +
labs(y = "Mean Infiltration Rate (mm/h)",
caption = "Errorbars represent ....")
It looks like this:
In general boxplots with whiskers are a bit hard to interpret. It might be better to use something like this..
df %>%
ggplot(aes(x = Burn, y = Infiltration, fill = `Graze Treatment`, col = `Graze Treatment`)) +
geom_point(stat = "identity", position = position_dodge(width = 1), size = 3) +
geom_linerange(aes(ymin = ll, ymax = hh), position = position_dodge(width = 1),
alpha = 0.4, size = 3) +
theme_minimal() +
expand_limits(y = 0) +
labs(y = "Mean Infiltration Rate (mm/h)",
caption = "Errorbars represent ....")
Note: If you have the raw data from which you generated your confidence intervals you might be better served using a boxplot (with geom_boxplot), a violin plot (with geom_violin) or even a ridge plot (ggridges:geom_density_ridges).
Some possible extensions
If the underlying data is available we can do much better. There are several options, which one you pick comes down to your use case and the size of your data.
First lets generate some sample data.
library(dplyr)
library(ggplot2)
library(tidyr)
library(tibble)
## Read in the q data
df <- read.table(text = "Infiltration Grazing Burn
3301.145496 G S
8165.771889 U S
9937.833576 G L
11576.5892 U L
32739.07643 G N
25923.84328 U N",
header = TRUE)
## Generate and clean some sample data
df <- df %>%
as_tibble %>%
mutate(Infiltration = map(Infiltration, function(x) {
tibble(Infiltration = rnorm(n = 1000,
mean = x,
sd = 0.1 * x),
id = 1:1000)
})) %>%
unnest() %>%
mutate(Grazing = Grazing %>%
recode(G = "Grazed", U = "Ungrazed"),
Burn = Burn %>%
recode(S = "Short Rotation", L = "Long Rotation", N = "Burned 194")) %>%
rename(`Graze Treatment` = Grazing)
Now lets make some plots .
The underlying data with jitter.
df %>%
ggplot(aes(x = Burn, y = Infiltration, fill = `Graze Treatment`, col = `Graze Treatment`)) +
geom_jitter(position = position_jitterdodge(), alpha = 0.1) +
theme_minimal() +
expand_limits(y = 0) +
labs(y = "Mean Infiltration Rate (mm/h)",
caption = "Errorbars represent ....")
Boxplots
df %>%
ggplot(aes(x = Burn, y = Infiltration, fill = `Graze Treatment`, col = `Graze Treatment`)) +
geom_boxplot(alpha = 0.4) +
theme_minimal() +
expand_limits(y = 0) +
labs(y = "Mean Infiltration Rate (mm/h)",
caption = "Errorbars represent ....")
Violin plots
df %>%
ggplot(aes(x = Burn, y = Infiltration, fill = `Graze Treatment`, col = `Graze Treatment`)) +
geom_violin(draw_quantiles = c(0.25, 0.5, 0.75), alpha = 0.4) +
theme_minimal() +
expand_limits(y = 0) +
labs(y = "Mean Infiltration Rate (mm/h)",
caption = "Errorbars represent ....")
Points with mean, 1 and 2 standard deviations
df %>%
group_by(`Graze Treatment`, Burn) %>%
summarise(
mean = mean(Infiltration),
sd = sd(Infiltration),
lll = mean - 2 * sd,
ll = mean - sd,
hh = mean + sd,
hhh = mean + 2*sd) %>%
ggplot(aes(x = Burn, y = mean, fill = `Graze Treatment`, col = `Graze Treatment`)) +
geom_point(stat = "identity", position = position_dodge(width = 1), size = 3) +
geom_linerange(aes(ymin = lll, ymax = hhh), position = position_dodge(width = 1),
alpha = 0.4, size = 3) +
geom_linerange(aes(ymin = ll, ymax = hh), position = position_dodge(width = 1),
alpha = 0.6, size = 3) +
theme_minimal() +
expand_limits(y = 0) +
labs(y = "Mean Infiltration Rate (mm/h)",
x = "Infiltration",
caption = "Errorbars represent ....")
Both jittered points and violin plots
df %>%
ggplot(aes(x = Burn, y = Infiltration, fill = `Graze Treatment`, col = `Graze Treatment`)) +
geom_violin(draw_quantiles = c(0.25, 0.5, 0.75),
position = position_dodge(width = 1),
aes(fill = NULL)) +
geom_jitter(position = position_jitterdodge(dodge.width = 1), alpha = 0.01) +
theme_minimal() +
expand_limits(y = 0) +
labs(y = "Mean Infiltration Rate (mm/h)",
caption = "Errorbars represent ....")
and any other summary plot overlayed with the raw data. This falls down when you have lots of data in which case one of the summary plots by itself will be better.

Related

geom_point shapes not the same size when aes size are the same value

I have a population and two samples. One sample is a random sample and another is proportional to the population. I want to create plots that show how proportional the samples are by some factors in this case year and gear. I want a plot that has a circle and cross the same size when they sample is proportional to population for given factors and if the cross is larger than the circle this indicates the sample sample more relative to the population and vice versa. See below for my code.
library(ggplot2)
library(dplyr)
library(tidyr)
n_y = 5
min_year = 1900
years = min_year:(min_year + n_y - 1)
fixed_gear_catch = rlnorm(n = n_y, meanlog = log(2500), 0.5)
trawl_gear_catch = rlnorm(n = n_y, meanlog = log(1200), 0.3)
fixed_gear_obs = rlnorm(n = n_y, meanlog = log(250), 0.5)
trawl_gear_obs = rlnorm(n = n_y, meanlog = log(120), 0.3)
population_df = data.frame(fixed_gear = fixed_gear_catch,
trawl_gear = trawl_gear_catch,
years = years)
sample_data = data.frame(fixed_gear = fixed_gear_obs,
trawl_gear = trawl_gear_obs,
years = years)
proportional_sample_data = data.frame(fixed_gear = fixed_gear_catch * 0.2,
trawl_gear = trawl_gear_catch * 0.2,
years = years)
population_df = population_df %>% gather("gear", "catch", -years)
sample_data = sample_data %>% gather("gear", "catch", -years)
proportional_sample_data = proportional_sample_data %>% gather("gear", "catch", -years)
## give type and merge
population_df$type = "Catch"
sample_data$type = "Observed"
proportional_sample_data$type = "Observed"
full_df = rbind(population_df, sample_data)
full_proportional_df = rbind(population_df, proportional_sample_data)
## convert to proporitons
full_df = full_df %>% group_by(years, type) %>% mutate(percentage = catch / sum(catch) * 100)
full_proportional_df = full_proportional_df %>% group_by(years, type) %>% mutate(percentage = catch / sum(catch) * 100)
## check the perfect proportions are what we think they should be
full_proportional_df %>% pivot_wider(id_cols = years, values_from = percentage, names_from = c(gear, type))
full_df %>% pivot_wider(id_cols = years, values_from = percentage, names_from = c(gear, type))
## plot with circle and crosses
shpe_manual = c("Catch" = 1, "Observed" = 3)
col_manual = c("Catch" = "red", "Observed" = "blue")
ggplot(full_df, aes(x = gear, y = years)) +
geom_point(aes(shape = type, col = type, size = percentage)) +
scale_shape_manual(values=shpe_manual) +
scale_size_continuous(limits = c(0,100), range = c(0,15)) +
scale_color_manual(values = col_manual)
## this should have perfec sized circles and crosses but doesn't
ggplot(full_proportional_df, aes(x = gear, y = years)) +
geom_point(aes(shape = type, col = type, size = percentage)) +
scale_shape_manual(values=shpe_manual) +
scale_size_continuous(limits = c(0,100), range = c(0,15)) +
scale_color_manual(values = col_manual)
The cross is naturally 2x as tall/wide, so I think this fixes it visually:
ggplot(full_df, aes(x = gear, y = years)) +
geom_point(aes(shape = type, col = type, size = percentage * if_else(type == "Observed", 0.5, 1))) +
scale_shape_manual(values=shpe_manual) +
# scale_size_continuous(limits = c(0,100), range = c(0,15)) +
scale_color_manual(values = col_manual) +
scale_size_area(max_size = 15)
As a check:
ggplot(full_proportional_df, aes(x = gear, y = years)) +
geom_point(aes(shape = type, col = type,
size = percentage * if_else(type == "Observed", 0.5, 1))) +
scale_shape_manual(values=shpe_manual) +
scale_size_continuous(limits = c(0,100), range = c(0,15)) +
scale_color_manual(values = col_manual)

Missing Confidence Intervals on a Geom_smooth function with double y graph

I am struggling to make a graph with double y axis. It comes out without confidence intervals with loess and I am not able to understand the reason.
Below I am reporting the code:
library(ggplot2)
library(readxl)
Dati <- data.frame("r" = c(0.99, 1.42, 2.10, 3.32, 6.09), "Vix" = c(16500, 19200, 22500, 24000, 26000), "OT" = c(23.5, 19, 11, 9, 7), "ref" = c("PU 178", "PU 178", "PU 178", "PU 178", "PU 178"))
attach(Dati)
scaleFactor <- max(Vix) / max(OT)
Graph <- ggplot(Dati, aes(x= r)) +
geom_point(aes(y= Vix, col=paste0("Vix ", ref)), shape = 1, size = 3.5) +
geom_smooth(aes(y= Vix, col = paste0("Vix ", ref)), method="loess", level=0.55, se = TRUE) +
geom_point(aes(y= OT * scaleFactor, col=paste0("OT ", ref)), shape = 1, size = 3.5) +
geom_smooth(aes(y=OT * scaleFactor, col = paste0("OT ", ref)), method="loess", level=0.55, se = TRUE) +
scale_color_manual(values=c('#644196', '#f92410', '#bba6d9', '#fca49c'),
name = "") +
theme(legend.justification = "top") +
scale_y_continuous(name="Viscosity at 10rpm (mPa s)", sec.axis=sec_axis(~./scaleFactor, name="open time (sec)")) +
theme(
axis.title.y.left=element_text(color='#f92410'),
axis.text.y.left=element_text(color='#f92410'),
axis.title.y.right=element_text(color='#644196'),
axis.text.y.right=element_text(color='#644196'),
legend.position = "none"
) +
scale_x_continuous(name="ratio A2333/AD5027")
Graph
And the result is completely without CI for both lines. I thought it was too big or small the specified level but also changing it I get no CIs. I thought 5 values are too less to achieve, but I made in the past graph with 5 values without problems.
Does somebody know if I made any mistake?
Below I post the graph which I obtain.
Do
Your span is too small (see this), so there's too little points to estimate your confidence interval. So for example if you do:
ggplot(Dati, aes(x= r)) +
geom_point(aes(y= Vix, col=paste0("Vix ",ref)),shape = 1, size = 3.5) +
geom_smooth(aes(y= Vix, col =paste0("Vix ",ref)), method="loess" ,span=1) +
geom_point(aes(y= OT * scaleFactor, col=paste0("OT ",ref)), shape = 1, size = 3.5) +
geom_smooth(aes(y=OT * scaleFactor, col =paste0("OT ",ref) ), method="loess",span=1) +
scale_color_manual(values=c('#644196', '#f92410', '#bba6d9', '#fca49c'),
name = "") +
theme(legend.justification = "top")
Loess is a bit of an overkill here, you can consider other smooth and also pivoting your data long to make it easier to code:
library(tidyr)
library(dplyr)
Dati %>% mutate(OT = OT*scaleFactor) %>%
pivot_longer(-c(r,ref)) %>%
mutate(name = paste0(name,ref)) %>%
ggplot(aes(x = r,y = value,col = name,fill = name)) +
geom_point(shape = 1, size = 3.5) +
geom_smooth(method="gam",formula = y ~ s(x,k=3),alpha=0.1) +
theme_bw()
Or polynomial of degree 2:
Dati %>% mutate(OT = OT*scaleFactor) %>%
pivot_longer(-c(r,ref)) %>%
mutate(name = paste0(name,ref)) %>%
ggplot(aes(x = r,y = value,col = name,fill = name)) +
geom_point(shape = 1, size = 3.5) +
geom_smooth(method="lm",formula = y ~ poly(x, 2),alpha=0.1) +
theme_bw()

How to adapt R code to make timescale on the X axis the same for each graph on this single output

I have created three change point graphs into a combined single output (as shown below), but X axis datetime points are different for each graph, which makes it difficult to draw a comparison. Please can you help adapt the R code (below the image) to make timescale on the X axis the same for each graph on this single output?
Graph
R code
setwd("directory/path")
sentimentposts <- read.csv("sentimentposts.csv", header = TRUE, sep = ",", stringsAsFactors = FALSE)
sentimentposts$Datetime <- dmy_hm(sentimentposts$Datetime)
## 1st change point graph for Negative
r7 <- sentimentposts %>%
filter(Day == c(4,5,6), Cat == "Negative")
plotpos2 <- r7
plotpos2$label <- c(paste(round(plotpos2$Count,digits=2)))
binseg.meancpt = cpt.mean(rev(r7$Percent), method="BinSeg",Q=3)
meancpt.point = cpts(binseg.meancpt)
p5 <- ggplot(data=r7, aes(x=Datetime, y = Percent)) +
geom_line(col="blue") +
geom_vline(xintercept = rev(r7$Datetime)[meancpt.point], col="red",lwd=0.9,linetype="dotted") +
geom_text_repel(data = plotpos2, aes(label = label),
vjust = 0.2, hjust = 0.4, size = 2) +
scale_x_datetime(breaks = date_breaks("6 hours"), labels = date_format("%D %H:%M")) +
ylab("Posts")+
ggtitle("ChangePoint of mean with BinSeg: Negative")
## 2nd change point graph for Neutral
r8 <- sentimentposts %>%
filter(Day == c(4,5,6), Cat == "Neutral")
plotpos2 <- r8
plotpos2$label <- c(paste(round(plotpos2$Count,digits=2)))
binseg.meancpt = cpt.mean(r8$Percent, method="BinSeg",Q=3)
meancpt.point = cpts(binseg.meancpt)
p6 <- ggplot(data=r8, aes(x=Datetime, y = Percent)) +
geom_line(col="blue") +
geom_vline(xintercept = rev(r8$Datetime)[meancpt.point], col="red",lwd=0.9,linetype="dotted") +
geom_text_repel(data = plotpos2, aes(label = label),
vjust = 0.2, hjust = 0.4, size = 2) +
scale_x_datetime(breaks = date_breaks("6 hours"), labels = date_format("%D %H:%M")) +
ylab("Posts")+
ggtitle("ChangePoint of mean with BinSeg: Neutral")
## 3rd change point graph for Positive
r9 <- sentimentposts %>%
filter(Day == c(4,5,6), Cat == "Positive")
plotpos2 <- r9
plotpos2$label <- c(paste(round(plotpos2$Count,digits=2)))
binseg.meancpt = cpt.mean(r9$Percent, method="BinSeg",Q=2)
meancpt.point = cpts(binseg.meancpt)
p7 <- ggplot(data=r9, aes(x=Datetime, y = Percent)) +
geom_line(col="blue") +
geom_vline(xintercept = rev(r9$Datetime)[meancpt.point], col="red",lwd=0.9,linetype="dotted") +
geom_text_repel(data = plotpos2, aes(label = label),
vjust = 0.2, hjust = 0.4, size = 2) +
scale_x_datetime(breaks = date_breaks("6 hours"), labels = date_format("%D %H:%M")) +
ylab("Posts")+
ggtitle("ChangePoint of mean with BinSeg: Positive")
#Combine graphs into one single output
grid.arrange(p5,p6,p7, ncol = 1, nrow = 3)
Data-set is available via Github:
https://raw.githubusercontent.com/jcool12/RCode/master/changepointData.csv
If you add the line
+ coord_cartesian(xlim = as.POSIXct(c("2015-11-05 09:00", "2015-11-06 20:00")))
to each plot, this will fix the x axis across the plots, giving you:

Labelling outliers with ggplot

I am trying to label outliers with ggplot. Regarding my code, I have two questions:
Why does it not label outliers below 1.5*IQR?
Why does it not label outliers based on the group they are in but instead apparently refers to the overall mean of the data? I would like to label outliers for each box plot individually. I.e. the outliers for Country A in Wave 1 (of a survey), etc.
A sample of my code:
PERCENT <- rnorm(50, sd = 3)
WAVE <- sample(6, 50, replace = TRUE)
AGE_GROUP <- rep(c("21-30", "31-40", "41-50", "51-60", "61-70"), 10)
COUNTRY <- rep(c("Country A", "Country B"), 25)
N <- rnorm(50, mean = 200, sd = 2)
df <- data.frame(PERCENT, WAVE, AGE_GROUP, COUNTRY, N)
ggplot(df, aes(x = factor(WAVE), y = PERCENT, fill = factor(COUNTRY))) +
geom_boxplot(alpha = 0.3) +
geom_point(aes(color = AGE_GROUP, group = factor(COUNTRY)), position = position_dodge(width=0.75)) +
geom_text(aes(label = ifelse(PERCENT > 1.5*IQR(PERCENT)|PERCENT < -1.5*IQR(PERCENT), paste(AGE_GROUP, ",", round(PERCENT, 1), "%, n =", round(N, 0)),'')), hjust = -.3, size = 3)
A picture of what I have so far:
I appreciate your help!
If you want IQR to be calculated by country, you need to group the data. You could probably do it globally(i.e. before you send the data to ggplot) or locally in the layer.
library(dplyr)
library(ggplot2)
ggplot(df, aes(x = as.factor(WAVE), y = PERCENT, fill = COUNTRY)) +
geom_boxplot(alpha = 0.3) +
geom_point(aes(color = AGE_GROUP, group = COUNTRY), position = position_dodge(width=0.75)) +
geom_text(aes(group = COUNTRY, label = ifelse(!between(PERCENT,-1.3*IQR(PERCENT), 1.3*IQR(PERCENT)),
paste(" ",COUNTRY, ",", AGE_GROUP, ",", round(PERCENT, 1), "%, n =", round(N, 0)),'')),
position = position_dodge(width=0.75),
hjust = "left", size = 3)
Adding the group aesthetic to geom_text and modifying the ifelse test should do what you want.
Setting group = interaction(WAVE, COUNTRY) will restrict the calculations to within each boxplot, and the outliner test needs to include a call to median(PERCENT).
library(ggplot2)
set.seed(42)
PERCENT <- rnorm(50, sd = 3)
WAVE <- sample(6, 50, replace = TRUE)
AGE_GROUP <- rep(c("21-30", "31-40", "41-50", "51-60", "61-70"), 10)
COUNTRY <- rep(c("Country A", "Country B"), 25)
N <- rnorm(50, mean = 200, sd = 2)
df <- data.frame(PERCENT, WAVE, AGE_GROUP, COUNTRY, N)
ggplot(df) +
aes(x = factor(WAVE),
y = PERCENT,
fill = factor(COUNTRY)) +
geom_boxplot(alpha = 0.3) +
geom_point(aes(color = AGE_GROUP, group = factor(COUNTRY)), position = position_dodge(width=0.75)) +
geom_text(aes(group = interaction(WAVE, COUNTRY),
label = ifelse(test = PERCENT > median(PERCENT) + 1.5*IQR(PERCENT)|PERCENT < median(PERCENT) -1.5*IQR(PERCENT),
yes = paste(AGE_GROUP, ",", round(PERCENT, 1), "%, n =", round(N, 0)),
no = '')),
position = position_dodge(width = 0.75),
hjust = -.2,
size = 3)

Create dynamic labels for geom_smooth lines

I have a changing df and I am grouping different values c.
With ggplot2 I plot them with the following code to get a scatterplott with multiple linear regression lines (geom_smooth)
ggplot(aes(x = a, y = b, group = c)) +
geom_point(shape = 1, aes(color = c), alpha = alpha) +
geom_smooth(method = "lm", aes(group = c, color = c), se = F)
Now I want to display on each geom_smooth line in the plot a label with the value of the group c.
This has to be dynamic, because I can not write new code when my df changes.
Example: my df looks like this
a b c
----------------
1.6 24 100
-1.4 43 50
1 28 100
4.3 11 50
-3.45 5.2 50
So in this case I would get 3 geom_smooth lines in the plot with different colors.
Now I simply want to add a text label to the plot with "100" next to the geom_smooth with the group c = 100 and a text label with "50"to the line for the group c = 50, and so on... as new groups get introduced in the df, new geom_smooth lines are plotted and need to be labeled.
the whole code for the plot:
ggplot(aes(x = a, y = b, group = c), data = df, na.rm = TRUE) +
geom_point(aes(color = GG, size = factor(c)), alpha=0.3) +
scale_x_continuous(limits = c(-200,2300))+
scale_y_continuous(limits = c(-1.8,1.5))+
geom_hline(yintercept=0, size=0.4, color="black") +
scale_color_distiller(palette="YlGnBu", na.value="white") +
geom_smooth(method = "lm", aes(group = factor(GG), color = GG), se = F) +
geom_label_repel(data = labelInfo, aes(x= max, y = predAtMax, label = label, color = label))
You can probably do it if you pick the location you want the lines labelled. Below, I set them to label at the far right end of each line, and used ggrepel to avoid overlapping labels:
library(ggplot2)
library(ggrepel)
library(dplyr)
set.seed(12345)
df <-
data.frame(
a = rnorm(100,2,0.5)
, b = rnorm(100, 20, 5)
, c = factor(sample(c(50,100,150), 100, TRUE))
)
labelInfo <-
split(df, df$c) %>%
lapply(function(x){
data.frame(
predAtMax = lm(b~a, data=x) %>%
predict(newdata = data.frame(a = max(x$a)))
, max = max(x$a)
)}) %>%
bind_rows
labelInfo$label = levels(df$c)
ggplot(
df
, aes(x = a, y = b, color = c)
) +
geom_point(shape = 1) +
geom_smooth(method = "lm", se = F) +
geom_label_repel(data = labelInfo
, aes(x= max
, y = predAtMax
, label = label
, color = label))
This method might work for you. It uses ggplot_build to access the rightmost point in the actual geom_smooth lines to add a label by it. Below is an adaptation that uses Mark Peterson's example.
library(ggplot2)
library(ggrepel)
library(dplyr)
set.seed(12345)
df <-
data.frame(
a = rnorm(100,2,0.5)
, b = rnorm(100, 20, 5)
, c = factor(sample(c(50,100,150), 100, TRUE))
)
p <-
ggplot(df, aes(x = a, y = b, color = c)) +
geom_point(shape = 1) +
geom_smooth(method = "lm", se = F)
p.smoothedmaxes <-
ggplot_build(p)$data[[2]] %>%
group_by( group) %>%
filter( x == max(x))
p +
geom_text_repel( data = p.smoothedmaxes,
mapping = aes(x = x, y = y, label = round(y,2)),
col = p.smoothedmaxes$colour,
inherit.aes = FALSE)
This came up for me today and I landed on this solution with data = ~fn()
library(tidyverse)
library(broom)
mpg |>
ggplot(aes(x = displ, y = hwy, colour = class, label = class)) +
geom_count(alpha = 0.1) +
stat_smooth(alpha = 0.6, method = lm, geom = "line", se = FALSE) +
geom_text(
aes(y = .fitted), size = 3, hjust = 0, nudge_x = 0.1,
data = ~{
nest_by(.x, class) |>
summarize(broom::augment(lm(hwy ~ displ, data = data))) |>
slice_max(order_by = displ, n = 1)
}
) +
scale_x_continuous(expand = expansion(add = c(0, 1))) +
theme_minimal()
Or do it with a function
#' #examples
#' last_lm_points(df = mpg, formula = hwy~displ, group = class)
last_lm_points <- function(df, formula, group) {
# df <- mpg; formula <- as.formula(hwy~displ); group <- sym("class");
x_arg <- formula[[3]]
df |>
nest_by({{group}}) |>
summarize(broom::augment(lm(formula, data = data))) |>
slice_max(order_by = get(x_arg), n = 1)
}
mpg |>
ggplot(aes(displ, hwy, colour = class, label = class)) +
geom_count(alpha = 0.1) +
stat_smooth(alpha = 0.6, method = lm, geom = "line", se = FALSE) +
geom_text(
aes(y = .fitted), size = 3, hjust = 0, nudge_x = 0.1,
data = ~last_lm_points(.x, hwy~displ, class)
) +
scale_x_continuous(expand = expansion(add = c(0, 1))) +
theme_minimal()

Resources