I'd like to add the significance letters to a plot using ggeffects. In my case:
# Packages
library(ggeffects)
library(dplyr)
library(glmmTMB)
library(multcomp)
library(lsmeans)
# My data set
ds <- read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/temp_ger_ds.csv")
str(ds)
#'data.frame': 140 obs. of 4 variables:
# $ temp : chr "constante" "constante" "constante" "constante" ...
# $ generation : chr "G0" "G0" "G0" "G0" ...
# $ development: int 22 24 22 27 27 24 25 26 27 18 ...
First fit the ziGamma model:
mTCFd <- glmmTMB(development ~ temp * generation, data = ds,
family = ziGamma(link = "log"))
Pairwise Comparison Post Hoc Tests:
lsm.TCFd.temp <- lsmeans(mTCFd, c("temp","generation"))
cld(lsm.TCFd.temp, Letters=letters)
temp generation lsmean SE df lower.CL upper.CL .group
constante G3 3.13 0.0180 129 3.09 3.16 a
constante G2 3.14 0.0180 129 3.11 3.18 ab
constante G0 3.19 0.0191 129 3.15 3.23 abc
constante G1 3.22 0.0180 129 3.18 3.25 bc
constante G4 3.23 0.0185 129 3.19 3.27 cd
flutuante G1 3.32 0.0352 129 3.25 3.39 cde
flutuante G3 3.34 0.0262 129 3.28 3.39 e
flutuante G0 3.36 0.0191 129 3.32 3.39 e
flutuante G4 3.36 0.0393 129 3.28 3.44 def
flutuante G2 3.47 0.0218 129 3.43 3.52 f
Now, display these letters to the plot:
ggpredict(mTCFd, terms = c("temp","generation")) %>% plot(add.data = TRUE)
But if I try:
lt<-cld(lsm.TCFd.temp, Letters=letters)
ggpredict(mTCFd, terms = c("temp","generation")) %>% plot(add.data = TRUE) %>% geom_text(aes(label = lt[,8]), vjust = -0.5)
Error in geom_text(., aes(label = lt[, 8]), vjust = -0.5) :
could not find function "geom_text"
Doesn't work! Please any help with it?
geom_text is a ggplot function. You may need to set up data for ggplot. Instead of plotting ggpredict directly, use ggpredict to get a data.frame.
I generated a new variable x_1 as x axis. You can have your own ways to get this. I just show the rough idea.
ds <- ds %>% mutate(x_1= 1+(readr::parse_number(generation)-2)*0.05 + as.integer(temp =="flutuante"),
group = generation)
df_gg <- ggpredict(mTCFd, terms = c("temp","generation")) %>%
mutate(x_1= 1+(readr::parse_number(as.character(group))-2)*0.05 + as.integer(x =="flutuante"))
df_gg %>% ggplot(aes(x = x_1, y = predicted, color = group)) +
geom_jitter(aes(y = development), data = ds, alpha = 0.25)+
geom_point()+
geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.02)+
geom_text(aes(x = x_1, label = lt[, 8]), vjust = -0.5, show.legend = FALSE)+
scale_x_continuous(breaks = c(1, 2), labels = c("constante", "flutuante"))
Related
I want to achieve a GAM plot that looks like this
Image from https://stats.stackexchange.com/questions/179947/statistical-differences-between-two-hourly-patterns/446048#446048
How can I accomplish this?
Model is
model = gam(y ~ s(t) + g, data = d)
The general way to do this is to compute model estimates (fitted values) over the range of the covariate(s) of interest for each group. The reproducible example below illustrates once way to do this using {mgcv} to fit the GAM and my {gratia} package for some helper functions to facilitate the process.
library("gratia")
library("mgcv")
library("ggplot2")
eg_data <- data_sim("eg4", n = 400, dist = "normal", scale = 2, seed = 1)
m <- gam(y ~ s(x2) + fac, data = eg_data, method = "REML")
ds <- data_slice(m, x2 = evenly(x2, n = 100), fac = evenly(fac))
fv <- fitted_values(m, data = ds)
The last line gets you fitted values from the model at the covariate combinations specified in the data slice:
> fv
# A tibble: 300 × 6
x2 fac fitted se lower upper
<dbl> <fct> <dbl> <dbl> <dbl> <dbl>
1 0.00131 1 -1.05 0.559 -2.15 0.0412
2 0.00131 2 -3.35 0.563 -4.45 -2.25
3 0.00131 3 1.13 0.557 0.0395 2.22
4 0.0114 1 -0.849 0.515 -1.86 0.160
5 0.0114 2 -3.14 0.519 -4.16 -2.13
6 0.0114 3 1.34 0.513 0.332 2.34
7 0.0215 1 -0.642 0.474 -1.57 0.287
8 0.0215 2 -2.94 0.480 -3.88 -2.00
9 0.0215 3 1.54 0.473 0.616 2.47
10 0.0316 1 -0.437 0.439 -1.30 0.424
# … with 290 more rows
# ℹ Use `print(n = ...)` to see more rows
This object is in a form suitable for plotting with ggplot():
fv |>
ggplot(aes(x = x2, y = fitted, colour = fac)) +
geom_point(data = eg_data, mapping = aes(y = y), size = 0.5) +
geom_ribbon(aes(x = x2, ymin = lower, ymax = upper, fill = fac,
colour = NULL),
alpha = 0.2) +
geom_line()
which produces
You can enhance and/or modify this using your ggplot skills.
The basic point with this model is that you have a common smooth effect of a covariate (here x2) plus group means (for the factor fac). Hence the curves are "parallel".
Note that there's a lot of variation around the estimated curves in this model because the simulated data are from a richer model with group-specific smooths and smooth effects of other covariates.
gg.bs30 <- ggplot(data,aes(x=Predictor,y=Output,col=class))+geom_point()+
geom_smooth(method='gam',formula=y ~ splines::bs(x, 30)) + facet_grid(class ~.)
print(gg.bs30)
Code from -> https://github.com/mariocastro73/ML2020-2021/blob/master/scripts/gams-with-ggplot-classes.R
I'm having trouble combining color and linetype guides into a single legend in a plot produced with ggplot2. Either the linetype shows up with all of the linetypes keyed the same way, or it does not show up at all.
My plot includes both a ribbon to show the bulk of the observations, along with lines showing minimum, median, maximum, and sometimes the observations from a single year.
Example code using built in CO2 data set:
library(tidyverse)
myExample <- CO2 %>%
group_by(conc) %>%
summarise(d.min = min(uptake, na.rm= TRUE),
d.ten = quantile(uptake,probs = .1, na.rm = TRUE),
d.median = median(uptake, na.rm = TRUE),
d.ninty = quantile(uptake, probs = .9, na.rm= TRUE),
d.max = max(uptake, na.rm = TRUE))
myExample <- cbind(myExample, "Qn1"= filter(CO2, Plant == "Qn1")[,5])
plot_plant <- TRUE # Switch to plot single observation series
myExample %>%
ggplot(aes(x=conc))+
geom_ribbon(aes(ymin=d.ten, ymax= d.ninty, fill = "80% of observations"), alpha = .2)+
geom_line(aes(y=d.min, colour = "c"), linetype = 3, size = .5)+
geom_line(aes(y=d.median, colour = "e"),linetype = 2, size = .5)+
geom_line(aes(y=d.max, colour = "a"),linetype = 3, size = .5)+
{if(plot_plant)geom_line(aes(y=Qn1, color = "f"), linetype = 1,size =.5)}+
scale_fill_manual("Statistic", values = "blue")+
scale_color_brewer(palette = "Dark2",name = "",
labels = c(
a= "Maximum",
e= "Median",
c= "Minimum",
f = current_year
), breaks = c("a","e","c","f"))+
scale_linetype_manual(name = "")+
guides(fill= guide_legend(order = 1), color = guide_legend(order = 2), linetype = guide_legend(order = 2))
With plot_plant set to TRUE, the code plots a single observation series, but linetype does not show up at all in the legend:
With plot_plant set to FALSE, linetype shows up in the legend, but I cannot see the distinction between the dotted and dashed legend entries:
The plot is working as desired, but I would like the linetype distinctions to show up in the legend. Visually, it is more important when I'm plotting the single observation series because the distinction between solid and dashed or dotted is stronger.
Searching for answers, I've seen suggestions to combine the different stats(min, median, max, and the single series) into a single variable and let ggplot determine the linetypes (ex [this post]ggplot2 manually specifying color & linetype - duplicate legend) or make a hash that describes the linetype [for example]How to rename a (combined) legend in ggplot2? but neither of these approaches seems to play well in combination with the ribbon plot.
I tried formatting my data into a long format, which usually works well for ggplot. This worked if I plotted all of the statistics as line geometry, but couldn't get the ribbon to work like I wanted, and overlaying a single observation series seemed like it needed to be stored in a different data table.
As you noted, ggplot loves long format data. So I recommend sticking with that.
Here I generate some made up data:
library(tibble)
library(dplyr)
library(ggplot2)
library(tidyr)
set.seed(42)
tibble(x = rep(1:10, each = 10),
y = unlist(lapply(1:10, function(x) rnorm(10, x)))) -> tbl_long
which looks like this:
# A tibble: 100 x 2
x y
<int> <dbl>
1 1 2.37
2 1 0.435
3 1 1.36
4 1 1.63
5 1 1.40
6 1 0.894
7 1 2.51
8 1 0.905
9 1 3.02
10 1 0.937
# ... with 90 more rows
Then I group_by(x) and calculate quantiles of interest for y in each group:
tbl_long %>%
group_by(x) %>%
mutate(q_0.0 = quantile(y, probs = 0.0),
q_0.1 = quantile(y, probs = 0.1),
q_0.5 = quantile(y, probs = 0.5),
q_0.9 = quantile(y, probs = 0.9),
q_1.0 = quantile(y, probs = 1.0)) -> tbl_long_and_wide
and that looks like:
# A tibble: 100 x 7
# Groups: x [10]
x y q_0.0 q_0.1 q_0.5 q_0.9 q_1.0
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2.37 0.435 0.848 1.38 2.56 3.02
2 1 0.435 0.435 0.848 1.38 2.56 3.02
3 1 1.36 0.435 0.848 1.38 2.56 3.02
4 1 1.63 0.435 0.848 1.38 2.56 3.02
5 1 1.40 0.435 0.848 1.38 2.56 3.02
6 1 0.894 0.435 0.848 1.38 2.56 3.02
7 1 2.51 0.435 0.848 1.38 2.56 3.02
8 1 0.905 0.435 0.848 1.38 2.56 3.02
9 1 3.02 0.435 0.848 1.38 2.56 3.02
10 1 0.937 0.435 0.848 1.38 2.56 3.02
# ... with 90 more rows
Then I gather up all the columns except for x, y, and the 10- and 90-percentile variables into two variables: key and value. The new key variable takes on the names of the old variables from which each value came from. The other variables are just copied down as needed.
tbl_long_and_wide %>%
gather(key, value, -x, -y, -q_0.1, -q_0.9) -> tbl_super_long
and that looks like:
# A tibble: 300 x 6
# Groups: x [10]
x y q_0.1 q_0.9 key value
<int> <dbl> <dbl> <dbl> <chr> <dbl>
1 1 2.37 0.848 2.56 q_0.0 0.435
2 1 0.435 0.848 2.56 q_0.0 0.435
3 1 1.36 0.848 2.56 q_0.0 0.435
4 1 1.63 0.848 2.56 q_0.0 0.435
5 1 1.40 0.848 2.56 q_0.0 0.435
6 1 0.894 0.848 2.56 q_0.0 0.435
7 1 2.51 0.848 2.56 q_0.0 0.435
8 1 0.905 0.848 2.56 q_0.0 0.435
9 1 3.02 0.848 2.56 q_0.0 0.435
10 1 0.937 0.848 2.56 q_0.0 0.435
# ... with 290 more rows
This format will allow you to use both geom_ribbon() and geom_smooth() like you want to do because the variables for the lines are contained in value and grouped by key whereas the variables to be mapped to ymin and ymax are separate from value and are all the same within each x group.
tbl_super_long %>%
ggplot() +
geom_ribbon(aes(x = x,
ymin = q_0.1,
ymax = q_0.9,
fill = "80% of observations"),
alpha = 0.2) +
geom_line(aes(x = x,
y = value,
color = key,
linetype = key)) +
scale_fill_manual(name = element_text("Statistic"),
guide = guide_legend(order = 1),
values = viridisLite::viridis(1)) +
scale_color_manual(name = element_blank(),
labels = c("Minimum", "Median", "Maximum"),
guide = guide_legend(reverse = TRUE, order = 2),
values = viridisLite::viridis(3)) +
scale_linetype_manual(name = element_blank(),
labels = c("Minimum", "Median", "Maximum"),
guide = guide_legend(reverse = TRUE, order = 2),
values = c("dotted", "dashed", "solid")) +
labs(x = "x", y = "y")
This data format with the long but grouped x and y variables plus the independent but repeated ymin, and xmin variables will allow you to use both geom_ribbon() and geom_smooth() and allow the linetypes to show up properly in the legend.
I have data that looks something like this:
time level strain
<dbl> <dbl> <chr>
1 0.0 0.000 M12-611020
2 1.0 0.088 M12-611020
3 3.0 0.211 M12-611020
4 4.0 0.278 M12-611020
5 4.5 0.404 M12-611020
6 5.0 0.606 M12-611020
7 5.5 0.778 M12-611020
8 6.0 0.902 M12-611020
9 6.5 1.024 M12-611020
10 8.0 1.100 M12-611020
11 0.0 0.000 M12-611025
12 1.0 0.077 M12-611025
13 3.0 0.088 M12-611025
14 4.0 0.125 M12-611025
15 5.0 0.304 M12-611025
16 5.5 0.421 M12-611025
17 6.0 0.518 M12-611025
18 6.5 0.616 M12-611025
19 7.0 0.718 M12-611025
I can easily graph it using ggplot, asking ggplot to look at the strains seperatley and using stat_smooth to fit a curve:
ggplot(data = data, aes(x = time, y = level), group = strain) + stat_smooth(aes(group=strain,fill=strain, colour = strain) ,method = "loess", se = F, span = 0.8) +
theme_gray()+xlab("Time(h)") +
geom_point(aes(fill=factor(strain)),alpha=0.5 , size=3,shape = 21,colour = "black", stroke = 1)+
theme(legend.position="right")
I would then like to predict using the loess curve that was fitted to I do so as follows:
# define the model
model <- loess(time ~ strain,span = 0.8, data = data)
# Predict for given levle (x) the time (y)
predict(model, newdata = 0.3, se = FALSE)
I do not know however to predict for one or other of my "strains" set out above (i.e the red or blue lines in the plot)?
Additionally is there a simple way to plot this predicytion on the graph for exmaple in the form of a dotted line going across at 0.3 down to the predicted time as above?
Do you mean something like this?
p <- ggplot(data = dat, aes(x = time, y = level, fill = strain)) +
geom_point(alpha=0.5 , size=3,shape = 21, colour = "black", stroke = 1) +
stat_smooth(aes(group=strain, colour=strain) ,method = "loess", se = F, span = 0.8)
newdat <- split(dat, dat$strain)
mod <- lapply(newdat, function(x)loess(level ~ time,span = 0.8, data = x))
predict(mod[["M12-611020"]], newdata = 2, se = FALSE)
p +
geom_segment(aes(x=2, xend=2, y=0, yend=0.097), linetype="dashed") +
geom_segment(aes(x=0, xend=2, y=0.097, yend=0.097), linetype="dashed")
I have the following data frame:
date DGS1MO DGS3MO DGS6MO DGS1 DGS2 DGS3 DGS5 DGS7 DGS10 DGS20 DGS30
1 2006-02-28 4.47 4.62 4.74 4.73 4.69 4.67 4.61 4.57 4.55 4.70 4.51
2 2006-03-31 4.65 4.63 4.81 4.82 4.82 4.83 4.82 4.83 4.86 5.07 4.90
3 2006-04-28 4.60 4.77 4.91 4.90 4.87 4.87 4.92 4.98 5.07 5.31 5.17
4 2006-05-31 4.75 4.86 5.08 5.07 5.04 5.03 5.04 5.06 5.12 5.35 5.21
5 2006-06-30 4.54 5.01 5.24 5.21 5.16 5.13 5.10 5.11 5.15 5.31 5.19
6 2006-07-31 5.02 5.10 5.18 5.11 4.97 4.93 4.91 4.93 4.99 5.17 5.07
Using melt (from reshape2) I got this data frame:
date variable value
1 2006-02-28 DGS1MO 4.47
2 2006-03-31 DGS1MO 4.65
3 2006-04-28 DGS1MO 4.60
4 2006-05-31 DGS1MO 4.75
5 2006-06-30 DGS1MO 4.54
6 2006-07-31 DGS1MO 5.02
As you can see I have 1, 3, 6 month, along with 10, 20, 30 year time horizons. I would like to plot box-and-whisker plot for each of these columns and have the following code:
bwplot <- ggplot(df, aes(x = variable, y = value, color = variable)) +
stat_boxplot(geom = "errorbar") +
geom_boxplot() +
bwplot
However, the issue is the distance (space) between the boxplots for each variable is the same. Ideally, there should be very small distance between the boxplots for 1 month and 3 month. And the gap between the boxplots for 10 year and 20 year should be wide. To remedy, I have tried to convert the variables into numbers (1/12, 3/12, 6/12, 1, 2, etc.) and then tried this code:
levels(df$variable) <- c(0.83, 0.25, 0.5, 1, 2, 3, 5, 7, 10, 20, 30)
bwplot <- ggplot(df, aes(x = as.numeric(as.character(df$variable)), y = value, color = variable)) +
stat_boxplot(geom = "errorbar") +
geom_boxplot() +
bwplot
But what I am getting is only one huge boxplot for the entire time horizon followed by this warning msg:
Warning messages:
1: Continuous x aesthetic -- did you forget aes(group=...)?
If I try
group = variable
I get
Error: Continuous value supplied to discrete scale
What is the right way of doing this?
Thanks.
s<-data.frame(date=seq(as.Date("2006-02-01"), by="month", length.out=6), M1=rnorm(6,5,0.5), M3=rnorm(6,5,0.5), M6=rnorm(6,5,0.5), Y1=rnorm(6,5,0.5), Y2=rnorm(6,5,0.5), Y3=rnorm(6,5,0.5), Y10=rnorm(6,5,0.5), Y20=rnorm(6,5,0.5), Y30=rnorm(6,5,0.5))
require(ggplot2)
require(reshape2)
s.melted<-melt(s, id.var="date")
#Create an axis where the numbers represent the number of months elapsed
s.melted$xaxis <-c("M"=1, "Y"=12)[sub("(M|Y)([0-9]+)","\\1",s.melted$variable)] * as.numeric(sub("(M|Y)([0-9]+)","\\2",s.melted$variable))
s.melted[sample(1:nrow(s.melted),6),]
date variable value xaxis
23 2006-06-01 Y1 4.645595 12
38 2006-03-01 Y10 5.190710 120
25 2006-02-01 Y2 4.831788 24
50 2006-03-01 Y30 3.892580 360
39 2006-04-01 Y10 4.513831 120
31 2006-02-01 Y3 4.357127 36
# Only show the ticks for variable
bwplot <- ggplot(s.melted, aes(x = xaxis, y = value, color = variable)) +
stat_boxplot(geom = "errorbar") +
geom_boxplot() + scale_x_continuous(breaks=s.melted$xaxis,
labels=s.melted$variable)
bwplot
I have a problem trying to use different colors in my plot for two groups. I created a plot with odds ratios (including 95%CI) over a period of serveral years for 2 groups (mfin and ffin). When using the syntax below, all points and lines are black and my attempts to adjust them e.g. geom_linerange(colour=c("red","blue")) have failed (Error: Incompatible lengths for set aesthetics: colour).
Can anyone help me with this?
ggplot(rbind(data.frame(mfin, group=mfin), data.frame(ffin, group=ffin)),
aes(x = JAAR, y = ror, ymin = llror, ymax = ulror)) +
geom_linerange() +
geom_point() +
geom_hline(yintercept = 1) +
ylab("Odds ratio & 95% CI") +
xlab("") +
geom_errorbar(width=0.2)
Below are some sample data (1st group = mfin, #ND GROUP + ffin)
JAAR ror llror ulror
2008 2.00 1.49 2.51
2009 2.01 1.57 2.59
2010 2.06 1.55 2.56
2011 2.07 1.56 2.58
2012 2.19 1.70 2.69
2013 2.23 1.73 2.72
2014 2.20 1.71 2.69
2015 2.31 1.84 2.78
2016 .230 1.83 2.76
JAAR ror llror ulror
2008 1.36 0.88 1.84
2009 1.20 0.73 1.68
2010 1.16 0.68 1.64
2011 1.23 0.77 1.69
2012 1.43 1.00 1.86
2013 1.46 1.04 1.88
2014 1.49 1.07 1.90
2015 1.30 0.89 1.70
2016 1.29 0.89 1.70
You need to map the group membership variable to the color aesthetic (in the long version of the data):
library(readr)
library(dplyr)
library(ggplot2)
# simulate some data
year_min = 1985
year_max = 2016
num_years = year_max - year_min + 1
num_groups = 2
num_estimates = num_years*num_groups
df_foo = data_frame(
upper_limit = runif(n = num_estimates, min = -20, max = 20),
lower_limit = upper_limit - runif(n = num_estimates, min = 0, max = 5),
point_estimate = runif(num_estimates, min = lower_limit, max = upper_limit),
year = rep(seq(year_min, year_max), num_groups),
group = rep(c("mfin", "ffin"), each = num_years)
)
# plot the confidence intervals
df_foo %>%
ggplot(aes(x = year, y = point_estimate,
ymin = lower_limit, ymax = upper_limit,
color = group)) +
geom_point() +
geom_errorbar() +
theme_bw() +
ylab("Odds Ratio & 95% CI") +
xlab("Year") +
scale_color_discrete(name = "Group")
This produces what I think you are looking for, except the simulated data makes it look somewhat messy: