Why do I not get two legends using ggplot2? - r

I am plotting different models' prediction lines over some data points. I would like to get a legend indicating to which individual belongs each point colour and another legend indicating to which model belongs each line colour. Below I share a fake example for reproducibility:
set.seed(123)
df <- data.frame(Height =rnorm(500, mean=175, sd=15),
Weight =rnorm(500, mean=70, sd=20),
ID = rep(c("A","B","C","D"), (500/4)))
mod1 <- lmer(Height ~ Weight + (1|ID), df)
mod2 <- lmer(Height ~ poly(Weight,2) + (1|ID), df)
y.mod1 <- predict(mod1, data.frame(Weight=df$Weight),re.form=NA) # Prediction of y according to model 1
y.mod2 <- predict(mod2, data.frame(Weight=df$Weight),re.form=NA) # Prediction of y according to model 2
df <- cbind(df, y.mod1,y.mod2)
df <- as.data.frame(df)
head(df)
Height Weight ID y.mod1 y.mod2
1 166.5929 57.96214 A 175.9819 175.4918
2 171.5473 50.12603 B 176.2844 176.3003
3 198.3806 90.53570 C 174.7241 174.7082
4 176.0576 85.02123 D 174.9371 174.5487
5 176.9393 39.81667 A 176.6825 177.7303
6 200.7260 68.09705 B 175.5905 174.8027
First I plot my data points:
Plot_a <- ggplot(df,aes(x=Weight, y=Height,colour=ID)) +
geom_point() +
theme_bw() +
guides(color=guide_legend(override.aes=list(fill=NA)))
Plot_a
Then, I add lines relative to the prediction models:
Plot_b <- Plot_a +
geom_line(data = df, aes(x=Weight, y=y.mod1,color='mod1'),show.legend = T) +
geom_line(data = df, aes(x=Weight, y=y.mod2,color='mod2'),show.legend = T) +
guides(fill = guide_legend(override.aes = list(linetype = 0)),
color=guide_legend(title=c("Model")))
Plot_b
Does anyone know why I am not getting two different legends, one titled Model and the other ID?
I would like to get this

This type of problems generaly has to do with reshaping the data. The format should be the long format and the data is in wide format. See this post on how to reshape the data from long to wide format.
The plot layers become simpler, one geom_line is enough and there is no need for guideto override the aesthetics.
To customize the models' legend text, create a vector of legends, in this case with plotmath, in order to have math notation. And the colors are set manually too.
library(dplyr)
library(tidyr)
library(ggplot2)
model_labels <- c(expression(X^1), expression(X^2))
df %>%
pivot_longer(
cols = c(y.mod1, y.mod2),
names_to = "Model",
values_to = "Value"
) %>%
ggplot(aes(Weight, Height)) +
geom_point(aes(fill = ID), shape = 21) +
geom_line(aes(y = Value, color = Model)) +
scale_color_manual(labels = model_labels,
values = c("coral", "coral4")) +
theme_bw()

The issue is that in ggplot2 each aesthetic can only have one scale and only one legend. As you are using only the color aes you get one legend. If you want multiple legends for the same aesthetic have a look at the ggnewscales package. Otherwise you have to make use of a second aesthetic.
My preferred approach would be similar to the one proposed by #RuiBarradas. However, to stick close to your approach this could be achieved like so:
Instead of color map on linetype in your calls to geom_line.
Set the colors for the lines as arguments, i.e. not inside aes.
Make use of scale_linetype_manual to get solid lines for both models.
Make use of guide_legend to fix the colors appearing in the legend
library(ggplot2)
library(lme4)
#> Loading required package: Matrix
set.seed(123)
df <- data.frame(Height =rnorm(500, mean=175, sd=15),
Weight =rnorm(500, mean=70, sd=20),
ID = rep(c("A","B","C","D"), (500/4)))
mod1 <- lmer(Height ~ Weight + (1|ID), df)
mod2 <- lmer(Height ~ poly(Weight,2) + (1|ID), df)
y.mod1 <- predict(mod1, data.frame(Weight=df$Weight),re.form=NA) # Prediction of y according to model 1
y.mod2 <- predict(mod2, data.frame(Weight=df$Weight),re.form=NA) # Prediction of y according to model 2
df <- cbind(df, y.mod1,y.mod2)
df <- as.data.frame(df)
Plot_a <- ggplot(df) +
geom_point(aes(x=Weight, y=Height, colour=ID)) +
theme_bw() +
guides(color=guide_legend(override.aes=list(fill=NA)))
line_colors <- scales::hue_pal()(2)
Plot_b <- Plot_a +
geom_line(aes(x=Weight, y=y.mod1, linetype = "mod1"), color = line_colors[1]) +
geom_line(aes(x=Weight, y=y.mod2, linetype = "mod2"), color = line_colors[2]) +
scale_linetype_manual(values = c(mod1 = "solid", mod2 = "solid")) +
labs(color = "ID", linetype = "Model") +
guides(linetype = guide_legend(override.aes = list(color = line_colors)))
Plot_b

Related

How to add "prediction" lines to an existing ggplot without colouring by group of the previous ggplot?

I have some data for which I run linear mixed-effect models using different polynomial degrees. I want to show in a plot the points representing my data and colouring by ID and also the prediction lines using the different polynomial degrees. However, for the prediction lines, I don't want to consider the ID. Below I show a reproducible example:
library(ggplot2)
library(lme4)
set.seed(123)
df <- data.frame(Height =rnorm(500, mean=175, sd=15),
Weight =rnorm(500, mean=70, sd=20),
ID = rep(c("A","B","C","D"), (500/4)))
mod1 <- lmer(Height ~ Weight + (1|ID), df)
mod2 <- lmer(Height ~ poly(Weight,2) + (1|ID), df)
y.mod1 <- predict(mod1, data.frame(Weight=df$Weight),re.form=NA) # Prediction of y according to model 1
y.mod2 <- predict(mod2, data.frame(Weight=df$Weight),re.form=NA) # Prediction of y according to model 2
df <- cbind(df, y.mod1,y.mod2)
df <- as.data.frame(df)
head(df)
Height Weight ID
1 166.5929 57.96214 A
2 171.5473 50.12603 B
3 198.3806 90.53570 C
4 176.0576 85.02123 D
5 176.9393 39.81667 A
6 200.7260 68.09705 B
What I did was, first, plot my data points:
Plot_a <- ggplot(df,aes(x=Weight, y=Height,colour=ID)) +
geom_point() +
theme_bw() +
guides(color=guide_legend(override.aes=list(fill=NA)))
Plot_a
Then, I plotted the "prediction lines" in my plot:
Plot_b <- Plot_a +
geom_line(data = df, aes(x=Weight, y=y.mod1)) +
geom_line(data = df, aes(x=Weight, y=y.mod2))
Plot_b
However, as you can see, the lines are coloured with different colours. I guess it is due to the ID factor. Does anyone know how to plot the lines without colouring using ID?
You can set independent colors for lines like this:
library(ggplot2)
#Code
#Plot 2
Plot_b <- Plot_a +
geom_line(data = df, aes(x=Weight, y=y.mod1),color='black') +
geom_line(data = df, aes(x=Weight, y=y.mod2),color='red')
Output:
As additional element, if you want two legends you can play with fill and color options in aes(). Here the code for that approach:
#Plot3
Plot_b <- Plot_a +
geom_line(data = df, aes(x=Weight, y=y.mod1,color='mod1'),show.legend = T) +
geom_line(data = df, aes(x=Weight, y=y.mod2,color='mod2'),show.legend = T) +
guides(color=guide_legend(title="Model"))
Output:
If you want to change colors you can check the options in scale_color_manual().
And for further customization:
#Plot 4
Plot_b <- Plot_a +
geom_line(data = df, aes(x=Weight, y=y.mod1,color='mod1'),show.legend = T) +
geom_line(data = df, aes(x=Weight, y=y.mod2,color='mod2'),show.legend = T) +
guides(fill = guide_legend(override.aes = list(linetype = 0)),
color=guide_legend(title="Model"))
Output:

annotate r squared to ggplot by using facet_wrap

I just joined the community and looking forward to get some help for the data analysis for my master thesis.
At the moment I have the following problem:
I plotted 42 varieties with ggplot by using facet_wrap:
`ggplot(sumfvvar,aes(x=TemperaturCmean,y=Fv.Fm,col=treatment))+
geom_point(shape=1,size=1)+
geom_smooth(method=lm)+
scale_color_brewer(palette = "Set1")+
facet_wrap(.~Variety)`
That works very well, but I would like to annotate the r squared values for the regression lines. I have two treatments and 42 varieties, therefore 84 regression lines.
Are there any possibilties to calculate all r squared values and integrate them into the ggplot? I found allready the function
ggplotRegression <- function (fit) {
require(ggplot2)
ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) +
geom_point() +
stat_smooth(method = "lm") +
labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
"Intercept =",signif(fit$coef[[1]],5 ),
" Slope =",signif(fit$coef[[2]], 5),
" P =",signif(summary(fit)$coef[2,4], 5)))
}
but that works just for one variety and one treatment. Could be a loop for the lm() function an option?
Here is an example with the ggpmisc package:
library(ggpmisc)
set.seed(4321)
x <- 1:100
y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
my.data <- data.frame(x = x,
y = y,
group = c("A", "B"))
formula <- y ~ poly(x, 1, raw = TRUE)
ggplot(my.data, aes(x, y)) +
facet_wrap(~ group) +
geom_point() +
geom_smooth(method = "lm", formula = formula) +
stat_poly_eq(formula = formula, parse = TRUE,
mapping = aes(label = stat(rr.label)))
You can't apply different labels to different facet, unless you add another r^2 column to your data.. One way is to use geom_text, but you need to calculate the stats you need first. Below I show an example with iris, and for your case, just change Species for Variety, and so on
library(tidyverse)
# simulate data for 2 treatments
# d2 is just shifted up from d1
d1 <- data.frame(iris,Treatment="A")
d2 <- data.frame(iris,Treatment="B") %>%
mutate(Sepal.Length=Sepal.Length+rnorm(nrow(iris),1,0.5))
# combine datasets
DF <- rbind(d1,d2) %>% rename(Variety = Species)
# plot like you did
# note I use "free" scales, if scales very different between Species
# your facet plots will be squished
g <- ggplot(DF,aes(x=Sepal.Width,y=Sepal.Length,col=Treatment))+
geom_point(shape=1,size=1)+
geom_smooth(method=lm)+
scale_color_brewer(palette = "Set1")+
facet_wrap(.~Variety,scales="free")
# rsq function
RSQ = function(y,x){signif(summary(lm(y ~ x))$adj.r.squared, 3)}
#calculate rsq for variety + treatment
STATS <- DF %>%
group_by(Variety,Treatment) %>%
summarise(Rsq=RSQ(Sepal.Length,Sepal.Width)) %>%
# make a label
# one other option is to use stringr::str_wrap in geom_text
mutate(Label=paste("Treat",Treatment,", Rsq=",Rsq))
# set vertical position of rsq
VJUST = ifelse(STATS$Treatment=="A",1.5,3)
# finally the plot function
g + geom_text(data=STATS,aes(x=-Inf,y=+Inf,label=Label),
hjust = -0.1, vjust = VJUST,size=3)
For the last geom_text() call, I allowed the y coordinates of the text to be different by multiplying the Treatment.. You might need to adjust that depending on your plot..

How do I facet by geom / layer in ggplot2?

I'm hoping to recreate the gridExtra output below with ggplot's facet_grid, but I'm unsure of what variable ggplot identifies with the layers in the plot. In this example, there are two geoms...
require(tidyverse)
a <- ggplot(mpg)
b <- geom_point(aes(displ, cyl, color = drv))
c <- geom_smooth(aes(displ, cyl, color = drv))
d <- a + b + c
# output below
gridExtra::grid.arrange(
a + b,
a + c,
ncol = 2
)
# Equivalent with gg's facet_grid
# needs a categorical var to iter over...
d$layers
#d + facet_grid(. ~ d$layers??)
The gridExtra output that I'm hoping to recreate is:
A hacky way of doing this is to take the existing data frame and create two, three, as many copies of the data frame you need with a value linked to it to be used for the facet and filtering later on. Union (or rbind) the data frames together into one data frame. Then set up the ggplot and geoms and filter each geom for the desired attribute. Also for the facet use the existing attribute to split the plots.
This can be seen below:
df1 <- data.frame(
graph = "point_plot",
mpg
)
df2 <- data.frame(
graph = "spline_plot",
mpg
)
df <- rbind(df1, df2)
ggplot(df, mapping = aes(x = displ, y = hwy, color = class)) +
geom_point(data = filter(df, graph == "point_plot")) +
geom_smooth(data = filter(df, graph == "spline_plot"), se=FALSE) +
facet_grid(. ~ graph)
If you really want to show different plots on different facets, one hacky way would be to make separate copies of the data and subset those...
mpg2 <- mpg %>% mutate(facet = 1) %>%
bind_rows(mpg %>% mutate(facet = 2))
ggplot(mpg2, aes(displ, cyl, color = drv)) +
geom_point(data = subset(mpg2, facet == 1)) +
geom_smooth(data = subset(mpg2, facet == 2)) +
facet_wrap(~facet)

Group alluvia in the R alluvial diagram

In the alluvial package, is it possible to combine those alluvia that have the same source and target nodes? For example the two dark alluvia in the image below, that both go through AB and 3.
Edit: Here is an example using the Titanic dataset, which shows the same behaviour:
# Titanic data
tit <- as.data.frame(Titanic)
tit3d <- aggregate( Freq ~ Class + Sex + Survived, data=tit, sum)
ord <- list(NULL, with(tit3d, order(Sex, Survived)), NULL)
alluvial(tit3d[,1:3], freq=tit3d$Freq, alpha=1, xw=0.2,
col=ifelse( tit3d$Survived == "No", "red", "gray"),
layer = tit3d$Sex != "Female",
border="white", ordering=ord)
It looks like the ggalluvial package as a geom_flow which resets at each category break. That might be more of what you want. For example
# reshape data
library(dplyr)
library(tidyr)
dd <- tit3d %>% mutate(id=1:n(), sc=Survived) %>%
gather("category", "value", -c(id, Freq, sc))
# draw plot
ggplot(dd, aes(x=category, stratum=value, alluvium = id,
label=value))+
geom_flow(aes(fill=sc)) +
geom_stratum(alpha = .5) + geom_text(stat = "stratum", size = 3) +
theme_minimal()

Insert line on facet_grid margin facets only

I want to put a regression line on the marginal facets of a facet_grid, but I come up with a strange quirk where there are redundant lines on non-marginal facets also.
library(ggplot2)
library(plyr)
data(diamonds)
Use plyr to build a data frame with the slopes and intercepts
regdf <- ddply(diamonds, .(cut), function(i)
lm(price ~ carat, data = i)$coefficients[1:2])
resolve some naming issues
regdf$color <- "(all)"
names(regdf)[2] <- "intercept"
p1 <- ggplot() + geom_point(aes(carat, price), data = diamonds, alpha = .4) +
facet_grid(color ~ cut, margins = T) +
geom_abline(aes(intercept = intercept, slope = carat), color = "red", data = regdf)
why do i get those superfluous lines on the D color row, and why are there numerous lines on some of those facets?

Resources