How to add legend of boxplot and points in ggplot2? - r

I have the following to plot a boxplot of some data "Samples" and add points of the "Baseline" and "Theoretical" data.
library(reshape2)
library(ggplot2)
meltshear <- melt(Shear)
samples <- rep(c("Samples"), each = 10)
baseline <- c("Baseline",samples)
method <- rep(baseline, 4)
xlab <- rep(c("EXT.Single","EXT.Multi","INT.Single","INT.Multi"), each = 11)
plotshear <- data.frame(Source = c(method,"theoretical","theoretical","theoretical"),
Shear = c(xlab,"EXT.Multi","INT.Single","INT.Multi"),
LLDF = c(meltshear[,2],0.825,0.720,0.884))
data <- subset(plotshear, Source %in% c("Samples"))
baseline <- subset(plotshear, Source %in% c("Baseline"))
theoretical <- subset(plotshear, Source %in% c("theoretical"))
ggplot(data = data, aes(x = Shear, y = LLDF)) + geom_boxplot(outlier.shape = NA) +
stat_summary(fun = mean, geom="point", shape=23, size=3) +
stat_boxplot(geom='errorbar', linetype=1, width=0.5) +
geom_jitter(data = baseline, colour = "green4") +
geom_jitter(data = theoretical, colour = "red")
I get the following plot but I cannot add the legend to the plot. I want to have the legend showing labels = c("Samples","Baseline","Theoretical") for the boxplot shape, green dot, and red dot respectively.

You could try to add fill into aes.
ggplot(data = data, aes(x = Shear, y = LLDF, fill = Shear))
Or you can see this resource, maybe it is useful http://www.cookbook-r.com/Graphs/

Related

How to assign colors to multicolor scatter plot with multicolor fitted lines in ggplot2

Problem
I have some data points stored in data.frame with three variables, x, y, and gender. My goal is to draw several generally fitted lines and also lines specifically fitted for male/female over the scatter plot, with points coloured by gender. It sounds easy but some issues just persist.
What I currently do is to use a new set of x's and predict y's for every model, combine the fitted lines together in a data.frame, and then convert wide to long, with their model name as the third var (from this post: ggplot2: how to add the legend for a line added to a scatter plot? and this: Add legend to ggplot2 line plot I learnt that mapping should be used instead of setting colours/legends separately). However, while I can get a multicolor line plot, the points come without specific colour for gender (already a factor) as I expected from the posts I referenced.
I also know it might be possible to use aes=(y=predict(model)), but I met other problems for this. I also tried to colour the points directly in aes, and assign colours separately for each line, but the legend cannot be generated unless I use lty, which makes legend in the same colour.
Would appreciate any idea, and also welcome to change the whole method.
Code
Note that two pairs of lines overlap. So it only appeared to be two lines. I guess adding some jitter in the data might make it look differently.
slrmen<-lm(tc~x+I(x^2),data=data[data['gender']==0,])
slrwomen<-lm(tc~x+I(x^2),data=data[data['gender']==1,])
prdf <- data.frame(x = seq(from = range(data$x)[1],
to = range(data$x)[2], length.out = 100),
gender = as.factor(rep(1,100)))
prdm <- data.frame(x = seq(from = range(data$x)[1],
to = range(data$x)[2], length.out = 100),
gender = as.factor(rep(0,100)))
prdf$fit <- predict(fullmodel, newdata = prdf)
prdm$fit <- predict(fullmodel, newdata = prdm)
rawplotdata<-data.frame(x=prdf$x, fullf=prdf$fit, fullm=prdm$fit,
linf=predict(slrwomen, newdata = prdf),
linm=predict(slrmen, newdata = prdm))
plotdata<-reshape2::melt(rawplotdata,id.vars="x",
measure.vars=c("fullf","fullm","linf","linm"),
variable.name="fitmethod", value.name="y")
plotdata$fitmethod<-as.factor(plotdata$fitmethod)
plt <- ggplot() +
geom_line(data = plotdata, aes(x = x, y = y, group = fitmethod,
colour=fitmethod)) +
scale_colour_manual(name = "Fit Methods",
values = c("fullf" = "lightskyblue",
"linf" = "cornflowerblue",
"fullm"="darkseagreen", "linm" = "olivedrab")) +
geom_point(data = data, aes(x = x, y = y, fill = gender)) +
scale_fill_manual(values=c("blue","green")) ## This does not work as I expected...
show(plt)
Code for another method (omitted two lines), which generates same-colour legend and multi-color plot:
ggplot(data = prdf, aes(x = x, y = fit)) + # prdf and prdm are just data frames containing the x's and fitted values for different models
geom_line(aes(lty="Female"),colour = "chocolate") +
geom_line(data = prdm, aes(x = x, y = fit, lty="Male"), colour = "darkblue") +
geom_point(data = data, aes(x = x, y = y, colour = gender)) +
scale_colour_discrete(name="Gender", breaks=c(0,1),
labels=c("Male","Female"))
This is related to using the colour aesthetic for lines and the fill aesthetics for points in your own (first) example. In the second example, it works because the colour aesthetic is used for lines and points.
By default, geom_point can not map a variable to fill, because the default point shape (19) doesn't have a fill.
For fill to work on points, you have to specify shape = 21:25 in geom_point(), outside of aes().
Perhaps this small reproducible example helps to illustrate the point:
Simulate data
set.seed(4821)
x1 <- rnorm(100, mean = 5)
set.seed(4821)
x2 <- rnorm(100, mean = 6)
data <- data.frame(x = rep(seq(20,80,length.out = 100),2),
tc = c(x1, x2),
gender = factor(c(rep("Female", 100), rep("Male", 100))))
Fit models
slrmen <-lm(tc~x+I(x^2), data = data[data["gender"]=="Male",])
slrwomen <-lm(tc~x+I(x^2),data = data[data["gender"]=="Female",])
newdat <- data.frame(x = seq(20,80,length.out = 200))
fitted.male <- data.frame(x = newdat,
gender = "Male",
tc = predict(object = slrmen, newdata = newdat))
fitted.female <- data.frame(x = newdat,
gender = "Female",
tc = predict(object = slrwomen, newdata = newdat))
Plot using colour aesthetics
Use the colour aesthetics for both points and lines (specify in ggplot such that it gets inherited throughout). By default, geom_point can map a variable to colour.
library(ggplot2)
ggplot(data, aes(x = x, y = tc, colour = gender)) +
geom_point() +
geom_line(data = fitted.male) +
geom_line(data = fitted.female) +
scale_colour_manual(values = c("tomato","blue")) +
theme_bw()
Plot using colour and fill aesthetics
Use the fill aesthetics for points and the colour aesthetics for lines (specify aesthetics in geom_* to prevent them being inherited). This will reproduce the problem.
ggplot(data, aes(x = x, y = tc)) +
geom_point(aes(fill = gender)) +
geom_line(data = fitted.male, aes(colour = gender)) +
geom_line(data = fitted.female, aes(colour = gender)) +
scale_colour_manual(values = c("tomato","blue")) +
scale_fill_manual(values = c("tomato","blue")) +
theme_bw()
To fix this, change the shape argument in geom_point to a point shape that can be filled (21:25).
ggplot(data, aes(x = x, y = tc)) +
geom_point(aes(fill = gender), shape = 21) +
geom_line(data = fitted.male, aes(colour = gender)) +
geom_line(data = fitted.female, aes(colour = gender)) +
scale_colour_manual(values = c("tomato","blue")) +
scale_fill_manual(values = c("tomato","blue")) +
theme_bw()
Created on 2021-09-19 by the reprex package (v2.0.1)
Note that the scales for colour and fill get merged automatically if the same variable is mapped to both aesthetics.
It seems to me that what you really want to do is use ggplot2::stat_smooth instead of trying to predict yourself.
Borrowing the data from #scrameri:
ggplot(data, aes(x = x, y = tc, color = gender)) +
geom_point() +
stat_smooth(aes(linetype = "X^2"), method = 'lm',formula = y~x + I(x^2)) +
stat_smooth(aes(linetype = "X^3"), method = 'lm',formula = y~x + I(x^2) + I(x^3)) +
scale_color_manual(values = c("darkseagreen","lightskyblue"))

How to add line to point shapes in ggplot2 legend

I want to create a black and white plot using ggplot2, where the data is plotted by category using a combination of lines and points. However, the legend only shows the point shape, with no line running through it, unless I add color to the plot.
Here is some example data to illustrate the problem with:
## Create example data
set.seed(123)
dat <- data.frame(
time_period = rep(1:4, each = 3),
category = rep(LETTERS[1:3], 4),
y = rnorm(12)
)
Here is an example of a color plot, so you can see how I want the legend to look:
library(ggplot2)
## Generate plot with color
ggplot(data = dat, mapping = aes(x = time_period, y = y, color = category)) +
geom_line(aes(group = category)) +
geom_point(aes(shape = category), size = 2) +
theme_bw()
However, if I move to grayscale (which I need to be able to do), the line running through the point in the legend disappears, which I'd like to avoid:
## Generate plot without color
ggplot(data = dat, mapping = aes(x = time_period, y = y)) +
geom_line(aes(group = category)) +
geom_point(aes(shape = category), size = 2) +
theme_bw()
How can I add a line through the point symbols in the legend with a grayscale plot?
I would suggest this approach:
#Plot
ggplot(data = dat, mapping = aes(x = time_period, y = y,group = category,shape = category)) +
geom_line(color='gray',show.legend = T) +
geom_point(size = 2) +
theme_bw()
Output:

How do i manually add a legend to a ggplot and geom_point?

I am plotting 2 sets of data on the same plot using ggplot. I have specified the colour for each data set, but there is no legend that comes out when the dot plot is generated.
What can i do to manually add a legend?
# Create an index to hold values of m from 1 to 100
m_index <- (1:100)
data_frame_50 <- data(prob_max_abs_cor_50)
data_frame_20 <- data.frame(prob_max_abs_cor_20)
library(ggplot2)
plot1 <- ggplot(data_frame_50, mapping = aes(x = m_index,
y = prob_max_abs_cor_50),
colour = 'red') +
geom_point() +
ggplot(data_frame_20, mapping = aes(x = m_index,
y = prob_max_abs_cor_20),
colour = 'blue') +
geom_point()
plot1 + labs(x = " Values of m ",
y = " Maximum Absolute Correlation ",
title = "Dot plot of probability")
First, I would suggest neatening your ggplot code a little. This is equivalent to your posted code;
ggplot() +
geom_point(data = data_frame_50, aes(x = m_index, y = prob_max_abs_cor_50,
colour = 'red')) +
geom_point(data = data_frame_20, aes(x = m_index, y = prob_max_abs_cor_20,
colour = 'blue')) +
labs(x = " Values of m ", y = " Maximum Absolute Correlation ",
title = "Dot plot of probability")
You won't get a legend here, because you are plotting different datasets with only one category in each. You need to have a single dataset with a column grouping your data (i.e. 20 or 50). So using some example data, this is the equivalent of what you are plotting and ggplot won't provide a legend;
ggplot() +
geom_point(data = iris, aes(x = Sepal.Length, y = Petal.Width), colour = 'red') +
geom_point(data = iris, aes(x = Sepal.Length, y = Petal.Length), colour = 'blue')
If you want to colour by category, include a colour argument inside the aes call;
ggplot() +
geom_point(data = iris, aes(x = Sepal.Length, y = Petal.Width,
colour = factor(Species)))
Have a look at the iris dataset to get a sense of how you need to shape your data. It's hard to give precise advice, because you haven't provided an idea of what your data look like, but something like this might work;
df.20 <- data.frame("m" = 1:100, "Group" = 20, "Numbers" = prob_max_abs_cor_20)
df.50 <- data.frame("m" = 1:100, "Group" = 50, "Numbers" = prob_max_abs_cor_50)
df.All <- rbind(df.20, df.50)

ggplot2 confusion matrix geom_text labeling

I've plotted a confusion matrix (predicting 5 outcomes) in R using ggplot and scales for geom_text labeling.
The way geom_text(aes(label = percent(Freq/sum(Freq))) is written in code, it's showing Frequency of each box divided by sum of all observations, but what I want to do is get Frequency of each box divided by sum Frequency for each Reference.
In other words, instead of A,A = 15.8%,
it should be A,A = 15.8%/(0.0%+0.0%+0.0%+0.0%+15.8%%) = 100.0%
library(ggplot2)
library(scales)
valid_actual <- as.factor(c("A","B","B","C","C","C","E","E","D","D","A","A","A","E","E","D","D","C","B"))
valid_pred <- as.factor(c("A","B","C","C","E","C","E","E","D","B","A","B","A","E","D","E","D","C","B"))
cfm <- confusionMatrix(valid_actual, valid_pred)
ggplotConfusionMatrix <- function(m){
mytitle <- paste("Accuracy", percent_format()(m$overall[1]),
"Kappa", percent_format()(m$overall[2]))
p <-
ggplot(data = as.data.frame(m$table) ,
aes(x = Reference, y = Prediction)) +
geom_tile(aes(fill = log(Freq)), colour = "white") +
scale_fill_gradient(low = "white", high = "green") +
geom_text(aes(x = Reference, y = Prediction, label = percent(Freq/sum(Freq)))) +
theme(legend.position = "none") +
ggtitle(mytitle)
return(p)
}
ggplotConfusionMatrix(cfm)
The problem is that, as far as I know, ggplot is not able to do group calculation. See this recent post for similar question.
To solve your problem you should take advantage of the dplyrpackage.
This should work
library(ggplot2)
library(scales)
library(caret)
library(dplyr)
valid_actual <- as.factor(c("A","B","B","C","C","C","E","E","D","D","A","A","A","E","E","D","D","C","B"))
valid_pred <- as.factor(c("A","B","C","C","E","C","E","E","D","B","A","B","A","E","D","E","D","C","B"))
cfm <- confusionMatrix(valid_actual, valid_pred)
ggplotConfusionMatrix <- function(m){
mytitle <- paste("Accuracy", percent_format()(m$overall[1]),
"Kappa", percent_format()(m$overall[2]))
data_c <- mutate(group_by(as.data.frame(m$table), Reference ), percentage =
percent(Freq/sum(Freq)))
p <-
ggplot(data = data_c,
aes(x = Reference, y = Prediction)) +
geom_tile(aes(fill = log(Freq)), colour = "white") +
scale_fill_gradient(low = "white", high = "green") +
geom_text(aes(x = Reference, y = Prediction, label = percentage)) +
theme(legend.position = "none") +
ggtitle(mytitle)
return(p)
}
ggplotConfusionMatrix(cfm)
And the result:

Have separate legends for a set of point-line plots, and a vertical line plot

Example data frame (if there's a better/more idiomatic way to do this, let me know):
n <- 10
group <- rep(c("A","B","C"),each = n)
x <- rep(seq(0,1,length = n),3)
y <- ifelse(group == "A",1+x,ifelse(group == "B",2+2*x,3+3*x))
df <- data.frame(group,x,y)
xd <- 0.5
des <- data.frame(xd)
I want to plot create point-line plots for the data in df, add a vertical curve at the x location indicated by xd, and get readable legends for both. I tried the following:
p <- ggplot(data = df, aes(x = x, y = y, color = group)) + geom_point() + geom_line(aes(linetype=group))
p <- p + geom_vline(data = des, aes(xintercept = xd), color = "blue")
p
Not quite what I had in mind, there's no legend for the vertical line.
A small modification (I don't understand why geom_vline is one of the few geometries with a show.legend parameter, which moreover defaults to FALSE!):
p <- ggplot(data = df, aes(x = x, y = y, color = group)) + geom_point() + geom_line(aes(linetype=group))
p <- p + geom_vline(data = des, aes(xintercept = xd), color = "blue", show.legend = TRUE)
p
At least now the vertical bar is showing in the legend, but I don't want it to go in the same "category" (?) as group. I would like another legend entry, titled Design, and containing only the vertical line. How can I achieve this?
A possible approach is to add an extra dummy aesthetic like fill =, which we'll subsequently use to create the second legend in combination with scale_fill_manual() :
ggplot(data = df, aes(x = x, y = y, color = group)) +
geom_point() +
geom_line(aes(linetype=group), show.legend = TRUE) +
geom_vline(data = des,
aes(xintercept = xd, fill = "Vertical Line"), # add dummy fill
colour = "blue") +
scale_fill_manual(values = 1, "Design", # customize second legend
guide = guide_legend(override.aes = list(colour = c("blue"))))

Resources