I need to add a legend of the two lines (best fit line and 45 degree line) on TOP of my two plots. Sorry I don't know how to add plots! Please please please help me, I really appreciate it!!!!
Here is an example
type=factor(rep(c("A","B","C"),5))
xvariable=seq(1,15)
yvariable=2*xvariable+rnorm(15,0,2)
newdata=data.frame(type,xvariable,yvariable)
p = ggplot(newdata,aes(x=xvariable,y=yvariable))
p+geom_point(size=3)+ facet_wrap(~ type) +
geom_abline(intercept =0, slope =1,color="red",size=1)+
stat_smooth(method="lm", se=FALSE,size=1)
Here is another approach which uses aesthetic mapping to string constants to identify different groups and create a legend.
First an alternate way to create your test data (and naming it DF instead of newdata)
DF <- data.frame(type = factor(rep(c("A", "B", "C"), 5)),
xvariable = 1:15,
yvariable = 2 * (1:15) + rnorm(15, 0, 2))
Now the ggplot code. Note that for both geom_abline and stat_smooth, the colour is set inside and aes call which means each of the two values used will be mapped to a different color and a guide (legend) will be created for that mapping.
ggplot(DF, aes(x = xvariable, y = yvariable)) +
geom_point(size = 3) +
geom_abline(aes(colour="one-to-one"), intercept =0, slope = 1, size = 1) +
stat_smooth(aes(colour="best fit"), method = "lm", se = FALSE, size = 1) +
facet_wrap(~ type) +
scale_colour_discrete("")
Try this:
# original data
type <- factor(rep(c("A", "B", "C"), 5))
x <- 1:15
y <- 2 * x + rnorm(15, 0, 2)
df <- data.frame(type, x, y)
# create a copy of original data, but set y = x
# this data will be used for the one-to-one line
df2 <- data.frame(type, x, y = x)
# bind original and 'one-to-one data' together
df3 <- rbind.data.frame(df, df2)
# create a grouping variable to separate stat_smoothers based on original and one-to-one data
df3$grp <- as.factor(rep(1:2, each = nrow(df)))
# plot
# use original data for points
# use 'double data' for abline and one-to-one line, set colours by group
ggplot(df, aes(x = x, y = y)) +
geom_point(size = 3) +
facet_wrap(~ type) +
stat_smooth(data = df3, aes(colour = grp), method = "lm", se = FALSE, size = 1) +
scale_colour_manual(values = c("red","blue"),
labels = c("abline", "one-to-one"),
name = "") +
theme(legend.position = "top")
# If you rather want to stack the two keys in the legend you can add:
# guide = guide_legend(direction = "vertical")
#...as argument in scale_colour_manual
Please note that this solution does not extrapolate the one-to-one line outside the range of your data, which seemed to be the case for the original geom_abline.
Related
I am trying to plot my date in ggplot like this: , with line type been determined by the p values of the smooth lines (i.e., dash line if the regression is not significant, and solid line when it is). Before I post this question, I tried this answer in this forum, but they normally deal with labels, not the line itself.
Belwo is my failure code with sample data. Thanks in advance for your kind help.
library(plyr)
library(ggplot2)
dat <- data.frame(id = 1: 100,
x = rnorm(100,2,0.5),
y = rnorm(100, 20, 5),
varA = rep(letters[1:4], 25),
varB = factor(sample(c(50,100,150), 100, TRUE)))
pvdat <- ddply(dat,.(varA,varB), function(df) data.frame(pvalue=format(signif(summary(lm(y~x,data=df))[[4]][2, 4], 2),scientific=-2),
lty = ifelse(summary(lm(y~x,data=df))[[4]][2, 4] > 0.05, 0, 1)))
ggplot(data= dat, aes(x = x, y = y, col = as.factor(varB))) + geom_smooth(method = "lm", aes(linetype = pvdat$lty)) + facet_grid(. ~ as.factor(varA), scale = "free_x")
There are two problems here:
pvdata$lty is continuous, but linetype requires a factor
pvdata has ten items but dat has 100, so ggplot does not know how to make a mapping between the two
To change your numeric column to a factor, you need as.factor(), and to make the mapping you can use the merge() function to make a single data frame with the values from pvdat mapped for each element of dat. Putting these together:
ggplot(data= merge(dat,pvdat,by = c("varA","varB")), aes(x = x, y = y, col = as.factor(varB))) + geom_smooth(method = "lm", aes(linetype = as.factor(lty))) + facet_grid(. ~ as.factor(varA), scale = "free_x")
will solve your problem.
Suppose I have the following plot in ggplot:
It was generated using the code below:
x <- seq(0, 10, by = 0.2)
y1 <- sin(x)
y2 <- cos(x)
y3 <- cos(x + pi / 4)
y4 <- sin(x + pi / 4)
df1 <- data.frame(x, y = y1, Type = as.factor("sin"), Method = as.factor("method1"))
df2 <- data.frame(x, y = y2, Type = as.factor("cos"), Method = as.factor("method1"))
df3 <- data.frame(x, y = y3, Type = as.factor("cos"), Method = as.factor("method2"))
df4 <- data.frame(x, y = y4, Type = as.factor("sin"), Method = as.factor("method2"))
df.merged <- rbind(df1, df2, df3, df4)
ggplot(df.merged, aes(x, y, colour = interaction(Type, Method), linetype = Method, shape = Type)) + geom_line() + geom_point()
I would like to have only one legend that correctly displays the shapes, the colors and the line types (the interaction(Type, Method) legends is the closest to what I would like, but it does not have the correct shapes/line types).
I know that if I use scale_xxx_manual and I specify the same labels for all legends they will be merged, but I don't want to have to set the labels manually: if there are new Methods or Types, I don't want to have to modify my code: a want something generic.
Edit
As pointed in answers below, there are several ways to get the job done in this particular case. All proposed solutions require to manually set the legend line types and shapes, either by using scale_xxx_manual functions or with guides function.
However, the proposed solutions still don't work in the general case: for instance, if I add a new data frame to the data set with a new "method3" Method, it does not work anymore, we have to manually add the new legend shapes and line types:
y5 <- sin(x - pi / 4)
df5 <- data.frame(x, y = y5, Type = as.factor("sin"), Method = as.factor("method3"))
df.merged <- rbind(df1, df2, df3, df4, df5)
override.shape <- c(16, 17, 16, 17, 16)
override.linetype <- c(1, 1, 3, 3, 4)
g <- ggplot(df.merged, aes(x, y, colour = interaction(Type, Method), linetype = Method, shape = Type)) + geom_line() + geom_point()
g <- g + guides(colour = guide_legend(override.aes = list(shape = override.shape, linetype = override.linetype)))
g <- g + scale_shape(guide = FALSE)
g <- g + scale_linetype(guide = FALSE)
print(g)
This gives:
Now the question is: how to automatically generate the override.shape and override.linetype vectors?
Note that the vector size is 5 because we have 5 curves, while the interaction(Type, Method) factor has size 6 (I don't have data for the cos/method3 combination)
Use labs() and set the same value for all aesthetics defining the appearance of geoms.
library('ggplot2')
ggplot(iris) +
aes(x = Sepal.Length, y = Sepal.Width,
color = Species, linetype = Species, shape = Species) +
geom_line() +
geom_point() +
labs(color = "Guide name", linetype = "Guide name", shape = "Guide name")
The R Cookbook section on Legends explains:
If you use both colour and shape, they both need to be given scale
specifications. Otherwise there will be two two separate legends.
In your case you need specifications for shape and linetype.
Edit
It was important to have the same data creating the shapes colors and lines, I combined your interaction phase by defining the column directly. Instead of scale_linetype_discrete to create the legend, I used scale_linetype_manual to specify the values since they will take on four different values by default.
If you would like a detailed layout of all possible shapes and line types, check this R Graphics site to see all of the number identifiers:
df.merged$int <- paste(df.merged$Type, df.merged$Method, sep=".")
ggplot(df.merged, aes(x, y, colour = int, linetype=int, shape=int)) +
geom_line() +
geom_point() +
scale_colour_discrete("") +
scale_linetype_manual("", values=c(1,2,1,2)) +
scale_shape_manual("", values=c(17,17,16,16))
Here is the solution in the general case:
# Create the data frames
x <- seq(0, 10, by = 0.2)
y1 <- sin(x)
y2 <- cos(x)
y3 <- cos(x + pi / 4)
y4 <- sin(x + pi / 4)
y5 <- sin(x - pi / 4)
df1 <- data.frame(x, y = y1, Type = as.factor("sin"), Method = as.factor("method1"))
df2 <- data.frame(x, y = y2, Type = as.factor("cos"), Method = as.factor("method1"))
df3 <- data.frame(x, y = y3, Type = as.factor("cos"), Method = as.factor("method2"))
df4 <- data.frame(x, y = y4, Type = as.factor("sin"), Method = as.factor("method2"))
df5 <- data.frame(x, y = y5, Type = as.factor("sin"), Method = as.factor("method3"))
# Merge the data frames
df.merged <- rbind(df1, df2, df3, df4, df5)
# Create the interaction
type.method.interaction <- interaction(df.merged$Type, df.merged$Method)
# Compute the number of types and methods
nb.types <- nlevels(df.merged$Type)
nb.methods <- nlevels(df.merged$Method)
# Set the legend title
legend.title <- "My title"
# Initialize the plot
g <- ggplot(df.merged, aes(x,
y,
colour = type.method.interaction,
linetype = type.method.interaction,
shape = type.method.interaction)) + geom_line() + geom_point()
# Here is the magic
g <- g + scale_color_discrete(legend.title)
g <- g + scale_linetype_manual(legend.title,
values = rep(1:nb.types, nb.methods))
g <- g + scale_shape_manual(legend.title,
values = 15 + rep(1:nb.methods, each = nb.types))
# Display the plot
print(g)
The result is the following:
Sinus curves are drawn as solid lines and cosinus curves as dashed lines.
"method1" data use filled circles for the shape.
"method2" data use filled triangle for the shape.
"method3" data use filled diamonds for the shape.
The legend matches the curve
To summarize, the tricks are :
Use the Type/Method interaction for all data representations (colour, shape,
linetype, etc.)
Then manually set both the curve styles and the legends styles with
scale_xxx_manual.
scale_xxx_manual allows you to provide a values vector that is longer than the actual number of curves, so it's easy to compute the style vector values from the sizes of the Type and Method factors
One just need to name both guides the same. For example:
g+ scale_linetype_manual(name="Guide1",values= c('solid', 'solid', 'dotdash'))+
scale_colour_manual(name="Guide1", values = c("blue", "green","red"))
The code below results in the desired legend, if I understand your question, but I'm not sure I understand the label issue, so let me know if this isn't what you were looking for.
p = ggplot(df.merged, aes(x, y, colour=interaction(Type, Method),
linetype=interaction(Type, Method),
shape=interaction(Type, Method))) +
geom_line() +
geom_point()
p + scale_shape_manual(values=rep(16:17, 2)) +
scale_linetype_manual(values=rep(c(1,3),each=2))
I'm trying to plot 2 sets of data points and a single line in R using ggplot.
The issue I'm having is with the legend.
As can be seen in the attached image, the legend applies the lines to all 3 data sets even though only one of them is plotted with a line.
I have melted the data into one long frame, but this still requires me to filter the data sets for each individual call to geom_line() and geom_path().
I want to graph the melted data, plotting a line based on one data set, and points on the remaining two, with a complete legend.
Here is the sample script I wrote to produce the plot:
xseq <- 1:100
x <- rnorm(n = 100, mean = 0.5, sd = 2)
x2 <- rnorm(n = 100, mean = 1, sd = 0.5)
x.lm <- lm(formula = x ~ xseq)
x.fit <- predict(x.lm, newdata = data.frame(xseq = 1:100), type = "response", se.fit = TRUE)
my_data <- data.frame(x = xseq, ypoints = x, ylines = x.fit$fit, ypoints2 = x2)
## Now try and plot it
melted_data <- melt(data = my_data, id.vars = "x")
p <- ggplot(data = melted_data, aes(x = x, y = value, color = variable, shape = variable, linetype = variable)) +
geom_point(data = filter(melted_data, variable == "ypoints")) +
geom_point(data = filter(melted_data, variable == "ypoints2")) +
geom_path(data = filter(melted_data, variable == "ylines"))
pushViewport(viewport(layout = grid.layout(1, 1))) # One on top of the other
print(p, vp = viewport(layout.pos.row = 1, layout.pos.col = 1))
You can set them manually like this:
We set linetype = "solid" for the first item and "blank" for others (no line).
Similarly for first item we set no shape (NA) and for others we will set whatever shape we need (I just put 7 and 8 there for an example). See e.g. http://www.r-bloggers.com/how-to-remember-point-shape-codes-in-r/ to help you to choose correct shapes for your needs.
If you are happy with dots then you can use my_shapes = c(NA,16,16) and scale_shape_manual(...) is not needed.
my_shapes = c(NA,7,8)
ggplot(data = melted_data, aes(x = x, y = value, color=variable, shape=variable )) +
geom_path(data = filter(melted_data, variable == "ylines") ) +
geom_point(data = filter(melted_data, variable %in% c("ypoints", "ypoints2"))) +
scale_colour_manual(values = c("red", "green", "blue"),
guide = guide_legend(override.aes = list(
linetype = c("solid", "blank","blank"),
shape = my_shapes))) +
scale_shape_manual(values = my_shapes)
But I am very curious if there is some more automated way. Hopefully someone can post better answer.
This post relied quite heavily on this answer: ggplot2: Different legend symbols for points and lines
I want to plot a line graph, with multiple lines, coloured depending on a grouping variable. Now I want to set the legend labels via scale-command:
scale_color_manual(values = colors_values, labels = ...)
The legend labels are as following: "x^2", "x^3", "x^4" etc., where the range is dynamically created. I would now like to dynamically create the expression as label text, i.e.
"x^2" should become x2
"x^3" should become x3
etc.
The amount of legend labels varies, so I thought about something like as.expression(sprintf("x^%i", number)), which does of course not work as label parameter for the scale function.
I have searched google and stack overflow, however, I haven't found a working solution yet, so I hope someone can help me here.
Here's a reproducible example:
poly.term <- runif(100, 1, 60)
resp <- rnorm(100, 40, 5)
poly.degree <- 2:4
geom.colors <- scales::brewer_pal(palette = "Set1")(length(poly.degree))
plot.df <- data.frame()
for (i in poly.degree) {
mydat <- na.omit(data.frame(x = poly.term, y = resp))
fit <- lm(mydat$y ~ poly(mydat$x, i, raw = TRUE))
plot.df <- rbind(plot.df, cbind(mydat, predict(fit), sprintf("x^%i", i)))
}
colnames(plot.df) <- c("x","y", "pred", "grp")
ggplot(plot.df, aes(x, y, colour = grp)) +
stat_smooth(method = "loess", se = F) +
geom_line(aes(y = pred))
scale_color_manual(values = geom.colors
# here I want to change legend labels
# lables = expresion???
)
I would like to have the legend labels to be x2, x3 and x4.
ggplot(plot.df, aes(x, y, colour = grp)) +
stat_smooth(method = "loess", se = F) +
geom_line(aes(y = pred)) +
scale_color_manual(values = setNames(geom.colors,
paste0("x^",poly.degree)),
labels = setNames(lapply(poly.degree, function(i) bquote(x^.(i))),
paste0("x^",poly.degree)))
It's important to ensure correct mapping if you change values or labels in the scale. Thus, you should always use named vectors.
this is my first stack overflow post and I am a relatively new R user, so please go gently!
I have a data frame with three columns, a participant identifier, a condition (factor with 2 levels either Placebo or Experimental), and an outcome score.
set.seed(1)
dat <- data.frame(Condition = c(rep("Placebo",10),rep("Experimental",10)),
Outcome = rnorm(20,15,2),
ID = factor(rep(1:10,2)))
I would like to construct a bar plot with two bars with the mean outcome score for each condition and the standard deviation as an error bar. I would like to then overlay lines connecting points for each participant's score in each condition. So the plot displays the individual response as well as the group mean.If it is also possible I would like to include an axis break.
I don't seem to be able to find any advice in other threads, apologies if I am repeating a question.
Many Thanks.
p.s. I realise that presenting data in this way will not be to everyones tastes. It is for a specific requirement!
This ought to work:
library(ggplot2)
library(dplyr)
dat.summ <- dat %>% group_by(Condition) %>%
summarize(mean.outcome = mean(Outcome),
sd.outcome = sd(Outcome))
ggplot(dat.summ, aes(x = Condition, y = mean.outcome)) +
geom_bar(stat = "identity") +
geom_errorbar(aes(ymin = mean.outcome - sd.outcome,
ymax = mean.outcome + sd.outcome),
color = "dodgerblue", width = 0.3) +
geom_point(data = dat, aes(x = Condition, y = Outcome),
color = "firebrick", size = 1.2) +
geom_line(data = dat, aes(x = Condition, y = Outcome, group = ID),
color = "firebrick", size = 1.2, alpha = 0.5) +
scale_y_continuous(limits = c(0, max(dat$Outcome)))
Some people are better with ggplot's stat functions and arguments than I am and might do it differently. I prefer to just transform my data first.
set.seed(1)
dat <- data.frame(Condition = c(rep("Placebo",10),rep("Experimental",10)),
Outcome = rnorm(20,15,2),
ID = factor(rep(1:10,2)))
dat.w <- reshape(dat, direction = 'wide', idvar = 'ID', timevar = 'Condition')
means <- colMeans(dat.w[, 2:3])
sds <- apply(dat.w[, 2:3], 2, sd)
ci.l <- means - sds
ci.u <- means + sds
ci.width <- .25
bp <- barplot(means, ylim = c(0,20))
segments(bp, ci.l, bp, ci.u)
segments(bp - ci.width, ci.u, bp + ci.width, ci.u)
segments(bp - ci.width, ci.l, bp + ci.width, ci.l)
segments(x0 = bp[1], x1 = bp[2], y0 = dat.w[, 2], y1 = dat.w[, 3], col = 1:10)
points(c(rep(bp[1], 10), rep(bp[2], 10)), dat$Outcome, col = 1:10, pch = 19)
Here is a method using the transfomations inside ggplot2
ggplot(dat) +
stat_summary(aes(x=Condition, y=Outcome, group=Condition), fun.y="mean", geom="bar") +
stat_summary(aes(x=Condition, y=Outcome, group=Condition), fun.data="mean_se", geom="errorbar", col="green", width=.8, size=2) +
geom_line(aes(x=Condition, y=Outcome, group=ID), col="red")