Overlapping Trend Lines in scatterplots, R - r

I am trying to overlay multiple trend lines using the geom_smooth() in R. I currently have this code.
ggplot(mtcars2, aes(x=Displacement, y = Variable, color = Variable))
+ geom_point(aes(x=mpg, y = hp, col = "Power"))
+ geom_point(aes(x=mpg, y = drat, col = "Drag Coef."))
(mtcars2 is the normalized form of mtcars)
Which give me this graph.
I am trying to use the geom_smooth(method='lm') to draw two trend lines for the the two variables. Any ideas?
(Bonus: I would also like to implement the 'shape=1' paramater to differentiate the varaibles if possible. The following method does not work)
geom_point(aes(x=mpg, y = hp, col = "Power", shape=2))
Update
I managed to do this.
ggplot(mtcars2, aes(x=Displacement, y = Variable, color = Variable))
+ geom_point(aes(x=disp, y = hp, col = "Power"))
+ geom_point(aes(x=disp, y = mpg, col = "MPG"))
+ geom_smooth(method= 'lm',aes(x=disp, y = hp, col = "Power"))
+ geom_smooth(method= 'lm',aes(x=disp, y = mpg, col = "MPG"))
It looks like this.
But this is an ugly piece of code. If anybody can make this code look prettier, it'd be great. Also, I have not yet been able to implement the 'shape=2' parameter.

It seems like you're making your life harder than it needs to be...you can pass in additional parameters into aes() such as group and shape.
I don't know if I got your normalization right, but this should give you enough to get going in the right direction:
library(ggplot2)
library(reshape2)
#Do some normalization
mtcars$disp_norm <- with(mtcars, (disp - min(disp)) / (max(disp) - min(disp)))
mtcars$hp_norm <- with(mtcars, (hp - min(hp)) / (max(hp) - min(hp)))
mtcars$drat_norm <- with(mtcars, (drat - min(drat)) / (max(drat) - min(drat)))
#Melt into long form
mtcars.m <- melt(mtcars, id.vars = "disp_norm", measure.vars = c("hp_norm", "drat_norm"))
#plot
ggplot(mtcars.m, aes(disp_norm, value, group = variable, colour = variable, shape = variable)) +
geom_point() +
geom_smooth(method = "lm")
Yielding:

Related

Group geom_vline for a conditional

I believe I'm going about this incorrectly.
I have a ggplot that has several lines graphed into it. Each line is categorized under a 'group.' (ie. predator lines include lines for bear frequency, lion_frequency; prey lines include lines for fish frequency, rabbit_frequency; etc.)
Here's a reproducible example using dummy data
p <- function(black_lines, green_lines){
ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() +
geom_vline(xintercept = 5) +
geom_vline(xintercept = 10) +
geom_vline(xintercept = 1:5,
colour = "green",
linetype = "longdash")
}
p()
Ideally, it would work like:
p <- function(black_lines, green_lines){
ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() +
if (black_lines){
geom_vline(xintercept = 5) +
geom_vline(xintercept = 10) +
}
if(green_lines){
geom_vline(xintercept = 1:5,
colour = "green",
linetype = "longdash")
}
}
p(T, T)
This method won't work, of course since R doesn't like ->
Error in ggplot_add():
! Cannot add ggproto objects together. Did you forget to add this object to a ggplot object?
But I'm wondering if this is possible? I couldn't find any similar questions so I feel like I'm going about wrongly.
For those who believe more context is needed. This is for a reactive Shiny app and I want the user to be able to select how the graph will be generated (as such with specific lines or not).
Thank you for your guidance in advance!
You could create your conditional layers using an if and assign them to a variable which could then be added to your ggplot like any other layer:
Note: In case you want to include multiple layers then you could put them in a list, e.g. list(geom_vline(...), geom_vline(...)).
library(ggplot2)
p <- function(black_lines, green_lines){
vline_black <- if (black_lines) geom_vline(xintercept = c(5, 10))
vline_green <- if (green_lines) geom_vline(xintercept = 1:5,
colour = "green",
linetype = "longdash")
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() +
vline_black +
vline_green
}
p(T, T)
p(T, F)

Adding the same component to a list of ggplots

I have a list of ggplot2 plots, and I want to add the same title (Cars to each element of the list
library(ggplot2)
l <- list(
ggplot(data = mtcars, aes(x = mpg, y = cyl, col = am)) + geom_point(),
ggplot(data = mtcars, aes(x = mpg, y = disp, col = am)) + geom_point(),
ggplot(data = mtcars, aes(x = mpg, y = hp, col = am)) + geom_point()
)
Now I can refer to each element and add the title as follows
l[[1]] + ggtitle("Cars")
l[[2]] + ggtitle("Cars")
l[[3]] + ggtitle("Cars")
But is there a way to add the title to all elements in the list at once?
(Note: For one layer, this is rather silly, but I can extend such an example to multiple layers.)
User H 1 answered the question. As ggplot2 is different due to the layering, I was unsure if lapply() would work in this case. I now learned that the pipe symbol, + is a function to be applied over.
But adding a title and positioning the legend at the bottom has the desired effect:
q <- lapply(l, function(x) x + ggtitle("Cars") + theme(legend.position = "bottom"))
multiplot( plotlist = q, cols = 2)
where the code for multiplot() is found here.

Animate the process of adding layers to a ggplot2 plot

I am starting to get familiar with gganimate, but I want to extend my gifs further.
For instance, I can throw a frame on one variable in gganimate but what if I want to animate the process of adding entirely new layers/geoms/variables?
Here's a standard gganimate example:
library(tidyverse)
library(gganimate)
p <- ggplot(mtcars, aes(x = hp, y = mpg, frame = cyl)) +
geom_point()
gg_animate(p)
But what if I want the gif to animate:
# frame 1
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point()
# frame 2
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point(aes(color = factor(cyl)))
# frame 3
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point(aes(color = factor(cyl), size = wt))
# frame 4
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point(aes(color = factor(cyl), size = wt)) +
labs(title = "MTCARS")
How might this be accomplished?
You can manually add a frame aesthetic to each layer, though it will include the legends for all of the frames immediately (Intentionally, I believe, to keep ratios/margins, etc. correct:
saveAnimate <-
ggplot(mtcars, aes(x = hp, y = mpg)) +
# frame 1
geom_point(aes(frame = 1)) +
# frame 2
geom_point(aes(color = factor(cyl)
, frame = 2)
) +
# frame 3
geom_point(aes(color = factor(cyl), size = wt
, frame = 3)) +
# frame 4
geom_point(aes(color = factor(cyl), size = wt
, frame = 4)) +
# I don't think I can add this one
labs(title = "MTCARS")
gg_animate(saveAnimate)
If you want to be able to add things yourself, and even see how legends, titles, etc. move things around, you may need to step back to a lower-level package, and construct the images yourself. Here, I am using the animation package which allows you to loop through a series of plots, with no limitations (they need not be related at all, so can certainly show things moving the plot area around. Note that I believe this requires ImageMagick to be installed on your computer.
p <- ggplot(mtcars, aes(x = hp, y = mpg))
toSave <- list(
p + geom_point()
, p + geom_point(aes(color = factor(cyl)))
, p + geom_point(aes(color = factor(cyl), size = wt))
, p + geom_point(aes(color = factor(cyl), size = wt)) +
labs(title = "MTCARS")
)
library(animation)
saveGIF(
{lapply(toSave, print)}
, "animationTest.gif"
)
The gganimate commands in the earlier answers are deprecated as of 2021 and won't accomplish OP's task.
Building on Mark's code, you can now simply create a static ggplot object with multiple layered geoms and then add the gganimate::transition_layers function to create an animation that transitions from layer to layer within the static plot. Tweening functions like enter_fade() and enter_grow() control how elements change into and out of frames.
library(tidyverse)
library(gganimate)
anim <- ggplot(mtcars, aes(x = hp, y = mpg)) +
# Title
labs(title = "MTCARS") +
# Frame 1
geom_point() +
# Frame 2
geom_point(aes(color = factor(cyl))) +
# Frame 3
geom_point(aes(color = factor(cyl), size = wt)) +
# gganimate functions
transition_layers() + enter_fade() + enter_grow()
# Render animation
animate(anim)
the animation package doesn't force you to specify frames in the data. See the example at the bottom of this page here, where an animation is wrapped in a big saveGIF() function. You can specify the duration of individual frames and everything.
The drawback to this is that, unlike the nice gganimate functions, the basic frame-by-frame animation wont hold the plot dimensions/legend constant. But if you can hack your way into displaying exactly what you want for each frame, the basic animation package will serve you well.

R: ggplot2, how to annotate summary statistics on each panel of a panel plot

How would I add a text annotation (eg. sd = sd_value) of the standard deviation in each panel of the following plot using ggplot2 in R?
library(datasets)
data(mtcars)
ggplot(data = mtcars, aes(x = hp)) +
geom_dotplot(binwidth = 1) +
geom_density() +
facet_grid(. ~ cyl) +
theme_bw()
I'd post an image of the plot, but I don't have enough rep.
I think "geom_text" or "annotate" might be useful but I'm not sure quite sure how.
If you want to vary the text label in each facet, you will want to use geom_text. If you want the same text to appear in each facet, you can use annotate.
p <- ggplot(data = mtcars, aes(x = hp)) +
geom_dotplot(binwidth = 1) +
geom_density() +
facet_grid(. ~ cyl)
mylabels <- data.frame(cyl = c(4, 6, 8),
label = c("first label", "seond label different", "and another"))
p + geom_text(x = 200, y = 0.75, aes(label = label), data = my labels)
### compare that to this way with annotate
p + annotate("text", x = 200, y = 0.75, label = "same label everywhere")
Now, if you really want standard deviation by cyl in this example, I'd probably use dplyr to do the calculation first and then complete this with geom_text like so:
library(ggplot2)
library(dplyr)
df.sd.hp <- mtcars %>%
group_by(cyl) %>%
summarise(hp.sd = round(sd(hp), 2))
ggplot(data = mtcars, aes(x = hp)) +
geom_dotplot(binwidth = 1) +
geom_density() +
facet_grid(. ~ cyl) +
geom_text(x = 200, y = 0.75,
aes(label = paste0("SD: ", hp.sd)),
data = df.sd.hp)
I prefer the appearance of the graph when the statistic appears within the facet label itself. I made the following script, which allows the choice of displaying the standard deviation, mean or count. Essentially it calculates the summary statistic then merges this with the name so that you have the format CATEGORY (SUMMARY STAT = VALUE).
#' Function will update the name with the statistic of your choice
AddNameStat <- function(df, category, count_col, stat = c("sd","mean","count"), dp= 0){
# Create temporary data frame for analysis
temp <- data.frame(ref = df[[category]], comp = df[[count_col]])
# Aggregate the variables and calculate statistics
agg_stats <- plyr::ddply(temp, .(ref), summarize,
sd = sd(comp),
mean = mean(comp),
count = length(comp))
# Dictionary used to replace stat name with correct symbol for plot
labelName <- mapvalues(stat, from=c("sd","mean","count"), to=c("\u03C3", "x", "n"))
# Updates the name based on the selected variable
agg_stats$join <- paste0(agg_stats$ref, " \n (", labelName," = ",
round(agg_stats[[stat]], dp), ")")
# Map the names
name_map <- setNames(agg_stats$join, as.factor(agg_stats$ref))
return(name_map[as.character(df[[category]])])
}
Using this script with your original question:
library(datasets)
data(mtcars)
# Update the variable name
mtcars$cyl <- AddNameStat(mtcars, "cyl", "hp", stat = "sd")
ggplot(data = mtcars, aes(x = hp)) +
geom_dotplot(binwidth = 1) +
geom_density() +
facet_grid(. ~ cyl) +
theme_bw()
The script should be easy to alter to include other summary statistics. I am also sure it could be rewritten in parts to make it a bit cleaner!

Adding a line to a ggplot2 plot and tweaking legend

I'm using ggplot2 to show points colored by value. In addition, I want to show a regression line on this data.
This is an example of the data that I am using:
structure(list(a = c(63.635707116462, 59.7200565823145, 56.0311239027684,
53.1573088984712, 51.0192317467653, 48.0727441921859, 47.1516684444444,
45.5081981068289, 43.5874967485549, 43.3163255512322), b = c(278.983796321269,
254.833332215134, 234.812503036992, 221.519477352253, 212.013474843663,
199.926648466351, 194.577007436116, 186.506133515809, 179.411968705754,
172.056487287103), col = c(18.36245, 22.03494, 25.70743, 29.37992,
33.05241, 36.7249, 40.39739, 44.06988, 47.74237, 51.41486), predict = c(275.438415187452,
256.049214397717, 237.782656695549, 223.552332598712, 212.965175538386,
198.374997400175, 193.814089203754, 185.676086057123, 176.165312823424,
174.82254927815)), .Names = c("a", "b", "col", "predict"), row.names = c(NA,
-10L), class = "data.frame")
And the code I am using so far is as follows:
p <- ggplot(data = df, aes(x = a, y = b, colour=col)) + geom_point()
p + stat_smooth(method = "lm", formula = y ~ x, se = FALSE)
However, this does not produce a straight line (as it is smoothed) so instead I tried to follow one of the examples on ggplot2 (which is using qplot) and did the following:
model <- lm(b ~ a, data = df)
df$predict <- stats::predict(model, newdata=df)
p <- ggplot(data = df, aes(x = a, y = b, colour=col) ) + geom_point()
p + geom_line(aes(x = a, y = predict))
In the example, a line is added using + geom_line(data=grid), which in my case would be + geom_line(data=df). This just joins the points together, instead of drawing a straight line on the plot. How can I plot a line on this plot that is perfectly straight?
The other problem I was having with the plot, is renaming the legend. I want to have a two word title for the data (e.g. 'Z Density'), but I don't know how to change it. I've tried using + scale_colour_discrete(name = "Fancy Title") and + scale_linetype_discrete(name = "Fancy Title") using advice from this question but they do not work as my data is colored by a value.
As #Andrie says, using method = "lm" gives a linear model. As for your second question, use scale_color_continuous()
p <- ggplot(data = df, aes(x = a, y = b, colour=col)) + geom_point()
p + stat_smooth(method = "lm", se = FALSE) +
scale_colour_continuous(name = "My Legend")
You also don't need to do all of the predicting. ggplot() will do this for you, which is one of the great benefits.

Resources