unable to set xlim and ylim using min() and max() in ggplot - r

I am missing something crucial here and can't see it.
Why does min and max not work to set the axis limits?
mtcars %>%
select(mpg, cyl, disp, wt) %>%
filter(complete.cases(disp)) %>%
ggplot() +
geom_point(aes(x=mpg, y=disp, colour=cyl), size=3) +
xlim(min(mpg, na.rm=TRUE),max(mpg, na.rm=TRUE)) +
ylim(min(disp, na.rm=TRUE),max(disp, na.rm=TRUE)) +
scale_colour_gradient(low="red",high="green", name = "cyl")
This works:
mtcars %>%
select(mpg, cyl, disp, wt) %>%
filter(complete.cases(disp)) %>%
ggplot() +
geom_point(aes(x=mpg, y=disp, colour=cyl), size=3) +
# xlim(min(mpg, na.rm=TRUE),max(mpg, na.rm=TRUE)) +
# ylim(min(disp, na.rm=TRUE),max(disp, na.rm=TRUE)) +
scale_colour_gradient(low="red",high="green", name = "cyl")

ggplot cannot access the column values in the way that dplyr can.
You need to add in the data:
mtcars %>%
select(mpg, cyl, disp, wt) %>%
filter(complete.cases(disp)) %>%
ggplot() +
geom_point(aes(x=mpg, y=disp, colour=cyl), size=3) +
xlim(min(mtcars$mpg, na.rm=TRUE),max(mtcars$mpg, na.rm=TRUE)) +
ylim(min(mtcars$disp, na.rm=TRUE),max(mtcars$disp, na.rm=TRUE)) +
scale_colour_gradient(low="red",high="green", name = "cyl")

You can't reference column names in ggplot objects except inside aes() and in a formula or vars() in a facet_* function. But the helper function expand_scale is there to help you expand the scales in a more controlled way.
For example:
# add 1 unit to the x-scale in each direction
scale_x_continuous(expand = expand_scale(add = 1))
# have the scale exactly fit the data, no padding
scale_x_continuous(expand = expand_scale(0, 0))
# extend the scale by 10% in each direction
scale_x_continuous(expand = expand_scale(mult = .1))
See ?scale_x_continuous and especially ?expand_scale for details. It's also possible to selectively pad just the top or just the bottom of each scale, there are examples in ?expand_scale.

Related

How to connect means per group in ggplot?

I can do a scatterplot of two continuous variables like this:
mtcars %>%
ggplot(aes(x=mpg, y = disp)) + geom_point() +
geom_smooth(method="auto", se=TRUE, fullrange=FALSE, level=0.95)
I use cut to create 5 groups of mpg intervals for cars (any better command would do as well). I like to see the intervals in the graph, thus they are easy to understand.
mtcars %>%
mutate(mpg_groups = cut(mpg, 5)) %>%
group_by(mpg_groups) %>%
mutate(mean_disp = mean(disp)) %>%
ggplot(aes(x=mpg_groups, y = mean_disp)) + geom_point()
mpg_groups is a factor variable and can no longer be connected via geom_smooth().
# not working
mtcars %>%
mutate(mpg_groups = cut(mpg, 5)) %>%
group_by(mpg_groups) %>%
mutate(mean_disp = mean(disp)) %>%
ggplot(aes(x=mpg_groups, y = mean_disp)) + geom_point() +
geom_smooth(method="auto", se=TRUE, fullrange=FALSE, level=0.95)
What can I do with easy (tidyverse) code in order to create the mean values per group and connect them via line?
As a more or less general rule, when drawing a line via ggplot2 you have to explicitly set the group aesthetic in all cases where the variable mapped on x isn't a numeric, i.e. use e.g. group=1 to assign all observations to one group which I call 1 for simplicity:
library(ggplot2)
library(dplyr, warn=FALSE)
mtcars %>%
mutate(mpg_groups = cut(mpg, 5)) %>%
group_by(mpg_groups) %>%
mutate(mean_disp = mean(disp)) %>%
ggplot(aes(x = mpg_groups, y = mean_disp, group = 1)) +
geom_point() +
geom_smooth(method = "auto", se = TRUE, fullrange = FALSE, level = 0.95)

Same y-axis range with ggarrange if I am already using facets and cannot use them again

my question is basically a follow-up to this question. However, the problem is that in the said question the answer completely bypasses the fact that ggarrange is used and instead transfers the whole issue to be handled by the facets functionality of ggplot.
This doesn't work for me since I already am using facets in the sub-plots and I cannot use them again.
Here is some example code. I am wondering how to achieve that the two plots which are joined with ggarrange have the same range of y-axis (of course, not setting the limits manually).
mtcars %>%
group_split(vs) %>%
map(~ggplot(., aes(x = mpg, y = wt)) +
geom_point() +
facet_grid(rows = vars(am), cols = vars(gear))) %>%
ggarrange(plotlist = .)
As you can see, the left image's y-axis ranges from 2 to 5, while the right plot's y-axis ranges from 1.5 to 3.5. How can I make them be the same?
I'm once again arguing for abandoning the 'ggarrange' approach, this time in favour of the {patchwork} package, which allows you to apply an operation to all previous plots. In this case, we can use & scale_y_continuous(limits = ...) to set the limits for all plots.
library(ggplot2)
library(dplyr)
library(purrr)
library(patchwork)
mtcars %>%
group_split(vs) %>%
map(~ggplot(., aes(x = mpg, y = wt)) +
geom_point() +
facet_grid(rows = vars(am), cols = vars(gear))) %>%
wrap_plots() &
scale_y_continuous(limits = range(mtcars$wt))
Created on 2022-12-08 by the reprex package (v2.0.0)
One option would be to compute and add the range of your x and y variables to your dataset before splitting, which could then be used to set the limits.
library(dplyr)
library(ggplot2)
library(ggpubr)
library(purrr)
mtcars %>%
mutate(across(c(mpg, wt), list(range = ~list(range(.x))))) %>%
group_split(vs) %>%
map(~ggplot(., aes(x = mpg, y = wt)) +
geom_point() +
scale_x_continuous(limits = .$mpg_range[[1]]) +
scale_y_continuous(limits = .$wt_range[[1]]) +
facet_grid(rows = vars(am), cols = vars(gear))) %>%
ggarrange(plotlist = .)

Annotate several regression lines produced with geom_smooth

I have a figure with 16 regression lines and I need to be able to identify them. Using a color gradient or symbols or different line types do not really help.
My idea therefore is, to just (haha) annotate every line.
Therefore, I build a dataset (hpAnnotatedLines) with the different maximum x values. This is the position the text should start. However, I have no idea how to automatically extract the respective y values of the predicted regression lines at the maximum x-axis values, which is different for each line.
Please find a smaller data set using mtcars as an example
library(ggplot2)
library(dplyr)
library(ggrepel)
#just select the data I need
mtcars1 <- select(mtcars, disp,cyl,hp)
mtcars1$cyl <- as.factor(mtcars1$cyl)
#extract max values
mtcars2 <- mtcars1 %>%
group_by(cyl) %>%
summarise(Max.disp= max(disp))
#build dataset for the annotation layer
#note that hp was done by hand. Here I need help
hpAnnotatedLines <- data.frame(cyl=levels(mtcars2$cyl),
disp=mtcars2$Max.disp,
hp=c(90,100,210))
#example plot
ggplot(mtcars, aes(x=disp, y=hp, color = factor(cyl))) +
geom_point() +
geom_smooth(method=lm)+
coord_cartesian(xlim = c(min(mtcars$disp), max(mtcars$disp) + 50)) +
geom_text_repel(
data = hpAnnotatedLines,
aes(label = cyl),
size = 3,
nudge_x = 1)
Instead of extracting the fitted values you could add the labels via geom_text by switching the stat to smooth and setting the label aesthetic via after_stat such that only the last point of each regression line gets labelled:
library(ggplot2)
library(dplyr)
myfun <- function(x, color) {
data.frame(x = x, color = color) %>%
group_by(color) %>%
mutate(label = ifelse(x %in% max(x), as.character(color), "")) %>%
pull(label)
}
ggplot(mtcars, aes(x=disp, y=hp, color = factor(cyl))) +
geom_point() +
geom_smooth(method=lm) +
geom_text(aes(label = after_stat(myfun(x, color))),
stat = "smooth", method = "lm", hjust = 0, size = 3, nudge_x = 1, show.legend = FALSE) +
coord_cartesian(xlim = c(min(mtcars$disp), max(mtcars$disp) + 50))
It's a bit of a hack, but you can extract the data from the compiled plot object. For example first make the plot without the labels,
myplot <- ggplot(mtcars, aes(x=disp, y=hp, color = factor(cyl))) +
geom_point() +
geom_smooth(method=lm)+
coord_cartesian(xlim = c(min(mtcars$disp), max(mtcars$disp) + 50))
Then use ggplot_build to get the data from the second layer (The geom_smooth layer) and transform it back into the names used by your data. Here we find the largest x value per group, and then take that y value.
pobj <- ggplot_build(myplot)
hpAnnotatedLines <- pobj$data[[2]] %>% group_by(group) %>%
top_n(1, x) %>%
transmute(disp=x, hp=y, cyl=levels(mtcars$cyl)[group])
Then add an additional layer to your plot
myplot +
geom_text_repel(
data = hpAnnotatedLines,
aes(label = cyl),
size = 3,
nudge_x = 1)
If your data is not that huge, you can extract the predictions out using augment() from broom and take that with the largest value:
library(broom)
library(dplyr)
library(ggplot2)
hpAnn = mtcars %>% group_by(cyl) %>%
do(augment(lm(hp ~ disp,data=.))) %>%
top_n(1,disp) %>%
select(cyl,disp,.fitted) %>%
rename(hp = .fitted)
# A tibble: 3 x 3
# Groups: cyl [3]
cyl disp hp
<dbl> <dbl> <dbl>
1 4 147. 96.7
2 6 258 99.9
3 8 472 220.
Then plot:
ggplot(mtcars, aes(x=disp, y=hp, color = factor(cyl))) +
geom_point() +
geom_smooth(method=lm)+
coord_cartesian(xlim = c(min(mtcars$disp), max(mtcars$disp) + 50))+
geom_text_repel(
data = hpAnn,
aes(label = cyl),
size = 3,
nudge_x = 1)

ggplot and dplyr filter reference

Here is my code:
mtcars %>% filter(cyl == 4) %>%
ggplot(., aes(mpg, hp, color=hp)) +
geom_point() +
scale_color_gradient(low = "darkorange2", high = "darkred",
breaks=c(min(mtcars$hp), max(mtcars$hp)),
labels=c("Min","Max"))
What I would like to do is, include the breaks in the scale_color_gradient function in the filter I have called beforehand. I know that .$hp works in base R and only using the variable name in dplyr, but how do I use it in this case?
You can put all the plotting code in braces to keep the "right" object in the .. Also if you want to go from min to max, you can use range(). For example
mtcars %>% filter(cyl == 4) %>%
{ggplot(., aes(mpg, hp, color=hp)) +
geom_point() +
scale_color_gradient(low = "darkorange2", high = "darkred",
breaks=range(.$hp),
labels=c("Min","Max"))}

Add titles to ggplots created with map()

What's the easiest way to add titles to each ggplot that I've created below using the map function? I want the titles to reflect the name of each data frame - i.e. 4, 6, 8 (cylinders).
Thanks :)
mtcars_split <-
mtcars %>%
split(mtcars$cyl)
plots <-
mtcars_split %>%
map(~ ggplot(data=.,mapping = aes(y=mpg,x=wt)) +
geom_jitter()
# + ggtitle(....))
plots
Use map2 with names.
plots <- map2(
mtcars_split,
names(mtcars_split),
~ggplot(data = .x, mapping = aes(y = mpg, x = wt)) +
geom_jitter() +
ggtitle(.y)
)
Edit: alistaire pointed out this is the same as imap
plots <- imap(
mtcars_split,
~ggplot(data = .x, mapping = aes(y = mpg, x = wt)) +
geom_jitter() +
ggtitle(.y)
)
Perhaps you'd be interested in using facet_wrap instead
ggplot(mtcars, aes(y=mpg, x=wt)) + geom_jitter() + facet_wrap(~cyl)
You can use purrr::map2():
mtcars_split <- mtcars %>% split(mtcars$cyl)
plots <- map2(mtcars_split, titles,
~ ggplot(data=.x, aes(mpg,wt)) + geom_jitter() + ggtitle(.y)
)
EDIT
Sorry duplicated with Paul's answer.

Resources