Add 2 additional lines to a ggplot - r

I have successfully plotted my dat below. However, I want to do the following data transformation: dat %>% group_by(groups) %>% mutate(x.cm = mean(x), x.cwc = x-x.cm) and then ADD 2 lines to my current plot:
geom_smooth() using x.cm as x
geom_smooth() using x.cwc as x
Is there a way to do this?
p.s: Is it also possible to display the 3 unique(x.cm) values as 3 stars on the plot? (see pic below)
library(tidyverse)
dat <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/cw.csv')
dat %>% group_by(groups) %>% ggplot() +
aes(x, y, color = groups, shape = groups)+
geom_point(size = 2) + theme_classic()+
stat_ellipse()
# Now do the transformation:
dat %>% group_by(groups) %>% mutate(x.cm = mean(x), x.cwc = x-x.cm)

I'm not sure the second part of your question makes any sense to me. But from the description one way is to simply add a level where you alter you data and aes argument, as I do in the example below (using mtcars as an example data)
# Load libraries and data
library(tidyverse)
library(ggplot2)
data(mtcars)
mtcars %>% ggplot(aes(x = hp, y = mpg)) +
geom_point(aes(col = factor(cyl))) +
stat_ellipse(aes(col = factor(cyl))) +
# Add line for ellipsis center
geom_line(data = mtcars %>% group_by(cyl) %>% summarize(mean_x = mean(hp),
mean_y = mean(mpg),
.groups = 'drop'),
mapping = aes(x = mean_x, y = mean_y)) +
geom_point(data = mtcars %>% group_by(cyl) %>% summarize(mean_x = mean(hp),
mean_y = mean(mpg),
.groups = 'drop'),
mapping = aes(x = mean_x, y = mean_y)) +
# Add smooth for.. what? I don't understand this part of the question.
geom_smooth(data = mtcars %>% group_by(cyl) %>% mutate(x_val = hp - mean(hp)) %>% ungroup(),
mapping = aes(x = x_val, y = mpg))
Now it should be quite clear which part does not make sense to me. Why/what do you mean with the second path (geom_smooth)? Moving the x-axis on the smoother makes no sense to me. Also I took the liberty of changing the definition of the first part, by instead adding the single points of the mean (center of circles) to the plot and connecting the using geom_line.

Related

How to connect means per group in ggplot?

I can do a scatterplot of two continuous variables like this:
mtcars %>%
ggplot(aes(x=mpg, y = disp)) + geom_point() +
geom_smooth(method="auto", se=TRUE, fullrange=FALSE, level=0.95)
I use cut to create 5 groups of mpg intervals for cars (any better command would do as well). I like to see the intervals in the graph, thus they are easy to understand.
mtcars %>%
mutate(mpg_groups = cut(mpg, 5)) %>%
group_by(mpg_groups) %>%
mutate(mean_disp = mean(disp)) %>%
ggplot(aes(x=mpg_groups, y = mean_disp)) + geom_point()
mpg_groups is a factor variable and can no longer be connected via geom_smooth().
# not working
mtcars %>%
mutate(mpg_groups = cut(mpg, 5)) %>%
group_by(mpg_groups) %>%
mutate(mean_disp = mean(disp)) %>%
ggplot(aes(x=mpg_groups, y = mean_disp)) + geom_point() +
geom_smooth(method="auto", se=TRUE, fullrange=FALSE, level=0.95)
What can I do with easy (tidyverse) code in order to create the mean values per group and connect them via line?
As a more or less general rule, when drawing a line via ggplot2 you have to explicitly set the group aesthetic in all cases where the variable mapped on x isn't a numeric, i.e. use e.g. group=1 to assign all observations to one group which I call 1 for simplicity:
library(ggplot2)
library(dplyr, warn=FALSE)
mtcars %>%
mutate(mpg_groups = cut(mpg, 5)) %>%
group_by(mpg_groups) %>%
mutate(mean_disp = mean(disp)) %>%
ggplot(aes(x = mpg_groups, y = mean_disp, group = 1)) +
geom_point() +
geom_smooth(method = "auto", se = TRUE, fullrange = FALSE, level = 0.95)

Annotate facet plot with grouped variables

I would like to place the numbers of observations above a facet boxplot. Here is an example:
exmp = mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n())
ggplot(exmp, aes(x = am, fill = gear, y = wt)) +
facet_grid(.~cyl) +
geom_boxplot() +
geom_text(aes(y = 6, label = N))
So, I already created column N to get the label over each box in the boxplot (combination of cyl, am and gear). How do I plot these labels so that they are over the respective box? Please note that the number of levels of gear for each level of am differs on purpose.
I really looked at a lot of ggplot tutorials and there are tons of questions dealing with annotating in facet plots. But none addressed this fairly common problem...
You need to give position_dodge() inside geom_textto match the position of the boxes, also define data argument to get the distinct value of observations:
ggplot(exmp, aes(x = as.factor(am), fill = as.factor(gear), y = wt)) +
geom_boxplot() +
facet_grid(.~cyl) +
geom_text(data = dplyr::distinct(exmp, N),
aes(y = 6, label = N), position = position_dodge(0.9))
One minor issue here is that you are printing the N value once for every data point, not once for every cyl/am/gear combination. So you might want to add a filtering step to avoid overplotting that text, which can look messy on screen, reduce your control over alpha, and slow down plotting in cases with larger data.
library(tidyverse)
exmp = mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n()) %>%
ungroup() %>%
mutate(am = as.factor(am),
gear = as.factor(gear))
(The data prep above was necessary for me to get the plot to look like your example. I'm using tidyverse 1.2.1 and ggplot2 3.2.1)
ggplot(exmp, aes(x = am, fill = gear, y = wt,
group = interaction(gear, am))) +
facet_grid(.~cyl) +
geom_boxplot() +
geom_text(data = exmp %>% distinct(cyl, gear, am, N),
aes(y = 6, label = N),
position = position_dodge(width = 0.8))
Here's the same chart with overplotting:
Perhaps using position_dodge() in your geom_text() will get you what you want?
mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n()) %>%
ggplot(aes(x = as.factor(am), fill = as.factor(gear), y = wt)) +
geom_boxplot() +
geom_text(aes(y = 6, label = N), position = position_dodge(width = 0.7)) +
facet_grid(.~cyl)

GGplot order legend using last values on x-axis

I have some time series data plotted using ggplot. I'd like the legend, which appears to the right of the plot, to be in the same order as the line on the most recent date/value on the plot's x-axis. I tried using the case_when function, but I'm obviously using it wrong. Here is an example.
df <- tibble(
x = runif(100),
y = runif(100),
z = runif(100),
year = sample(seq(1900, 2010, 10), 100, T)
) %>%
gather(variable, value,-year) %>%
group_by(year, variable) %>%
summarise(mean = mean(value))
df %>%
ggplot(aes(year, mean, color = variable)) +
geom_line()
## does not work
df %>%
mutate(variable = fct_reorder(variable, case_when(mean ~ year == 2010)))
ggplot(aes(year, mean, color = variable)) +
geom_line()
We may add one extra line
ungroup() %>% mutate(variable = fct_reorder(variable, mean, tail, n = 1, .desc = TRUE))
before plotting, or use
df %>%
mutate(variable = fct_reorder(variable, mean, tail, n = 1, .desc = TRUE)) %>%
ggplot(aes(year, mean, color = variable)) +
geom_line()
In this way we look at the last values of mean and reorder variable accordingly.
There's another way without adding a new column using fct_reorder2():
library(tidyverse)
df %>%
ggplot(aes(year, mean, color = fct_reorder2(variable, year, mean))) +
geom_line() +
labs(color = "variable")
Although it's not recommendable in your case, to order the legend based on the first (earliest) values in your plot you can set
df %>%
ggplot(aes(year, mean, color = fct_reorder2(variable, year, mean, .fun = first2))) +
geom_line() +
labs(color = "variable")
The default is .fun = last2 (see also https://forcats.tidyverse.org/reference/fct_reorder.html)

How do you dodge a horizontal linerange in ggplot?

How do you dodge a ggstance::geom_linerangeh in ggplot2?
library(tidyverse)
library(ggstance)
mtcars %>%
group_by(cyl, am) %>%
summarize(lo = mpg %>% min,
hi = mpg %>% max) %>%
ggplot() +
geom_linerangeh(aes(y = am %>%
factor,
xmin = lo,
xmax = hi,
group = am %>%
factor),
position = position_dodgev(height = .25)) +
facet_wrap(~cyl, ncol = 1)
results in :
whereas I would like to see the lines sitting slightly below the horizontals, consistent with the standard behaviour of position_dodge elsewhere.
To get dodging, you need to map colour or linetype to another variable that splits am into sub-categories based on that third variable; otherwise there's only one category for each level of am and therefore nothing to dodge.
For example, let's use vs as that other variable and we'll map it to color. We also add rows (using complete) for missing combinations of am,vs, and cyl to ensure that dodging occurs even for combinations of cyl and am where only one level of vs is present in the data.
library(tidyr)
mtcars %>%
group_by(vs=factor(vs), cyl=factor(cyl), am=factor(am)) %>%
summarize(lo = mpg %>% min,
hi = mpg %>% max) %>%
ungroup() %>%
complete(am, cyl, nesting(vs)) %>%
ggplot() +
geom_linerangeh(aes(y = am, colour=vs, xmin = lo, xmax = hi),
position = position_dodgev(height = 0.5)) +
facet_wrap(~cyl, ncol = 1) +
theme_bw()

Using purrr to plot a mean line on a list of ggplot2 objects

I am playing around with purrr and the methods outlines in this post
My goal here is to use purrr::map2 (or some variant) to apply a function (mean in this case) by a group (cyl) then create some plots that use the result of the previous application of a function in the resulting plot. Or put, another way, I want to add a vertical line for the mean on each of these plots using the mean_list list column all within a dplyr chain. Is this possible?
library(tidyverse)
mt_list <- mtcars %>%
group_by(cyl) %>%
nest() %>%
mutate(mean_list = map2(data, cyl, ~mean(.$disp))) %>%
mutate(plot = map2(data, cyl, ~ggplot(data = .x) +
geom_point(aes(y = drat, x = disp)) #+
#geom_vline(data = mean_list ,aes(xintercept) Unsure about this step
))
This is an example of one type of plot I'm after but this seems like a silly way to do this when the whole point is to have everything contained within a nice tibble like mt_list
mt_list$plot[[1]] +
geom_vline(aes(xintercept = mt_list$mean_list[[1]]))
This can be done by passing mean_list as the second argument to map2 rather than cyl, then using xintercept = .y in your geom_vline.
mt_list <- mtcars %>%
group_by(cyl) %>%
nest() %>%
mutate(mean_list = map(data, ~mean(.$disp))) %>%
mutate(plot = map2(data, mean_list, ~ ggplot(data = .x) +
geom_point(aes(y = drat, x = disp)) +
geom_vline(xintercept = .y)
))
Note that for this particular use case, you can also avoid having to create mean_list at all by using aes(xintercept = mean(disp)):
mt_list <- mtcars %>%
group_by(cyl) %>%
nest() %>%
mutate(plot = map(data, ~ ggplot(data = .) +
geom_point(aes(y = drat, x = disp)) +
geom_vline(aes(xintercept = mean(disp)))))

Resources