How to connect means per group in ggplot? - r

I can do a scatterplot of two continuous variables like this:
mtcars %>%
ggplot(aes(x=mpg, y = disp)) + geom_point() +
geom_smooth(method="auto", se=TRUE, fullrange=FALSE, level=0.95)
I use cut to create 5 groups of mpg intervals for cars (any better command would do as well). I like to see the intervals in the graph, thus they are easy to understand.
mtcars %>%
mutate(mpg_groups = cut(mpg, 5)) %>%
group_by(mpg_groups) %>%
mutate(mean_disp = mean(disp)) %>%
ggplot(aes(x=mpg_groups, y = mean_disp)) + geom_point()
mpg_groups is a factor variable and can no longer be connected via geom_smooth().
# not working
mtcars %>%
mutate(mpg_groups = cut(mpg, 5)) %>%
group_by(mpg_groups) %>%
mutate(mean_disp = mean(disp)) %>%
ggplot(aes(x=mpg_groups, y = mean_disp)) + geom_point() +
geom_smooth(method="auto", se=TRUE, fullrange=FALSE, level=0.95)
What can I do with easy (tidyverse) code in order to create the mean values per group and connect them via line?

As a more or less general rule, when drawing a line via ggplot2 you have to explicitly set the group aesthetic in all cases where the variable mapped on x isn't a numeric, i.e. use e.g. group=1 to assign all observations to one group which I call 1 for simplicity:
library(ggplot2)
library(dplyr, warn=FALSE)
mtcars %>%
mutate(mpg_groups = cut(mpg, 5)) %>%
group_by(mpg_groups) %>%
mutate(mean_disp = mean(disp)) %>%
ggplot(aes(x = mpg_groups, y = mean_disp, group = 1)) +
geom_point() +
geom_smooth(method = "auto", se = TRUE, fullrange = FALSE, level = 0.95)

Related

Multi-row labels in ggplot2

I have a plot which contains multiple entries of the same items along the x-axis. I have a total of 45 items grouped according to the groups below.
pvalall$Group<-c(rep("Physical",5*162),rep("Perinatal",11*162),rep("Developmental",3*162),
rep("Lifestyle-Life Events",5*162),rep("Parental-Family",13*162),rep("School",3*162),
rep("Neighborhood",5*162))
pvalall$Group <- factor(pvalall$Group,
levels = c("Physical", "Perinatal", "Developmental",
"Lifestyle-Life Events", "Parental-Family",
"School","Neighborhood"))
So essentially there are 162*45=7290 points along the x-axis and each 162 set of them corresponds to one of the variables of interest. How do I get geom_point to only plot one lable for each of these 162 given a list of the variable names c("var1","var2",....,"var45")?
A reprex would be nice, but generally the solution is to create a separate dataframe with one row per group indicating where the labels should go, and to add a geom_text() layer to your plot that uses this dataframe.
My guess is that the code should look like this:
# create a dataframe for the labels
pvalall %>%
group_by(Group) %>%
summarize(Domains = mean(Domains),
`-log10(P-Values)` = mean(`-log10(P-Values)`)) -> label_df
# now make the plot
pvalall %>%
ggplot(aes(x = Domains, y = `-log10(P-Values)`)) +
geom_point(aes(col = Group)) + # putting col aesthetic in here so that the labels are not colored
geom_text(data =label_df, aes(label = Group))
Here is an example with mtcars:
library(tidyverse)
mtcars %>%
group_by(cyl) %>%
summarize(mpg = mean(mpg),
disp = mean(disp)) %>%
mutate(cyl_label = str_c(cyl, "\ncylinders")) -> label_df
mtcars %>%
ggplot(aes(x = mpg, y = disp)) +
geom_point(aes(col = factor(cyl)), show.legend = F) +
geom_text(data = label_df, aes(label = cyl_label))
produces

Add 2 additional lines to a ggplot

I have successfully plotted my dat below. However, I want to do the following data transformation: dat %>% group_by(groups) %>% mutate(x.cm = mean(x), x.cwc = x-x.cm) and then ADD 2 lines to my current plot:
geom_smooth() using x.cm as x
geom_smooth() using x.cwc as x
Is there a way to do this?
p.s: Is it also possible to display the 3 unique(x.cm) values as 3 stars on the plot? (see pic below)
library(tidyverse)
dat <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/cw.csv')
dat %>% group_by(groups) %>% ggplot() +
aes(x, y, color = groups, shape = groups)+
geom_point(size = 2) + theme_classic()+
stat_ellipse()
# Now do the transformation:
dat %>% group_by(groups) %>% mutate(x.cm = mean(x), x.cwc = x-x.cm)
I'm not sure the second part of your question makes any sense to me. But from the description one way is to simply add a level where you alter you data and aes argument, as I do in the example below (using mtcars as an example data)
# Load libraries and data
library(tidyverse)
library(ggplot2)
data(mtcars)
mtcars %>% ggplot(aes(x = hp, y = mpg)) +
geom_point(aes(col = factor(cyl))) +
stat_ellipse(aes(col = factor(cyl))) +
# Add line for ellipsis center
geom_line(data = mtcars %>% group_by(cyl) %>% summarize(mean_x = mean(hp),
mean_y = mean(mpg),
.groups = 'drop'),
mapping = aes(x = mean_x, y = mean_y)) +
geom_point(data = mtcars %>% group_by(cyl) %>% summarize(mean_x = mean(hp),
mean_y = mean(mpg),
.groups = 'drop'),
mapping = aes(x = mean_x, y = mean_y)) +
# Add smooth for.. what? I don't understand this part of the question.
geom_smooth(data = mtcars %>% group_by(cyl) %>% mutate(x_val = hp - mean(hp)) %>% ungroup(),
mapping = aes(x = x_val, y = mpg))
Now it should be quite clear which part does not make sense to me. Why/what do you mean with the second path (geom_smooth)? Moving the x-axis on the smoother makes no sense to me. Also I took the liberty of changing the definition of the first part, by instead adding the single points of the mean (center of circles) to the plot and connecting the using geom_line.

Display `dplyr` groups ordered by their corresponding value (not name) in `geom_point`

Using the mpg dataset I want to produce a scatterplot that shows for every manufacturer one point with the grouped (by manufacturer) mean of displ.
The following works so far:
ggplot(mpg %>%
group_by(manufacturer) %>%
summarise(mean_displ = mean(displ))) +
geom_point(aes(x = manufacturer, y = mean_displ)) +
guides(x = guide_axis(angle = 90))
Now I want to show the points in ascending order according to their displ value. Or: I want to sort the manufacturer variable on the x-axis according to the corresponding mean_displ value.
I tried to insert a arrange(mean_displ) statement in my dplyr chain. No success.
So I introduced a dummy variable x that produces the plot I want, but now the labeling is gone..
ggplot(mpg %>%
group_by(manufacturer) %>%
summarise(mean_displ = mean(displ)) %>%
arrange(mean_displ) %>%
mutate(x = 1:15)) +
geom_point(aes(x = x, y = mean_displ))
How can I get the later plot but with the labeling from above?
fct_reorder from the forcats package can order the levels of a factor.
library(tidyverse)
ggplot(mpg %>%
group_by(manufacturer) %>%
summarise(mean_displ = mean(displ))) +
geom_point(aes(x = fct_reorder(manufacturer, mean_displ), y = mean_displ)) +
guides(x = guide_axis(angle = 90))

Annotate facet plot with grouped variables

I would like to place the numbers of observations above a facet boxplot. Here is an example:
exmp = mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n())
ggplot(exmp, aes(x = am, fill = gear, y = wt)) +
facet_grid(.~cyl) +
geom_boxplot() +
geom_text(aes(y = 6, label = N))
So, I already created column N to get the label over each box in the boxplot (combination of cyl, am and gear). How do I plot these labels so that they are over the respective box? Please note that the number of levels of gear for each level of am differs on purpose.
I really looked at a lot of ggplot tutorials and there are tons of questions dealing with annotating in facet plots. But none addressed this fairly common problem...
You need to give position_dodge() inside geom_textto match the position of the boxes, also define data argument to get the distinct value of observations:
ggplot(exmp, aes(x = as.factor(am), fill = as.factor(gear), y = wt)) +
geom_boxplot() +
facet_grid(.~cyl) +
geom_text(data = dplyr::distinct(exmp, N),
aes(y = 6, label = N), position = position_dodge(0.9))
One minor issue here is that you are printing the N value once for every data point, not once for every cyl/am/gear combination. So you might want to add a filtering step to avoid overplotting that text, which can look messy on screen, reduce your control over alpha, and slow down plotting in cases with larger data.
library(tidyverse)
exmp = mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n()) %>%
ungroup() %>%
mutate(am = as.factor(am),
gear = as.factor(gear))
(The data prep above was necessary for me to get the plot to look like your example. I'm using tidyverse 1.2.1 and ggplot2 3.2.1)
ggplot(exmp, aes(x = am, fill = gear, y = wt,
group = interaction(gear, am))) +
facet_grid(.~cyl) +
geom_boxplot() +
geom_text(data = exmp %>% distinct(cyl, gear, am, N),
aes(y = 6, label = N),
position = position_dodge(width = 0.8))
Here's the same chart with overplotting:
Perhaps using position_dodge() in your geom_text() will get you what you want?
mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n()) %>%
ggplot(aes(x = as.factor(am), fill = as.factor(gear), y = wt)) +
geom_boxplot() +
geom_text(aes(y = 6, label = N), position = position_dodge(width = 0.7)) +
facet_grid(.~cyl)

How do you dodge a horizontal linerange in ggplot?

How do you dodge a ggstance::geom_linerangeh in ggplot2?
library(tidyverse)
library(ggstance)
mtcars %>%
group_by(cyl, am) %>%
summarize(lo = mpg %>% min,
hi = mpg %>% max) %>%
ggplot() +
geom_linerangeh(aes(y = am %>%
factor,
xmin = lo,
xmax = hi,
group = am %>%
factor),
position = position_dodgev(height = .25)) +
facet_wrap(~cyl, ncol = 1)
results in :
whereas I would like to see the lines sitting slightly below the horizontals, consistent with the standard behaviour of position_dodge elsewhere.
To get dodging, you need to map colour or linetype to another variable that splits am into sub-categories based on that third variable; otherwise there's only one category for each level of am and therefore nothing to dodge.
For example, let's use vs as that other variable and we'll map it to color. We also add rows (using complete) for missing combinations of am,vs, and cyl to ensure that dodging occurs even for combinations of cyl and am where only one level of vs is present in the data.
library(tidyr)
mtcars %>%
group_by(vs=factor(vs), cyl=factor(cyl), am=factor(am)) %>%
summarize(lo = mpg %>% min,
hi = mpg %>% max) %>%
ungroup() %>%
complete(am, cyl, nesting(vs)) %>%
ggplot() +
geom_linerangeh(aes(y = am, colour=vs, xmin = lo, xmax = hi),
position = position_dodgev(height = 0.5)) +
facet_wrap(~cyl, ncol = 1) +
theme_bw()

Resources