Multi-row labels in ggplot2 - r

I have a plot which contains multiple entries of the same items along the x-axis. I have a total of 45 items grouped according to the groups below.
pvalall$Group<-c(rep("Physical",5*162),rep("Perinatal",11*162),rep("Developmental",3*162),
rep("Lifestyle-Life Events",5*162),rep("Parental-Family",13*162),rep("School",3*162),
rep("Neighborhood",5*162))
pvalall$Group <- factor(pvalall$Group,
levels = c("Physical", "Perinatal", "Developmental",
"Lifestyle-Life Events", "Parental-Family",
"School","Neighborhood"))
So essentially there are 162*45=7290 points along the x-axis and each 162 set of them corresponds to one of the variables of interest. How do I get geom_point to only plot one lable for each of these 162 given a list of the variable names c("var1","var2",....,"var45")?

A reprex would be nice, but generally the solution is to create a separate dataframe with one row per group indicating where the labels should go, and to add a geom_text() layer to your plot that uses this dataframe.
My guess is that the code should look like this:
# create a dataframe for the labels
pvalall %>%
group_by(Group) %>%
summarize(Domains = mean(Domains),
`-log10(P-Values)` = mean(`-log10(P-Values)`)) -> label_df
# now make the plot
pvalall %>%
ggplot(aes(x = Domains, y = `-log10(P-Values)`)) +
geom_point(aes(col = Group)) + # putting col aesthetic in here so that the labels are not colored
geom_text(data =label_df, aes(label = Group))
Here is an example with mtcars:
library(tidyverse)
mtcars %>%
group_by(cyl) %>%
summarize(mpg = mean(mpg),
disp = mean(disp)) %>%
mutate(cyl_label = str_c(cyl, "\ncylinders")) -> label_df
mtcars %>%
ggplot(aes(x = mpg, y = disp)) +
geom_point(aes(col = factor(cyl)), show.legend = F) +
geom_text(data = label_df, aes(label = cyl_label))
produces

Related

Creating a Stacked Percentage Bar Chart in R with ggplot with labels

I have a dataset that has the variables "SEXO" (M or F) and "Class" (0 or 1). I want to create a bar plot using ggplot2 that shows, for each sex, the distribution of Class as a percentage. I was able to get the plot, but I can't seem to get the labels working on the bars itself. I don't want to change the labels on the axis, I just want to get the % shown on the plot for each SEXO.
This is the code I have been using:
ggplot(data = df, aes(x = SEXO, fill = Class)) + geom_bar(position = 'fill')
I also attach an image of the plot produced by the code:
This would be the ideal outcome:
Here an example using the mtcars dataset where you can calculate the percentage per group and use these to place in your bars using label with geom_text like this:
library(ggplot2)
library(dplyr)
mtcars %>%
group_by(am, vs) %>%
summarise(cnt = n()) %>%
mutate(perc = round(cnt/sum(cnt), 2)) %>%
ggplot(aes(x = factor(vs), fill = factor(am), y = perc)) +
geom_col(position = 'fill') +
geom_text(aes(label = paste0(perc*100,"%"), y = perc), position = position_stack(vjust = 0.5), size = 3) +
labs(fill = 'Class', x = 'vs') +
scale_y_continuous(limits = c(0,1))
#> `summarise()` has grouped output by 'am'. You can override using the `.groups`
#> argument.
Created on 2022-11-02 with reprex v2.0.2

Method of ordering groups in ggplot line plot

I have created a plot with the following code:
df %>%
mutate(vars = factor(vars, levels = reord)) %>%
ggplot(aes(x = EI1, y = vars, group = groups)) +
geom_line(aes(color=groups)) +
geom_point() +
xlab("EI1 (Expected Influence with Neighbor)") +
ylab("Variables")
The result is:
While the ei1_other group is in descending order on x, the ei1_gun points are ordered by variables. I would like both groups to follow the same order, such that ei1_gun and ei1_other both start at Drugs and then descend in order of the variables, rather than descending by order of x values.
The issue is that the order by which geom_line connects the points is determined by the value on the x-axis. To solve this issue simply swap x and y and make use of coord_flip.
As no sample dataset was provided I use an example dataset based on mtcars to illustrate the issue and the solution. In my example data make is your vars, value your EI1 and name your groups:
library(ggplot2)
library(dplyr)
library(tidyr)
library(forcats)
example_data <- mtcars %>%
mutate(make = row.names(.)) %>%
select(make, hp, mpg) %>%
mutate(make = fct_reorder(make, hp)) %>%
pivot_longer(-make)
Mapping make on x and value on y results in an unordered line plot as in you example. The reason is that the order by which the points get connected is determined by value:
example_data %>%
ggplot(aes(x = value, y = make, color = name, group = name)) +
geom_line() +
geom_point() +
xlab("EI1 (Expected Influence with Neighbor)") +
ylab("Variables")
In contrast, swapping x and y, i.e. mapping make on x and value on y, and making use of coord_flip gives a nice ordererd line plot as the order by which the points get connected is now determined by make (of course we also have to swap xlab and ylab):
example_data %>%
ggplot(aes(x = make, y = value, color = name, group = name)) +
geom_line() +
geom_point() +
coord_flip() +
ylab("EI1 (Expected Influence with Neighbor)") +
xlab("Variables")

Display `dplyr` groups ordered by their corresponding value (not name) in `geom_point`

Using the mpg dataset I want to produce a scatterplot that shows for every manufacturer one point with the grouped (by manufacturer) mean of displ.
The following works so far:
ggplot(mpg %>%
group_by(manufacturer) %>%
summarise(mean_displ = mean(displ))) +
geom_point(aes(x = manufacturer, y = mean_displ)) +
guides(x = guide_axis(angle = 90))
Now I want to show the points in ascending order according to their displ value. Or: I want to sort the manufacturer variable on the x-axis according to the corresponding mean_displ value.
I tried to insert a arrange(mean_displ) statement in my dplyr chain. No success.
So I introduced a dummy variable x that produces the plot I want, but now the labeling is gone..
ggplot(mpg %>%
group_by(manufacturer) %>%
summarise(mean_displ = mean(displ)) %>%
arrange(mean_displ) %>%
mutate(x = 1:15)) +
geom_point(aes(x = x, y = mean_displ))
How can I get the later plot but with the labeling from above?
fct_reorder from the forcats package can order the levels of a factor.
library(tidyverse)
ggplot(mpg %>%
group_by(manufacturer) %>%
summarise(mean_displ = mean(displ))) +
geom_point(aes(x = fct_reorder(manufacturer, mean_displ), y = mean_displ)) +
guides(x = guide_axis(angle = 90))

Annotate facet plot with grouped variables

I would like to place the numbers of observations above a facet boxplot. Here is an example:
exmp = mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n())
ggplot(exmp, aes(x = am, fill = gear, y = wt)) +
facet_grid(.~cyl) +
geom_boxplot() +
geom_text(aes(y = 6, label = N))
So, I already created column N to get the label over each box in the boxplot (combination of cyl, am and gear). How do I plot these labels so that they are over the respective box? Please note that the number of levels of gear for each level of am differs on purpose.
I really looked at a lot of ggplot tutorials and there are tons of questions dealing with annotating in facet plots. But none addressed this fairly common problem...
You need to give position_dodge() inside geom_textto match the position of the boxes, also define data argument to get the distinct value of observations:
ggplot(exmp, aes(x = as.factor(am), fill = as.factor(gear), y = wt)) +
geom_boxplot() +
facet_grid(.~cyl) +
geom_text(data = dplyr::distinct(exmp, N),
aes(y = 6, label = N), position = position_dodge(0.9))
One minor issue here is that you are printing the N value once for every data point, not once for every cyl/am/gear combination. So you might want to add a filtering step to avoid overplotting that text, which can look messy on screen, reduce your control over alpha, and slow down plotting in cases with larger data.
library(tidyverse)
exmp = mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n()) %>%
ungroup() %>%
mutate(am = as.factor(am),
gear = as.factor(gear))
(The data prep above was necessary for me to get the plot to look like your example. I'm using tidyverse 1.2.1 and ggplot2 3.2.1)
ggplot(exmp, aes(x = am, fill = gear, y = wt,
group = interaction(gear, am))) +
facet_grid(.~cyl) +
geom_boxplot() +
geom_text(data = exmp %>% distinct(cyl, gear, am, N),
aes(y = 6, label = N),
position = position_dodge(width = 0.8))
Here's the same chart with overplotting:
Perhaps using position_dodge() in your geom_text() will get you what you want?
mtcars %>% as_tibble() %>%
mutate(cartype = as.factor(row.names(mtcars))) %>%
group_by(cyl, am, gear) %>%
mutate(N = n()) %>%
ggplot(aes(x = as.factor(am), fill = as.factor(gear), y = wt)) +
geom_boxplot() +
geom_text(aes(y = 6, label = N), position = position_dodge(width = 0.7)) +
facet_grid(.~cyl)

is it possible to ggplot grouped partial boxplots w/o facets w/ a single `geom_boxplot()`?

I needed to add some partial boxplots to the following plot:
library(tidyverse)
foo <- tibble(
time = 1:100,
group = sample(c("a", "b"), 100, replace = TRUE) %>% as.factor()
) %>%
group_by(group) %>%
mutate(value = rnorm(n()) + 10 * as.integer(group)) %>%
ungroup()
foo %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE)
I would add a grid of (2 x 4 = 8) boxplots (4 per group) to the plot above. Each boxplot should consider a consecutive selection of 25 (or n) points (in each group). I.e., the firsts two boxplots represent the points between the 1st and the 25th (one boxplot below for the group a, and one boxplot above for the group b). Next to them, two other boxplots for the points between the 26th and 50th, etcetera. If they are not in a perfect grid (which I suppose would be both more challenging to obtain and uglier) it would be even better: I prefer if they will "follow" their corresponding smooth line!
That all without using facets (because I have to insert them in a plot which is already facetted :-))
I tried to
bar <- foo %>%
group_by(group) %>%
mutate(cut = 12.5 * (time %/% 25)) %>%
ungroup()
bar %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(aes(x = cut))
but it doesn't work.
I tried to call geom_boxplot() using group instead of x
bar %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(aes(group = cut))
But it draws the boxplots without considering the groups and loosing even the colors (and add a redundant call including color = group doesn't help)
Finally, I decided to try it roughly:
bar %>%
ggplot(aes(x = time, y = value, color = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(data = filter(bar, group == "a"), aes(group = cut)) +
geom_boxplot(data = filter(bar, group == "b"), aes(group = cut))
And it works (maintaining even the correct colors from the main aes)!
Does someone know if it is possible to obtain it using a single call to geom_boxplot()?
Thanks!
This was interesting! I haven't tried to use geom_boxplot with a continuous x before and didn't know how it behaved. I think what is happening is that setting group overrides colour in geom_boxplot, so it doesn't respect either the inherited or repeated colour aesthetic. I think this workaround does the trick; we combine the group and cut variables into group_cut, which takes 8 different values (one for each desired boxplot). Now we can map aes(group = group_cut) and get the desired output. I don't think this is particularly intuitive and it might be worth raising it on the Github, since usually we expect aesthetics to combine nicely (e.g. combining colour and linetype works fine).
library(tidyverse)
bar <- tibble(
time = 1:100,
group = sample(c("a", "b"), 100, replace = TRUE) %>% as.factor()
) %>%
group_by(group) %>%
mutate(
value = rnorm(n()) + 10 * as.integer(group),
cut = 12.5 * ((time - 1) %/% 25), # modified this to prevent an extra boxplot
group_cut = str_c(group, cut)
) %>%
ungroup()
bar %>%
ggplot(aes(x = time, y = value, colour = group)) +
geom_point() +
geom_smooth(se = FALSE) +
geom_boxplot(aes(group = group_cut), position = "identity")
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Created on 2019-08-13 by the reprex package (v0.3.0)

Resources