ggplot: add mean value to a stacked barplot (secondary axis) - r

I am dealing with survey data and now I am trying to add text-labels to a stacked bar plot. What am I doing wrong?
# Sample Data
n <- 100
df <- data.frame(item = sample(paste("Item", 1:4), size=n, replace=TRUE),
value = sample(1:5, size=n, replace=TRUE))
# Create stacked barplot
df %>% group_by(item) %>%
count(value) %>%
mutate(percent = 1 / sum(n) * n,
answer = factor(value, ordered=TRUE)) %>%
ggplot(aes(x = item, y = percent, fill = fct_rev(answer))) +
geom_col() +
scale_y_continuous(labels = scales::percent) +
geom_text(aes(label = round(percent, 1))) +
labs(fill = "Answer")
I am supposed to add additional mean values for every item. Is there a way to add a secondary axis ranging from 1 to 5 and add the mean values for each item as points to the plot? (even though I know, that statistically this is somewhat questionable as 100% does not really correspond to the maximum value of 5)

You need to specify the position of the labels; at the moment your code places them at their respective positions (i.e. 0.2 is placed at 0.2 on the y axis, and 0.3 is placed at 0.3 on the y axis), but if you add position = position_stack() this should solve your first problem, e.g.
library(tidyverse)
n <- 100
df <- data.frame(item = sample(paste("Item", 1:4), size=n, replace=TRUE),
value = sample(1:5, size=n, replace=TRUE))
# Create stacked barplot
df %>% group_by(item) %>%
count(value) %>%
mutate(percent = 1 / sum(n) * n,
answer = factor(value, ordered=TRUE)) %>%
ggplot(aes(x = item, y = percent, fill = fct_rev(answer))) +
geom_col() +
scale_y_continuous(labels = scales::percent) +
geom_text(aes(label = round(percent, 1)),
position = position_stack(vjust = 0.5)) +
labs(fill = "Answer")
Created on 2022-11-30 with reprex v2.0.2

Related

Gradient alpha centered around 0 in ggplot2?

I would like to plot densities by groups such that the alpha value decreases (more transparent) as the x axis value gets closer to 0.
Based on the data dataset, I generate the alpha column by rescaling the x axis values around 0.
I thought that adding the alpha inside the aes() would work but this throws and error.
library(tidyverse)
library(purrr)
library(scales)
set.seed(123)
data <- tibble(A = rnorm(100),
B = rnorm(100, mean = -0.7),
C = rnorm(100, mean = 1)) %>%
pivot_longer(cols = everything(),
names_to = "model") %>%
group_by(model) %>%
summarise(value = list(value)) %>%
mutate( xval = map(value, ~density(.x)$x),
yval = map(value, ~density(.x)$y)) %>%
select(-value) %>%
unnest(ends_with("val"))
#create alpha column
df <- data %>%
group_by(model) %>%
mutate(myalpha = abs(scale(xval, center = 0)), #scale to center around 0
myalpha2 = scales::rescale_mid(myalpha, mid = 0) #rescale 0-1, 0 for values around 0
) %>%
as_tibble()
df %>%
ggplot(aes(x = xval, y = yval,
fill = model, col = model))+
geom_line()+
geom_vline(xintercept = 0)+
geom_density(aes(alpha = myalpha2), #alpha white around 0
stat = "identity")+
scale_fill_manual(values = c("red", "pink", "orange"))+
scale_alpha_identity()
#> Error in `f()`:
#> ! Aesthetics can not vary with a ribbon
Created on 2022-09-11 by the reprex package (v2.0.1)
You cannot yet have a gradient fill in native ggplot (this includes gradients on the alpha channel). You can give the appearance of gradient fills using vertical line segments whose individual alpha values change along the x axis though.
Note that your alpha calculation isn't quite right here. myalpha2 has a minimum of 0.5 at the 0 point, as you can easily check with min(df$myalpha2).
To fix this, and implement the vertical line segment hack, you can do:
df %>%
mutate(myalpha2 = 2 * (as.vector(myalpha2) - 0.5)) %>%
ggplot(aes(x = xval, y = yval))+
geom_line()+
geom_vline(xintercept = 0)+
geom_segment(aes(alpha = myalpha2, xend = xval, yend = 0, color = model),
size = 1) +
scale_color_manual(values = c("red", "pink", "orange"))+
scale_alpha_identity()

Plot a line on a barchart in ggplot2

I have built a stacked bar chart showing the relative proportions of response to different questions. Now I want to show a particular response ontop of that barchart, to show how an individuals response relates to the overall proportions of responses.
I created a toy example here:
library(ggplot2)
n = 1000
n_groups = 5
overall_df = data.frame(
state = sample(letters[1:8], n, replace = TRUE),
frequency = runif(n, min = 0, max = 1),
var_id = rep(LETTERS[1:n_groups], each = 1000 / n_groups)
)
row = data.frame(
A = "a", B = "b", C = "c", D = "h", E = "b"
)
ggplot(overall_df,
aes(fill=state, y=frequency, x=var_id)) +
geom_bar(position="fill", stat="identity")
The goal here is to have the responses in the object row plotted as a point in the corresponding barchart box, with a line connecting the points.
Here is a (poorly drawn) example of the desired result. Thanks for your help.
This was trickier than I thought. I'm not sure there's any way round manually calculating the x/y co-ordinates of the line.
library(dplyr)
library(ggplot2)
df <- overall_df %>% group_by(state, var_id) %>%
summarize(frequency = sum(frequency))
freq <- unlist(Map(function(d, val) {
(sum(d$frequency[d$state > val]) + 0.5 * d$frequency[d$state == val]) /
sum(d$frequency)
}, d = split(df, df$var_id), val = row))
line_df <- data.frame(state = unlist(row),
frequency = freq,
var_id = names(row))
ggplot(df, aes(fill=state, y=frequency, x=var_id)) +
geom_col(position="fill") +
geom_line(data = line_df, aes(group = 1)) +
geom_point(data = line_df, aes(group = 1))
Created on 2022-03-08 by the reprex package (v2.0.1)
Here's an automated approach using dplyr. I prepare the summary by joining the label data to the original data, and then using group_by + summarize to get those.
library(dplyr)
row_df <- data.frame(state = letters[1:n_groups], var_id = LETTERS[1:n_groups])
line_df <- row_df %>%
left_join(overall_df, by = "var_id") %>%
group_by(var_id) %>%
summarize(state = last(state.x),
frequency = (sum(frequency[state.x < state.y]) +
sum(frequency[state.x == state.y])/2) / sum(frequency))
ggplot(overall_df, aes(fill=state, y=frequency, x=var_id)) +
geom_bar(position="fill", stat="identity") +
geom_point(data = line_df) +
geom_line(data = line_df, aes(group = 1))

How do I get a single percentage/proportion plot using ggplot for separate groups?

df <- data.frame(k = sample(1:3, 100, replace = TRUE),
g = sample(1:2, 100, replace = TRUE, prob = c(0.3, 0.7)))
In this data frame I have two groups g which members are in one of three conditions k.
Now, I want to see the proportions of the conditions k in both groups.
ggplot(df, aes(x = k, fill = as.factor(g), y = (..count..)/sum(..count..))) +
geom_bar(position=position_dodge())
That looks nice at first but there is a problem. The group 2 is larger than group 1. Therefore the proportions are not right: It looks as if all conditions were more likely in group 2 than in group 1. I need to calculate the y = (..count..)/sum(..count..) for both groups separately. How do I do this?
Here's how you can do it:
library(tidyverse)
df %>%
group_by(g) %>%
count(k) %>%
mutate(share = n / sum(n)) %>%
ggplot(aes(x = k, fill = as.factor(g), y = share)) +
geom_col(position = position_dodge())

Adding group-specific text/data to faceted plot in R/ggplot2

I am comparing the intra-group correlation between duplicate samples within a large gene expression experiment, where I have multiple separate biological groups - the idea being to see if any of the groups is much less well-correlated than the others, indicating a potential sample mixup or other error.
I am using ggplot to plot the expression values of each duplicate pair against each other. I would like to also be able to add the correlation coefficient and p-value to each panel of the plot, which I obtain through summarize and cor.test. You can use this code to get the general idea: in exp1, the duplicates are correlated, but not in exp2.
library(tidyverse)
df <- data.frame(exp=c(rep('exp1', 100), rep('exp2', 100)), a=rnorm(200, 1000, 200))
df <- mutate(df, b=ifelse(exp=='exp1', a*rnorm(100,1,0.05), rnorm(100, 1000, 200)))
head(df)
tail(df)
df %>% ggplot(aes(x=a, y=b))+
geom_point() +
facet_wrap(~exp)
group_by(df, exp) %>%
summarize(corr=cor.test(a,b)$estimate, pval=cor.test(a,b)$p.value)
This is the plot I generated via ggplot, and I've manually added the R and p-values that I got at the end. But of course, if I have a lot of sample pairs to analyze, it would be nice to be able to add these automatically from within the ggplot call. I'm just not sure how to do it.
If, for whatever reason, you want to build this yourself instead of using the ggpubr functions, you can create your summary data, format labels, and place the labels with geom_text.
I'm formatting the stats so that R has a fixed 3 significant digits and p has 3 digits, falling back on scientific notation. I changed the names of those columns in summarise to R and p to make the labels below. Reshaping to long data and creating a new column with unite gets this:
library(tidyverse)
...
group_by(df, exp) %>%
summarize(R = cor.test(a, b)$estimate, p = cor.test(a, b)$p.value) %>%
mutate(R = formatC(R, format = "fg", digits = 3),
p = formatC(p, format = "g", digits = 3)) %>%
gather(key = measure, value = value, -exp) %>%
unite("stat", measure, value, sep = " = ")
#> # A tibble: 4 x 2
#> exp stat
#> <chr> <chr>
#> 1 exp1 R = 0.965
#> 2 exp2 R = 0.0438
#> 3 exp1 p = 1.14e-58
#> 4 exp2 p = 0.665
Then for each of the groups, I want to collapse both labels, separated by a newline \n. This is a place that will scale well—you might have more summary stats to display, but this should still work.
summ <- group_by(df, exp) %>%
summarize(R = cor.test(a, b)$estimate, p = cor.test(a, b)$p.value) %>%
mutate(R = formatC(R, format = "fg", digits = 3),
p = formatC(p, format = "g", digits = 3)) %>%
gather(key = measure, value = value, -exp) %>%
unite("stat", measure, value, sep = " = ") %>%
group_by(exp) %>%
summarise(both_stats = paste(stat, collapse = "\n"))
summ
#> # A tibble: 2 x 2
#> exp both_stats
#> <chr> <chr>
#> 1 exp1 "R = 0.965\np = 1.14e-58"
#> 2 exp2 "R = 0.0438\np = 0.665"
In geom_text, I'm setting the x coordinate to -Inf, which gets the minimum of all x values, and the y coordinate as Inf for the maximum of all y values. That puts the label in the top-left corner, regardless of the values in the data.
The one thing I don't like here is then hacking the hjust and vjust outside their intended ranges of 0 to 1. But nudge_x/nudge_y won't do anything because of the values being set to infinity.
df %>%
ggplot(aes(x = a, y = b)) +
geom_point() +
geom_text(aes(x = -Inf, y = Inf, label = both_stats), data = summ,
hjust = -0.1, vjust = 1.1, lineheight = 1) +
facet_wrap(~ exp)
Created on 2018-11-14 by the reprex package (v0.2.1)
We can use the stat_cor function from the ggpubr package.
set.seed(123)
library(dplyr)
library(ggplot2)
library(ggpubr)
df <- data.frame(exp=c(rep('exp1', 100), rep('exp2', 100)), a=rnorm(200, 1000, 200))
df <- mutate(df, b=ifelse(exp=='exp1', a*rnorm(100,1,0.05), rnorm(100, 1000, 200)))
ggplot(df, aes(x=a, y=b))+
geom_point() +
facet_wrap(~exp) +
stat_cor(method = "pearson")
Similar to the answer of camille, but you can do all in one run
library(tidyverse)
set.seed(123)
df %>%
group_by(exp) %>%
mutate(p = cor.test(a, b)$p.value,
rho = cor.test(a, b)$estimate) %>%
mutate_at(vars(p, rho), signif, 2) %>%
ggplot(aes(x=a, y=b)) +
geom_point() +
geom_text(data = . %>% distinct(p, rho, exp),
aes(x = -Inf, y = Inf,label = paste("p=",p,"\nrho=",rho)),
hjust = -0.1, vjust = 1.1, lineheight = 1) +
facet_wrap(~exp)

ggplot/GGally - Parallel Coordinates - y-axis labels

Does anyone know if there is a way to add variable labels to the ggparcoord function in GGally? I've tried numerous ways with geom_text, but nothing is yielding results.
To be more explicit, I am looking to pass the row.names(mtcars) through geom_text. The only way that I can distinguish the car is passing row.names(mtcars) through the groupColumn argument, but I don't like the way this looks.
Doesn't work:
mtcars$carName <- row.names(mtcars) # This becomes column 12
library(GGally)
# Attempt 1
ggparcoord(mtcars,
columns = c(12, 1, 6),
groupColumn = 1) +
geom_text(aes(label = carName))
# Attempt 2
ggparcoord(mtcars,
columns = c(12, 1, 6),
groupColumn = 1,
mapping = aes(label = carName))
Any ideas would be appreciated!
Solution 1: If you want to stick close to your original attempt, you can calculate the appropriate y coordinates for the car names, & add that as a separate data source. Use inherit.aes = FALSE so that this geom_text layer doesn't inherit anything from the ggplot object created using ggparcoord():
library(dplyr)
p1 <- ggparcoord(mtcars,
columns = c(12, 1, 6),
groupColumn = 1) +
geom_text(data = mtcars %>%
select(carName) %>%
mutate(x = 1,
y = scale(as.integer(factor(carName)))),
aes(x = x, y = y, label = carName),
hjust = 1.1,
inherit.aes = FALSE) +
# optional: remove "carName" from x-axis labels
scale_x_discrete(labels = function(x) c("", x[-1])) +
# also optional: hide legend, which doesn't really seem relevant here
theme(legend.position = "none")
p1
Solution 2: This alternative uses carName as the group column, & doesn't pass it as one of the parallel coordinate columns. (which I think this might be closer to the use cases intended by this function...) Specifying carName as the group column allows the car name values to be captured in the data slot of the ggplot object created by ggparcoord() this time, so our geom_text label can inherit it directly, & even filter only for rows corresponding to variable == "mpg" (or whatever the first of the parallel coordinate columns is named, in the actual use case). The y coordinates are not as evenly spread out as above, but geom_text_repel from the ggrepel package does a decent job at shifting overlapping text labels away from one another.
library(dplyr)
library(ggrepel)
p2 <- ggparcoord(mtcars,
columns = c(1, 6),
groupColumn = "carName") +
geom_text_repel(data = . %>%
filter(variable == "mpg"),
aes(x = variable, y = value, label = carName),
xlim = c(NA, 1)) + # limit repel region to the left of the 1st column
theme(legend.position = "none") # as before, hide legend since the labels
# are already in the plot
p2
Solution 3 / 4: You can actually plot the same with ggplot(), without relying on extensions that may do unexpected stuff behind the scenes:
library(dplyr)
library(tidyr)
library(ggrepel)
# similar output to solution 1
p3 <- mtcars %>%
select(carName, mpg, wt) %>%
mutate(carName.column = as.integer(factor(carName))) %>%
gather(variable, value, -carName) %>%
group_by(variable) %>%
mutate(value = scale(value)) %>%
ungroup() %>%
ggplot(aes(x = variable, y = value, label = carName, group = carName)) +
geom_line() +
geom_text(data = . %>% filter(variable == "carName.column"),
hjust = 1.1) +
scale_x_discrete(labels = function(x) c("", x[-1]))
p3
# similar output to solution 2
p4 <- mtcars %>%
select(carName, mpg, wt) %>%
gather(variable, value, -carName) %>%
group_by(variable) %>%
mutate(value = scale(value)) %>%
ungroup() %>%
ggplot(aes(x = variable, y = value, label = carName, group = carName)) +
geom_line() +
geom_text_repel(data = . %>% filter(variable == "mpg"),
xlim = c(NA, 1))
p4
Edit
You can add text labels on the right as well, for each of the above. Do note that the location for labels may not be nicely spaced out, since they are positioned according to wt's scaled values:
p1 +
geom_text(data = mtcars %>%
select(carName, wt) %>%
mutate(x = 3,
y = scale(wt)),
aes(x = x, y = y, label = carName),
hjust = -0.1,
inherit.aes = FALSE)
p2 +
geom_text_repel(data = . %>%
filter(variable == "wt"),
aes(x = variable, y = value, label = carName),
xlim = c(2, NA))
p3 +
geom_text(data = . %>% filter(variable == "wt"),
hjust = -0.1)
p4 +
geom_text_repel(data = . %>% filter(variable == "wt"),
xlim = c(2, NA))

Resources