Create ggplots with the same scale in R - r

I'd like to do the following in R: I have 2 datasets (one consisting of 4, the other of 3 values) and I'd like to plot them with ggplot2 as bar charts (separately). However, I'd like to use the same scale for the both, i.e.: if the minimum value of dataset #1 is 0.2 and 0.4 of dataset #2, then I want to use 0.2 for both. Same applies for the maximum values (choosing the greater there).
So, basically, I want to make the 2 plots comparable. Of course, would be great to apply the common scale for coloring the bars, as well. Now, I'm using colorRampPalette and applying it in the scale_fill_gradient2 property.
A MWE provided below:
library("ggplot2")
val <- c(0.2, 0.35, 0.5, 0.65)
labels <- c('A', 'B', 'C', 'D')
LtoM <-colorRampPalette(c('green', 'yellow'))
df <- data.frame(val)
bar <- ggplot(data = df,
aes(x = factor(labels),
y = val,
fill = val)) +
geom_bar(stat = 'identity') +
scale_fill_gradient2(low=LtoM(100), mid='snow3',
high=LtoM(100), space='Lab') +
geom_text(aes(label = val), vjust = -1, fontface = "bold") +
labs(title = "Title", y = "Value", x = "Methods") +
theme(legend.position = "none")
print(bar)
Given the code above, and another dataset like c(0.4, 0.8, 1.2) with labels c('E', 'F', 'G'), how to adjust the code to create 2 different and separated plots (saved into PNGs finally, i.e.) but use the common (0.2 to 1.2) scale for both the heights of bars and their colors (so moving the images exactly next to each other indicates that the bars with the same height but belonging to different images appear in the same way and their colors are the same)?

We can use a mix of the breaks argument in scale_y_continuous to ensure that we have consistent axis ticks, then use coord_cartesian to ensure that we force both plots to have the same y-axis range.
df1 <- data.frame(val = c(0.2, 0.35, 0.5, 0.65), labels = c('A', 'B', 'C', 'D'))
df2 <- data.frame(val = c(0.4, 0.8, 1.2), labels = c('E', 'F', 'G'))
g_plot <- function(df) {
ggplot(data = df,
aes(x = factor(labels),
y = val,
fill = val)) +
geom_bar(stat = 'identity') +
scale_fill_gradient2(low=LtoM(100), mid='snow3',
high=LtoM(100), space='Lab') +
geom_text(aes(label = val), vjust = -1, fontface = "bold") +
scale_y_continuous(breaks = seq(0, 1.2, 0.2)) +
coord_cartesian(ylim = c(0, 1.2)) +
labs(title = "Title", y = "Value", x = "Methods") +
theme(legend.position = "none")
}
bar1 <- g_plot(df1);
bar2 <- g_plot(df2);
gridExtra::grid.arrange(bar1, bar2, ncol = 2);

You actually dont need to use coord_cartesian. You can just use the limits argument in scale_y_continuous, like this:
scale_y_continuous(limits = c(0,1.2), breaks = seq(0, 1.2, 0.2))

Related

Position stacked identity data sample size as geom_text directly over a bar using geom_bar from ggplot2

In this experiment, we tracked presence or absence of bacterial infection in our subject animals. We were able to isolate which type of bacteria was present in our animals and created a plot that has Week Since Experiment Start on the X axis, and Percentage of Animals Positive for bacterial infection on the Y axis. This is a stacked identity ggplot where each geom_bar contains the different identities of the bacteria that were in the infected animals each week. Here is a sample dataset with the corresponding ggplot code and result:
DummyData <- data.frame(matrix(ncol = 5, nrow = 78))
colnames(DummyData) <- c('WeeksSinceStart','BacteriaType','PositiveOccurences','SampleSize','NewSampleSize')
DummyData$WeeksSinceStart <- c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,9,9,9,9,9,10,10,10,10)
DummyData$BacteriaType <- c("BactA","BactB","BactD","BactB","BactE","BactA","BactS","BactF","BactE","BactH","BactJ","BactK","BactE","BactB","BactS","BactF","BactL","BactE","BactW","BactH","BactS","BactJ","BactQ","BactN","BactW","BactA","BactD","BactE","BactA","BactC","BactD","BactK","BactL","BactE","BactD","BactA","BactS","BactK","BactB","BactE","BactF","BactH","BactN","BactE","BactL","BactZ","BactE","BactC","BactR","BactD","BactJ","BactN","BactK","BactW","BactR","BactE","BactW","BactA","BactM","BactG","BactO","BactI","BactE","BactD","BactM","BactH","BactC","BactM","BactW","BactA","BactL","BactB","BactE","BactA","BactS","BactH","BactQ","BactF")
PosOcc <- seq(from = 1, to = 2, by = 1)
DummyData$PositiveOccurences <- rep(PosOcc, times = 13)
DummyData$SampleSize <- c(78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,78,29,29,29,29,29,10,10,10,10)
DummyData$NewSampleSize <- c(78,NA,NA,NA,NA,NA,NA,NA,NA,78,NA,NA,NA,NA,NA,NA,NA,NA,78,NA,NA,NA,NA,NA,NA,NA,78,NA,NA,NA,NA,NA,NA,NA,NA,78,NA,NA,NA,NA,NA,NA,NA,NA,NA,78,NA,NA,NA,NA,NA,NA,NA,78,NA,NA,NA,NA,NA,NA,NA,NA,78,NA,NA,NA,NA,NA,NA,29,NA,NA,NA,NA,10,NA,NA,NA)
numcolor <- 20
plotcolors <- colorRampPalette(brewer.pal(8, "Set3"))(numcolor)
#GGplot for Dummy Data
DummyDataPlot <- ggplot(DummyData, aes(x = WeeksSinceStart, y = PositiveOccurences/SampleSize, fill = BacteriaType)) + geom_bar(position = "stack", stat = "identity") +
geom_text(label = DummyData$NewSampleSize, nudge_y = 0.1) +
scale_y_continuous(limits = c(0,0.6), breaks = seq(0, 1, by = 0.1)) + scale_x_continuous(limits = c(0.5,11), breaks = seq(0,10, by =1)) +
labs(
x = "Weeks Since Start",
y = "Proportion Positive") +
scale_fill_manual(values = plotcolors)
The problem: I cannot seem to find a way to position the labels from geom_text directly over each bar. I would also love to add the text "n = " to the sample size value directly over each bar. Thank you for your help!
I have tried different values for position_dodge statement and nudge_y statement with no success.
Sometimes the easiest approach is to do some data wrangling, i.e. one option would be to create a separate dataframe for your labels:
library(ggplot2)
library(dplyr)
dat_label <- DummyData |>
group_by(WeeksSinceStart) |>
summarise(y = sum(PositiveOccurences / SampleSize), SampleSize = unique(SampleSize))
ggplot(DummyData, aes(x = WeeksSinceStart, y = PositiveOccurences / SampleSize, fill = BacteriaType)) +
geom_bar(position = "stack", stat = "identity") +
geom_text(data = dat_label, aes(x = WeeksSinceStart, y = y, label = SampleSize), inherit.aes = FALSE, nudge_y = .01) +
#scale_y_continuous(limits = c(0, 0.6), breaks = seq(0, 1, by = 0.1)) +
scale_x_continuous(limits = c(0.5, 11), breaks = seq(0, 10, by = 1)) +
labs(
x = "Weeks Since Start",
y = "Proportion Positive"
) +
scale_fill_manual(values = plotcolors)

ggforce facet_zoom error with ggplot2 on R

I have a data.frame in R 4.0.2 with a continuous variable in one column and two possible values of a categorical variable (variable 'type': known or novel) in another, which I use to color them differently (using a palette from ggsci 2.9 package). I represent an histogram (stat_bin) with ggplot2 3.3.2 and I want to use the facet_zoom function of ggforce 0.3.2 to zoom only the data belonging to one of the 'types' (using the option zoom.data, as it is done in the volcano example on http://cran.univ-paris1.fr/web/packages/ggforce/vignettes/Visual_Guide.html#contextual-zoom), however I get this error:
Error: Aesthetics must be either length 1 or the same as the data (2000): x
Reproducible example:
library(ggplot2)
library(ggsci)
library(ggforce)
testdata <- as.data.frame(sort(rnorm(1000)))
testdata$type <- "known"
testdata[501:1000,2] <- "novel"
# Working code
ggplot(testdata) +
stat_bin(aes(x=testdata[,1], fill = type), binwidth = 1, color="white") +
scale_fill_npg() + theme_light() +
facet_zoom(xlim = c(0, 4), ylim = c(0, 300), horizontal = TRUE, zoom.size = 0.3)
# Desired code
ggplot(testdata) +
stat_bin(aes(x=testdata[,1], fill = type), data = cbind(testdata, zoom = FALSE), binwidth = 1, color="white") +
stat_bin(aes(x=testdata[testdata$type == "novel",1]), data = cbind(testdata, zoom = TRUE), binwidth = 0.5) +
scale_fill_npg() + theme_light() +
facet_zoom(xlim = c(0, 4), ylim = c(0, 300), horizontal = TRUE, zoom.size = 0.3, zoom.data = zoom)
Thanks!
The issue is that you pass the whole dataset as data in the second stat_bin. Simply pass the subsetted df instead of trying to subset in aes():
BTW: I also renamed the first variable in your data as x.
library(ggplot2)
library(ggsci)
library(ggforce)
set.seed(42)
testdata <- data.frame(x = sort(rnorm(1000)))
testdata$type <- "known"
testdata[501:1000,2] <- "novel"
# Desired code
ggplot(testdata) +
stat_bin(aes(x = x, fill = type), data = cbind(testdata, zoom = FALSE), binwidth = 1, color="white") +
stat_bin(aes(x = x), data = cbind(testdata[testdata$type == "novel", ], zoom = TRUE), binwidth = 0.5) +
scale_fill_npg() + theme_light() +
facet_zoom(xlim = c(0, 4), ylim = c(0, 300), horizontal = TRUE, zoom.size = 0.3, zoom.data = zoom)
To only show the type == "novel" data in the zoomed plot, try this:
library(tidyverse)
library(ggsci)
library(ggforce)
testdata <- data.frame(values = sort(rnorm(1000)))
testdata$type <- "known"
testdata[501:1000,2] <- "novel"
# Desired code
ggplot(testdata) +
stat_bin(aes(x = values, fill = type),
binwidth = 1, color="white") +
scale_fill_npg() + theme_light() +
facet_zoom(zoom.data = ifelse(type == "novel", NA, FALSE),
xlim = c(0, 4), ylim = c(0, 300),
horizontal = TRUE)

Plot the confidence band with ggplot2

I have a dataset that is similar to this :
x <- data.frame(date = c(20190902, 20190903, 20190904),
Group = c(rep("A", 3)),
mean = c(2.5, 3.4, 4.6),
ci_upper = c(1.2, 0.5, 0.3),
ci_lower = c(0.5, 0.4, 0.25))
y <- data.frame(date= c(20190902, 20190903, 20190904),
Group = c(rep("B", 3)),
mean = c(0.4, 3.8, 6.2),
ci_upper = c(1.9, 0.9, 0.5),
ci_lower = c(0.5, 0.8, 0.8))
df <- rbind(x, y)
I would like to plot the confidence band across the timeframe, with 2 different groups (A and B).
Currently I'm using this method but didn't work:
p <- ggplot(df) +
geom_line(aes(y = mean, x = date, group = type ))+
geom_ribbon(aes(ymin = ci_lower, ymax = ci_upper, x = week, fill = "grey70"), alpha = 0.3)+
scale_colour_manual("", values = "blue")+
scale_fill_manual("", values = "grey12")
I wasn't sure how I can approach this.
You are almost there. Only some small corrections of the aes() are needed.
But first I would slightly modify the input just to make the result looking prettier (now the ci_upper/ci_lower are not always more/less as compared with a corresponding mean value):
# to ensure reproducibility of the samples
set.seed(123)
df$ci_lower <- df$mean - sample(nrow(x))
df$ci_upper <- df$mean + sample(nrow(x))
The main thing which should be changed in your ggplot() call is definition of the aesthetics which will be used for plotting. Note, please, that default aesthetics values should be set only once.
p <- ggplot(df,
aes(x = as.Date(as.character(date), format = "%Y%m%d"),
y = mean,
group = Group, col = Group, fill = Group)) +
geom_line() +
geom_ribbon(aes(ymin = ci_lower, ymax = ci_upper), alpha = 0.3)+
scale_colour_manual("", values = c("red", "blue")) +
scale_fill_manual("", values = c("red", "blue"))
The result is as follows:
Actually, the last two code rows are even not necessary, as the default ggplot-color scheme (which you have used to show the desired result) looks very nice, also.

How to stop ggrepel labels moving between gganimate frames in R/ggplot2?

I would like to add labels to the end of lines in ggplot, avoid them overlapping, and avoid them moving around during animation.
So far I can put the labels in the right place and hold them static using geom_text, but the labels overlap, or I can prevent them overlapping using geom_text_repel but the labels do not appear where I want them to and then dance about once the plot is animated (this latter version is in the code below).
I thought a solution might involve effectively creating a static layer in ggplot (p1 below) then adding an animated layer (p2 below), but it seems not.
How do I hold some elements of a plot constant (i.e. static) in an animated ggplot? (In this case, the labels at the end of lines.)
Additionally, with geom_text the labels appear as I want them - at the end of each line, outside of the plot - but with geom_text_repel, the labels all move inside the plotting area. Why is this?
Here is some example data:
library(dplyr)
library(ggplot2)
library(gganimate)
library(ggrepel)
set.seed(99)
# data
static_data <- data.frame(
hline_label = c("fixed_label_1", "fixed_label_2", "fixed_label_3", "fixed_label_4",
"fixed_label_5", "fixed_label_6", "fixed_label_7", "fixed_label_8",
"fixed_label_9", "fixed_label_10"),
fixed_score = c(2.63, 2.45, 2.13, 2.29, 2.26, 2.34, 2.34, 2.11, 2.26, 2.37))
animated_data <- data.frame(condition = c("a", "b")) %>%
slice(rep(1:n(), each = 10)) %>%
group_by(condition) %>%
mutate(time_point = row_number()) %>%
ungroup() %>%
mutate(score = runif(20, 2, 3))
and this is the code I am using for my animated plot:
# colours for use in plot
condition_colours <- c("red", "blue")
# plot static background layer
p1 <- ggplot(static_data, aes(x = time_point)) +
scale_x_continuous(breaks = seq(0, 10, by = 2), expand = c(0, 0)) +
scale_y_continuous(breaks = seq(2, 3, by = 0.10), limits = c(2, 3), expand = c(0, 0)) +
# add horizontal line to show existing scores
geom_hline(aes(yintercept = fixed_score), alpha = 0.75) +
# add fixed labels to the end of lines (off plot)
geom_text_repel(aes(x = 11, y = fixed_score, label = hline_label),
hjust = 0, size = 4, direction = "y", box.padding = 1.0) +
coord_cartesian(clip = 'off') +
guides(col = F) +
labs(title = "[Title Here]", x = "Time", y = "Mean score") +
theme_minimal() +
theme(panel.grid.minor = element_blank(),
plot.margin = margin(5.5, 120, 5.5, 5.5))
# animated layer
p2 <- p1 +
geom_point(data = animated_data,
aes(x = time_point, y = score, colour = condition, group = condition)) +
geom_line(data = animated_data,
aes(x = time_point, y = score, colour = condition, group = condition),
show.legend = FALSE) +
scale_color_manual(values = condition_colours) +
geom_segment(data = animated_data,
aes(xend = time_point, yend = score, y = score, colour = condition),
linetype = 2) +
geom_text(data = animated_data,
aes(x = max(time_point) + 1, y = score, label = condition, colour = condition),
hjust = 0, size = 4) +
transition_reveal(time_point) +
ease_aes('linear')
# render animation
animate(p2, nframes = 50, end_pause = 5, height = 1000, width = 1250, res = 120)
Suggestions for consideration:
The specific repelling direction / amount / etc. in geom_text_repel is determined by a random seed. You can set seed to a constant value in order to get the same repelled positions in each frame of animation.
I don't think it's possible for repelled text to go beyond the plot area, even if you turn off clipping & specify some repel range outside plot limits. The whole point of that package is to keep text labels away from one another while remaining within the plot area. However, you can extend the plot area & use geom_segment instead of geom_hline to plot the horizontal lines, such that these lines stop before they reach the repelled text labels.
Since there are more geom layers using animated_data as their data source, it would be cleaner to put animated_data & associated common aesthetic mappings in the top level ggplot() call, rather than static_data.
Here's a possible implementation. Explanation in annotations:
p3 <- ggplot(animated_data,
aes(x = time_point, y = score, colour = condition, group = condition)) +
# static layers (assuming 11 is the desired ending point)
geom_segment(data = static_data,
aes(x = 0, xend = 11, y = fixed_score, yend = fixed_score),
inherit.aes = FALSE, colour = "grey25") +
geom_text_repel(data = static_data,
aes(x = 11, y = fixed_score, label = hline_label),
hjust = 0, size = 4, direction = "y", box.padding = 1.0, inherit.aes = FALSE,
seed = 123, # set a constant random seed
xlim = c(11, NA)) + # specify repel range to be from 11 onwards
# animated layers (only specify additional aesthetic mappings not mentioned above)
geom_point() +
geom_line() +
geom_segment(aes(xend = time_point, yend = score), linetype = 2) +
geom_text(aes(x = max(time_point) + 1, label = condition),
hjust = 0, size = 4) +
# static aesthetic settings (limits / expand arguments are specified in coordinates
# rather than scales, margin is no longer specified in theme since it's no longer
# necessary)
scale_x_continuous(breaks = seq(0, 10, by = 2)) +
scale_y_continuous(breaks = seq(2, 3, by = 0.10)) +
scale_color_manual(values = condition_colours) +
coord_cartesian(xlim = c(0, 13), ylim = c(2, 3), expand = FALSE) +
guides(col = F) +
labs(title = "[Title Here]", x = "Time", y = "Mean score") +
theme_minimal() +
theme(panel.grid.minor = element_blank()) +
# animation settings (unchanged)
transition_reveal(time_point) +
ease_aes('linear')
animate(p3, nframes = 50, end_pause = 5, height = 1000, width = 1250, res = 120)

R stacked % frequency histogram with percentage of aggregated data based on

I believe my question is very similar to this post. Only difference is my aes fill is a factor with multiple levels. This what I am after
and this is how far I have gotten
set.seed(123)
n = 100
LoanStatus = sample(c('Chargedoff', 'Completed', 'Current', 'Defaulted', 'PastDue'), n, replace = T, prob = NULL)
ProsperScore = sample(1:11, n, replace = T, prob = NULL)
df = data.frame(ProsperScore,factor(LoanStatus))
df = data.frame(ProsperScore,LoanStatus)
probs = data.frame(prop.table(table(df),1))
Code for the stacked bar plot could look something like this:
library(ggplot2)
brks <- c(0, 0.25, 0.5, 0.75, 1)
ggplot(data=probs,aes(x=ProsperScore,y=Freq,fill=LoanStatus)) +
geom_bar(stat="identity") +
scale_y_continuous(breaks = brks, labels = scales::percent(brks)) +
scale_x_discrete(breaks = c(3,6,9))
More complete code, demonstrating how you would go about adding percentages to the plot, is here:
library(ggplot2)
library(plyr)
brks <- c(0, 0.25, 0.5, 0.75, 1)
probs <- probs %>% dplyr::group_by(ProsperScore) %>%
dplyr::mutate(pos=cumsum(Freq)-(Freq*0.5)) %>%
dplyr::mutate(pos=ifelse(Freq==0,NA,pos))
probs$LoanStatus <- factor(probs$LoanStatus, levels = rev(levels(probs$LoanStatus)))
ggplot(data=probs,aes(x=ProsperScore,y=Freq,fill=LoanStatus)) +
geom_bar(stat="identity") +
scale_y_continuous(breaks = brks, labels = scales::percent(brks)) +
scale_x_discrete(breaks = c(3,6,9)) +
geom_text(data=probs, aes(x = ProsperScore, y = pos,
label = paste0(round(100*Freq),"%")), size=2)
To only show the percentages in the first column of the graph, add %>%
dplyr::mutate(pos=ifelse(ProsperScore==1,pos,NA)) to the dplyr calls.

Resources