Population pyramid plot with ggplot2 and dplyr (instead of plyr) - r

I am trying to reproduce the simple population pyramid from the post Simpler population pyramid in ggplot2
using ggplot2 and dplyr (instead of plyr).
Here is the original example with plyr and a seed
set.seed(321)
test <- data.frame(v=sample(1:20,1000,replace=T), g=c('M','F'))
require(ggplot2)
require(plyr)
ggplot(data=test,aes(x=as.factor(v),fill=g)) +
geom_bar(subset=.(g=="F")) +
geom_bar(subset=.(g=="M"),aes(y=..count..*(-1))) +
scale_y_continuous(breaks=seq(-40,40,10),labels=abs(seq(-40,40,10))) +
coord_flip()
Works fine.
But how can I generate this same plot with dplyr instead? The example uses plyr in the subset = .(g == statements.
I have tried the following with dplyr::filter but got an error:
require(dplyr)
ggplot(data=test,aes(x=as.factor(v),fill=g)) +
geom_bar(dplyr::filter(test, g=="F")) +
geom_bar(dplyr::filter(test, g=="M"),aes(y=..count..*(-1))) +
scale_y_continuous(breaks=seq(-40,40,10),labels=abs(seq(-40,40,10))) +
coord_flip()
Error in get(x, envir = this, inherits = inh)(this, ...) :
Mapping should be a list of unevaluated mappings created by aes or aes_string

You avoid the error by specifying the argument data in geom_bar:
ggplot(data = test, aes(x = as.factor(v), fill = g)) +
geom_bar(data = dplyr::filter(test, g == "F")) +
geom_bar(data = dplyr::filter(test, g == "M"), aes(y = ..count.. * (-1))) +
scale_y_continuous(breaks = seq(-40, 40, 10), labels = abs(seq(-40, 40, 10))) +
coord_flip()

You can avoid both dplyr and plyr when making population pyramids with recent versions of ggplot2.
If you have counts of the sizes of age-sex groups then use the answer here
If your data is at the individual level (as yours is) then use the following:
set.seed(321)
test <- data.frame(v=sample(1:20,1000,replace=T), g=c('M','F'))
head(test)
# v g
# 1 20 M
# 2 19 F
# 3 5 M
# 4 6 F
# 5 8 M
# 6 7 F
library("ggplot2")
ggplot(data = test, aes(x = as.factor(v), fill = g)) +
geom_bar(data = subset(test, g == "F")) +
geom_bar(data = subset(test, g == "M"),
mapping = aes(y = - ..count.. ),
position = "identity") +
scale_y_continuous(labels = abs) +
coord_flip()

To build an Age Pyramid with individual data or microdata you can use:
test <- data.frame(v=sample(1:100, 1000, replace=T), g=c('M','F'))
ggplot(data = test, aes(x = v, fill = g)) +
geom_histogram(data = subset(test, g == "F"), binwidth = 5, color="white", position = "identity") +
geom_histogram(data = subset(test, g == "M"), binwidth = 5, color="white", position = "identity",
mapping = aes(y = - ..count.. )) +
scale_x_continuous("Age", breaks = c(seq(0, 100, by=5))) +
scale_y_continuous("Population", breaks = seq(-30, 30, 10), labels = abs) +
scale_fill_discrete(name = "Sex") +
coord_flip() +
theme_bw()
Changing the binwidth in geom_histogram() can group your data in wider categories.
Changing binwidth to 10 and adjusting the axis breaks:
ggplot(data = test, aes(x = v, fill = g)) +
geom_histogram(data = subset(test, g == "F"), binwidth = 10, color="white", position = "identity") +
geom_histogram(data = subset(test, g == "M"), binwidth = 10, color="white", position = "identity",
mapping = aes(y = - ..count.. )) +
scale_x_continuous("Age", breaks = c(seq(0, 100, by = 10))) +
scale_y_continuous("Population", breaks = seq(-100, 100, 10), labels = abs) +
scale_fill_discrete(name = "Sex") +
coord_flip() +
theme_bw()

Related

How to smooth out a time-series geom_area with fill in ggplot?

I have the following graph and code:
Graph
ggplot(long2, aes(x = DATA, y = value, fill = variable)) + geom_area(position="fill", alpha=0.75) +
scale_y_continuous(labels = scales::comma,n.breaks = 5,breaks = waiver()) +
scale_fill_viridis_d() +
scale_x_date(date_labels = "%b/%Y",date_breaks = "6 months") +
ggtitle("Proporcions de les visites, només 9T i 9C") +
xlab("Data") + ylab("% visites") +
theme_minimal() + theme(legend.position="bottom") + guides(fill=guide_legend(title=NULL)) +
annotate("rect", fill = "white", alpha = 0.3,
xmin = as.Date.character("2020-03-16"), xmax = as.Date.character("2020-06-22"),
ymin = 0, ymax = 1)
But it has some sawtooth, how am I supposed to smooth it out?
I believe your situation is roughly analogous to the following, wherein we have missing x-positions for one group, but not the other at the same position. This causes spikes if you set position = "fill".
library(ggplot2)
x <- seq_len(100)
df <- data.frame(
x = c(x[-c(25, 75)], x[-50]),
y = c(cos(x[-c(25, 75)]), sin(x[-50])) + 5,
group = rep(c("A", "B"), c(98, 99))
)
ggplot(df, aes(x, y, fill = group)) +
geom_area(position = "fill")
To smooth out these spikes, it has been suggested to linearly interpolate the data at the missing positions.
# Find all used x-positions
ux <- unique(df$x)
# Split data by group, interpolate data groupwise
df <- lapply(split(df, df$group), function(xy) {
approxed <- approx(xy$x, xy$y, xout = ux)
data.frame(x = ux, y = approxed$y, group = xy$group[1])
})
# Recombine data
df <- do.call(rbind, df)
# Now without spikes :)
ggplot(df, aes(x, y, fill = group)) +
geom_area(position = "fill")
Created on 2022-06-17 by the reprex package (v2.0.1)
P.S. I would also have expected a red spike at x=50, but for some reason this didn't happen.

Set size line plot with different y axis as addition to a stacked barplot

I would like to plot stacked barplot with added line plot that presents the overall set sizes. I'm plotting stacked barplot in ggplot2 without problems however additional line with different y axis is the difficulty. I'm using long-formated table as input, so there is no 'overall size' column.
Code to reproduce sample table:
df <- data.frame(Sample=c("S1","S2","S3","S4","S5","S6"), A=c(30,52,50,81,23,48), B=c(12,20,15,22,30,14), C=c(rep(15,6)))
df.melt <- melt(setDT(df), id.vars = "Sample", variable.name = "Group")
Head of the table:
Sample Group value
1: S1 A 30
2: S2 A 52
3: S3 A 50
4: S4 A 81
5: S5 A 23
6: S6 A 48
Code to draw stacked barplot:
ggplot(df.melt, aes(x = Sample, y = value, fill = Group)) +
geom_col(position = position_fill(reverse = TRUE)) +
theme(axis.text.x=element_text(angle=45, hjust=1), legend.title=element_blank()) +
scale_fill_brewer(palette="Set3") +
ylab("% of Total") +
scale_y_continuous(labels = percent) +
scale_x_discrete(limits = unique(df.melt$Sample))
Therefore the line would run through six stacked bars pointing the size of each set i.e. for sample S1 it would be 57 (A + B + C), and y axis labels to the right of the plot would show set size range.
You can put the data set directly in the geom. This allows you to use different data sets for each geom. Secondary axis are a bit tricky. They need to be a function of the primary axis and the data adjusted accordingly. I've used 120 as the adjustment factor.
percent <- c("0%", "25%", "50%", "75%", "100%")
set_sizes <- df %>%
rowwise %>%
mutate(Size = sum(A, B, C))
ggplot() +
geom_col(df.melt, mapping = aes(x = Sample, y = value, fill = Group),position = position_fill(reverse = TRUE)) +
geom_line(set_sizes, mapping = aes(x = Sample, y = Size / 120, group = 1)) +
scale_y_continuous(name = "% of Total", labels = percent, sec.axis = sec_axis(~ .*120, name = "Sample Size")) +
theme(axis.text.x=element_text(angle=45, hjust=1), legend.title=element_blank()) +
scale_fill_brewer(palette="Set3") +
scale_x_discrete(limits = unique(df.melt$Sample))
Alternatively, you can use cowplot to arrange two independent plots on top of each other, e.g.:
suppressMessages(invisible(lapply(c("data.table", "ggplot2", "cowplot"),
require, character.only=TRUE)))
df <- data.table(Sample=c("S1","S2","S3","S4","S5","S6"),
A=c(30,52,50,81,23,48), B=c(12,20,15,22,30,14), C=c(rep(15,6)))
df.melt <- melt(df, id.vars = "Sample", variable.name = "Group")
percent <- paste0(sprintf("%s", seq(0, 100, 25)), "%")
p1 <- ggplot(df.melt, aes(x = Sample, y = value, fill = Group)) +
geom_col(position = position_fill(reverse = TRUE)) +
theme(axis.text.x=element_text(angle=45, hjust=1), legend.title=element_blank()) +
scale_fill_brewer(palette="Set3") +
ylab("% of Total") +
scale_y_continuous(labels = percent) +
scale_x_discrete(limits = unique(df.melt$Sample))
p2 <- ggplot(df.melt[, .(value=sum(value)), by="Sample"],
aes(x = Sample, y = value, group=1)) +
geom_line() +
scale_x_discrete(labels = NULL, breaks = NULL) +
labs(x = NULL)
plot_grid(p2, NULL, p1, align="hv", nrow=3, axis='tlbr', rel_heights=c(1, -.28, 4), greedy=FALSE)
Created on 2022-02-20 by the reprex package (v2.0.1)

How to make a sorted geom_bar() ggplot [duplicate]

This question already has answers here:
Order discrete x scale by frequency/value
(7 answers)
Closed 5 years ago.
My dataframe is called:
d3with variable names : course_name,id,total_enrolled,total_capacity
I did:
d3a <- head(d3[order(d3$total_capacity, decreasing = T),], 15)
d3.plottable <- d3a[, c(1,3,4)]
d3.plottable <- melt(d3.plottable, id.vars = "course_name")
library(ggplot2)
g <- ggplot(d3.plottable, aes(x = course_name, y = value))
g + geom_bar(aes(fill = variable), position = position_dodge(), stat = "identity") +
coord_flip() + theme(legend.position = "top")
g <- g + labs(x = "Course Name")
g <- g+ labs(y = "Number of Students")
g
And what I get is this:
No matter what I do I can't sort the orange bar in descending order.
Is there a way to do that? I would like to sort on the variable total_enrolled.
PS:I apologize for the badly formatted code,I am still figuring out stackoverflow.
Here is a an example redefining the order of the factor levels.
Note, since you don't provide sample data I will simulate some data.
# Sample data
set.seed(2017);
df <- cbind.data.frame(
course_name = rep(LETTERS[1:6], each = 2),
value = sample(300, 12),
variable = rep(c("total_enrolled", "total_capacity"), length.out = 12)
);
# Relevel factor levels, ordered by subset(df, variable == "total_enrolled")$value
df$course_name <- factor(
df$course_name,
levels = as.character(subset(df, variable == "total_enrolled")$course_name[order(subset(df, variable == "total_enrolled")$value)]));
# Plot
require(ggplot2);
g <- ggplot(df, aes(x = course_name, y = value))
g <- g + geom_bar(aes(fill = variable), position = position_dodge(), stat = "identity");
g <- g + coord_flip() + theme(legend.position = "top");
g <- g + labs(x = "Course Name")
g <- g + labs(y = "Number of Students")
g;

How to plot a barplot/barchart with continuous x axis

I would like to plot a barplot but I have dates on the x axis and I want those dates to be correctly spaced (as it is NON categorical)
set.seed(1)
m = matrix(abs(rnorm(6)),3,2)
rownames(m) = as.Date(c('2011-01-01','2011-01-03','2011-01-10'))
barplot(t(m),beside=T,col=c('red','blue'),las=2)
On this example I would like 14984 to be offset on the right.
I'd rather a graphics solution but ggplot2 is fine too
Would you mind to use ´ggplot´ instead?
library(ggplot2)
set.seed(1)
df <- data.frame(y=abs(rnorm(6)),
x=rep(as.Date(c('2011-01-01','2011-01-03','2011-01-10')),
times = 2),
g = factor(rep(c(1,2), each = 3)))
ggplot(aes(x=x, y=y, group = g, fill = g), data = df) +
geom_bar(stat = 'identity', position = 'dodge')
You can improve axis formatting with `scale_x_date´
library(scales)
ggplot(aes(x=x, y=y, group = g, fill = g), data = df) +
geom_bar(stat = 'identity', position = 'dodge') +
scale_x_date(breaks = '1 day') +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
And customize it to your purpose
ggplot(aes(x=x, y=y, group = g, fill = g), data = df) +
geom_bar(stat = 'identity', position = 'dodge') +
scale_x_date(breaks = '1 day') +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) +
scale_fill_manual('My\nclasses', values = c('1'='red', '2' = 'blue')) +
labs(list(title = 'Barplot\n', x = ('Date'), y = 'Values'))
With graphics, you probably have to prepare the data appropriately (with missing values for dates you don't consider) in order to do this. Then you can use barplot.
# matrix definition
set.seed(1)
m = matrix(abs(rnorm(6)),3,2)
rownames(m) = as.Date(c('2011-01-01','2011-01-03','2011-01-10'))
# get all dates in between
dts <- do.call(":", as.list(range(rownames(m))))
dts <- dts[!dts%in%rownames(m)]
mat <- matrix(NA, nrow=length(dts), ncol=2, dimnames=list(dts, NULL))
# combine with original matrix
m <- rbind(m, mat)
m <- m[order(rownames(m)), ]
which(!is.na(m[,1]))
# plot
barplot(t(m), beside=T, col=c('red','blue'),las=2, axes=FALSE, axisnames=FALSE)
axis(2)
axis(1, at=3*which(!is.na(m[,1]))-1, labels=rownames(m[!is.na(m[,1]),]))

Fill being ignored with group + facet_wrap in ggplot2 / geom_bar

I suspect I might be using group incorrectly here, but I can't seem to understand why the fill color is getting ignored in the example below.
df <- data.frame(a = factor(c(1,1,2,2,1,2,1,2)),
b = factor(c(1,2,3,4,5,6,7,2)),
c = factor(c(1,2,1,2,1,2,1,2)))
p <- ggplot(df, aes(x=b)) +
geom_bar(aes(y = ..density.., group = c, fill=a), binwidth = 1) +
facet_wrap(~ c) +
scale_y_continuous(labels = percent_format()) +
scale_color_hue()
p
Any help would be greatly appreciated.
Thanks in advance,
--JT
I think I understand what plot you're after now. I'd do something like this:
df <- data.frame(a = c(1,1,2,2,1,2,1,2),
b = c(1,2,3,4,5,6,7,2),
c = c(1,2,1,2,1,2,1,2))
df <- within(df, { f <- 1 / ave(b, list(c), FUN=length)})
df[, 1:3] <- lapply(df[, 1:3], as.factor)
ggplot(df, aes(x = b)) + geom_bar(stat = "identity", position = "stack",
aes(y = f, group = c, fill = a), binwidth = 1) + facet_wrap(~ c) +
scale_y_continuous(labels = percent_format())
This gives the plot:

Resources