Creating 2 y axes in ggplot with count and cumulative count - r

Here's some dummy data
dummy <- data.frame(numbers = 1:5,
symptomdate = as.Date(c("2012-08-30", "2012-08-30", "2012-08-31", "2012-09-01", "2012-09-01")),
reporteddate = as.Date(c("2012-09-02", "2012-09-03", "2012-09-05", "2012-09-07", "2012-09-08")),
dateofdeath = as.Date(c("2012-09-10", NA, NA, NA, "2012-09-31")),
gender = c("Female", "Male", "Male","Female", "Male"),
position = c("Resident", "Staff", "Resident", "Staff", "Staff"),
outbreakdate = as.Date(c("2012-08-31","2012-08-31","2012-08-31","2012-08-31","2012-08-31")))
each observation is a 'case'. I would like to create a histogram which shows the case count on the y axis, and also have a secondary y-axis which shows the cumulative count of cases, but I can't figure out how to make it using 'sec.axis'. Do I need to add a cumulative count to my dataframe first?
What I have so far:
ggplot(dummy, aes(x= symptomdate, group = position, fill = position)) + stat_bin(colour = "black", binwidth = 0.5, alpha = 1, position = "identity") + theme_bw() +
xlab("Symptom date") + ylab("Number of cases") + scale_x_date(breaks= date_breaks("1 day"), labels = date_format("%b-%d")) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + theme(legend.position="top") + scale_fill_manual(values = my_colours)
I know this must be simple but I've looked at countless posts and can't figure it out. Thank you in advance!

Try this. With your dummy data you can create the variables for cases and cumulative counts. After computing the scaling factor, you can reshape to long and sketch the plot with the desired structure. Here the code, where tidyverse functions have been used over dummy dataframe:
library(tidyverse)
#Code
newdf <- dummy %>% group_by(symptomdate) %>%
summarise(Count=n()) %>% ungroup() %>%
mutate(Cum=cumsum(Count))
#Scaling factor
sf <- max(newdf$Count)
newdf$Cum <- newdf$Cum/sf
#plot
newdf %>%
pivot_longer(-symptomdate) %>%
ggplot(aes(x=symptomdate)) +
geom_bar( aes(y = value, fill = name, group = name),
stat="identity", position=position_dodge(),
color="black", alpha=.6) +
scale_fill_manual(values = c("blue", "red")) +
scale_y_continuous(name = "Cases",sec.axis = sec_axis(~.*sf, name="Cum Cases"))+
labs(fill='Variable')+
theme_bw()
Output:

Related

Ordering y axis by another variable in a ggolot bar plot

I have a swimlane plot which I want to order by a group variable. I was also wondering if it is possible to label the groups on the ggplot.
Here is the code to create the data set and plot the data
dataset <- data.frame(subject = c("1002", "1002", "1002", "1002", "10034","10034","10034","10034","10054","10054","10054","1003","1003","1003","1003"),
exdose = c(5,10,20,5,5,10,20,20,5,10,20,5,20,10,5),
p= c(1,2,3,4,1,2,3,4,1,2,3,1,2,3,4),
diff = c(3,3,9,7,3,3,4,5,3,5,6,3,5,6,7),
group =c("grp1","grp1","grp1","grp1","grp2","grp2","grp2","grp2","grp1","grp1","grp1","grp2","grp2","grp2","grp2")
)
ggplot(dataset, aes(x = diff + 1, y = subject, group = p)) +
geom_col(aes(fill = as.factor(exdose)), position = position_stack(reverse = TRUE))
I want the y axis order by group and I want a label on the side to label the groups if possible
you can see from the plot it is ordered by subject number but I want it ordered by group and some indicator of group.
I tried reorder but I was unsuccessful in getting the desired plot.
As Stefan points out, facets are probably the way to go here, but you can use them with subtle theme tweaks to make it look as though you have just added a grouping variable on the y axis:
library(tidyverse)
dataset %>%
mutate(group = factor(group),
subject = reorder(subject, as.numeric(group)),
exdose = factor(exdose)) %>%
ggplot(aes(x = diff + 1, y = subject, group = p)) +
geom_col(aes(fill = exdose), color = "gray50",
position = position_stack(reverse = TRUE)) +
scale_y_discrete(expand = c(0.1, 0.4)) +
scale_fill_brewer(palette = "Set2") +
facet_grid(group ~ ., scales = "free_y", switch = "y") +
theme_minimal(base_size = 16) +
theme(strip.background = element_rect(color = "gray"),
strip.text = element_text(face = 2),
panel.spacing.y = unit(0, "mm"),
panel.background = element_rect(fill = "#f9f8f6", color = NA))

How to change color of moving averages in ggplot, plotting two series into one graph?

In order to highlight the moving average in my ggplot visualization, I want to give it a different color (in this case grey or black for both MA lines). When it comes to to a graph representing two time series, I struggle to find the best solution. Maybe I need to take a different approach.
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(tidyquant))
V = 365
data <- data.frame (var1 = c(rnorm(V)),
var2 = c(rnorm(V)+12),
date = c(dates <- ymd("2013-01-01")+ days(0:364))
)
data_melted <- reshape2::melt(data, id.var='date')
data_melted %>%
ggplot() +
geom_line(mapping = aes(x= date, y=value, col=variable)) +
scale_color_manual(values=c("#CC6666", "steelblue")) +
geom_ma(ma_fun = SMA, n = 30, mapping = aes(x= date, y=value, col=variable)) +
theme(axis.text.x = element_text(angle = 50, vjust = 0.5)) +
scale_x_date(date_breaks = "1 month")
I think you can get what you want by not mapping variable to color in aes() for the MA part. I instead include group = variable to tell ggplot2 that the two MA's should be separate series, but no difference in their color based on that.
data_melted %>%
ggplot() +
geom_line(mapping = aes(x= date, y=value, col=variable)) +
scale_color_manual(values=c("#CC6666", "steelblue")) +
tidyquant::geom_ma(ma_fun = SMA, n = 30, mapping = aes(x= date, y=value, group = variable), color = "black") +
theme(axis.text.x = element_text(angle = 50, vjust = 0.5)) +
scale_x_date(date_breaks = "1 month")
If you want different colors, the natural way to do this in ggplot would be to give the moving averages their own values to be mapped to color.
...
scale_color_manual(values=c("#CC6666", "#996666", "steelblue", "slateblue")) +
tidyquant::geom_ma(ma_fun = SMA, n = 30, mapping = aes(x= date, y=value, col=paste(variable, "MA"))) +
...
I would consider looking at the tsibble library for time series data.
library(tsibble)
data_melted <-as_tsibble(data_melted, key = 'variable', index = 'date')
data_melted <- data_melted %>%
mutate(
`5-MA` = slider::slide_dbl(value, mean,
.before = 2, .after = 2, .complete = TRUE)
)
data_melted %>%
filter(variable == "var1") %>%
autoplot(value) +
geom_line(aes(y = `5-MA`), colour = "#D55E00") +
labs(y = "y",
title = "title") +
guides(colour = guide_legend(title = "series"))

Back-to-back bar plot with ggplot2 in R

I am trying to obtain a back-to-back bar plot (or pyramid plot) similar to the ones shown here:
Population pyramid with gender and comparing across two time periods with ggplot2
Basically, a pyramid plot of a quantitative variable whose values have to be displayed for combinations of three categorical variables.
library(ggplot2)
library(dplyr)
df <- data.frame(Gender = rep(c("M", "F"), each = 20),
Age = rep(c("0-10", "11-20", "21-30", "31-40", "41-50",
"51-60", "61-70", "71-80", "81-90", "91-100"), 4),
Year = factor(rep(c(2009, 2010, 2009, 2010), each= 10)),
Value = sample(seq(50, 100, 5), 40, replace = TRUE)) %>%
mutate(Value = ifelse(Gender == "F", Value *-1 , Value))
ggplot(df) +
geom_col(aes(fill = interaction(Gender, Year, sep = "-"),
y = Value,
x = Age),
position = "dodge") +
scale_y_continuous(labels = abs,
expand = c(0, 0)) +
scale_fill_manual(values = hcl(h = c(15,195,15,195),
c = 100,
l = 65,
alpha=c(0.4,0.4,1,1)),
name = "") +
coord_flip() +
facet_wrap(.~ Gender,
scale = "free_x",
strip.position = "bottom") +
theme_minimal() +
theme(legend.position = "bottom",
panel.spacing.x = unit(0, "pt"),
strip.background = element_rect(colour = "black"))
example of back-to-back barplot I want to mimick
Trying to mimick this example on my data, things go wrong from the first ggplot function call as the bars are not dodged on both sides of the axis:
mydf = read.table("https://raw.githubusercontent.com/gilles-guillot/IPUMS_R/main/tmp/df.csv",
header=TRUE,sep=";")
ggplot(mydf) +
geom_col(aes(fill = interaction(mig,ISCO08WHO_yrstud, sep = "-"),
x = country,
y = f),
position = "dodge")
failed attempt to get a back-to-back bar plot
as I was expected from:
ggplot(df) +
geom_col(aes(fill = interaction(Gender, Year, sep = "-"),
y = Value,
x = Age),
position = "dodge")
geol_col plot with bar dodged symmetrically around axis
In the example you are following, df$Value is made negative if Gender == 'F'. You need to do similar to achieve "bar dodged symmetrically around axis".

How to label the count of each bin within ggridges package?

I have a data frame that simulates the NFL season with 2 columns: team and rank. I am trying to use ggridges to make a distribution plot of the frequency of each team at each rank from 1-10. I can get the plot working, but I'd like to display the count of each team/rank in each bin. I have been unsuccessful so far.
ggplot(results,
aes(x=rank, y=team, group = team)) +
geom_density_ridges2(aes(fill=team), stat='binline', binwidth=1, scale = 0.9, draw_baseline=T) +
scale_x_continuous(limits = c(0,11), breaks = seq(1,10,1)) +
theme_ridges() +
theme(legend.position = "none") +
scale_fill_manual(values = c("#4F2E84", "#FB4F14", "#7C1415", "#A71930", "#00143F", "#0C264C", "#192E6C", "#136677", "#203731"), name = NULL)
Which creates this plot:
I tried adding in this line to get the count added to each bin, but it did not work.
geom_text(stat='bin', aes(y = team + 0.95*stat(count/max(count)),
label = ifelse(stat(count) > 0, stat(count), ""))) +
Not the exact dataset but this should be enough to at least run the original plot:
results = data.frame(team = rep(c('Jets', 'Giants', 'Washington', 'Falcons', 'Bengals', 'Jaguars', 'Texans', 'Cowboys', 'Vikings'), 1000), rank = sample(1:20,9000,replace = T))
How about calculating the count for each bin, joining to the original data and using the new variable n as the label?
library(dplyr) # for count, left_join
results %>%
count(team, rank) %>%
left_join(results) %>%
ggplot(aes(rank, team, group = team)) +
geom_density_ridges2(aes(fill = team),
stat = 'binline',
binwidth = 1,
scale = 0.9,
draw_baseline = TRUE) +
scale_x_continuous(limits = c(0, 11),
breaks = seq(1, 10, 1)) +
theme_ridges() +
theme(legend.position = "none") +
scale_fill_manual(values = c("#4F2E84", "#FB4F14", "#7C1415", "#A71930", "#00143F",
"#0C264C", "#192E6C", "#136677", "#203731"), name = NULL) +
geom_text(aes(label = n),
color = "white",
nudge_y = 0.2)
Result:
Neilfws' answer is great, but I've always found geom_ridgelines difficult to work with in circumstances like this so I usually recreate them with geom_rect:
library(dplyr)
results %>%
count(team, rank) %>%
filter(rank<=10) %>%
mutate(team=factor(team)) %>%
ggplot() +
geom_rect(aes(xmin=rank-0.5, xmax=rank+0.5, ymin=team, fill=team,
ymax=as.numeric(team)+n*0.75/max(n))) +
geom_text(aes(x=rank, y=as.numeric(team)-0.1, label=n)) +
theme_ridges() +
theme(legend.position = "none") +
scale_fill_manual(values = c("#4F2E84", "#FB4F14", "#7C1415", "#A71930",
"#00143F", "#0C264C", "#192E6C", "#136677",
"#203731"), name = NULL) +
ylab("team")
I especially like the level of fine control I get from geom_rect rather than ridgelines. But you do lose out on the nice bounding line drawn around each ridgeline, so if that's important then go with the other answer.

Reordering the Barplots in ggplot2 in R

I have a dataframe through which I plot a bar plot through ggplot2 in R.
library(dplyr)
library(ggplot2)
library(reshape2)
Dataset<- c("MO", "IP", "MP","CC")
GPP <- c(1, 3, 4,3)
NPP<-c(4,3,5,2)
df <- data.frame(Dataset,GPP,NPP)
df.m<-melt(df)
ggplot(df.m, aes(Dataset, value, fill = variable)) +
geom_bar(stat="identity", position = "dodge")
my_se <- df.m %>%
group_by(Dataset) %>%
summarise(n=n(),
sd=sd(value),
se=sd/sqrt(n))
df.m %>%
left_join(my_se) %>%
ggplot(aes(x = Dataset, y = value, fill = variable)) +
geom_bar(stat="identity", position = "dodge")+
geom_errorbar(aes(x=Dataset, ymin=value-se, ymax=value+se), width=0.4, position = position_dodge(.9))+
scale_fill_manual(labels = c("GPP", "NPP"),values=cbp1)+
theme(legend.text=element_text(size=11),axis.text.y=element_text(size=11.5),
axis.text.x=element_text(size=11.5),axis.title.x = element_text(size = 12), axis.title.y = element_text(size = 12))+
theme_bw()+theme(legend.title =element_blank())+
labs(y= fn, x = "")
When my bargraph if plotted, the order of the bars is
I would like to rearrange the bars in order : MO, IP, MP, CC (not alphabetically).
Help would be appreciated.
You need to set your factor levels explicitly or R will pick an order for them.
In the case of characters R will pick alphabetical order. Since you want a non-alphabetical order you'll need to set levels inside of factor at some point before plotting (there are several places where you could do it).
df <- data.frame(Dataset = factor(Dataset, levels=c("MO", "IP"," MP", "CC")) ,GPP,NPP)
Try this (I changed the colors because cbp1 is not present):
df.m %>%
left_join(my_se) %>%
ggplot(aes(x = factor(Dataset,levels=c('MO', 'IP', 'MP', 'CC')), y = value, fill = variable)) +
geom_bar(stat="identity", position = "dodge")+
geom_errorbar(aes(x=Dataset, ymin=value-se, ymax=value+se), width=0.4, position = position_dodge(.9))+
scale_fill_manual(labels = c("GPP", "NPP"),values=c('pink','cyan'))+
theme(legend.text=element_text(size=11),axis.text.y=element_text(size=11.5),
axis.text.x=element_text(size=11.5),axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12))+
theme_bw()+theme(legend.title =element_blank())+
labs(y= "fn", x = "")

Resources