R- stacked charts - r

Hi I'm having issues with a stacked bar chart.
The goal is to print a bar chart that shows the sum of products sold stacked on top of each other, which I have done, but the products are not grouped together, so instead of having big blocks per product, they are all split. I need some way to aggregate the count, so it sums and then I can add the chart in some sort of order
library(ggplot2)
library(plyr) #Is this automatically loaded with ggplot2?
library(dplyr)
salesMixData <- read.csv("SalesMix.csv", stringsAsFactors = FALSE, header = TRUE)
productMix <- salesMixData[,c(1,6,7)]
ggplot(productMix, aes(x=JoinMonthYear, y=Count,fill=Prod)) +
geom_bar(stat='identity') +
theme(axis.text.x = element_text(angle=60, hjust = 1),legend.position="bottom")
The output looks like the following:

You probably want to summarise the data first, calculating an aggregate sum for each combination of JoinMonthYear and Prod.
Here's an example with a dummy data set:
library(ggplot2)
library(dplyr)
d <- data.frame(x=sample(20, 1000, replace=T),
count=rpois(1000, 10),
grp=sample(LETTERS[1:10], 1000, replace=TRUE))
This is equivalent to what you're seeing:
ggplot(d, aes(x=x, y=count, fill=grp)) +
geom_bar(stat='identity')
Grouping the observations (in your case by JoinMonthYear and Prod), and then summarising to the groups' sums, should get you what you're after:
d %>%
group_by(x, grp) %>%
summarise(sum_count=sum(count, na.rm=TRUE)) %>%
ggplot(aes(x=x, y=sum_count, fill=grp)) +
geom_bar(stat='identity')

Related

How to make Financial Times faceted charts with ggplot2 in R

Financial Times have a nice faceted coronavirus chart: see Daily death tolls at https://www.ft.com/coronavirus-latest Do you have an idea how make it using R and ggplot2?
Facet_wrap function is not useful in this case, it separates every country line to single minigraphs. The other countries are not visible in gray.
Should I prepare 20+ charts and join them using gridExtra::grid.arrange()?
I am thinking whether there's way to plot the above without replicating the data.frame, so I simulate some data:
set.seed(111)
data = data.frame(group=rep(letters[1:6],each=60),
do.call(rbind,
replicate(6,
data.frame(x=1:60,y=cumsum(rnbinom(60,mu=20,size=0.1))),simplify=FALSE))
)
Below I roll through each group and create a data.frame, with another column called "highlight" to annotate the group of interest:
library(purrr)
library(ggthemes)
library(ggplot2)
library(dplyr)
unique(data$group) %>%
map_dfr(~cbind(data,facet=.x,highlight=data$group %in% .x)) %>%
ggplot(aes(x=x,y=y,group=group))+
geom_line(aes(col=highlight)) +
facet_wrap(~facet,ncol=3,scales="free") +
theme_tufte() + scale_color_manual(values=c("#e5dfdf","#357376")) +
theme(strip.text=element_text(size=12,colour="steelblue"))+
guides(colour = "none")
One can of course, create a list of ggplots, but in fact you are also replicating the data (ggplot creates a data.frame underneath):
plotfun = function(data,highlight){
data %>%
mutate(highlight = group == highlight) %>%
ggplot(aes(x=x,y=y,group=group))+
geom_line(aes(col=highlight)) +
theme_tufte() + scale_color_manual(values=c("#e5dfdf","#357376")) +
ggtitle(highlight)+
theme(plot.title = element_text(size=12,colour="steelblue"))+
guides(colour = "none")
}
grid.arrange(grobs=unique(data$group) %>% map(~plotfun(data,.x)),ncol=3)

rearrange facet_wrap plots based on the points in the subplot

I would like to rearrange the facet_wrap plots in a better way.
library(ggplot2)
set.seed(123)
freq <- sample(1:10, 20, replace = T)
labels <- sample(LETTERS, 20)
value <- paste("i",1:13,sep='')
lab <- rep(unlist(lapply(1:length(freq), function(x) rep(labels[x],freq[x]))),2)
ival <- rep(unlist(lapply(1:length(freq), function(x) value[1:freq[x]])),2)
df <- data.frame(lab, ival, type=c(rep('Type1',119),rep('Type2',119)),val=runif(238,0,1))
ggplot(df, aes(x=ival, y=val, col = type, group = type)) +
geom_line() +
geom_point(aes(x=ival, y=val)) +
facet_wrap( ~lab, ncol=3) +
theme(axis.text.x=element_text(angle=45, vjust=0.3)) +
scale_x_discrete(limits=paste('i',1:13,sep=''))
It results in the below plot:
Is there any way rearrange the plots based on their frequency? Some of the lab frequencies (or the number of points per type) are very low(1-3). I would like to arrange the plots facet_wrap wrt their frequencies instead of their label orders. One advantage is to reduce the plotting area and get better intuition from the plots.
Can it be done using the frequency values computed on the fly and passing them to the facet_wrap? Or it should be done separately using dplyr approaches and divide the data into low/medium/high frequent set of plots?
Here is one idea. We can use dplyr to calculate the number of each group in lab and use fct_reorder from forcats to reorder the factor level.
library(dplyr)
library(forcats)
df2 <- df %>%
group_by(lab) %>%
mutate(N = n()) %>%
ungroup() %>%
mutate(lab = fct_reorder(lab, N))
ggplot(df2, aes(x=ival, y=val, col = type, group = type)) +
geom_line() +
geom_point(aes(x=ival, y=val)) +
facet_wrap( ~lab, ncol=3) +
theme(axis.text.x=element_text(angle=45, vjust=0.3)) +
scale_x_discrete(limits=paste('i',1:13,sep=''))
Set .desc = TRUE when using fct_reorder if you want to reverse the factor levels.

Rank Stacked Bar Chart by Sum of Subset of Fill Variable

Sample data:
set.seed(145)
df <- data.frame(Age=sample(c(1:10),20,replace=TRUE),
Rank=sample(c("Extremely","Very","Slightly","Not At All"),
20,replace=TRUE),
Percent=(runif(10,0,.01)))
df.plot <- ggplot(df,aes(x=Age,y=Percent,fill=Rank))+
geom_bar(stat="identity")+
coord_flip()
df.plot
Within the ggplot, how can I reorder x=Age, by the sum of Ranks "Extremely" and "Very" only?
I tried using the below, without success.
df.plot <- ggplot(df,aes(x=reorder(Age,Rank=="Extremely",sum),y=Percent,fill=Rank))+
geom_bar(stat="identity")+
coord_flip()
df.plot
Couple of notes:
The way that you are simulating your data does not rule out the possibility that for some ages, all categories are not represented (which is fine), but also that for some ages, some categories are duplicated. I am assuming that this is not true for your real data, so have let this be. Note also that your simulation logic does not produce percentages that add up, although the category names indicate that they should.
The way I would do this is to create the ordering of age based on your desired logic, and then pass that order to the factor call. This decouples the ordering logic and allows arbitrary ordering logic.
Here is then what I think you are looking for:
library(ggplot2)
library(dplyr)
library(scales)
set.seed(145)
# simulate the data
df_foo = data.frame(Age=sample(c(1:10),20,replace=TRUE),
Rank=sample(c("Extremely","Very","Slightly","Not At All"),
20,replace=TRUE),
Percent=(runif(10,0,.01)))
# get the ordering that you are interested in
age_order = df_foo %>%
filter(Rank %in% c("Extremely", "Very")) %>%
group_by(Age) %>%
summarize(SumRank = sum(Percent)) %>%
arrange(desc(SumRank)) %>%
`[[`("Age")
# in some cases ages do not appear in the order because the
# ordering logic does not span all categories
age_order = c(age_order, setdiff(unique(df_foo$Age), age_order))
# make age a factor sorted by the ordering above
ggplot(df_foo, aes(x = factor(Age, levels = age_order), y = Percent, fill = Rank))+
geom_bar(stat = "identity") +
coord_flip() +
theme_bw() +
scale_y_continuous(labels = percent)
Which code produces:

2 stacked histograms with a common x-axis

I want to plot two stacked histograms that share a common x-axis. I want the second histogram to be plotted as the inverse(pointing downward) of the first. I found this post that shows how to plot the stacked histograms (How to plot multiple stacked histograms together in R?). For the sake of simplicity, let's say I just want to plot that same histogram, on the same x-axis but facing in the negative y-axis direction.
You could count up cases and then multiply the count by -1 for one category. Example with data.table / ggplot
library(data.table)
library(ggplot2)
# fake data
set.seed(123)
dat <- data.table(value = factor(sample(1:5, 200, replace=T)),
category = sample(c('a', 'b'), 200, replace=T))
# count by val/category; cat b as negative
plot_dat <-
dat[, .(N = .N * ifelse(category=='a', 1, -1)),
by=.(value, category)]
# plot
ggplot(plot_dat, aes(x=value, y=N, fill=category)) +
geom_bar(stat='identity', position='identity') +
theme_classic()
You can try something like this:
ggplot() +
stat_bin(data = diamonds,aes(x = depth)) +
stat_bin(data = diamonds,aes(x = depth,y = -..count..))
Responding to the additional comment:
library(dplyr)
library(tidyr)
d1 <- diamonds %>%
select(depth,table) %>%
gather(key = grp,value = val,depth,table)
ggplot() +
stat_bin(data = d1,aes(x = val,fill = grp)) +
stat_bin(data = diamonds,aes(x = price,y = -..count..))
Visually, that's a bad example because the scales of the variables are all off, but that's the general idea.

ggplot2: Stack barcharts with group means

I have tried several things to make ggplot plot barcharts with means derived from factors in a dataframe, but i wasnt successful.
If you consider:
df <- as.data.frame(matrix(rnorm(60*2, mean=3,sd=1), 60, 2))
df$factor <- c(rep(factor(1:3), each=20))
I want to achieve a stacked, relative barchart like this:
This chart was created with manually calculating group means in a separate dataframe, melting it and using geom_bar(stat="identity", position = "fill) and scale_y_continuous(labels = percent_format()). I havent found a way to use stat_summary with stacked barcharts.
In a second step, i would like to have errorbars attached to the breaks of each column. I have six treatments and three species, so errorbars should be OK.
For anything this complicated, I think it's loads easier to pre-calculate the numbers, then plot them. This is easily done with dplyr/tidyr (even the error bars):
gather(df, 'cat', 'value', 1:2) %>%
group_by(factor, cat) %>%
summarise(mean=mean(value), se=sd(value)/sqrt(n())) %>%
group_by(cat) %>%
mutate(perc=mean/sum(mean), ymin=cumsum(perc) -se/sum(mean), ymax=cumsum(perc) + se/sum(mean)) %>%
ggplot(aes(x=cat, y=perc, fill=factor(factor))) +
geom_bar(stat='identity') +
geom_errorbar(aes(ymax=ymax, ymin=ymin))
Of course this looks a bit strange because there are error bars around 100% in the stacked bars. I think you'd be way better off ploting the actual data points, plus means and error bars and using faceting:
gather(df, 'cat', 'value', 1:2) %>%
group_by(cat, factor) %>%
summarise(mean=mean(value), se=sd(value)/sqrt(n())) %>%
ggplot(aes(x=cat, y=mean, colour=factor(factor))) +
geom_point(aes(y=value), position=position_jitter(width=.3, height=0), data=gather(df, 'cat', 'value', 1:2) ) +
geom_point(shape=5, size = 3) +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.1) +
facet_grid(factor ~ .)
This way anyone can examine the data and see for themselves that they are normally distributed

Resources