Want individual components as subsets in a stacked bar chart - r

I am trying to create a stacked bar plot which will show the revenue of the company and various components of its cost of sales (operating expenses. other fixed costs etc.). Now I want the individual components of the cost of sales to be shown on top of the revenue bar so that it is clear what part of the revenue is cost of sales.
Right now, I am only able to create a stacked bar plot which lays everything on top of each other. In other words, cost of sales is displayed on top of revenue.
Ideally, I would want the individual components of cost of sales to be displayed as subset of revenue.
Here's a brief look at the molten data frame I have:
Time variable value
2013-01-01 A 84.32153
2013-02-01 A 91.41203
2013-01-01 B 1214.29960
2013-02-01 B 1224.21256
2013-01-01 C 312.78462
2013-02-01 C 175.58130
2013-01-01 D 321.12000
2013-02-01 D 298.82000
In the above scenario, I want B to be displayed as the super set and A,C and D should be components of B for the two months shown above.
I am using the following code:
stackbar <- ggplot(temp, aes_string(x = 'Time',y='value', fill = "variable")) +
geom_bar(stat='identity') +
ylab("Count") + theme(legend.title = element_blank()) +
theme(legend.direction = "horizontal") +
theme(legend.position = c(1, 1)) +
theme(legend.justification = c(1, 0)) +
theme(panel.grid.minor.x=element_blank(),
panel.grid.minor.y=element_blank(), panel.background=element_blank(),
panel.grid.major.x=element_line(color='grey90',linetype='dashed'),
panel.grid.major.y=element_line(color='grey90',linetype='dashed')) +
theme(axis.ticks.x=element_blank()) + theme(axis.ticks.y=element_blank()) +
scale_colour_discrete(limits = levels(temp$variable))
Any help in this regard would be much appreciated.

What you want to do is to subset your data and have two geom_bar layers. First, you want to draw bars with variable B. Then, you draw the rest in the second geom_bar(). I hope this will give you the figure you want.
stackbar <- ggplot(mydf, aes(x = Time, y = value, fill = variable)) +
geom_bar(data = mydf[mydf$variable == "B",], stat = "identity") +
geom_bar(data = mydf[!mydf$variable == "B",], stat = "identity")+
ylab("Count") +
theme(legend.title = element_blank()) +
theme(legend.direction = "horizontal") +
theme(legend.position = c(1, 1)) +
theme(legend.justification = c(1, 0)) +
theme(panel.grid.minor.x=element_blank(),
panel.grid.minor.y=element_blank(), panel.background=element_blank(),
panel.grid.major.x=element_line(color='grey90',linetype='dashed'),
panel.grid.major.y=element_line(color='grey90',linetype='dashed')) +
theme(axis.ticks.x=element_blank()) + theme(axis.ticks.y=element_blank()) +
scale_colour_discrete(limits = levels(mydf$variable))
stackbar

Related

Why can't i plot an actual pie chart with percentages?

There are two columns in my data set Q6.1,
count(market_segment) market_segment
1 201 Complementary
2 2309 Corporate
3 6513 Direct
4 5836 Groups
5 7472 Offline TA/TO
6 17729 Online TA
Showing 1 to 6 of 6 entries, 2 total columns
I am using this code
ggplot(Q6.1, aes(x="", y= 'count(market_segment)', fill = market_segment))+
geom_bar(stat="identity", width=1)+coord_polar("y", start=0)+
geom_text(aes(label = paste0('count(market_segment')))
You need to use coord_polar to convert the bar chart to a pie chart, and add the percentage labels using geom_text. Given the unwieldy column names, perhaps you should also convert this to proportions first:
library(ggplot2)
df$percent <- df$`count(market_segment)`/ sum(df$`count(market_segment)`)
ggplot(df, aes(x = 1, y = percent, fill = market_segment)) +
geom_col(color = "gray90", width = 1) +
geom_text(aes(x = c(1.1, rep(1, 5)), label = scales::percent(percent)),
position = position_stack(vjust = 0.5)) +
coord_polar(theta = "y") +
scale_fill_brewer(palette = "Pastel1", name = "Market segment") +
theme_void()

How do I represent percent of a variable in a filled barplot?

I have a data frame(t1) and I want to illustrate the shares of companies in relation to their size
I added a Dummy variable in order to make a filled barplot and not 3:
t1$row <- 1
The size of companies are separated in medium, small and micro:
f_size <- factor(t1$size,
ordered = TRUE,
levels = c("medium", "small", "micro"))
The plot is build up with the economic_theme:
ggplot(t1, aes(x = "Size", y = prop.table(row), fill = f_size)) +
geom_col() +
geom_text(aes(label = as.numeric(f_size)),
position = position_stack(vjust = 0.5)) +
theme_economist(base_size = 14) +
scale_fill_economist() +
theme(legend.position = "right",
legend.title = element_blank()) +
theme(axis.title.y = element_text(margin = margin(r = 20))) +
ylab("Percentage") +
xlab(NULL)
How can I modify my code to get the share for medium, small and micro in the middle of the three filled parts in the barplot?
Thanks in advance!
Your question isn't quite clear to me and I suggest you re-phrase it for clarity. But I believe you're trying to get the annotations to be accurately aligned on the Y-axis. For this use, pre-calculate the labels and then use annotate
library(data.table)
library(ggplot2)
set.seed(3432)
df <- data.table(
cat= sample(LETTERS[1:3], 1000, replace = TRUE)
, x= rpois(1000, lambda = 5)
)
tmp <- df[, .(pct= sum(x) / sum(df[,x])), cat][, cumsum := cumsum(pct)]
ggplot(tmp, aes(x= 'size', y= pct, fill= cat)) + geom_bar(stat='identity') +
annotate('text', y= tmp[,cumsum] - 0.15, x= 1, label= as.character(tmp[,pct]))
But this is a poor decision graphically. Stacked bar charts, by definition sum to 100%. Rather than labeling the components with text, just let the graphic do this for you via the axis labels:
ggplot(tmp, aes(x= cat, y= pct, fill= cat)) + geom_bar(stat='identity') + coord_flip() +
scale_y_continuous(breaks= seq(0,1,.05))

Using stat_ecdf in ggplot in R

I am trying to reproduce a similar figure
ggplot2_ecdf
My Data looks like this
Category Value
A 2
A 3
A 4
A 2
A 4
B 2
B 1
B 6
C 1
C 2
C 3
C 3
I would like to plot the distribution with the category as X-axes and the values in y-axes. Since some of them have similar values, using the stat_ecdf () would be great to visualize the distribution with curves for the categories to horizontally displace similar points (similar to the figure in the link).
I used the beeswarm plot in ggplot but would like to use stat_ecdf to get a displaced distribution (showing each entry as dots per category). And also add a median line in red.
What I tried
a <- ggplot(df, aes(x=Category, y=value)) +
stat_ecdf()+
scale_y_continuous() +
theme_light() +
theme(axis.text.x = element_text(angle = 90)) +
xlab('category') +
ylab('values')
a
I'am a bit limited on time today, but maybe this can point you in the right direction.
a <- ggplot(data = df,
aes(x = value)) +
stat_ecdf(geom = "point",
size = 1,
pad = FALSE) +
xlab("category") +
ylab("values") +
facet_wrap(~ Category,
scales = "free_x",
strip.position = "bottom") +
coord_cartesian(clip = "off") +
theme_minimal() +
theme(axis.text.x = element_blank(),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank())
a
Update:
I played around a bit more. Hopefully this looks a bit better.

R: geom_point - how to show statistics on top of figure

I made a figure using geom_point from ggplot2 (just showing part of it). Colors are representing 3 classes. Black bar is mean (not relevant for the question).
The data structure is the following (stored in a list):
V1 V2 V3
1 L. brevis 5 class1
3 L. sp. 13 class1
4 L. rhamnosus 14 class1
5 L. lindneri 17 class1
6 L. plantarum 17 class1
7 L. acidophilus 18 class1
8 L. acidophilus 18 class1
10 L. plantarum 18 class1
... ... .. ...
Where V2 is the position of the datapoints on the y-axis and V3 is the class (color).
Now I would like to show the percentages for each of the three classes on top of the figure (Or maybe even as pie charts :-) ). I made an example for "L. acidophilus" on the image (66.7% / 33.3%).
The legend explaining groups ideally is also produced by R but I can do it manually.
How do I do that?
Forgot to add the 0% for group three on top of column "L. acidophilus"... Sorry for that.
EDIT: Here the ggplot2 code:
p <- ggplot(myData, aes(x=V1, y=V2)) +
geom_point(aes(color=V3, fill=V3), size=2.5, cex=5, shape=21, stroke=1) +
scale_color_manual(values=colBorder, labels=c("Class I","Class II","Class III","This study")) +
scale_fill_manual(values=col, labels=c("Class I","Class II","Class III","This study")) +
theme_bw() +
theme(axis.text.x=element_text(angle=50,hjust=1,face="italic", color="black"), text = element_text(size=12),
axis.text.y=element_text(color="black"), panel.grid.major = element_line(color="gray85",size=.15), panel.grid.minor = element_blank(),
panel.grid.major.y = element_blank(), axis.ticks = element_line(size = 0.3), panel.border = element_rect(fill=NA, colour = "black", size=0.3)) +
stat_summary(aes(shape="mean"), fun.y=mean, size = 6, shape=95, colour="black", geom="point") +
guides(fill=guide_legend(title="Class", order=1), color=guide_legend(title="Class",order=1), shape=guide_legend(title="Blup", order=2))
Option A: Secondary Axis
You can do this using a secondary x axis (new to ggplot2 v2.2.0), but it's hard to do with a categorical variable on the x axis because it doesn't work with scale_x_discrete(), only scale_x_continuous(). So, you have to convert the factor to integer, plot based on that, and then overwrite the labels on the primary x axis.
For example:
set.seed(123)
df <- iris[sample.int(nrow(iris),size=300,replace=TRUE),]
# Assume we are grouping by species
# Some group-level stats -- how about count and mean/sdev of sepal length
library(dplyr)
df_stats <- df %>%
group_by(Species) %>%
summarize(stat_txt = paste0(c('N=','avg=','sdev='),
c(n(),round(mean(Sepal.Length),2),round(sd(Sepal.Length),3) ),
collapse='\n') )
library(ggplot2)
ggplot(data = df,
aes(x = as.integer(Species),
y = Sepal.Length)) +
geom_point() +
stat_summary(aes(shape="mean"), fun.y=mean, size = 6, shape=95,
colour="black", geom="point") +
theme_bw() +
scale_x_continuous(breaks=1:length(levels(df$Species)),
limits = c(0,length(levels(df$Species))+1),
labels = levels(df$Species),
minor_breaks=NULL,
sec.axis=sec_axis(~.,
breaks=1:length(levels(df$Species)),
labels=df_stats$stat_txt)) +
xlab('Species') +
theme(axis.text.x = element_text(hjust=0))
Option B: grid.arrange your statistics as a separate chart atop your main chart.
This is a little more straightforward, but the two charts don't quite perfectly line up, possibly because of the ticks and labels being suppressed on the axes of the top chart.
library(ggplot2)
library(gridExtra)
p <-
ggplot(data = df,
aes(x = Species,
y = Sepal.Length)) +
geom_point() +
stat_summary(aes(shape="mean"), fun.y=mean, size = 6, shape=95,
colour="black", geom="point") +
theme_bw() +
theme(axis.text.x = element_text(angle=45, hjust=1, vjust=1))
annot <-
ggplot(data=df_stats, aes(x=Species, y = 0)) +
geom_text(aes(label=stat_txt), hjust=0) +
theme_minimal() +
scale_x_discrete(breaks=NULL) +
scale_y_continuous(breaks=NULL) +
xlab(NULL) + ylab('')
grid.arrange(annot, p, heights=c(1,8))

How to make box plots within the same column to represent the soil column

I am trying to demonstrate the soil type (soil column) at different depths in the ground using box plots. However, as the sampling interval is not consistent, there are also gaps in between the samples.
My questions are as follows:
Is it possible to put the box plots within the same column? i.e. all box plots in 1 straight column
Is it possible to remove the x-axis labels and ticks when using ggdraw? I tried to remove it when using plot, but appears again when I use ggdraw.
My code looks like this:
SampleID <- c("Rep-1", "Rep-2", "Rep-3", "Rep-4")
From <- c(0,2,4,9)
To <- c(1,4,8,10)
Mid <- (From+To)/2
ImaginaryVal <- c(1,1,1,1)
Soiltype <- c("organic", "silt","clay", "sand")
df <- data.frame(SampleID, From, To, Mid, ImaginaryVal, Soiltype)
plot <- ggplot(df, aes(x=ImaginaryVal, ymin=From, lower=From,fill=Soiltype,
middle=`Mid`, upper=To, ymax=To)) +
geom_boxplot(colour= "black", stat="identity") + scale_y_reverse(breaks = seq(0,10,0.5)) + xlab('Soiltype') + ylab('Depth (m)') + theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())
ggdraw(switch_axis_position(plot + theme_bw(8), axis = 'x'))
In the image I have pointed out what I want, using the red arrows and lines.
You can use position = position_dodge() like so:
plot <- ggplot(df, aes(x=ImaginaryVal, ymin=From, lower=From,fill=Soiltype, middle=Mid, upper=To, ymax=To)) +
geom_boxplot(colour= "black", stat="identity", position = position_dodge(width=0)) +
scale_y_reverse(breaks = seq(0,10,0.5)) +
xlab('Soiltype') +
ylab('Depth (m)') +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())
edit: I don't think you need cowplot at all, if this is what you want your plot to look like:
ggplot(df, aes(x=ImaginaryVal, ymin=From, lower=From,fill=Soiltype, middle=Mid, upper=To, ymax=To)) +
geom_boxplot(colour= "black", stat="identity", position = position_dodge(width=0)) +
scale_y_reverse(breaks = seq(0,10,0.5)) +
xlab('Soiltype') +
ylab('Depth (m)') +
theme_bw() +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) +
xlab("") +
ggtitle("Soiltype")

Resources