ggplot2: geom_bar with group, position_dodge and fill - r

I am trying to generate a barplot such that the x-axes is by patient with each patient having multiple samples. So for instance (using the mtcars data as a template of what the data would look like):
library("ggplot2")
ggplot(mtcars, aes(x = factor(cyl), group = factor(gear))) +
geom_bar(position = position_dodge(width = 0.8), binwidth = 25) +
xlab("Patient") +
ylab("Number of Mutations per Patient Sample")
This would produce something like this:
With each barplot representing a sample in each patient.
I want to add additional information about each patient sample by using colors to fill the barplots (e.g. different types of mutations in each patient sample). I was thinking I could specify the fill parameter like this:
ggplot(mtcars, aes(x = factor(cyl), group = factor(gear), fill = factor(vs))) +
geom_bar(position = position_dodge(width = 0.8), binwidth = 25) +
xlab("Patient") +
ylab("Number of Mutations per Patient Sample")
But this doesn't produce "stacked barplots" for each patient sample barplot. I am assuming this is because the position_dodge() is set. Is there anyway to get around this? Basically, what I want is:
ggplot(mtcars, aes(x = factor(cyl), fill = factor(vs))) +
geom_bar() +
xlab("Patient") +
ylab("Number of Mutations per Patient Sample")
But with these colors available in the first plot I listed. Is this possible with ggplot2?

I think facets are the closest approximation to what you seem to be looking for:
ggplot(mtcars, aes(x = factor(gear), fill = factor(vs))) +
geom_bar(position = position_dodge(width = 0.8), binwidth = 25) +
xlab("Patient") +
ylab("Number of Mutations per Patient Sample") +
facet_wrap(~cyl)
I haven't found anything related in the issue tracker of ggplot2.

If I understand your question correctly, you want to pass in aes() into your geom_bar layer. This will allow you to pass a fill aesthetic. You can then place your bars as "dodge" or "fill" depending on how you want to display the data.
A short example is listed here:
ggplot(mtcars, aes(x = factor(cyl), fill = factor(vs))) +
geom_bar(aes(fill = factor(vs)), position = "dodge", binwidth = 25) +
xlab("Patient") +
ylab("Number of Mutations per Patient Sample")
With the resulting plot: http://imgur.com/ApUJ4p2 (sorry S/O won't let me post images yet)
Hope that helps!

I have hacked around this a few times by layering multiple geom_cols on top of each other in the order I prefer. For example, the code
ggplot(data, aes(x=cert, y=pct, fill=Party, group=Treatment, shape=Treatment)) +
geom_col(aes(x=cert, y=1), position=position_dodge(width=.9), fill="gray90") +
geom_col(position=position_dodge(width=.9)) +
scale_fill_manual(values=c("gray90", "gray60"))
Allowed me to produce the feature you're looking for without faceting. Notice how I set the background layer's y value to 1. To add more layers, you can just cumulatively sum your variables.
Image of the plot:

I guess, my answer in this post will help you to build the chart with multiple stacked vertical bars for each patient ...
Layered axes in ggplot?

One way I don't see suggested above is to use facet_wrap to group samples by patient and then stack mutations by sample. Removes the need for dodging. Also changed and modified which mtcars attributes used to match question and get more variety in the mutations attribute.
patients <-c('Tom','Harry','Sally')
samples <- c('S1','S2','S3')
mutations <- c('M1','M2','M3','M4','M5','M6','M7','M8')
ds <- data.frame(
patients=patients[mtcars$cyl/2 - 1],
samples=samples[mtcars$gear - 2],
mutations=mutations[mtcars$carb]
)
ggplot(
ds,
aes(
x = factor(samples),
group = factor(mutations),
fill = factor(mutations)
)
) +
geom_bar() +
facet_wrap(~patients,nrow=1) +
ggtitle('Patient') +
xlab('Sample') +
ylab('Number of Mutations per Patient Sample') +
labs(fill = 'Mutation')
Output now has labels that match the specific language of the request...easier to see what is going on.

Related

Selecting color to mark means on boxplot with stat_summary

So there's several useful pages up about marking means on boxplots with multiple series; but even with those I'm having an issue where I can't select a color for the points and still show the two different means. I can do this:
library(ggplot2)
d <- subset(mpg,class=="compact"|class=="midsize")
ggplot(d,aes(drv,hwy,color=class)) + geom_boxplot() + scale_color_manual(values=c("blue","orange")) +
stat_summary(fun=mean,size=.5,shape=5,position=position_dodge(width=.75))
And that gives me the two different means, but they're the same color as the boxplots themselves and so not the best to look at.
So I add a color specification into the code:
ggplot(d,aes(drv,hwy,color=class)) + geom_boxplot() + scale_color_manual(values=c("blue","orange")) +
stat_summary(fun=mean,size=.5,color="black",shape=5,position=position_dodge(width=.75))
But then it's only showing the one mean.
So what am I missing here to get both a specified color and the multiple means being marked?
When you overwrite the colour aesthetic in stat_summary() you also lose
the grouping information. You need to bring it back explicitly with aes(group = class):
library(ggplot2)
d <- subset(mpg, class == "compact" | class == "midsize")
ggplot(d, aes(drv, hwy, color = class)) +
geom_boxplot() +
stat_summary(
aes(group = class),
colour = "black",
fun = mean,
size = .5,
shape = 5,
position = position_dodge(width = .75)
)
#> Warning: Removed 4 rows containing missing values (geom_segment).
Using fill to color the box, and color for stat_summary you get the desired output.
ggplot(d,aes(drv,hwy, fill=class)) + geom_boxplot() + scale_fill_manual(values=c("cyan","orange")) +
stat_summary(fun=mean,size=.5, color="red",
shape=5,position=position_dodge(width=.75))

Filling stacked/dodged bar with different colors

I'm trying to build a chart combining stacked and dodged to compare two business lines over months on two different KPIs (VOL and NV).
I would have something like this:
(https://imgur.com/a/IambH09)
I would use 4 different colours but even using the
scale_fill_manual
it uses just the first two for all the categories.
Do you think it is possible? Otherwise I don't go further adjusting other details
Thanks
Bruno
this is result I'm stucked with:
https://imgur.com/a/5RJMMiN
df=data.frame(
SOC=rep(c("ENERGIA","ENERGIE"),each=4),
MESE_RIF=rep(c("2019_01","2019_02")),
CHURN_TYPE=rep(c("VOL","NV"),each=2),
CHURN_RATE=rep(c(1.35,1.14,0.23,0.22,1.49,1.54,0.13,0.10)),
NR_LOST=rep(c(8288,7010,1432,1372,2818,2857,247,186)))
#filling colors
fill <- c("#72A3C9", "#B9DDF1","#F07E27","#FFC786")
#graph
ggplot(df, aes(x = SOC, y = CHURN_RATE, fill = CHURN_TYPE)) +
geom_bar(position = "stack", stat = "identity") + facet_wrap( ~ MESE_RIF) +
geom_text(data=df, aes(label = (df$NR_LOST)), size=4,
position=position_stack(vjust = 0.5)) + scale_fill_manual(values=fill)
You have four fill colors, and fill is being mapped to CHURN_RATE, which has two values.
One approach could be to map fill to the combination of CHURN_RATE and SOC, like this.
ggplot(df, aes(x = SOC, y = CHURN_RATE,
fill = interaction(CHURN_TYPE, SOC))) +
...

getting a ggplot2 to display relative contributions of each element to the total

genocount <-ggplot(SNPs, aes(genotype))
genocount + geom_bar()
Creates this Bar Chart:
I would like to be able to display the percentage contribution per chromosome to each genotype in a stacked orientation (those are displayed along the x axis). I've tried some methods that I've seen suggested by others, but they return different errors...I'm not sure if there's an incompatibility with my data set or if it's something else.
Thanks for your help!
library(scales)
ggplot(SNPs, aes(genotype))
genocount + geom_bar(aes(position = "fill", fill = chromosome))+
geom_text(aes(label = percent(chromosome/sum(chromosome))))
scale_y_continuous(labels = percent_format())
I know exactly what you mean.
In the case of a bar chart the code is the following
ggplot(mydf) +
geom_bar(aes(x = var1,y = (..count..)/sum(..count..)),
stat = "count",position = "identity")
In the case of an histogram, the code is the following:
ggplot(data = df) +
geom_histogram(aes(x = var1, y = (..count..)/sum(..count..)),
position = "identity")
Don't ask me what is ..count..
I only know it is a black magic that works

How to label stacked histogram in ggplot

I am trying to add corresponding labels to the color in the bar in a histogram. Here is a reproducible code.
ggplot(aes(displ),data =mpg) + geom_histogram(aes(fill=class),binwidth = 1,col="black")
This code gives a histogram and give different colors for the car "class" for the histogram bars. But is there any way I can add the labels of the "class" inside corresponding colors in the graph?
The inbuilt functions geom_histogram and stat_bin are perfect for quickly building plots in ggplot. However, if you are looking to do more advanced styling it is often required to create the data before you build the plot. In your case you have overlapping labels which are visually messy.
The following codes builds a binned frequency table for the dataframe:
# Subset data
mpg_df <- data.frame(displ = mpg$displ, class = mpg$class)
melt(table(mpg_df[, c("displ", "class")]))
# Bin Data
breaks <- 1
cuts <- seq(0.5, 8, breaks)
mpg_df$bin <- .bincode(mpg_df$displ, cuts)
# Count the data
mpg_df <- ddply(mpg_df, .(mpg_df$class, mpg_df$bin), nrow)
names(mpg_df) <- c("class", "bin", "Freq")
You can use this new table to set a conditional label, so boxes are only labelled if there are more than a certain number of observations:
ggplot(mpg_df, aes(x = bin, y = Freq, fill = class)) +
geom_bar(stat = "identity", colour = "black", width = 1) +
geom_text(aes(label=ifelse(Freq >= 4, as.character(class), "")),
position=position_stack(vjust=0.5), colour="black")
I don't think it makes a lot of sense duplicating the labels, but it may be more useful showing the frequency of each group:
ggplot(mpg_df, aes(x = bin, y = Freq, fill = class)) +
geom_bar(stat = "identity", colour = "black", width = 1) +
geom_text(aes(label=ifelse(Freq >= 4, Freq, "")),
position=position_stack(vjust=0.5), colour="black")
Update
I realised you can actually selectively filter a label using the internal ggplot function ..count... No need to preformat the data!
ggplot(mpg, aes(x = displ, fill = class, label = class)) +
geom_histogram(binwidth = 1,col="black") +
stat_bin(binwidth=1, geom="text", position=position_stack(vjust=0.5), aes(label=ifelse(..count..>4, ..count.., "")))
This post is useful for explaining special variables within ggplot: Special variables in ggplot (..count.., ..density.., etc.)
This second approach will only work if you want to label the dataset with the counts. If you want to label the dataset by the class or another parameter, you will have to prebuild the data frame using the first method.
Looking at the examples from the other stackoverflow links you shared, all you need to do is change the vjust parameter.
ggplot(mpg, aes(x = displ, fill = class, label = class)) +
geom_histogram(binwidth = 1,col="black") +
stat_bin(binwidth=1, geom="text", vjust=1.5)
That said, it looks like you have other issues. Namely, the labels stack on top of each other because there aren't many observations at each point. Instead I'd just let people use the legend to read the graph.

What is the simplest method to fill the area under a geom_freqpoly line?

The x-axis is time broken up into time intervals. There is an interval column in the data frame that specifies the time for each row. The column is a factor, where each interval is a different factor level.
Plotting a histogram or line using geom_histogram and geom_freqpoly works great, but I'd like to have a line, like that provided by geom_freqpoly, with the area filled.
Currently I'm using geom_freqpoly like this:
ggplot(quake.data, aes(interval, fill=tweet.type)) + geom_freqpoly(aes(group = tweet.type, colour = tweet.type)) + opts(axis.text.x=theme_text(angle=-60, hjust=0, size = 6))
I would prefer to have a filled area, such as provided by geom_density, but without smoothing the line:
The geom_area has been suggested, is there any way to use a ggplot2-generated statistic, such as ..count.., for the geom_area's y-values? Or, does the count aggregation need to occur prior to using ggplot2?
As stated in the answer, geom_area(..., stat = "bin") is the solution:
ggplot(quake.data, aes(interval)) + geom_area(aes(y = ..count.., fill = tweet.type, group = tweet.type), stat = "bin") + opts(axis.text.x=theme_text(angle=-60, hjust=0, size = 6))
produces:
Perhaps you want:
geom_area(aes(y = ..count..), stat = "bin")
geom_ribbon can be used to produce a filled area between two lines without needing to explicitly construct a polygon. There is good documentation here.
ggplot(quake.data, aes(interval, fill=tweet.type, group = 1)) + geom_density()
But I don't think this is a meaningful graphic.
I'm not entirely sure what you're aiming for. Do you want a line or bars. You should check out geom_bar for filled bars. Something like:
p <- ggplot(data, aes(x = time, y = count))
p + geom_bar(stat = "identity")
If you want a line filled in underneath then you should look at geom_area which I haven't personally used but it appears the construct will be almost the same.
p <- ggplot(data, aes(x = time, y = count))
p + geom_area()
Hope that helps. Give some more info and we can probably be more helpful.
Actually i would throw on an index, just the row of the data and use that as x, and then use
p <- ggplot(data, aes(x = index, y = count))
p + geom_bar(stat = "identity") + scale_x_continuous("Intervals",
breaks = index, labels = intervals)

Resources