geom_text with faceted barplot messes up scales - r

Trying to facet out a barplot of survey responses. Each facet should have the percent answers to the question while annotating each facet with the number of respondants. Everything seems to work fine until add the annotation with geom_text which messes up the scale. Cant figure out why.
This seems to work fine
ggplot(dat2, aes(Q1a)) +geom_bar(aes(y = ..prop.., group = D3b) ) +
coord_flip() + facet_wrap(~ D3b) + labs(x="",y="") +
scale_y_continuous(labels = percent)
and gets me this, which has the barplots scaled the way I need them
But when I add geom_text the scale gets all messed up
ggplot(dat2, aes(Q1a)) +geom_bar(aes(y = ..prop.., group = D3b) )+
coord_flip() + facet_wrap(~ D3b) + labs(x="",y="") +
scale_y_continuous(labels = percent)+ geom_text(data=dat2.cor,
aes(x=0,y=0,label=n), nudge_x = 10, nudge_y=20,
colour="black",inherit.aes=FALSE)
Like so:
dat2.cor is a separate dataset I created with data annotate the facets which looks like this:
Anybody have any idea whats going on here?

Related

Change axis breaks/limits of ggplot with geom_col

I am having problems with changing the axis ticks in a barplot. I am fairly new in using ggplot so the answer might be very obvious.
Here is some data (yes it is strange, but designed to mimic the original dataset I have, which I am not allowed to share):
lab='this is just a very long example text and it will be longer and longer and longer and longer and longer and longer and longer and longer and longer and end'
number=1:20
n=unlist(lapply(number,paste,value=lab))
a=round(runif(n=20,min=-48000,max=-40000))
b=round(runif(n=20,min=-48000,max=-40000))
c=round(runif(n=20,min=-48000,max=-40000))
d=data.frame(cbind(n,a,b,c))
df=pivot_longer(d,cols=c('a','b','c'))
l1=round(as.numeric(min(df$value))/1000 )*1000+1000
l2=round(as.numeric(max(df$value))/1000 )*1000-1000
lim=seq(from=l1,to=l2,by=-1000)
colScale <- scale_fill_manual(name = "n",values = c(rainbow(nrow(df)/3)))
from which I create a barplot
p1=ggplot(df, aes(name, value, fill = as.factor(n))) +
geom_col(position = "dodge",colour='black') +
#scale_y_continuous(breaks = lim , labels = as.character(lim)) +
coord_flip() +
theme_bw() +
theme(axis.text.x=element_text(angle=90),axis.title.x=element_text(face='bold')) +
theme(axis.text.y=element_text(angle=90,size=15)) +
theme(legend.title=element_blank()) +
labs(x = "",y="test") +
colScale +
guides(fill=guide_legend(ncol=1)) +
ggtitle('something') +
theme(plot.title = element_text(hjust = 0.5,size=20))
which is this
that is basically working as I wanted, but the scaling of the x-axis is very unpleasant. What I want instead is an axis, where the breaks and labels are equal to the vector 'lim'. What I understood was that it should be possible to do this by scaling the respective axis as in the commented line. But when I'm trying this I get the error 'Discrete value supplied to continuous scale'. I tried to change the scale to 'scale_y_discrete' but then the ticks disappear completely. I tried everything I could find but nothing worked, so what is wrong?
Based on the answers I changed the plot definition to:
p1=ggplot(df, aes(name, as.numeric(value), fill = as.factor(n))) +
geom_col(position = "dodge",colour='black') +
scale_y_continuous(breaks = lim , labels = as.character(lim)) +
coord_flip() +
theme_bw() +
theme(axis.text.x=element_text(angle=90),axis.title.x=element_text(face='bold')) +
theme(axis.text.y=element_text(angle=90,size=15)) +
theme(legend.title=element_blank()) +
labs(x = "",y="test") +
colScale +
guides(fill=guide_legend(ncol=1)) +
ggtitle('something') +
theme(plot.title = element_text(hjust = 0.5,size=20))
which produced this plot
now I am able to change the axis ticks, but the plot looks nothing like the first one. My goal is to keep the look, meaning showing only the top part of the bars.
I'd suggest converting value to as.numeric (preferably before ggplot, but you can do it within, like below) and using coord_cartesian to specify the "view window". You also might find it simpler to specify your axes in the order you want them, rather than using coord_flip, which is mostly unnecessary since ggplot 3.3.0.
ggplot(df, aes(as.numeric(value), name, fill = as.factor(n))) +
geom_col(position = "dodge",colour='black') +
scale_x_continuous(breaks = lim , labels = as.character(lim)) +
coord_cartesian(xlim = c(min(as.numeric(df$value)), max(as.numeric(df$value))))
# Theming after this up to you

Overlay points (and error bars) over bar plot with position_dodge

I have been trying to look for an answer to my particular problem but I have not been successful, so I have just made a MWE to post here.
I tried the answers here with no success.
The task I want to do seems easy enough, but I cannot figure it out, and the results I get are making me have some fundamental questions...
I just want to overlay points and error bars on a bar plot, using ggplot2.
I have a long format data frame that looks like the following:
> mydf <- data.frame(cell=paste0("cell", rep(1:3, each=12)),
scientist=paste0("scientist", rep(rep(rep(1:2, each=3), 2), 3)),
timepoint=paste0("time", rep(rep(1:2, each=6), 3)),
rep=paste0("rep", rep(1:3, 12)),
value=runif(36)*100)
I have attempted to get the plot I want the following way:
myPal <- brewer.pal(3, "Set2")[1:2]
myPal2 <- brewer.pal(3, "Set1")
outfile <- "test.pdf"
pdf(file=outfile, height=10, width=10)
print(#or ggsave()
ggplot(mydf, aes(cell, value, fill=scientist )) +
geom_bar(stat="identity", position=position_dodge(.9)) +
geom_point(aes(cell, color=rep), position=position_dodge(.9), size=5) +
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_manual(values=myPal) +
scale_color_manual(values=myPal2)
)
dev.off()
But I obtain this:
The problem is, there should be 3 "rep" values per "scientist" bar, but the values are ordered by "rep" instead (they should be 1,2,3,1,2,3, instead of 1,1,2,2,3,3).
Besides, I would like to add error bars with geom_errorbar but I didn't manage to get a working example...
Furthermore, overlying actual value points to the bars, it is making me wonder what is actually being plotted here... if the values are taken properly for each bar, and why the max value (or so it seems) is plotted by default.
The way I think this should be properly plotted is with the median (or mean), adding the error bars like the whiskers in a boxplot (min and max value).
Any idea how to...
... have the "rep" value points appear in proper order?
... change the value shown by the bars from max to median?
... add error bars with max and min values?
I restructured your plotting code a little to make things easier.
The secret is to use proper grouping (which is otherwise inferred from fill and color. Also since you're dodging on multiple levels, dodge2 has to be used.
When you are unsure about "what is plotted where" in bar/column charts, it's always helpful to add the option color="black" which reveals that still things are stacked on top each other, because of your use of dodge instead of dodge2.
p = ggplot(mydf, aes(x=cell, y=value, group=paste(scientist,rep))) +
geom_col(aes(fill=scientist), position=position_dodge2(.9)) +
geom_point(aes(cell, color=rep), position=position_dodge2(.9), size=5) +
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_brewer(palette = "Set2")+
scale_color_brewer(palette = "Set1")
ggsave(filename = outfile, plot=p, height = 10, width = 10)
gives:
Regarding error bars
Since there are only three replicates I would show original data points and maybe a violin plot. For completeness sake I added also a geom_errorbar.
ggplot(mydf, aes(x=cell, y=value,group=paste(cell,scientist))) +
geom_violin(aes(fill=scientist),position=position_dodge(),color="black") +
geom_point(aes(cell, color=rep), position=position_dodge(0.9), size=5) +
geom_errorbar(stat="summary",position=position_dodge())+
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_brewer(palette = "Set2")+
scale_color_brewer(palette = "Set1")
gives
Update after comment
As I mentioned in my comment below, the stacking of the percentages leads to an undesirable outcome.
ggplot(mydf, aes(x=paste(cell, scientist), y=value)) +
geom_bar(aes(fill=rep),stat="identity", position=position_stack(),color="black") +
geom_point(aes(color=rep), position=position_dodge(.9), size=3) +
facet_grid(timepoint~., scales="free_x", space="free_x") +
scale_y_continuous("% of total cells") +
scale_fill_brewer(palette = "Set2")+
scale_color_brewer(palette = "Set1")

getting a ggplot2 to display relative contributions of each element to the total

genocount <-ggplot(SNPs, aes(genotype))
genocount + geom_bar()
Creates this Bar Chart:
I would like to be able to display the percentage contribution per chromosome to each genotype in a stacked orientation (those are displayed along the x axis). I've tried some methods that I've seen suggested by others, but they return different errors...I'm not sure if there's an incompatibility with my data set or if it's something else.
Thanks for your help!
library(scales)
ggplot(SNPs, aes(genotype))
genocount + geom_bar(aes(position = "fill", fill = chromosome))+
geom_text(aes(label = percent(chromosome/sum(chromosome))))
scale_y_continuous(labels = percent_format())
I know exactly what you mean.
In the case of a bar chart the code is the following
ggplot(mydf) +
geom_bar(aes(x = var1,y = (..count..)/sum(..count..)),
stat = "count",position = "identity")
In the case of an histogram, the code is the following:
ggplot(data = df) +
geom_histogram(aes(x = var1, y = (..count..)/sum(..count..)),
position = "identity")
Don't ask me what is ..count..
I only know it is a black magic that works

ggplot2: geom_bar with group, position_dodge and fill

I am trying to generate a barplot such that the x-axes is by patient with each patient having multiple samples. So for instance (using the mtcars data as a template of what the data would look like):
library("ggplot2")
ggplot(mtcars, aes(x = factor(cyl), group = factor(gear))) +
geom_bar(position = position_dodge(width = 0.8), binwidth = 25) +
xlab("Patient") +
ylab("Number of Mutations per Patient Sample")
This would produce something like this:
With each barplot representing a sample in each patient.
I want to add additional information about each patient sample by using colors to fill the barplots (e.g. different types of mutations in each patient sample). I was thinking I could specify the fill parameter like this:
ggplot(mtcars, aes(x = factor(cyl), group = factor(gear), fill = factor(vs))) +
geom_bar(position = position_dodge(width = 0.8), binwidth = 25) +
xlab("Patient") +
ylab("Number of Mutations per Patient Sample")
But this doesn't produce "stacked barplots" for each patient sample barplot. I am assuming this is because the position_dodge() is set. Is there anyway to get around this? Basically, what I want is:
ggplot(mtcars, aes(x = factor(cyl), fill = factor(vs))) +
geom_bar() +
xlab("Patient") +
ylab("Number of Mutations per Patient Sample")
But with these colors available in the first plot I listed. Is this possible with ggplot2?
I think facets are the closest approximation to what you seem to be looking for:
ggplot(mtcars, aes(x = factor(gear), fill = factor(vs))) +
geom_bar(position = position_dodge(width = 0.8), binwidth = 25) +
xlab("Patient") +
ylab("Number of Mutations per Patient Sample") +
facet_wrap(~cyl)
I haven't found anything related in the issue tracker of ggplot2.
If I understand your question correctly, you want to pass in aes() into your geom_bar layer. This will allow you to pass a fill aesthetic. You can then place your bars as "dodge" or "fill" depending on how you want to display the data.
A short example is listed here:
ggplot(mtcars, aes(x = factor(cyl), fill = factor(vs))) +
geom_bar(aes(fill = factor(vs)), position = "dodge", binwidth = 25) +
xlab("Patient") +
ylab("Number of Mutations per Patient Sample")
With the resulting plot: http://imgur.com/ApUJ4p2 (sorry S/O won't let me post images yet)
Hope that helps!
I have hacked around this a few times by layering multiple geom_cols on top of each other in the order I prefer. For example, the code
ggplot(data, aes(x=cert, y=pct, fill=Party, group=Treatment, shape=Treatment)) +
geom_col(aes(x=cert, y=1), position=position_dodge(width=.9), fill="gray90") +
geom_col(position=position_dodge(width=.9)) +
scale_fill_manual(values=c("gray90", "gray60"))
Allowed me to produce the feature you're looking for without faceting. Notice how I set the background layer's y value to 1. To add more layers, you can just cumulatively sum your variables.
Image of the plot:
I guess, my answer in this post will help you to build the chart with multiple stacked vertical bars for each patient ...
Layered axes in ggplot?
One way I don't see suggested above is to use facet_wrap to group samples by patient and then stack mutations by sample. Removes the need for dodging. Also changed and modified which mtcars attributes used to match question and get more variety in the mutations attribute.
patients <-c('Tom','Harry','Sally')
samples <- c('S1','S2','S3')
mutations <- c('M1','M2','M3','M4','M5','M6','M7','M8')
ds <- data.frame(
patients=patients[mtcars$cyl/2 - 1],
samples=samples[mtcars$gear - 2],
mutations=mutations[mtcars$carb]
)
ggplot(
ds,
aes(
x = factor(samples),
group = factor(mutations),
fill = factor(mutations)
)
) +
geom_bar() +
facet_wrap(~patients,nrow=1) +
ggtitle('Patient') +
xlab('Sample') +
ylab('Number of Mutations per Patient Sample') +
labs(fill = 'Mutation')
Output now has labels that match the specific language of the request...easier to see what is going on.

Enforcing more logarithmic axis ticks under free scales in facet_wrap() of ggplot2

Apologies for not providing my data. It is too big, and with mock data, i wasnt able to reproduce the problem.
If you look at this:
You realize that some facets have no or only one x-tick. I want to enforce at least 2 ticks per facets, even though it is logarithmically scaled. Is it possible?
The graph was produced from this code:
ggplot(data, aes(x = Frequency, y = Treatment)) +
facet_wrap(~ SG, scale = "free_x") +
geom_point(aes(col = factor(Treatment)), shape = "|") +
scale_color_manual(values = somecols, guide=FALSE) +
scale_x_log10()

Resources