ggplot2 adding stacked barchart to heatmap - r
I would like to add functional information to a HeatMap (geom_tile). I've got the following simplified DataFrame and R code producing a HeatMap and a separate stacked BarPlot (in the right order, corresponding to the HeatMap).
Question:
How can I add the BarPlot to the right edge/side of the Heatmap?? It shouldn't overlap with any of the tiles, and the tiles of the BarPlot should align with the tiles of the HeatMap.
Data:
AccessionNumber <- c('A4PU48','A9YWS0','B7FKR5','G4W9I5','B7FGU7','B7FIR4','DY615543_2','G7I6Q7','G7I9C1','G7I9Z0','A4PU48','A9YWS0','B7FKR5','G4W9I5','B7FGU7','B7FIR4','DY615543_2','G7I6Q7','G7I9C1','G7I9Z0','A4PU48','A9YWS0','B7FKR5','G4W9I5','B7FGU7','B7FIR4','DY615543_2','G7I6Q7','G7I9C1','G7I9Z0','A4PU48','A9YWS0','B7FKR5','G4W9I5','B7FGU7','B7FIR4','DY615543_2','G7I6Q7','G7I9C1','G7I9Z0')
Bincode <- c(13,25,29,19,1,1,35,16,4,1,13,25,29,19,1,1,35,16,4,1,13,25,29,19,1,1,35,16,4,1,13,25,29,19,1,1,35,16,4,1)
MMName <- c('amino acid metabolism','C1-metabolism','protein','tetrapyrrole synthesis','PS','PS','not assigned','secondary metabolism','glycolysis','PS','amino acid metabolism','C1-metabolism','protein','tetrapyrrole synthesis','PS','PS','not assigned','secondary metabolism','glycolysis','PS','amino acid metabolism','C1-metabolism','protein','tetrapyrrole synthesis','PS','PS','not assigned','secondary metabolism','glycolysis','PS','amino acid metabolism','C1-metabolism','protein','tetrapyrrole synthesis','PS','PS','not assigned','secondary metabolism','glycolysis','PS')
cluster <- c(1,2,2,2,3,3,4,4,4,4,1,2,2,2,3,3,4,4,4,4,1,2,2,2,3,3,4,4,4,4,1,2,2,2,3,3,4,4,4,4)
variable <- c('rd2c_24','rd2c_24','rd2c_24','rd2c_24','rd2c_24','rd2c_24','rd2c_24','rd2c_24','rd2c_24','rd2c_24','rd2c_48','rd2c_48','rd2c_48','rd2c_48','rd2c_48','rd2c_48','rd2c_48','rd2c_48','rd2c_48','rd2c_48','rd2c_72','rd2c_72','rd2c_72','rd2c_72','rd2c_72','rd2c_72','rd2c_72','rd2c_72','rd2c_72','rd2c_72','rd2c_96','rd2c_96','rd2c_96','rd2c_96','rd2c_96','rd2c_96','rd2c_96','rd2c_96','rd2c_96','rd2c_96')
value <- c(2.15724042939,1.48366099919,1.29388509992,1.59969471112,1.82681962192,2.13347487296,1.08298157478,1.20709456306,1.02011775131,0.88018823632,1.41435923375,1.31680079684,1.32041325076,1.23402873856,2.04977975574,1.90651971106,0.911615352178,1.05021352328,1.18437303394,1.05620421143,1.02132613918,1.22080237755,1.40759491365,1.43131574695,1.65848581311,1.91886008221,0.639581269674,1.11779720968,1.09406554542,1.02259316617,1.00529867534,1.30885290475,1.39376458384,1.35503544429,1.81418617518,1.92505106722,0.862870707741,1.0832577668,1.03118887309,1.21310404226)
df <- data.frame(AccessionNumber, Bincode, MMName, cluster, variable, value)
HeatMap plot:
hm <- ggplot(df, aes(x=variable, y=AccessionNumber))
hm + geom_tile(aes(fill=value), colour = 'white') + scale_fill_gradient2(low='blue', midpoint=1, high='red')
stacked BarPlot:
bp <- ggplot(df, aes(x=sum(df$Bincode), fill=MMName))
bp + stat_bin(aes(ymax = ..count..), binwidth = 1, geom='bar')
Thank you very much for your help/support!!
The variables of the y-axis are sorted first by increasing "cluster" then alphabetically by "AccessionNumber". This is true for both the HeatMap as well as the BarPlot. The values appear in the same order in both plots, but show two different variables (same amount of rows and in the same order, but different content). The HeatMap displays a continuous variable in contrast to the BarPlot which displays a categorical variable. Therefore, the plots could be combined, displaying additional information.
Please help!
Related
visualize relationship between categorical variable and frequency of other variable in one graph?
how in R, should I have a histogram with a categorical variable in x-axis and the frequency of a continuous variable on the y axis? is this correct?
There are a couple of ways one could interpret "one graph" in the title of the question. That said, using the ggplot2 package, there are at least a couple of ways to render histograms with by groups on a single page of results. First, we'll create data frame that contains a normally distributed random variable with a mean of 100 and a standard deviation of 20. We also include a group variable that has one of four values, A, B, C, or D. set.seed(950141237) # for reproducibility of results df <- data.frame(group = rep(c("A","B","C","D"),200), y_value = rnorm(800,mean=100,sd = 20)) The resulting data frame has 800 rows of randomly generated values from a normal distribution, assigned into 4 groups of 200 observations. Next, we will render this in ggplot2::ggplot() as a histogram, where the color of the bars is based on the value of group. ggplot(data = df,aes(x = y_value, fill = group)) + geom_histogram() ...and the resulting chart looks like this: In this style of histogram the values from each group are stacked atop each other(i.e. the frequency of group A is added to B, etc. before rendering the chart), which might not be what the original poster intended. We can verify the "stacking" behavior by removing the fill = group argument from aes(). # verify the stacking behavior ggplot(data = df,aes(x = y_value)) + geom_histogram() ...and the output, which looks just like the first chart, but drawn in a single color. Another way to render the data is to use group with facet_wrap(), where each distribution appears in a different facet on one chart. ggplot(data = df,aes(x = y_value)) + geom_histogram() + facet_wrap(~group) The resulting chart looks like this: The facet approach makes it easier to see differences in frequency of y values between the groups.
(ggplot) Pecentage labels in stacked barplot with two categorical variables
I am finding it impossible to create the appropriate labels when plotting two categorical variables in a 100% stacked barplot. Consider the code below (fictional dataset that reproduces my problem): data <- data.frame( gender=sample(c("M", "F"), 40, replace=TRUE), football=sample(c("Yes", "No"), 40, replace=TRUE) ) What I am trying to do is to create a 100% stacked barplot and display labels for each category. I succeeded in creating the plot with the following code (both ways produce the same plot): ggplot(data, aes(gender))+ geom_bar(aes(fill=football), position="fill")+ scale_y_continuous(labels=percent) ggplot(data, aes(gender, ..count..))+ geom_bar(aes(fill=football), position="fill", stat="count")+ scale_y_continuous(labels=percent) Click here to see the graph I understand that to create percentage labels I need first to compute the cumulative sum. However, I do not get to find a way to properly use "cumsum(var)" with a categorical variable. The closest I have got is to the following: ggplot(data, aes(gender))+ geom_bar(aes(fill=football), position="fill")+ geom_text(aes(label=(..count../sum(..count..))*100, fill=football), stat="count")+ scale_y_continuous(labels=percent) But if you reproduce the code above you will see that percentages refer to the total number of observations (and not to the categories within "gender") as well as the Y-axis gets messy. Any help will be truly appreciated. Thanks!
Histogram color fills with categorical variables in R
I am trying to create a plot like this: qplot(carat, data = diamonds, geom = "histogram", fill = color) However, instead of having a quantitative variable for the x-axis, I am using a categorical data. I am using a data frame like this: refBases=c("A","A","A","C","C","C","G","G","G","T","T","T") altBases=c("C","G","T","A","G","T","A","C","T","A","C","G") myDF$ref=refBases myDF$alt=altBases myDF$Freq=c(5,2,3,6,9,6,8,6,7,4,6,4) So, basically, I would like my plot to look the same, except that the x-axis will be four bins from the ref column (A,C,G,T); the y-axis will be the Freq; and the color legend will be the four variables in the alt column (A,C,G,T). So, basically, there will be four ref bins on the x-axis, each divided into three parts along the y-axis, with the color legend indicating the alt value. I get something rather silly when I try what I expect: qplot(ref,Freq,data=myDF,fill=alt)
What you're describing doesn't sound like a histogram (which is a very specific plot for continuous random variables to estimate the kernel density); sounds like you just want a bar chart. I believe this is what you're looking for myDF <- data.frame( ref=c("A","A","A","C","C","C","G","G","G","T","T","T"), alt=c("C","G","T","A","G","T","A","C","T","A","C","G"), Freq=c(5,2,3,6,9,6,8,6,7,4,6,4) ) library(ggplot2) ggplot(myDF, aes(ref, Freq, fill=alt)) + geom_bar(stat="identity", position="dodge")
Problems making a graphic in ggplot
I an working with ggplot. I want to desine a graphic with ggplot. This graphics is with two continuous variables but I would like to get a graphic like this: Where x and y are the continuous variables. My problem is I can't get it to show circles in the line of the plot. I would like the plot to have circles for each pair of observations from the continuous variables. For example in the attached graphic, it has a circle for pairs (1,1), (2,2) and (3,3). It is possible to get it? (The colour of the line doesn't matter.)
# dummy data dat <- data.frame(x = 1:5, y = 1:5) ggplot(dat, aes(x,y,color=x)) + geom_line(size=3) + geom_point(size=10) + scale_colour_continuous(low="blue",high="red") Playing with low/high will change the colours. In general, to remove the legend, use + theme(legend.position="none")
Scatterplot with single regression line despite two groups using ggplot2
I would like to produce a scatter plot with ggplot2, which contains both a regression line through all data points (regardless which group they are from), but at the same time varies the shape of the markers by the grouping variable. The code below produces the group markers, but comes up with TWO regression lines, one for each group. #model=lm(df, ParamY~ParamX) p1<-ggplot(df,aes(x=ParamX,y=ParamY,shape=group)) + geom_point() + stat_smooth(method=lm) How can I program that?
you shouldn't have to redo your full aes in the geom_point and add another layer, just move the shape aes to the geom_point call: df <- data.frame(x=1:10,y=1:100+5,grouping = c(rep("a",10),rep("b",10))) ggplot(df,aes(x=x,y=y)) + geom_point(aes(shape=grouping)) + stat_smooth(method=lm) EDIT: To help with your comment: because annotate can end up, for me anyway, with the same labels on each facet. I like to make a mini data.frame that has my variable for faceting and the facet levels with another column representing the labels I want to use. In this case the label data frame is called dfalbs. Then use this to label data frame to label the facets individually e.g. df <- data.frame(x=1:10,y=1:10,grouping = c(rep("a",5),rep("b",5)),faceting=c(rep(c("oneR2","twoR2"),5))) dflabs <- data.frame(faceting=c("oneR2","twoR2"),posx=c(7.5,7.5),posy=c(2.5,2.5)) ggplot(df,aes(x=x,y=y,group=faceting)) + geom_point(aes(shape=grouping),size=5) + stat_smooth(method=lm) + facet_wrap( ~ faceting) + geom_text(data=dflabs,aes(x=posx,y=posy,label=faceting))