I make very slow progress in R but now I'm able to do some stuff.
Right now I'm plotting the effects of 4 treatments on plant growth in one graph. As you can see the errorbars overlap which is why I made them different colors. I think in order to make the graph clearer it's better to use the lower errorbars as "half wiskers" for the lower 2 lines, and the upper errorbars for the top two lines (like I have now), see the attached image for reference
Is that doable with the way my script is set up now?
Here is part of my script of the plot, I have a lot more but this is where I specify the plot itself (leaving out the aesthetics and stuff), thanks in advance:
"soda1" is my altered dataframe, setup in a clear way, "sdtv" are my standard deviations for each timepoint/treatment, "oppervlak" is my y variable and "Measuring Date" is my x variable. "Tray ID" is the treatment, so my grouping variable.
p <- ggplot(soda1, aes(x=reorder(`Measuring Date`, oppervlak), y=`oppervlak`, group=`Tray ID`, fill=`Tray ID`, colour = `Tray ID` )) +
scale_fill_brewer(palette = "Spectral") +
geom_errorbar(data=soda1, mapping=aes(ymin=oppervlak, ymax=oppervlak+sdtv, group=`Tray ID`), width=0.1) +
geom_line(aes(linetype=`Tray ID`)) +
geom_point(mapping=aes(x=`Measuring Date`, y=oppervlak, shape=`Tray ID`))
print(p)
Showing only one side of errorbars can hide an overlap in the uncertainty between the distribution of two or more variables or measurements.
Instead of hiding this overlap, you could adjust the position of your errorbars horizontally very easily by adding position=position_dodge(width=) to your call to geom_errorbar().
For example:
library(ggplot2)
# some random data with two factors
df <- data.frame(a=rep(1:10, times=2),
b=runif(20),
treat=as.factor(rep(c(0,1), each=10)),
errormax=runif(20),
errormin=runif(20))
# plotting both sides of the errorbars, but dodging them horizontally
p <- ggplot(data=df, aes(x=a, y=b, colour=treat)) +
geom_line() +
geom_errorbar(data=df, aes(ymin=b-errormin, ymax=b+errormax),
position=position_dodge(width=0.25))
I would like to add functional information to a HeatMap (geom_tile). I've got the following simplified DataFrame and R code producing a HeatMap and a separate stacked BarPlot (in the right order, corresponding to the HeatMap).
Question:
How can I add the BarPlot to the right edge/side of the Heatmap?? It shouldn't overlap with any of the tiles, and the tiles of the BarPlot should align with the tiles of the HeatMap.
Data:
AccessionNumber <- c('A4PU48','A9YWS0','B7FKR5','G4W9I5','B7FGU7','B7FIR4','DY615543_2','G7I6Q7','G7I9C1','G7I9Z0','A4PU48','A9YWS0','B7FKR5','G4W9I5','B7FGU7','B7FIR4','DY615543_2','G7I6Q7','G7I9C1','G7I9Z0','A4PU48','A9YWS0','B7FKR5','G4W9I5','B7FGU7','B7FIR4','DY615543_2','G7I6Q7','G7I9C1','G7I9Z0','A4PU48','A9YWS0','B7FKR5','G4W9I5','B7FGU7','B7FIR4','DY615543_2','G7I6Q7','G7I9C1','G7I9Z0')
Bincode <- c(13,25,29,19,1,1,35,16,4,1,13,25,29,19,1,1,35,16,4,1,13,25,29,19,1,1,35,16,4,1,13,25,29,19,1,1,35,16,4,1)
MMName <- c('amino acid metabolism','C1-metabolism','protein','tetrapyrrole synthesis','PS','PS','not assigned','secondary metabolism','glycolysis','PS','amino acid metabolism','C1-metabolism','protein','tetrapyrrole synthesis','PS','PS','not assigned','secondary metabolism','glycolysis','PS','amino acid metabolism','C1-metabolism','protein','tetrapyrrole synthesis','PS','PS','not assigned','secondary metabolism','glycolysis','PS','amino acid metabolism','C1-metabolism','protein','tetrapyrrole synthesis','PS','PS','not assigned','secondary metabolism','glycolysis','PS')
cluster <- c(1,2,2,2,3,3,4,4,4,4,1,2,2,2,3,3,4,4,4,4,1,2,2,2,3,3,4,4,4,4,1,2,2,2,3,3,4,4,4,4)
variable <- c('rd2c_24','rd2c_24','rd2c_24','rd2c_24','rd2c_24','rd2c_24','rd2c_24','rd2c_24','rd2c_24','rd2c_24','rd2c_48','rd2c_48','rd2c_48','rd2c_48','rd2c_48','rd2c_48','rd2c_48','rd2c_48','rd2c_48','rd2c_48','rd2c_72','rd2c_72','rd2c_72','rd2c_72','rd2c_72','rd2c_72','rd2c_72','rd2c_72','rd2c_72','rd2c_72','rd2c_96','rd2c_96','rd2c_96','rd2c_96','rd2c_96','rd2c_96','rd2c_96','rd2c_96','rd2c_96','rd2c_96')
value <- c(2.15724042939,1.48366099919,1.29388509992,1.59969471112,1.82681962192,2.13347487296,1.08298157478,1.20709456306,1.02011775131,0.88018823632,1.41435923375,1.31680079684,1.32041325076,1.23402873856,2.04977975574,1.90651971106,0.911615352178,1.05021352328,1.18437303394,1.05620421143,1.02132613918,1.22080237755,1.40759491365,1.43131574695,1.65848581311,1.91886008221,0.639581269674,1.11779720968,1.09406554542,1.02259316617,1.00529867534,1.30885290475,1.39376458384,1.35503544429,1.81418617518,1.92505106722,0.862870707741,1.0832577668,1.03118887309,1.21310404226)
df <- data.frame(AccessionNumber, Bincode, MMName, cluster, variable, value)
HeatMap plot:
hm <- ggplot(df, aes(x=variable, y=AccessionNumber))
hm + geom_tile(aes(fill=value), colour = 'white') + scale_fill_gradient2(low='blue', midpoint=1, high='red')
stacked BarPlot:
bp <- ggplot(df, aes(x=sum(df$Bincode), fill=MMName))
bp + stat_bin(aes(ymax = ..count..), binwidth = 1, geom='bar')
Thank you very much for your help/support!!
The variables of the y-axis are sorted first by increasing "cluster" then alphabetically by "AccessionNumber". This is true for both the HeatMap as well as the BarPlot. The values appear in the same order in both plots, but show two different variables (same amount of rows and in the same order, but different content). The HeatMap displays a continuous variable in contrast to the BarPlot which displays a categorical variable. Therefore, the plots could be combined, displaying additional information.
Please help!
Using the following data set:
day <- gl(8,1,48,labels=c("Mon","Tues","Wed","Thurs","Fri","Sat","Sun","Avg"))
day <- factor(day, level=c("Mon","Tues","Wed","Thurs","Fri","Sat","Sun","Avg"))
month<-gl(3,8,48,labels=c("Jan","Mar","Apr"))
month<-factor(month,level=c("Jan","Mar","Apr"))
snow<-gl(2,24,48,labels=c("Y","N"))
snow<-factor(snow,levels=c("Y","N"))
count <- c(.94,.95,.96,.98,.93,.94,.99,.9557143,.82,.84,.83,.86,.91,.89,.93,.8685714,1.07,.99,.86,1.03,.81,.92,.88,.9371429,.94,.95,.96,.98,.93,.94,.99,.9557143,.82,.84,.83,.86,.91,.89,.93,.8685714,1.07,.99,.86,1.03,.81,.92,.88,.9371429)
d <- data.frame(day=day,count=count,month=month,snow=snow)
I like the y-scale in this graph, but not the bars:
ggplot()+
geom_line(data=d[d$day!="Avg",],aes(x=day, y=count, group=month, colour=month))+
geom_bar(data=d[d$day=="Avg",],aes(x=day, y=count, fill=month),position="dodge", group=month)+
scale_x_discrete(limits=levels(d$day))+
facet_wrap(~snow,ncol=1,scales="free")+
scale_y_continuous(labels = percent_format())
I like the points, but not the scale:
ggplot(data=d[d$day=="Avg",],aes(x=day, y=count, fill=month,group=month,label=month),show_guide=F)+
facet_wrap(~snow,ncol=1,scales="free")+
geom_line(data=d[d$day!="Avg",],aes(x=day, y=count, group=month, colour=month), show_guide=F)+
scale_x_discrete(limits=levels(d$day))+
scale_y_continuous(labels = percent_format())+
geom_point(aes(colour = month),size = 4,position=position_dodge(width=1.2))
How to combine the desirable qualities in the above graphs?
Essentially, I'm asking: How can I graph the points with a varied y-max while setting the y-min to zero?
Note: The solution that I'm aiming to find will apply to about 27 graphs built from one dataframe. So I'll vote up those solutions that avoid alterations to individual graphs. I'm hoping for a solution that applies to all the facet wrapped graphs.
Minor Questions (possibly for a separate post):
- How can I add a legend to each of the facet wrapped graphs? How
can I change the title of the legend to read "Weekly Average"? How
can the shape/color of the lines/points be varied and then reported
in one single legend?
there's expand_limits(y=0), which essentially adds a dummy layer with invisible geom_blank only to stretch the scales.