I work with some rnaseq data, and a need to plot a heatmap with dots at determined transcripts of genes. I can not figure out how to this with ggpplot or pheatmap. So I have to use inkscape to manually put every dot on the plot. It's exausting, and a waste of time. Bellow is the image from inkscape:
I've made the basic plot with this code:
pal <- colorRampPalette(c("blue","white","red"))
a<-pal(200)
my_sample_col <- data.frame(Condition =
c("ALZxCon","PAxCon","PSPxCon"))
rownames(my_sample_col)<- colnames(transcript.table[,1:3])
my_colour <- list(Condition = c(ALZxCon = "lightblue",PAxCon =
"pink",PSPxCon = "yellow"))
pheatmap(transcript.table[,1:3],annotation_col =
my_sample_col,annotation_colors = my_colour[1],
color=a,show_colnames = F,cellheight = 15,cex=1,cluster_rows =
F,cluster_cols = F,
fontsize_row = 10,gaps_col = c(1,2),cellwidth = 15)
Where transcript table is something like this:
log2FC(AZ) log2FC(PA) log2FC(PSP) Sig(AZ) Sig(PA) Sig(PSP)
ABCA7_ENST000002633094 -0.2 -0.3 -0.2 Not Sig FDR<0.05 FDR<0.05
ABCA7_ENST0000043319 -0.6 -0.37 -0.7 FDR<0.05 FDR<0.05 FDR<0.05
I want to generate a heatmap where the square of the transcripts with FDR < 0.05 gets a black dot. Can you guys help with this?
I'm personally not an enormous fan of functions such as pheatmap, precisely because you can't customise every detail you would want. I'll show an alternative with ggplot2.
First things first, ggplot likes data in a long format, which I would do as follows:
# Loading in your data
z <- "log2FC(AZ),log2FC(PA),log2FC(PSP),Sig(AZ),Sig(PA),Sig(PSP)
ABCA7_ENST000002633094,-0.2,-0.3,-0.2,Not Sig,FDR<0.05,FDR<0.05
ABCA7_ENST0000043319,-0.6,-0.37,-0.7,FDR<0.05,FDR<0.05,FDR<0.05"
tab <- read.table(text=z, header = T, sep = ",")
# Converting to long format
lfc <- tab[,1:3]
pval <- tab[,4:6]
colnames(lfc) <- colnames(pval) <- c("AZ", "PA", "PSP")
lfc <- reshape2::melt(as.matrix(lfc))
pval <- reshape2::melt(as.matrix(pval))
df <- cbind(lfc, pval = pval$value)
Which will get us our main ingredients for the heatmap and the significance dots, but we would need a little extra data.frame for some annotation:
anno <- data.frame(x = levels(df$Var2),
y = "Condition")
Now the trick in getting this annotation to work nicely with the heatmap is a package called ggnewscale, which will allow us to set both a continuous fill for the heatmap and a discrete fill for the annotation. What remains is to make the actual plot, wherein I've tried to conserve some aspects of the pheatmap function in your example.
library(ggnewscale)
ggplot(df, aes(Var2, Var1)) +
# Important for ggnewscale is to specify a fill in the layer/geom itself
geom_tile(aes(fill = value),
width = 0.9, colour = "grey50") +
geom_point(data = df[df$pval == "FDR<0.05",]) +
scale_fill_gradientn(colours = c("blue", "white", "red"),
limits = c(-1,1)*max(abs(df$value)),
name = expression(atop("Log"[2]*" Fold","Change"))) +
# Set new scale fill after you've specified the scale for the heatmap
new_scale_fill() +
geom_tile(data = anno, aes(x, y, fill = x),
width = 0.9, height = 0.8, colour = "grey50") +
scale_fill_discrete(name = "Condition") +
scale_x_discrete(name = "", expand = c(0,0)) +
scale_y_discrete(name = "", expand = c(0,0),
limits = c(levels(df$Var1), "Condition"),
position = "right") +
coord_equal() +
theme(panel.background = element_blank(),
axis.ticks = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_text(face = c(rep("plain", nlevels(df$Var1)), "bold")))
Which looks like this:
Mix and match the ggplot code as you please.
Related
I want to plot some point estimates with a couple of interval estimates around them, and then to superimpose the true point values using a different color and size, with a legend for the color.
I've tried lots of things. If I just use a new call to geom_point, I can't figure out how to add a legend. Therefore, my current approach resorts to stacking the data on top of itself, which is clumsy. Even then, the graph comes out wrong with big blue points for the True values, with the desired orange points on top of them.
I'd appreciate any help I can get.
nms <- c("2.5%","25%","50%","75%","97.5%","dose","truep")
a <- c(9.00614679684893e- 44,0.000123271800672435,0.0339603711049475,0.187721170170911,0.67452033450121,5,0.040752445325937)
b <- c(1.59502878028266e-25,0.00328588588499889,0.0738203422543555,0.25210200886225,0.714843425007051,10,0.0885844107052267)
cc <- c(1.41975723605948e-14,0.0184599181547097,0.118284929584256,0.311068595276067,0.74339745948793,15,0.141941915501108)
d <- c(0.0311851190805834,0.154722028150561,0.299318020818234,0.50887634580605,0.838779816278485,25,0.359181624981881)
e <- c(0.0529617924263383,0.289588386297245,0.566777817134668,0.883959271416755,0.999999999999317,40,0.680133380561602)
f <- c(0.0598904847882839,0.327655201251564,0.640100529843672,0.950060245074853,1,50,0.768120635812406)
g <- c(0.0641613025760661,0.355626055560067,0.686504841650593,0.978023943968809,1,60,0.823805809980712)
p <- as.data.frame(t(data.frame(a, b, cc, d, e, f, g)))
names(p) <- nms
# Faff duplicating data
p$truep <- 1.2 * p$truep
p2 <- p
p2[, 1:5] <- p$truep # truep is known, so there are no intervals
p3 <- rbind(p2, p)
p3$wh <- rep((c(2, 3)), each=nrow(p))
p3$col <- rep(c("orange", "blue"), each=nrow(p))
ggplot(p3, aes(dose, `50%`)) +
geom_point(aes(size=wh, color=col)) +
scale_size(range=c(5, 7), guide="none") +
scale_color_manual(name="", labels=c("Prior", "True"), values=c("blue", "orange")) +
geom_pointrange(aes(ymin=`2.5%`, ymax=`97.5%`, x=dose), color="blue") +
geom_pointrange(aes(ymin=`25%`, ymax=`75%`, x=dose), color="blue", size=2) +
geom_point(aes(dose, truep), color="orange") +
theme(axis.text.x=element_text(size=12), axis.title.x=element_text(size=14),
axis.text.y=element_text(size=12), axis.title.y=element_text(size=14),
legend.text=element_text(size=12))
R 3.3.1, ggplot2_2.1.1
Thanks,
Harry
I found a solution by splitting the dataset in two parts:
library(dplyr)
priors <- p%>%
mutate(datatype = 'Prior')
truevals <- p%>%
select(dose, truep)%>%
mutate(datatype = 'True')
ggplot(truevals, aes(x = dose, y = truep, colour = datatype))+
geom_pointrange(data = priors, aes(ymin=`25%`, ymax=`75%`, y = `50%`), size=1.5) +
geom_pointrange(data = priors, aes(ymin=`2.5%`, ymax=`97.5%`, y = `50%`))+
geom_point()+
scale_color_manual(name="", values=c("Prior" = "blue", "True" = "orange")) +
theme(axis.text.x=element_text(size=12), axis.title.x=element_text(size=14),
axis.text.y=element_text(size=12), axis.title.y=element_text(size=14),
legend.text=element_text(size=12))
First we plot the two pointranges based on the dataset with priors. Then the actual values. By adding a row with the datatype to both datasets we can add the legend. The result is this graph:
For the method ggplot2::geom_point() there is a show.legend attribute which is NA by default so setting this to TRUE should help.
You can add a legend using the labels attribute as follows:
ggplot2::scale_fill_manual(values = c("red", "black",
labels = c("Number of people",
"Number of birds"))
You are already doing this with labels=c("Prior", "True")
You can also change the look of the legend with:
ggplot2::theme(legend.position = "bottom",
legend.text = ggplot2::element_text(size = 22),
legend.box = "horizontal",
legend.key = ggplot2::element_blank())
I have a data.frame that I'm trying to plot in a facetted manner with R's ggplot's geom_boxplot:
set.seed(1)
vals <- rnorm(12)
min.vals <- vals-0.5
low.vals <- vals-0.25
max.vals <- vals+0.5
high.vals <- vals+0.25
df <- data.frame(sample=c("c0.A_1","c0.A_2","c1.A_1","c1.A_2","c2.A_1","c2.A_2","c0.B_1","c0.B_2","c1.B_1","c1.B_2","c2.B_1","c2.B_2"),
replicate=rep(c(1,2),6),val=vals,min.val=min.vals,low.val=low.vals,max.val=max.vals,high.val=high.vals,
group=c(rep("A",6),rep("B",6)),cycle=rep(c("c0","c0","c1","c1","c2","c2"),2),
stringsAsFactors = F)
In this example there are two factors which I'd like to facet:
facet.factors <- c("group","cycle")
for(f in 1:length(facet.factors)) df[,facet.factors[f]] <- factor(df[,facet.factors[f]],levels=unique(df[,facet.factors[f]]))
levels.vec <- sapply(facet.factors,function(f) length(levels(df[,f])))
But in other cases I may have only one or more than two factors.
Is there a way to pass to facet_wrap the vector of factors by which to facet and the number of columns?
Here's what I tried, where in addition I created my own colors for each factor level:
library(RColorBrewer,quietly=T)
library(scales,quietly=T)
level.colors <- brewer.pal(sum(levels.vec),"Set2")
require(ggplot2)
ggplot(df,aes_string(x="replicate",ymin="min.val",lower="low.val",middle="val",upper="high.val",ymax="max.val",col=facet.factors,fill=facet.factors))+
geom_boxplot(position=position_dodge(width=0),alpha=0.5,stat="identity")+
facet_wrap(~facet.factors,ncol=max(levels.vec))+
labs(x="Replicate",y="Val")+
scale_x_continuous(breaks=unique(df$replicate))+
scale_color_manual(values=level.colors,labels=unname(unlist(sapply(facet.factors,function(f) levels(df[,f])))),name="factor level")+scale_fill_manual(values=level.colors,labels=unname(unlist(sapply(facet.factors,function(f) levels(df[,f])))),name="factor level")+
theme_bw()+theme(legend.position="none",panel.border=element_blank(),strip.background=element_blank(),axis.title=element_text(size=8))
which obviously throws this error:
Error in combine_vars(data, params$plot_env, vars, drop = params$drop) :
At least one layer must contain all variables used for facetting
Clearly this works:
ggplot(df,aes_string(x="replicate",ymin="min.val",lower="low.val",middle="val",upper="high.val",ymax="max.val",col=facet.factors,fill=facet.factors))+
geom_boxplot(position=position_dodge(width=0),alpha=0.5,stat="identity")+
facet_wrap(group~cycle,ncol=max(levels.vec))+
labs(x="Replicate",y="Val")+
scale_x_continuous(breaks=unique(df$replicate))+
scale_color_manual(values=level.colors,labels=unname(unlist(sapply(facet.factors,function(f) levels(df[,f])))),name="factor level")+scale_fill_manual(values=level.colors,labels=unname(unlist(sapply(facet.factors,function(f) levels(df[,f])))),name="factor level")+
theme_bw()+theme(legend.position="none",panel.border=element_blank(),strip.background=element_blank(),axis.title=element_text(size=8))
But it ignores the colors I'm passing and doesn't add the legend, I imagine since I cannot pass a vector to col and fill in aesthetics, and clearly I have to hard code the facetting.
This doesn't work either for the facetting problem:
ggplot(df,aes_string(x="replicate",ymin="min.val",lower="low.val",middle="val",upper="high.val",ymax="max.val",col=facet.factors,fill=facet.factors))+
geom_boxplot(position=position_dodge(width=0),alpha=0.5,stat="identity")+
facet_wrap(facet.factors[1]~facet.factors[2],ncol=max(levels.vec))+
labs(x="Replicate",y="Val")+
scale_x_continuous(breaks=unique(df$replicate))+
scale_color_manual(values=level.colors,labels=unname(unlist(sapply(facet.factors,function(f) levels(df[,f])))),name="factor level")+scale_fill_manual(values=level.colors,labels=unname(unlist(sapply(facet.factors,function(f) levels(df[,f])))),name="factor level")+
theme_bw()+theme(legend.position="none",panel.border=element_blank(),strip.background=element_blank(),axis.title=element_text(size=8))
So my questions are:
1. Is there a way to pass a vector to facet_wrap?
2. Is there a way to color and fill by a vector of factors rather by single ones?
We cannot specify two colors for coloring/filling to a single box, I suggested that the faceting variables be pasted together as coloring/filling scale:
df$col.fill <- Reduce(paste, df[facet.factors])
facets of facet_wrap accepts both character vector or a one sided formula:
facet.formula <- as.formula(paste('~', paste(facet.factors, collapse = '+')))
So the code finally looks like this:
ggplot(df,
aes_string(
x = "replicate", ymin = "min.val", ymax = "max.val",
lower = "low.val", middle = "val", upper = "high.val",
col = "col.fill", fill = "col.fill"
)) +
geom_boxplot(position = position_dodge(width = 0),
alpha = 0.5,
stat = "identity") +
facet_wrap(facet.factors, ncol = max(levels.vec)) +
# alternatively: facet_wrap(facet.formula, ncol = max(levels.vec)) +
labs(x = "Replicate", y = "Val") +
scale_x_continuous(breaks = unique(df$replicate)) +
theme_bw() +
theme(
#legend.position = "none",
panel.border = element_blank(),
strip.background = element_blank(),
axis.title = element_text(size = 8)
)
The legend is not displayed because you added legend.position = "none",.
BTW, it would definitely improve readibility if you add some space and line break in you code.
I'm struggling with ggplot (I always do). There are a number of very similar questions about forcing ggplot to include zero value categories in legends - here and here (for example). BUT I (think I) have a slightly different requirement to which all my mucking about with scale_x_discrete and scale_fill_manual has not helped.
Requirement: As you can see; the right-hand plot has no data in the TM=5 category - so is missing. What I need is for that right plot to have category 5 shown on the axis but obviously with no points or box.
Current Plot Script:
#data
plotData <- data.frame("TM" = c(3,2,3,3,3,4,3,2,3,3,4,3,4,3,2,3,2,2,3,2,3,3,3,2,3,1,3,2,2,4,4,3,2,3,4,2,3),
"Score" = c(5,4,4,4,3,5,5,5,5,5,5,3,5,5,4,4,5,4,5,4,5,4,5,4,4,4,4,4,5,4,4,5,3,5,5,5,5))
#vars
xTitle <- bquote("T"["M"])
v.I <- plotData$TM
depVar <- plotData$Score
#plot
p <- ggplot(plotData, aes_string(x=v.I,y=depVar,color=v.I)) +
geom_point() +
geom_jitter(alpha=0.8, position = position_jitter(width = 0.2, height = 0.2)) +
geom_boxplot(width=0.75,alpha=0.5,aes_string(group=v.I)) +
theme_bw() +
labs(x=xTitle) +
labs(y=NULL) +
theme(legend.position='none',
axis.text=element_text(size=10, face="bold"),
axis.title=element_text(size=16))
Attempted Solutions:
drop=False to scales (suggested by #Jarretinha here) totally borks margins and x-axis labels
> plot + scale_x_discrete(drop=FALSE) + scale_fill_manual(drop=FALSE)
Following logic from here and manually setting the labels in scale_fill_manual does nothing and results in the same right-hand plot from example above.
> p + scale_fill_manual(values = c("red", "blue", "green", "purple", "pink"),
labels = c("Cat1", "Cat2", "Cat3", "Cat4", "Cat5"),
drop=FALSE)
Playing with this logic and trying something with scale_x_discrete results in a change to category names on x-axis but the fifth is still missing AND the margins (as attempt 1) are borked again. BUT apparent that scale_x_discrete is important and NOT the whole answer
> p + scale_x_discrete(limits = c("Cat1", "Cat2", "Cat3", "Cat4", "Cat5"), drop=FALSE)
ANSWER for above example courtesy of input from #Bouncyball & #aosmith
#data
plotData <- data.frame("TM" = c(3,2,3,3,3,4,3,2,3,3,4,3,4,3,2,3,2,2,3,2,3,3,3,2,3,1,3,2,2,4,4,3,2,3,4,2,3),
"Score" = c(5,4,4,4,3,5,5,5,5,5,5,3,5,5,4,4,5,4,5,4,5,4,5,4,4,4,4,4,5,4,4,5,3,5,5,5,5))
plotData$TM <- factor(plotData$TM, levels=1:5) # add correct (desired number of factors to input data)
#vars
xTitle <- bquote("T"["M"])
v.I <- plotData$TM
depVar <- plotData$Score
myPalette <- c('#5c9bd4','#a5a5a4','#4770b6','#275f92','#646464','#002060')
#plot
ggplot(plotData, aes_string(x=v.I,y=depVar,color=v.I)) +
geom_jitter(alpha=0.8, position = position_jitter(width = 0.2, height = 0.2)) +
geom_boxplot(width=0.75,alpha=0.5,aes_string(group=v.I)) +
scale_colour_manual(values = myPalette, drop=F) + # new line added here
scale_x_discrete(drop=F) + # new line added here
theme_bw() +
labs(x=xTitle) +
labs(y=NULL) +
theme(legend.position='none',
axis.text=element_text(size=10, face="bold"),
axis.title=element_text(size=16))
Here's a workaround you could use:
# generate dummy data
set.seed(123)
df1 <- data.frame(lets = sample(letters[1:4], 20, replace = T),
y = rnorm(20), stringsAsFactors = FALSE)
# define factor, including the missing category as a level
df1$lets <- factor(df1$lets, levels = letters[1:5])
# make plot
ggplot(df1, aes(x = lets, y = y))+
geom_boxplot(aes(fill = lets))+
geom_point(data = NULL, aes(x = 'e', y = 0), pch = NA)+
scale_fill_brewer(drop = F, palette = 'Set1')+
theme_bw()
Basically, we plot an "empty" point (i.e. pch = NA) so that the category shows up on the x-axis, but has no visible geom associated with it. We also define our discrete variable, lets as a factor with five levels when only four are present in the data.frame. The missing category is the letter e.
NB: You'll have to adjust the positioning of this "empty" point so that it doesn't skew your y axis.
Otherwise, you could use the result from this answer to avoid having to plot an "empty" point.
# generate dummy data
set.seed(123)
df1 <- data.frame(lets = sample(letters[1:4], 20, replace = T),
y = rnorm(20), stringsAsFactors = FALSE)
# define factor, including the missing category as a level
df1$lets <- factor(df1$lets, levels = letters[1:5])
# make plot
ggplot(df1, aes(x = lets, y = y)) +
geom_boxplot(aes(fill = lets)) +
scale_x_discrete(drop = F) +
scale_fill_brewer(drop = F, palette = 'Set1') +
theme_bw()
Here is a graph I am trying to develop:
I have row and column coordinate variables, also three quatitative variables (rectheat = to fill the rectangle heatmap,circlesize = size of circles, circlefill = fill color heatmap). NA should be missing represented by a different color (for example gray color).
The following is data:
set.seed (1234)
rectheat = sample(c(rnorm (10, 5,1), NA, NA), 7*14, replace = T)
dataf <- data.frame (rowv = rep (1:7, 14), columnv = rep(1:14, each = 7),
rectheat, circlesize = rectheat*1.5,
circlefill = rectheat*10 )
dataf
Here is code that I worked on:
require(ggplot2)
ggplot(dataf, aes(y = factor(rowv),x = factor(columnv))) +
geom_rect(aes(colour = rectheat)) +
geom_point(aes(colour = circlefill, size =circlesize)) + theme_bw()
I am not sure if geom_rect is appropriate and other part is fine as I could not get any results except errors.
Here it is better to use geom_tile (heatmap).
require(ggplot2)
ggplot(dataf, aes(y = factor(rowv),
x = factor(columnv))) + ## global aes
geom_tile(aes(fill = rectheat)) + ## to get the rect filled
geom_point(aes(colour = circlefill,
size =circlesize)) + ## geom_point for circle illusion
scale_color_gradient(low = "yellow",
high = "red")+ ## color of the corresponding aes
scale_size(range = c(1, 20))+ ## to tune the size of circles
theme_bw()
The graph I'm currently trying to make falls a little between two stools. I want to make a histogram that is composed of stacked and labelled boxes. Here's an example of exactly the sort of thing I'm talking about, taken from a recent article in the New York Times:
http://farm8.staticflickr.com/7109/7026409819_1d2aaacd0a.jpg
Is it possible to achieve this using ggplot2?
To amplify the question somewhat, so far what I have is:
dfr <- data.frame(
name = LETTERS[1:26],
percent = rnorm(26, mean=15)
)
ggplot(dfr, aes(x=percent, fill=name)) + geom_bar() +
stat_bin(geom="text", aes(label=name))
...which I'm clearly doing all wrong. Ultimately what I'd ideally like is something along the lines of the manually-modified graph below, with (say) letters A to M filled one shade and N to Z filled another.
http://farm8.staticflickr.com/7116/7026536711_4df9a1aa12.jpg
Here you go!
set.seed(3421)
# added type to mimick which candidate is supported
dfr <- data.frame(
name = LETTERS[1:26],
percent = rnorm(26, mean=15),
type = sample(c("A", "B"), 26, replace = TRUE)
)
# easier to prepare data in advance. uses two ideas
# 1. calculate histogram bins (quite flexible)
# 2. calculate frequencies and label positions
dfr <- transform(dfr, perc_bin = cut(percent, 5))
dfr <- ddply(dfr, .(perc_bin), mutate,
freq = length(name), pos = cumsum(freq) - 0.5*freq)
# start plotting. key steps are
# 1. plot bars, filled by type and grouped by name
# 2. plot labels using name at position pos
# 3. get rid of grid, border, background, y axis text and lables
ggplot(dfr, aes(x = perc_bin)) +
geom_bar(aes(y = freq, group = name, fill = type), colour = 'gray',
show_guide = F) +
geom_text(aes(y = pos, label = name), colour = 'white') +
scale_fill_manual(values = c('red', 'orange')) +
theme_bw() + xlab("") + ylab("") +
opts(panel.grid.major = theme_blank(), panel.grid.minor = theme_blank(),
axis.ticks = theme_blank(), panel.border = theme_blank(),
axis.text.y = theme_blank())