Superscripts in heat plot labels in ggplot r - r

Good morning,
I am making a heat map in ggplot of correlations between specific phenotypes. I would like to label each tile with the R^2 for the association.
I have a correlation matrix, max_all, which looks like this:
phenolist2 pheno1 pheno2 pheno3 pheno4 pheno5
max.pheno1 pheno1 0.05475998 0.05055959 0.05056578 0.10330301 0.05026997
max.pheno2 pheno2 0.15743312 0.05036100 0.05151750 0.04880302 0.31008809
max.pheno3 pheno3 0.05458550 0.07672537 0.04043422 0.16845294 0.14268895
max.pheno4 pheno4 0.05484327 0.04391523 0.05151107 0.09521869 0.19776296
max.pheno5 pheno5 0.08658449 0.05183693 0.16292683 0.22369817 0.53630569
Otherwise, my code is as follows:
tmp_Rsq <- melt(max_all)
tmp_Rsq <- ddply(tmp_Rsq, .(variable), transform, rescale=rescale(value))
labels_Rsq <- expression(paste(R^2, " = ", format(tmp_Rsq$value, digits=2), sep=""))
ggplot(tmp, aes(variable, phenolist2)) +
geom_tile(aes(fill =-log10(value)), colour = "white") +
geom_text(aes(label=as.character(labels_Rsq), parse = TRUE, size=4)) +
scale_fill_gradientn(colours = myPalette(101), name="-log10(P)", limits=c(0 , 3.5)) +
theme(axis.title.x = element_blank(), axis.title.y=element_blank(),
plot.title=element_text(size=20))+
theme(axis.text = element_text(colour="black", face="bold"))
My problem is that I can not get the expression to write out so that 2 is a superscript of R.
I realize there are a number of questions on this website addressing similar issues, for example ggplot2 two-line label with expression, Combining paste() and expression() functions in plot labels and Adding Regression Line Equation and R2 on graph but I have been unable to get the solutions suggested in these answers to apply to my case (likely because I have been trying to use a vector of labels).
Thanks a lot for your help.

Parse needs to be outside the aes, and the labels need to be a character vector.
labels_Rsq <- paste0("R^2 ==", format(tmp_Rsq$value, digits=2))
> head(labels_Rsq)
[1] "R^2 ==0.055" "R^2 ==0.157" "R^2 ==0.055" "R^2 ==0.055" "R^2 ==0.087" "R^2 ==0.051"
ggplot(tmp_Rsq, aes(variable, phenolist2)) +
geom_tile(aes(fill =-log10(value)), colour = "white") +
geom_text(aes(label=as.character(labels_Rsq)), parse = TRUE, size=4) +
# scale_fill_gradientn(colours = myPalette(101), name="-log10(P)", limits=c(0 , 3.5)) +
theme(axis.title.x = element_blank(), axis.title.y=element_blank(),
plot.title=element_text(size=20))+
theme(axis.text = element_text(colour="black", face="bold"))

Related

How to display the variable actual value in ggplot legend

I was trying to do a ggplot scatter plot with var_x, var_y, and var_group as user-defined parameter. The code is as below
var_x='Expression'
var_y='Purity'
var_group = c('Type', 'Stage')
ggplot(data=d, mapping = aes(x=get(var_x), y=get(var_y), color=get(var_group[2]))) +
geom_point(size=2)+
xlab(var_x) +
ylim(0,1) + ylab(var_y)+
theme(legend.position = "top",
legend.text=element_text(size=40),
axis.text = element_text(size=30))+
ggtitle(paste0("Gene expression vs tumor purity in TCGA"))+
stat_poly_line() + stat_poly_eq()+
facet_wrap(.~get(var_group[1]), ncol=wNum, scale='fixed')
It's working but the legend is showing as "get(var_group[2]", while I'd like it to be displayed as the real value ("Type" in my test case). How can I do that?

How to illustrate non available data points in a different shape using ggplot2?

Is there a way to change the shape of the points for missing data in R? I am plotting .csv files like this one in a lollipop style.
Name,chr,Pos,Reads...ME_016,Reads...ME_017,Reads...ME_018,Reads...ME_019
cg01389728,chr10,6620395,33.82,41.38,41.38,38.46
cg01389728,chr10,6620410,0,-,-,-
cg01389728,chr10,6620430,0,0,-,-
cg01389728,chr10,6620447,0,-,0,-
cg01389728,chr10,6620478,0,-,-,-
cg01389728,chr10,6620510,28.33,29.85,25.64,28.13
cg01389728,chr10,6620520,0,0,-,0
cg01389728,chr10,6620531,0,-,50,-
Using ggplot2, my graphs are created with this:
dataset <-read.table("testset", sep=",",na.strings="-", header=TRUE)
dataset <- subset(dataset, select=c(-Name, -chr))
dataset <- melt(dataset, id.vars="Pos")
dataset$variable <- gsub("\\.\\.\\.","_",dataset$variable)
xaxes <- unique(dataset$Pos)
dataset$Pos <- as.factor(dataset$Pos)
ggplot(dataset, aes(x=Pos, y=variable,fill=cut(value, breaks=10))) + geom_point(size=4, shape=21) + geom_line() + scale_fill_discrete(labels=c("0-10%","10-20%","20-30%","30-40%","40-50%","50-60%","60-70%","70-80%","80-90%","90-100%")) +
xlab("CpG Positions") +
ylab("Sample") +
labs(fill="Coverage in %") +
theme_bw() +
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5),plot.title = element_text(vjust=2),axis.title.x = element_text(vjust=-0.5),axis.title.y = element_text(vjust=1.5))
However, I want to set the shape of the missing points ("-") in the plot to an "x", (shape=4) and show them also in the legend.
I've tried approaches like:
scale_fill_manual(values=c(value, NA))
or:
scale_shape_manual(values=c(21,4))
By default, the "-" are also shown with shape 21 and grey colour. There must be a way to manipulate this? Writing a method like this might be the trick, but how to call it for the whole column?
formas <- function(x){
+ if(is.na(x)) forma <- 4
+ if(!is.na(x)) forma <- 21
+ return(forma)
+ }
This comes pretty close, I think.
ggplot(dataset, aes(x=Pos, y=variable,
color=cut(value, breaks=10),
shape=ifelse(is.na(value),"Missing","Present"))) +
geom_point(size=4) +
geom_line() +
scale_shape_manual(name="",values=c(Missing=4,Present=19))+
scale_color_discrete(labels=c("0-10%","10-20%","20-30%","30-40%","40-50%","50-60%","60-70%","70-80%","80-90%","90-100%")) +
xlab("CpG Positions") +
ylab("Sample") +
labs(color="Coverage in %") +
theme_bw() +
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5),plot.title = element_text(vjust=2),axis.title.x = element_text(vjust=-0.5),axis.title.y = element_text(vjust=1.5))
Change are:
used color instead of fill, with shape=19 for points with data
added shape aesthetic to ggplot(...) call.
removed shape=21 from geom_point(...) call.
added scale_shape_manual(...) to define the shapes for Missing and Present, and turn off the guide label.
I know you wanted filled points with a black outline (it does look better), but when I tried that with the added shape aesthetic, the fill legend does not display the colors correctly. Try it yourself.
Here is another approach that comes closer to producing the graph you specified (circular points with black outline and fill color determined by coverage).
fill.colors <- hcl(h=seq(15, 375, length=11), l=65, c=100)[1:10]
ggplot(dataset, aes(x=Pos, y=variable,
fill=cut(value, breaks=10),
shape=ifelse(is.na(value),"Missing","Present"))) +
geom_point(size=4) +
geom_line() +
scale_fill_manual(name="Coverage in %",
values=fill.colors,
labels=c("0-10%","10-20%","20-30%","30-40%","40-50%","50-60%","60-70%","70-80%","80-90%","90-100%"),
drop=FALSE) +
scale_shape_manual(name="",values=c(Missing=4,Present=21),limits=c("Missing"))+
xlab("CpG Positions") +
ylab("Sample") +
labs(color="Coverage in %") +
theme_bw() +
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5),
plot.title = element_text(vjust=2),
axis.title.x = element_text(vjust=-0.5),
axis.title.y = element_text(vjust=1.5))+
guides(fill=guide_legend(override.aes=list(colour=fill.colors),order=1))
The problem in the other answer with using point shape 21 and the fill aesthetic is that, while the fill colors are displayed correctly in the plot, they are not displayed correctly in the legend. One way around that is to force ggplot to set the legend fill colors using
guides(fill=guide_legend(override.aes=list(colour=fill.colors),order=1))
Unfortunately, to do that you have to specify the fill colors manually (so that the actual fill and the override fill are the same). This code does that using
fill.colors <- hcl(h=seq(15, 375, length=11), l=65, c=100)[1:10]
which creates a color palette that mimics the ggplot default. You could of course use your own color palette here.
While this does come closer to your original intent, I actually think the other answer provides a better data visualization. The black outlines around the points, while "attractive", make it much more difficult to distinguish between fill colors, especially with 10 possible colors (which is at the edge of discernability anyway).
I can't see, why this is not working:
fill.colors <- hcl(h=seq(15, 375, length=11), l=65, c=100)[1:10]
ggplot(dataset, aes(x=Pos, y=variable
,color=cut(value, breaks=c(-0.01,10,20,30,40,50,60,70,80,90,100))
,shape=ifelse(is.na(value),"Missing","Present"))) +
geom_point(size=4) +
scale_shape_manual(name="",values=c("Missing"=4,"Present"=19),limits=c("Missing"))+
scale_color_manual(name="Coverage in %",
values=ifelse(is.na(dataset$value),"grey",fill.colors),
labels=c("0-10%","10-20%","20-30%","30-40%","40-50%","50-60%","60-70%","70-80%","80-90%","90-100%"),drop=FALSE) +
theme_bw() +
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5),
plot.title = element_text(vjust=2),
axis.title.x = element_text(vjust=-0.5),
axis.title.y = element_text(vjust=1.5)) +
xlab("CpG Positions") +
ylab("Sample") +
labs(color="Coverage in %") +
guides(fill=guide_legend(override.aes=list(colour=fill.colors),order=1))
NA values are not shown anymore with an X, and instead of displaying them in "grey", the class 90-100% will be shown in grey. No error message is shown - what is the problem?

ggplot: legend for a plot the combines bars / lines?

I have a empirical PDF + CDF combo I'd like to plot on the same panel. distro.df has columns pdf, cdf, and day. I'd like the pdf values to be plotted as bars, and the cdf as lines. This does the trick for making the plot:
p <- ggplot(distro.df, aes(x=day))
p <- p + geom_bar(aes(y=pdf/max(pdf)), stat="identity", width=0.95, fill=fillCol)
p <- p + geom_line(aes(y=cdf))
p <- p + xlab("Day") + ylab("")
p <- p + theme_bw() + theme_update(panel.background = element_blank(), panel.border=element_blank())
However, I'm having trouble getting a legend to appear. I'd like a line for the cdf and a filled block for the pdf. I've tried various contortions with guides, but can't seem to get anything to appear.
Suggestions?
EDIT:
Per #Henrik's request: to make a suitable distro.df object:
df <- data.frame(day=0:10)
df$pdf <- runif(length(df$day))
df$pdf <- df$pdf / sum(df$pdf)
df$cdf <- cumsum(df$pdf)
Then the above to make the plot, then invoke p to see the plot.
This generally involves moving fill into aes and using it in both the geom_bar and geom_line layers. In this case, you also need to add show_guide = TRUE to geom_line.
Once you have that, you just need to set the fill colors in scale_fill_manual so CDF doesn't have a fill color and use override.aes to do the same thing for the lines. I didn't know what your fill color was, so I just used red.
ggplot(df, aes(x=day)) +
geom_bar(aes(y=pdf/max(pdf), fill = "PDF"), stat="identity", width=0.95) +
geom_line(aes(y=cdf, fill = "CDF"), show_guide = TRUE) +
xlab("Day") + ylab("") +
theme_bw() +
theme_update(panel.background = element_blank(),
panel.border=element_blank()) +
scale_fill_manual(values = c(NA, "red"),
breaks = c("PDF", "CDF"),
name = element_blank(),
guide = guide_legend(override.aes = list(linetype = c(0,1))))
I'd still like a solution to the above (and will checkout #aosmith's answer), but I am currently going with a slightly different approach to eliminate the need to solve the problem:
p <- ggplot(distro.df, aes(x=days, color=pdf, fill=pdf))
p <- p + geom_bar(aes(y=pdf/max(pdf)), stat="identity", width=0.95)
p <- p + geom_line(aes(y=cdf), color="black")
p <- p + xlab("Day") + ylab("CDF")
p <- p + theme_bw() + theme_update(panel.background = element_blank(), panel.border=element_blank())
p
This also has the advantage of displaying some of the previously missing information, namely the PDF values.

How can I make a Frequency distribution bar plot in ggplot2?

Sample of the dataset.
nq
0.140843018
0.152855833
0.193245919
0.156860105
0.171658019
0.186281942
0.290739146
0.162779517
0.164694042
0.171658019
0.195866609
0.166967913
0.136841748
0.108907644
0.264136384
0.356655651
0.250508305
I would like to make a Percentage Bar plot/Histogram like this question: RE: Alignment of numbers on the individual bars with ggplot2
The max value of NQ for full dataset is 21 and minimum value is 0.00005
But I am unable to adapt the code as I don't have a Freq column and I have one series.
I have made a mockup of the figure I am trying to make.
Could you please help?
Would that work for you?
nq <- read.table(text = "
0.140843018
0.152855833
0.193245919
0.156860105
0.171658019
0.186281942
0.290739146
0.162779517
0.164694042
0.171658019
0.195866609
0.166967913
0.136841748
0.108907644
0.264136384
0.356655651
0.250508305", header = F) # Your data
nq$V2 <- cut(nq$V1, 5, include.lowest = T)
nq2 <- aggregate(V1 ~ V2, nq, length)
nq2$V3 <- nq2$V1/sum(nq2$V1)
library(ggplot2)
ggplot() + geom_bar(data = nq2, aes(V2, V1), stat = "identity", width=1, fill = "white", col = "black", size = 2) +
geom_text(vjust=1, fontface="bold", data = nq2, aes(label = paste(sprintf("%.1f", V3*100), "%", sep=""), x = V2, y = V1 + 0.4), size = 5) +
theme_bw() +
scale_x_discrete(expand = c(0,0), labels = sprintf("%.3f",seq(min(nq$V1), max(nq$V1), by = max(nq$V1)/6))) +
ylab("No. of Cases") + xlab("") +
scale_y_continuous(expand = c(0,0)) +
theme(
axis.title.y = element_text(size = 20, face = "bold", angle = 0),
panel.grid.major = element_blank() ,
panel.grid.minor = element_blank() ,
panel.border = element_blank() ,
panel.background = element_blank(),
axis.line = element_line(color = 'black', size = 2),
axis.text.x = element_text(face="bold"),
axis.text.y = element_text(face="bold")
)
I thought this would be easy, but it turned out to be frustrating. So perhaps the "right" way is to transform your data before using ggplot as it looks like #DavidArenburg has done. But, if you feel like hacking ggplot, here's what I ended up doing.
First, some sample data.
set.seed(15)
dd<-data.frame(x=sample(1:25, 100, replace=T, prob=25:1))
br <- seq(0,25, by=5) # break points
My first attempt was
library(ggplot2)
ggplot(dd, aes(x)) +
stat_bin(position="stack", breaks=br) +
geom_text(aes(y=..count.., label=..density..*..width.., ymax=..count..+1),
vjust=-.5, breaks=br, stat="bin")
but that didn't make "pretty labels"
so i thought i'd use the percent() function from the scales package to make it pretty. However, silly ggplot doesn't really make it possible to use functions with ..().. variables because it evaluates them in the data.frame only (then the empty baseenv()). It doesn't have a way to find the function you use. So this is when I turned to hacking. First i'll extract the "Layer" definition from ggplot and the map_statistic from it. (NOTE: this was done with "ggplot2_1.0.0" and is specific to that version; this is a private function that may change in future releases)
orig.map_statistic <- ggplot2:::Layer$map_statistic
new.map_statistic <- orig.map_statistic
body(new.map_statistic)[[9]]
# stat_data <- as.data.frame(lapply(new, eval, data, baseenv()))
here's the line that's causing grief I would prefer it the function resolved other names in the plot environment that are not found in the data.frame. So I decided to change it with
body(new.map_statistic)[[9]] <- quote(stat_data <- as.data.frame(lapply(new, eval, data, plot$plot_env)))
assign("map_statistic", new.map_statistic, envir=ggplot2:::Layer)
So now I can use functions with ..().. variables. So I can do
library(scales)
ggplot(dd, aes(x)) +
stat_bin(position="stack", breaks=br) +
geom_text(aes(y=..count.., ymax=..count..+2,
label=percent(..density..*..width..)),
vjust=-.5, breaks=br, stat="bin")
to get
So i'm not sure why ggplot has this default behavior. There could be some good reason for it but I don't know what it is. This does change how ggplot will behave for the rest of the session. You can change back to default with
assign("map_statistic", orig.map_statistic, envir=ggplot2:::Layer)

Set different bandwidths in ggplot2 facet_grid plotting

Suppose I have a data set called "data", and is generated through:
library(reshape2) # Reshape data, needed in command "melt"
library(ggplot2) # apply ggplot
density <-rep (0.05, each=800)
tau <-rep (0.05, each=800)
# define two different models: network and non-network
model <-rep(1:2, each=400, times=1)
## Create data and factors for the plot
df <- melt(rnorm(800, -3, 0.5))
data <- as.data.frame(cbind(density, tau, model, df$value))
data$density <- factor(data$density,levels=0.05,
labels=c("Density=0.05"))
data$tau <- factor(data$tau,levels=0.05,
labels=c("tau=0.05"))
data$model<- factor(data$model,levels=c(1,2),
labels=c("Yes",
"No"))
ggplot(data=data, aes(x=V4, shape=model, colour=model, lty=model)) +
stat_density(adjust=1, geom="line",position="identity") +
facet_grid(tau~density, scale="free") +
geom_vline(xintercept=-3, lty="dashed") +
ggtitle("Kernel Density") +
xlab("Data") +
ylab("Kernel Density") +
theme(plot.title=element_text(face="bold", size=17), # change fond size of title
axis.text.x= element_text(size=14),
axis.text.y= element_text(size=14),
legend.title=element_text(size=14),
legend.text =element_text(size=12),
strip.text.x=element_text(size=14), # change fond size of x_axis
strip.text.y=element_text(size=14)) # change fond size of y_axis
Looking at the data, variable V4 is separated into two subsets by the model (Yes [1:400] and No [401:800]), and the kernel density is plotted without change the original bandwidth since adjust=1.
What I want to do is: for the Yes model, the bandwidth changes to 10 times of the original, but for the No model, the bandwidth keeps unchanged. Can I do something like letting the adjust=c(10, 1)? I know how to realize this by plot()+lines(), but I want to do this in ggplot() for further analysis.
I wouldn't recommend this, since it creates a very misleading plot, but you can do it with two calls to stat_density(...).
ggplot(data=data, aes(x=V4, shape=model, colour=model, lty=model)) +
stat_density(data=data[data$model=="Yes",], adjust=10,
geom="line",position="identity") +
stat_density(data=data[data$model=="No",], adjust=1,
geom="line",position="identity") +
facet_grid(tau~density, scale="free") +
geom_vline(xintercept=-3, lty="dashed") +
ggtitle("Kernel Density") +
xlab("Data") +
ylab("Kernel Density") +
theme(plot.title=element_text(face="bold", size=17),
axis.text.x= element_text(size=14),
axis.text.y= element_text(size=14),
legend.title=element_text(size=14),
legend.text =element_text(size=12),
strip.text.x=element_text(size=14),
strip.text.y=element_text(size=14))

Resources