Scaling heat map colours for multiple heat maps - r

So I have a bunch of matrices that I am trying to plot as a heatmaps. I am using the heatmap.2() function in the ggplot2 packaage.
I have been trying for quite some time with it, and I am sure there is a very simple fix, but my issue is this:
How do I keep the colours consistent between heatmaps? For example, to make the values that provide the colours absolute as opposed to relative.
I have tried doing something similar to this question:
R/ggplot: Reuse color key for multiple heat maps
But I was unable to figure out the ggplot function; I kept receiving an error message stating that there were "no layers in plot".
After reading the comments on the above question, I tried using scales::rescale() and discrete_scale() but the former does not remove the problem, while the latter did not work.
I am fully aware that I might be doing something very simple wrong, and just being a bit of an idiot, but for the life of me I can't figure out where I am going wrong.
As for the data itself, I am trying to plot 10 matrices/heatmaps, each 10x10 cells (showing change over time) and the values in the cells range from 1.0 to 1.2.
As an example, this is the code I am using (once I have my 10x10 matrix).
Matrix1<-matrix(data=(runif(100,1.0,1.2)),nrow=10,ncol=10)
heatmap.2(Matrix1, Colv=NA, Rowv=NA, dendrogram="none",
trace="none", key=F, cellnote=round(Matrix1,digits=2),
notecex=1.6, notecol="black",
labRow=seq(10,100,10), labCol=seq(10,100,10),
main="Title1", xlab="Xlab1", ylab="Ylab1"
)
So any help with either figuring out how to create the scaled values for the heatmap.2() function, or how I can use the ggplot() function would be greatly appreciated!

It's important to note that heatmap.2 is not a ggplot2 function. The ggplot2 package is not necessarily compatible with all plotting types. If you look at the ?heatmap.2 help page, in the upper left corner it shows you where the function is from. heatmap.2 {gplots} means that function comes from the gplots package. These are different pacakges so they have different rules how they work.
To get the same colors across different heatmaps, you want to explicitly get the breaks= parameter. By default it splits the observed range of the data into equal chunks. But since each data set may have a different min and max, these chunks may have different start and end points. By specifying breaks, you can make them all consistent. Since your data ranges from 1 to 1.2, you can set
mybreaks <- seq(1.0, 1.2, length.out=7)
and then in your call add
heatmap.2(Matrix1, Colv=NA, Rowv=NA, dendrogram="none",
...
breaks=mybreaks,
...
)
That should make them all match up.

Maybe this will help you. With the following code multiple heatmaps are stored in a list and displayed in a grid later on. This will allow you to control the colours of each heatmap since each heatmap is created separately. So in this case I chose to use green and red for the number range in each chart.
data(mtcars)
require(ggplot2)
require(gridExtra)
myplotslist2 <- list()
var = c("mpg", "wt", "drat")
new = cbind(mtcars, "variable")
new = cbind(car = rownames(mtcars), new)
for (i in 1:length(var)){
t= paste("new[[\"variable\"]] = \"", var[[i]],"\"; a = ggplot(new, aes(variable, car)) + geom_tile(aes(fill = ", var[[i]], "),colour = \"white\") + scale_fill_gradient(low = \"red\", high = \"green\") + theme(axis.title.y=element_blank(), axis.text.y=element_blank(),legend.position=\"none\"); myplotslist2[[i]] = a")
eval(parse(text=t))
}
grid.arrange(grobs=myplotslist2, ncol=length(var))
The result looks like this:
I hope this helps.
I explain more in my blogpost. https://dwh-businessintelligence.blogspot.nl/2016/05/pca-3d-and-k-means.html

Related

Is it possible to over-ride the x axis range in R package ggbio when using autoplot and ensdb transcripts?

I am trying to use ggbio to plot gene transcripts. I want to plot a very specific range so it matches my ggplot2 plots. The problem is my example plot ends up having range of 133,567,500-133,570,000 regardless of the GRange and whether I specify xlim or not.
This example should only plot a small bit of intron (the thin arrowed line) but instead plots the full 2 exons and intron in between. I believe autoplot wants to plot the entire transcript or transcripts present in the range and widens the range to accommodate for that.
library(EnsDb.Hsapiens.v86)
library(ggbio)
ensdb <- EnsDb.Hsapiens.v86
mut<-GRanges("10", IRanges(133568909, 133569095))
gene <- autoplot(ensdb, which=mut, names.expr="gene_name",xlim=c(133568909,133569095))
gene.gg <- gene#ggplot
png("test_gene_plot_5.png")
gene.gg
dev.off()
Is there any way to over-ride this? I've looked at the manual page for autoplot and I couldn't narrow down an option that would fix it. Others have said to use xlim, but that does not seem to change anything
I like ggbio because it can make a ggplot2 object to be plotted along with other ggplot2 objects. I have not seen an example for that with other approaches like Gvis. But I would entertain other approaches if they could be combined with my existing plots.
Thanks!
Amy
It kind of depends wether you want clipped or squished data. Usually autoplot outputs a ggplot object at some point that can be manipulated as such.
For squished data:
library(GenomicRanges) # just to be sure start and end work
gene#ggplot +
scale_x_continuous(limits = c(start(mut), end(mut)), oob = scales::squish)
For clipped data:
gene#ggplot +
coord_cartesian(xlim = c(start(mut), end(mut)))
But to be totally honest, I'm unsure wether this is the most informative way to communicate that you are plotting the internals of an intron.
Alternatively, I've written a gene model geom at some point that doesn't work through the autoplot methods (which can sometimes be a pain if you want to customise everything). Downside is that you'd have to do some manual gene searching and setting aesthetics. Upside is that it works like most other geoms and is therefore easy to combine with some other data.
library(ggnomics) # from: https://github.com/teunbrand/ggnomics
# Finding a gene's exons manually
my_gene <- transcriptsByOverlaps(EnsDb.Hsapiens.v86, mut)
my_gene <- exonsByOverlaps(EnsDb.Hsapiens.v86, my_gene)
my_gene <- as.data.frame(my_gene)
some_other_data <- data.frame(
x = seq(start(mut), end(mut), by = 10),
y = cumsum(rnorm(19))
)
ggplot(some_other_data) +
geom_line(aes(x, y)) +
geom_genemodel(data = my_gene,
aes(xmin = start, xmax = end,
y = max(some_other_data$y) + 1,
group = 1, strand = strand)) +
coord_cartesian(xlim = c(start(mut), end(mut)))
Hope that helped!

How to set heigth of rows grid in graph lines on ggplots (R)?

I'm trying plots a graph lines using ggplot library in R, but I get a good plots but I need reduce the gradual space or height between rows grid lines because I get big separation between lines.
This is my R script:
library(ggplot2)
library(reshape2)
data <- read.csv('/Users/keepo/Desktop/G.Con/Int18/input-int18.csv')
chart_data <- melt(data, id='NRO')
names(chart_data) <- c('NRO', 'leyenda', 'DTF')
ggplot() +
geom_line(data = chart_data, aes(x = NRO, y = DTF, color = leyenda), size = 1)+
xlab("iteraciones") +
ylab("valores")
and this is my actual graphs:
..the first line is very distant from the second. How I can reduce heigth?
regards.
The lines are far apart because the values of the variable plotted on the y-axis are far apart. If you need them closer together, you fundamentally have 3 options:
change the scale (e.g. convert the plot to a log scale), although this can make it harder for people to interpret the numbers. This can also change the behavior of each line, not just change the space between the lines. I'm guessing this isn't what you will want, ultimately.
normalize the data. If the actual value of the variable on the y-axis isn't important, just standardize the data (separately for each value of leyenda).
As stated above, you can graph each line separately. The main drawback here is that you need 3 graphs where 1 might do.
Not recommended:
I know that some graphs will have the a "squiggle" to change scales or skip space. Generally, this is considered poor practice (and I doubt it's an option in ggplot2 because it masks the true separation between the data points. If you really do want a gap, I would look at this post: axis.break and ggplot2 or gap.plot? plot may be too complexe
In a nutshell, the answer here depends on what your numbers mean. What is the story you are trying to tell? Is the important feature of your plots the change between them (in which case, normalizing might be your best option), or the actual numbers themselves (in which case, the space is relevant).
you could use an axis transformation that maps your data to the screen in a non-linear fashion,
fun_trans <- function(x){
d <- data.frame(x=c(800, 2500, 3100), y=c(800,1950, 3100))
model1 <- lm(y~poly(x,2), data=d)
model2 <- lm(x~poly(y,2), data=d)
scales::trans_new("fun",
function(x) as.vector(predict(model1,data.frame(x=x))),
function(x) as.vector(predict(model2,data.frame(y=x))))
}
last_plot() + scale_y_continuous(trans = "fun")
enter image description here

Manually creating an object that looks like a heatmap color key

I'm working on trying to create a key for a heatmap, but as far as I know, I cannot use the existing tools for adding a legend since I've generated the colors myself (I manually turn a scaled variable into rgb values for a short rainbow ( [255,0,0] to [0,0,255] ).
Basically, all I want to do is use the rightmost 10th of the screen to create a rectangle with these 10 colors: "#0000FF", "#0072FF", "#00E3FF", "#00FFAA", "#00FF38", "#39FF00", "#AAFF00", "#FFE200", "#FF7100", "#FF0000"
with three numerical labels - at 0, max/2, and max
In essence, I want to manually produce an object that looks like a rudimentary heatmap color key.
As far as I know, split.screen can only split the screen in half, which isn't what I'm looking for. I want the graphic I already know how to produce to take up the leftmost 90% of the screen, and I want this colored rectangle to take up the other 10%.
Thanks.
EDIT: I greatly appreciate the advice about the best way to the the plot - that said, I still would like to know the best way to do the task originally asked - creating the legend by hand; I already am able to produce the exact heatmap graphic that I'm looking for - the false coloring wasn't the only problem with ggplot that I was having - it was just the final factor convincing me to switch. I need a non ggplot solution.
EDIT #2: This is close to the solution I am looking for, except this only goes up to 10 instead of accepting a maximum value as a parameter (I will be running this code on multiple data-sets, all with different maximum values - I want the legend to reflect this). Additionally, if I change the size of the graph, the key falls apart into disconnected squares.
Take a look at the layouts function (link). I think you want something like this:
layout(matrix(c(1,2), 1, 2, byrow = TRUE), widths=c(9,1))
## plot heatmap
## plot legend
I would also recommend the ggplot2 package and the geom_tile function which will take care of all of this for you.
Assuming your data is in a data frame with the x and y coordinates and heatmap value (e.g. gdat <- data.frame(x_coord=c(1,2,...), y_coord=c(1,1,...), val=c(6,2,...))) Then you should be able to produce your desired heat map plot with the following ggplot command:
ggplot(gdat) + geom_tile(aes(x=x_coord, y=y_coord, fill=val)) +
scale_fill_gradient(low="#0000FF", high="#FF0000")
To get your data into the following format you may want to look into the very useful reshape2 package.
Given a script no ggplot restriction on this answer here is how one could produce the plot with just base R.
colors <- c("#0000FF", "#0072FF", "#00E3FF", "#00FFAA", "#00FF38",
"#39FF00", "#AAFF00", "#FFE200", "#FF7100", "#FF0000")
layout(matrix(c(1,2), 1, 2, byrow = TRUE), widths=c(9,1))
plot(rnorm(20), rnorm(20), col=sample(colors, 20, replace=TRUE))
par(mar=c(0,0,0,0))
plot(x=rep(1,10), y=1:10, col=colors, pch=15, cex=7.1)
You may have to adjust the cex for your device.

Coloring scatterplot in R based on fold enrichment

I'm very new to R and have tried to search around for an answer to my question, but couldn't find quite what I was looking for (or I just couldn't figure out the right keywords to include!). I think this is a fairly common task in R though, I am just very new.
I have a x vs y scatterplot and I want to color those points for which there is at least a 2-fold enrichment, ie where x/y>=2 . Since my values are expressed as log2 values, the the transformed value needs to be x/y>=4.
I currently have the scatterplot plotted with
plot(log2(counts[,40], log2(counts[,41))
where counts is a .csv imported files and 40 & 41 are my columns of interested.
I've also created a column for fold change using
counts$fold<-counts[,41]/counts[,40]
I don't know how to incorporate these two pieces of information... Ultimately I want a graph that looks something like the example here: http://s17.postimg.org/s3k1w8r7j/error_messsage_1.png
where those points that are at least two-fold enriched will colored in blue.
Any help would be greatly appreciated. Thanks!
Is this what you're looking for:
# Fake data
dat = data.frame(x=runif(100,0,50), y = rnorm(100, 10, 2))
plot(dat$x, dat$y, col=ifelse(dat$x/dat$y > 4, "blue", "red"), pch=16)
The ifelse statement creates a vector of "blue" and "red" (or whatever colors you want) based on the values of dat$x/dat$y and plot uses that to color the points.
This might be helpful if you've never worked with colors in R.
Another option is to use ggplot2 instead of base graphics. Here's an example:
library(ggplot2)
ggplot(dat, aes(x,y, colour=cut(x/y, breaks=c(-1000,4,1000),
labels=c("<=4",">4")))) +
geom_point(size=5) +
labs(colour="x/y")

R: how to make multiple plots from one CSV, grouping by a column

I'd like to put multiple plots onto a single visual output in R, based on data that I have in a CSV that looks something like this:
user,size,time
fred,123,0.915022
fred,321,0.938769
fred,1285,1.185608
wilma,5146,2.196687
fred,7506,1.181990
barney,5146,1.860287
wilma,1172,1.158015
barney,5146,1.219313
wilma,13185,1.455904
wilma,8754,1.381372
wilma,878,1.216908
barney,2974,1.223852
I can read this just fine, using, e.g.:
data = read.csv('data.csv')
For the moment, a fairly simple plot is fine, so I'm just trying plot(), without much to it (setting type='o' to get lines and points), and' from solving a past problem, I know that I can do, e.g., the following, to get data for just fred:
plot(data$time[which(data$user == 'fred')], data$size[which(data$user == 'fred')], type='o')
What I'd like, though, is to have the data for each user all showing up on one set of axes, with color coding (and a legend to match users to colors) to identify different user data.
And if another user shows up, I'd like another line to show up, with another color (perhaps recycling if I have too many users at once).
However, just this doesn't do it:
plot(data$size, data$time, type='o',col=c("red", "blue", "green"))
Because it doesn't seem to group by the user.
And just this:
plot(data, type='o')
gives me an error:
Error in plot.default(...) :
formal argument "type" matched by multiple actual arguments
This:
plot(data)
does do something, but not what I want.
I've poked around, but I'm new enough to R that I'm not quite sure how best to search for this, nor where to look for examples that would hit a use-case like this.
I even got somewhat closer with this:
plot(data$size[which(data$user == 'wilma')], data$time[which(data$user == 'wilma')], type='o', col=c('red'))
lines(data$size[which(data$user == 'fred')], data$time[which(data$user == 'fred')], type='o', col=c('green'))
lines(data$size[which(data$user == 'barney')], data$time[which(data$user == 'barney')], type='o', col=c('blue'))
This gives me a plot (which I'd post inline, but as a new user, I'm not allowed to yet):
not-quite-right plot
which is kind of close to what I want, except that it:
doesn't have a legend
has ugly axis labels, instead of just time and size
is scaled to the first plot, and thus is missing data from some of the others
isn't sorted by x-axis, which I could do externally, though I'm guessing I could do it fairly easily in R.
So, the question, ultimately, is this:
What's an easy way to plot data like this which:
has multiple lines based on the labels in the first column of the CSV
uses the same set of axes for the data in columns 2 and 3, regardless of the label
has a legend and color-coding for which label is being used for a particular line (or set of points)
will adapt to adding new labels to the data file, hopefully without change to the R code.
Thanks in advance for any help or pointers on this.
P.S. I looked around for similar questions, and found one that's sort of close, but it's not quite the same, and I failed to figure out how to adapt it to what I'm trying to do.
Good question. This is doable in base plot, but it's even easier and more intuitive using ggplot2. Below is an example of how to do this with random data in ggplot2
First download and install the package
install.packages("ggplot2",repos='http://cran.us.r-project.org')
require(ggplot2)
Next generate the data
a <- c(rep('a',3),rep('b',3),rep('c',3))
b <- rnorm(9,50,30)
c <- rep(seq(1,3),3)
dat <- data.frame(a,b,c)
Finally, make the plot
ggplot(data=dat, aes(x=c, y=b , group=a, colour=a)) + geom_line() + geom_point()
Basically, you are telling ggplot that your x axis corresponds to the c column (dat$c), your y axis corresponds to the b column (y$b) and to group (draw separate lines) by the a column (dat$a). Colour specifies that you want to group colour by the a column as well.
The resulting graph looks like this:

Resources