Heatmap with high expression values on the bottom
I'm quite new to Rstudio and I'm trying to make a heatmap using the heatmaply function in r, but in some heatmaps (with different data) the high expression values (in red) show on top, and with another dataset the high expression values show up at the bottom, with low expression values on top, as in the image.
I use the same code for the different datasets
heatmaply(Heatmap_DEXFORM, dendrogram = "row", scale_fill_gradient_fun = scale_fill_gradient2(low="blue",high="red", midpoint=0,limits=c(-4,6))
Is this a result of the way my data is shaped? Is there a command where I can make the heatmap flip so the high expression values show on top, as in my other heatmaps?
Thanks in advance!
heatmaps are typically ordered based on hierarchical clustering rather than the magnitude of the values. To order based on magnitude (high at the top or vice versa) you would need to supply a dendrogram (as Tal suggested) or manually re-order your data (for example, based on the row sums or row means (or column sums/means)).
See the toy example below.
mat <- scale(mtcars)
heatmaply(mat, dend = "none")
heatmaply(mat[order(rowSums(mat)), ], dend = "none")
Related
I want to plot the distribution of the datasets using the histogram in R. I tried using different arguments (default, Freedman-Diaconis, and Scott) to get the best representation. I consider using a log scale later, but first I want to know the raw distribution without any scaling. However, the results look different, why is that? The dataset I use can be downloaded from here data or here data. The code I'm running are
hist(as.matrix(deviation_all_genes_all_spots), xlim = c(-(1*10^(4)), 10^(4.5)), breaks = 200)
result is
hist(as.matrix(deviation_all_genes_all_spots), xlim = c(-(1*10^(4)), 10^(4.5)), breaks = "Scott")
Result is
hist(as.matrix(deviation_all_genes_all_spots), xlim = c(-(1*10^(4)), 10^(4.5)), breaks="Freedman-Diaconis")
result is
Please help. Thank you very much.
Histograms are very sensitive to the choice of cell break points. Even for the same (!) number of cells, the histogram can become considerably different by just a small shift of the cell borders. It is thus generally preferable to use kernel density estimators instead of histograms, because they do not depend on random cell border placement:
# increase n if you have a wide range of values
d <- density(as.matrix(deviation_all_genes_all_spots), n=512)
plot(d$x, d$y)
In your second and third call of hist, you ask for an automatic way to select the number of cells and the cell borders. Obviously, this results in more cells than in your first call with breaks=200. You can query the cells from the return value of hist, e.g.
h <- hist(as.matrix(deviation_all_genes_all_spots))
cat(srintf("number of cells = %i\n", length(h$mids))
I am using a package called Vegan to calculate the PCA of my environmental samples and then plot my PCA values on the ordination space in R, I am using the data from an excel sheet where certain columns are chosen for the PCA calculation as below:
library(vegan)
library("readxl")
library(devtools)
SRSummer21 <- read_excel(file.choose())
SRSummer21PCA <- rda(SRSummer21[,c(3:17)], scale = TRUE)
summary(SRSummer21PCA)
ordiplot (SRSummer21PCA, display = 'species', type = 't')
ef <- envfit (SRSummer21PCA, SRSummer21[,c(41:43)])
plot (ef)
I try to add 3 other variables on top of the PCA calculation, as I don't want them to be in the original calculation, they are "Alkalinity", "Conductivity", and "pH", which are in columns 41 to 43 in my datasheet. Now when I plot the data, it gives me a simple plot with a set of black geom points as my data, however, I can't completely see the arrows of the data I added on top, I need to fit in the image so that all data and arrows can be seen in the space, I also like to change the shape and color of my data points according to their "Treatment" which is either of 3 different values: Cold, Moderate, Warm, which are written in column "Treatment" of my excel datasheet. you can see that part of the alkalinity arrow is present but the rest I can't see
Does anyone know how I can do this?
Thanks
How to plot heatmap with multiple categories in a single cell with ggplot2? Heatmap plot of categorical variables could be done with this code
#data
datf <- data.frame(indv=factor(paste("ID", 1:20),
levels =rev(paste("ID", 1:20))), matrix(sample(LETTERS[1:7], 400, T), ncol = 20))
library(ggplot2);
library(reshape2)
# converting data to long form for ggplot2 use
datf1 <- melt(datf, id.var = 'indv')
ggplot(datf1, aes(variable, indv)) + geom_tile(aes(fill = value),
colour = "white") + scale_fill_manual(values= rainbow (7))
The codes came from here:
http://rgraphgallery.blogspot.com/2013/04/rg54-heatmap-plot-of-categorical.html
But what about multiple categories in a single cell like this? Is it possible to use triangle or other shape as a cell?
http://postimg.org/image/4dudrv0nz/
copy from biostar as Alex Reynolds suggested.
For those interested, this apperas to be Figure 2 from Exome sequencing identifies mutation in CNOT3 and ribosomal genes RPL5 and RPL10 in T-cell acute lymphoblastic leukemia.
I wanted to create a similar plot with ggplot and geom_tile for a bigger collection of genes (few hundreds) but finally decided to use geom_points instead to provide additional information per cell (tile). Also it looks to me a lot like this plot was generated in Excel or some other spreadsheet software (maybe along those lines https://www.youtube.com/watch?v=0s5OiRMMzuY). The colors in the cells (tiles) do not match those in the legend (suggesting that they have been added separately and not automatically) and there appears to be an erroneous cell (diagonal separating colors -upper left to lower right - different from diagonal in black color - lower left to upper right -).
Hence, my concluding two cents: Doing this automatically is probably very time-consuming and in my opinion makes only sense if you want to do this repeatedly, e.g., on data that is subject to change or on multiple datasets, and/or if you have a larger collections of genes.
Otherwise, following the instructions in the youtube video for a rather small number of cells is likely to be more efficient. Or use geom_point (similar to Adding points to a geom_tile layer in ggplot2 or
Marking specific tiles in geom_tile() / geom_raster()
) to represent information about an additional category (variable).
In any case, should anyone have other suggestions on how to automatically create such a figure, I am more than happy to hear about that.
I am trying to create 2 heatmaps with variable values in R. I would like the colors and values to be scaled so that the values of the two heatmaps will be comparable. Right now I am using the heatmap.2 from the gplot package.
MyHeatMap <- heatmap.2(MyData, trace="none", col=greenred)
My data is in the from of a numeric matrix. I have two of these matrices where the numeric ranges of the values are slightly different and I would like to create quality heatmaps for both (does not have to necessarily be using the sample package).
I've encountered this issue a number of times in my own analyses and here is how I would suggest handling it.
Firstly, set your greenred color variable to have 256 colors with greenred(256).
Then, create a break variable that contains the range of numbers that you would like to split these 256 colors on for both heatmaps (the length will be one more than the length of the color vector). So, for instance, if you wanted the spread to be from -1 to 1 from green to red, respectively, you would do
pairs.breaks = seq(from=-1,to=1,length.out=257)
Then, when calling your heatmaps, use
MyHeatMap1 <- heatmap.2(MyData1, trace="none", col=greenred(256), breaks=pairs.breaks)
MyHeatMap2 <- heatmap.2(MyData2, trace="none", col=greenred(256), breaks=pairs.breaks)
This should produce two heat maps with different data sets that use identical color scales.
Hope this helps!
Ron
I am a newbie to R and I am trying to do some clustering on a data table where rows represent individual objects and columns represent the features that have been measured for these objects. I've worked through some clustering tutorials and I do get some output, however, the heatmap that I get after clustering does not correspond at all to the heatmap produced from the same data table with another programme. While the heatmap of that programme does indicate clear differences in marker expression between the objects, my heatmap doesn't show much differences and I cannot recognize any clustering (i.e., colour) pattern on the heatmap, it just seems to be a randomly jumbled set of colours that are close to each other (no big contrast). Here is an example of the code I am using, maybe someone has an idea on what I might be doing wrong.
mydata <- read.table("mydata.csv")
datamat <- as.matrix(mydata)
datalog <- log(datamat)
I am using log values for the clustering because I know that the other programme does so, too
library(gplots)
hr <- hclust(as.dist(1-cor(t(datalog), method="pearson")), method="complete")
mycl <- cutree(hr, k=7)
mycol <- sample(rainbow(256)); mycol <- mycol[as.vector(mycl)]
heatmap(datamat, Rowv=as.dendrogram(hr), Colv=NA,
col=colorpanel(40, "black","yellow","green"),
scale="column", RowSideColors=mycol)
Again, I plot the original colours but use the log-clusters because I know that this is what the other programme does.
I tried to play around with the methods, but I don't get anything that would at least somehow look like a clustered heatmap. When I take out the scaling, the heatmap becomes extremely dark (and I am actually quite sure that I have somehow to scale or normalize the data by column). I also tried to cluster with k-means, but again, this didn't help. My idea was that the colour scale might not be used completely because of two outliers, but although removing them slightly increased the range of colours plotted on the heatmap, this still did not reveal proper clusters.
Is there anything else I could play around with?
And is it possible to change the colour scale with heatmap so that outliers are found in the last bin that has a range of "everything greater than a particular value"? I tried to do this with heatmap.2 (argument "breaks"), but I didn't quite succeed and also I didn't manage to put the row side colours that I use with the heatmap function.
If you are okay with using heatmap.2 from the gplots package that will allow you to add breaks to assign colors to ranges represented in your heatmap.
For example if you had 3 colors blue, white, and red with the values going from low to high you could do something like this:
my.breaks <- c(seq(-5, -.6, length.out=6),seq(-.5999999, .1, length.out=4),seq(.100009,5, length.out=7))
result <- heatmap.2(mtscaled, Rowv=T, scale='none', dendrogram="row", symm = T, col=bluered(16), breaks=my.breaks)
In this case you have 3 sets of values that correspond to the 3 colors, the values will differ of course depending on what values you have with your data.
One thing you are doing in your program is to call hclust on your data then to call heatmap on it, however if you look in the heatmap manual page it states:
Defaults to hclust.
So I don't think you need to do that. You might want to take a look at some similar questions that I had asked that might help to point you in the right direction:
Heatmap Question 1
Heatmap Question 2
If you post an image of the heatmap you get and an image of the heatmap that the other program is making it will be easier for us to help you out more.