heatmap with multiple RowSideColors

heatmap with multiple RowSideColors - r

I am creating a heatmap for a given matrix. I also separately have multiple factors to be shown along with the heatmap. Right now I could create one RowSideColors for one factor. But is there a way to create RowSideColors for multiple factors from gplots heatmap.2 function?
In other words, many RowSideColors with the heatmap. Any tips?

Based on what you've posted, I've attempted to include a reproducible example below in case anyone else has a similar question:
require(gplots)
data(mtcars)
df <- as.matrix(mtcars[,8:11])
df = df[order(rownames(df)),] # sorts the rows in alphabetical order
# specifying a column dendrogram
heatmap.2(df, Rowv=FALSE, dendrogram=c("column"))
The resulting heatmap is as follows:

After bit of digging, I found the solution myself that if you specify
tmpSorted = tmp[order(rownames(tmp)),] # sorts alphabetical order
heatmap.2(tmpSorted, Rowv=F .... )
the option Rowv=F works!

Related

How do I add a legend to my heatmap in R?

I have a large dataset of the expression of genes.
The rows are the genes.
The columns are SPECIFIC tissues- so it is the gene expression in that tissue
I'm using the following code to make a heatmap:
heatmap(expression_all_tissues_matrix, scale= "column",col=brewer.pal(9,"Blues"))
I do not know how to make a legend.
I've tried to make the legend/key seperately but I cannot figure out how to use "Blues" in brewer.pal.
Thanks!

The use of the pheatmap package with its eponymous function allows to get what you are looking for. The following code allows you to have the legend on the same graph.
require(pheatmap)
require(RColorBrewer)
pheatmap(as.matrix(expression_all_tissues_matrix),color=brewer.pal(9,"Blues"))
You can also play on several arguments to associate rows and columns by clustering, but if you don't want to classify them, just use the arguments cluster_rows = F and cluster_col = F . Don't forget to normalize the data, it can help you to have a nicer rendering. Use ?pheatmap for more information.

How can I stop sapply dropping my barplot titles?

I'm wanting to make a barplot for the factor variables in my data set. To do this I've been running sapply(data[sapply(data, class)=='factor'],function(x) barplot(table(x))). To my annoyance, the plots remember their factor labels, but none of them have retained a title. How can I fix this without titling each graph by hand?
Currently, I'm getting humorously vague untitled graphs like this:

How about
## extract names
fvars <- names(data)[which(sapply(data,inherits,"factor"))]
## apply barplot() with main=
lapply(fvars, function(x) barplot(table(data[[x]]), main=x))
?
Example data:
data <- mtcars
for (i in c("vs","am","gear","carb")) data[[i]] <- factor(data[[i]])
Note that this creates all the plots at once. If you're working in a GUI with a plot history (RStudio or RGui) you can page back through the graphs. Otherwise, you might want to use par(mfrow=c(nr,nc)) (fill in number of rows and columns) to set up subplots before you start.
The numbers that are returned are the bar midpoints (see ?barplot): you could wrap the barplot() call in invisible() if you don't want to see them.

qqnorm plotting for multiple subsets

I am very new to R. I have figured out how to make qqnorm plots on a subset of my dataframe. However, I would like to make qqnorm plots on subsets that are defined by two factors (one factor has 48 categories (brain_region) and each of those categories can be further subdivided by another factor, which has three levels (GroupID)). I have tried the following:
by(t, t[,"GroupID"], function(x) tapply(t$FA,t$brain_region,qqnorm))
but it does not seem to be working. I'm also not sure if this is the best approach, as I'm new to this program.
I would also like to save each of the separately generated qqnorm plot with the x axis as labeled as "FA" and the title with the specific level of each of the two factors (brain region/GroupID). Thank you very much for any help.

Plotting is one of the few things where apply isn't the optimal solution. ggplot offers you enough possibilities to get this done, as shown in this answer.
Plotting all levels in one go
If you use the base plots, you can better use a for loop for this. Plus, if you want to plot different plots on the same graphics device, you can use eg par(mfrow=) or layout (see the help page ?layout)
Let's take the built-in data set iris as an example:
data(iris)
op <- par(mfrow=c(1,3))
for(i in levels(iris$Species)){
tmp <- with(iris, Petal.Width[Species==i])
qqnorm(tmp,xlab="Petal.Width",main=i)
qqline(tmp)
}
par(op)
rm(i,tmp)
gives :
Don't forget to clean up your workspace after using a for loop. Not really obligatory, but it can prevent serious confusion later on.
Combine two factors
In order to get this done for 2 factor levels at the same time, you can either construct a nested for-loop, or combine both factors into a single factor. Take the dataset mtcars:
data(mtcars)
mtcars$cyl <- factor(mtcars$cyl)
mtcars$am <- factor(mtcars$am,
labels=c('automatic','manual'))
To combine both levels, you can use this simple construct :
mtcars$combined <- factor(paste(mtcars$cyl,mtcars$am,sep='/'))
And then do the same again. With two for loops, your code would like like the code below. Be warned though that this only works if you have data for every combination of the factors, and you don't have too many levels. If you have a lot of levels, you better save the plots by using eg png() (see ?png for info) instead of plotting them all on the same graphics device.
lcyl <- levels(mtcars$cyl)
lam <- levels(mtcars$am)
par(mfrow=c(length(lam),length(lcyl)))
for(i in lam){
for(j in lcyl){
tmp <- with(mtcars,mpg[am==i & cyl==j])
qqnorm(tmp,xlab="Petal.Width",
main=paste(i,j,sep="/"))
qqline(tmp)
}
}
gives :

Plotting distribution of differences in R

I have a dataset with numbers indicating daily difference in some measure.
https://dl.dropbox.com/u/22681355/diff.csv
I would like to create a plot of the distribution of the differences with special emphasis on the rare large changes.
I tried plotting each column using the hist() function but it doesn't really provide a detailed picture of the data.
For example plotting the first column of the dataset produces the following plot:
https://dl.dropbox.com/u/22681355/Rplot.pdf
My problem is that this gives very little detail to the infrequent large deviations.
What is the easiest way to do this?
Also any suggestions on how to summarize this data in a table? For example besides showing the min, max and mean values, would you look at quantiles? Any other ideas?

You could use boxplots to visualize the distribution of the data:
sdiff <- read.csv("https://dl.dropbox.com/u/22681355/diff.csv")
boxplot(sdiff[,-1])
Outliers are printed as circles.

I back #Sven's suggestion for identifying outliers, but you can get more refinement in your histograms by specifying a denser set of breakpoints than what hist chooses by default.
d <- read.csv('https://dl.dropbox.com/u/22681355/diff.csv', header=TRUE, row.names=1)
with(d, hist(a, breaks=seq(min(a), max(a), length.out=100)))

Violin plots could be useful:
df <- read.csv('https://dl.dropbox.com/u/22681355/diff.csv')
library(vioplot)
with(df,vioplot(a,b,c,d,e,f,g,h,i,j))
I would use a boxplot on transformed data, e.g.:
boxplot(df[,-1]/sqrt(abs(df[,-1])))
Obviously a histogram would also look better after transformation.

Producing statistics over levels

I've generated a set of levels from my dataset, and now I want to find a way to sum the rest of the data columns in order to plot it while plotting my first column. Something like:
levelSet <- cut(frame$x1, "cutting")
boxplot(frame$x1~levelSet)
for (l in levelSet)
{
x2Sum<-sum(frame$x2[levelSet==l])
}
or maybe the inside of the loop should look like:
lines(sum(frame$x2[levelSet==l]))
Any thoughts? I am new to R, but I can't seem to get a hang of the indexing and ~ notation thus far.
I know r doesn't work this way, but I'd like functionality that 'looks' like
hist(frame$x2~levelSet)
## Or
hist(frame$x2, breaks = levelSet)

To plot a histograph, boxplot, etc. over a level set:
Try the lattice package:
library(lattice)
histogram(~x2|equal.count(x1),data=frame)
Substitute shingle for equal.count to set your own break points.
ggplot2 would also work nicely for this.
To put a histogram over a boxplot:
par(mfrow=c(2,1))
hist(x2)
boxplot(x2)
You can also use the layout() command to fine-tune the arrangement.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

heatmap with multiple RowSideColors - r

After bit of digging, I found the solution myself that if you specify tmpSorted = tmp[order(rownames(tmp)),] # sorts alphabetical order heatmap.2(tmpSorted, Rowv=F .... ) the option Rowv=F works!

Related

How do I add a legend to my heatmap in R?

How can I stop sapply dropping my barplot titles?

qqnorm plotting for multiple subsets

Plotting distribution of differences in R

Producing statistics over levels

Categories

Resources