Heatmap function in R dendrogram failure - r

For the life of me I cannot understand why this method is failing, I would really appreciate an additional set of eyes here:
heatmap.2(TEST,trace="none",density="none",scale="row",
ColSideColors=c("red","blue")[data.test.factors],
col=redgreen,labRow="",
hclustfun=function(x) hclust(x,method="complete"),
distfun=function(x) as.dist((1 - cor(x))/2))
The error that I get is:
row dendrogram ordering gave index of wrong length
If I don't include the distfun, everything works really well and is responsive to the hclust function. Any advice would be greatly appreicated.

The standard call to dist computes the distance between the rows of the matrix provided, cor computes the correlation between columns of the provided matrix, so the above example to work, you need to transpose the matrix:
heatmap.2(TEST,trace="none",density="none",scale="row",
ColSideColors=c("red","blue")[data.test.factors],
col=redgreen,labRow="",
hclustfun=function(x) hclust(x,method="complete"),
distfun=function(x) as.dist((1 - cor( t(x) ))/2))
should work. If you use a square matrix, you'll get code that works, but it won't be calculating what you think it is.

This is not reproducible yet ...
TEST <- matrix(runif(100),nrow=10)
heatmap.2(TEST, trace="none", density="none",
scale="row",
labRow="",
hclust=function(x) hclust(x,method="complete"),
distfun=function(x) as.dist((1-cor(x))/2))
works for me. I don't know what redgreen or data.test.factors are.
Have you tried debug(heatmap.2) or options(error=recover) (or traceback(), although it's unlikely to be useful on its own) to try to track down the precise location of the error?
> sessionInfo()
R version 2.13.0 alpha (2011-03-18 r54865)
Platform: i686-pc-linux-gnu (32-bit)
...
other attached packages:
[1] gplots_2.8.0 caTools_1.12 bitops_1.0-4.1 gdata_2.8.2 gtools_2.6.2

Building on Ben Bolker's reply, your code seems to work if TEST is an n×n matrix and data.test.factors is a vector of n integers. So for example starting with
n1 <- 5
n2 <- 5
n3 <- 5
TEST <- matrix(runif(n1*n2), nrow=n1)
data.test.factors <- sample(n3)
then your code will work. However if n1 and n2 are different then you will get the error row dendrogram ordering gave index of wrong length, while if they are the same but n3 is different or data.test.factors has non-integers then you will get the error 'ColSideColors' must be a character vector of length ncol(x).

Related

Having problem with having a fraction result

I was trying to solve a basic matrix problem. .
I used :
A<- matrix(c(2,7,5,7), 2,2)
b<- c(8,12)
solve(A,b, fractions = TRUE)
However, my result only gives me results in decimal places. How can get fractions results?
I also want to plot this equation above.
I used:
plotEqn(A,b)
However, it tells me this equation can't be found. Can I have some advice please?
Thank you very much!!!
For your first question,
MASS::fractions(solve(A,b))
gives {4/21, 32/21} (note that you won't always be guaranteed the correct answer, as R does floating-point calculation unlike e.g. Mathematica)
For your second question, it looks like the plotEqn() function is in the matlib package: if you have that package installed, then either first loading the package (with library("matlib")) or matlib::plotEqn(A,b) should work.
On closer inspection it looks like you want matlib::Solve() for the first question (note that R is case-sensitive, so solve and Solve are different):
library(matlib)
Solve(A,b, fraction=TRUE)
## x1 = 4/21
## x2 = 32/21

cummerbund, scatterplots and a statistical value

I'm working in cummerbund with cuffdiff files from a RNA-Seq analysis. I made a scatterplot with two conditions, but I'd like to see de correlation value of my data. Is it possible? Is there a command to do this? Any idea? Thanks!!
I searched decorrelation and find nothing significant I am guessing you mean the correlation.
You are looking for cor function. Just type ?cor into r and you will get the info. Here is an example.
> cor(1:5,1:5)
[1] 1
> cor(1:5,5:1)
[1] -1
This is taken from the seqanswers thread about the same thing.
Suppose your working directory contains a directory of your cuffdiff output, say cuffdiff_out. Then you can run this to find the correlation of the FPKM values.
cuff_data <- readCufflinks("cuffdiff_out/")
m <- fpkmMatrix(genes(cuff_data))
cor(m[, 1], m[, 2])

Output of parApply different from my input

I am still quite new to r (used to program in Matlab) and I am trying use the parallel package to speed up some calculations. Below is an example which I am trying to calculate the rolling standard deviation of a matrix (by column) with the use of zoo package, with and without parallelising the codes. However, the shape of the outputs came out to be different.
# load library
library('zoo')
library('parallel')
library('snow')
# Data
z <- matrix(runif(1000000,0,1),100,1000)
#This is what I want to calculate with timing
system.time(zz <- rollapply(z,10,sd,by.column=T, fill=NA))
# Trying to achieve the same output with parallel computing
cl<-makeSOCKcluster(4)
clusterEvalQ(cl, library(zoo))
system.time(yy <-parCapply(cl,z,function(x) rollapplyr(x,10,sd,fill=NA)))
stopCluster(cl)
My first output zz has the same dimensions as input z, whereas output yy is a vector rather than a matrix. I understand that I can do something like matrix(yy,nrow(z),ncol(z)) however I would like to know if I have done something wrong or if there is a better way of coding to improve this. Thank you.
From the documentation:
parRapply and parCapply always return a vector. If FUN always returns
a scalar result this will be of length the number of rows or columns:
otherwise it will be the concatenation of the returned values.
And:
parRapply and parCapply are parallel row and column apply functions
for a matrix x; they may be slightly more efficient than parApply but
do less post-processing of the result.
So, I'd suggest you use parApply.

R psych package - Sort correlation matrix after PCA using mat.sort()

After principal component analysis I am trying to get the variables of my original correlation matrix to be sorted in the same way as in the (sorted) loadings matrix (as displayed with print.psych). I have used the following functions from the psych package, but I can't get the two to align.
pc <- principal(myCorrMatrix$correlations, nfactors=15, n.obs=49, rotate="oblimin")
print.psych(pc, cut=0.3, sort=TRUE)
sortedPC <- fa.sort(pc)
sortedMatrix <- mat.sort(myCorrMatrix$correlations, sortedPC)
I am not 100% sure what the second parameter of mat.sort should be, but i tried several of the elements of sortedPC to no avail. Any pointers would be much appreciated!
Don't pass fa.sort() to mat.sort(). mat.sort() expects an fa object, which has the $loadings in it. Note that principal() also gives the $loadings in the resulting object. I think this should work:
sortedMatrix <- mat.sort(myCorrMatrix$correlations, pc)
Here's an example:
data(Bechtoldt.1)
sorted <- mat.sort(Bechtoldt.1,principal(Bechtoldt.1,5))
cor.plot(sorted)

Matrix Math in R on Large Datasets

I've got a big square matrix, which I've taken the first row for testing purposes....
so the initial matrix is 1x63000, which is pretty big. Every time i try to multiply it by itself, using
a %*% b
Every time I do this, I get...
Error in fooB %*% fooB : non-conformable arguments
However, this works with smaller matrices. Are there any packages for handling mathematical functions of large matrices? or is there a trick I'm missing somewhere?
cheers
You are looking for the crossproduct, i.e. a %*% t(a) and there is a base R function for this. Try:
crossprod(a)

Resources