I'm working in cummerbund with cuffdiff files from a RNA-Seq analysis. I made a scatterplot with two conditions, but I'd like to see de correlation value of my data. Is it possible? Is there a command to do this? Any idea? Thanks!!
I searched decorrelation and find nothing significant I am guessing you mean the correlation.
You are looking for cor function. Just type ?cor into r and you will get the info. Here is an example.
> cor(1:5,1:5)
[1] 1
> cor(1:5,5:1)
[1] -1
This is taken from the seqanswers thread about the same thing.
Suppose your working directory contains a directory of your cuffdiff output, say cuffdiff_out. Then you can run this to find the correlation of the FPKM values.
cuff_data <- readCufflinks("cuffdiff_out/")
m <- fpkmMatrix(genes(cuff_data))
cor(m[, 1], m[, 2])
Related
I am a second year M.Sc student and I am running into a bit of a snag running my statistics.
I am trying to run a contingency table and Fishers test and I keep getting an error.
Error in fisher.test(GAL4UAS) : if 'x' is not a matrix, 'y' must be given
If anyone can see what I have done wrong/may be missing I would really appreciate it?
This is the code:
setwd("/Users/Pria/Desktop/Data Analysis/")
GAL4UAS <-- data.frame(Yes=c(20,21,19),No=c(10,9,11))
GAL4UAS <- lapply(GAL4UAS, abs)
fisher.test(GAL4UAS)
fisher.test(GAL4UAS[c(1,2)])
fisher.test(GAL4UAS[c(1,3)])
fisher.test() is anticipating a matrix as an input and not a data frame. Try putting your data into a matrix. One option among several would be:
m <- matrix(c(20,21,19,10,9,11),nrow = 3,ncol=2,byrow=FALSE)
fisher.test(m)
When you apply the abs() using lapply the output is a list and not a data.frame. The apply function returns the output in a matrix format which is expected in the fisher.test(). So maybe you can try this:
GAL4UAS <- data.frame(Yes=c(20,21,19),No=c(10,9,11))
GAL4UAS <- apply(GAL4UAS, abs, MARGIN=c(1,2))
fisher.test(GAL4UAS)
From what I've seen, R cannot very easily produce usable output for large correlation matrices (50-100 variables). For instance, "corr.test" or "cor" output is horrendously wrapped (each variable should have only one row and one column, but this is certainly not the case) and does not copy well into Excel for later examination. Is there a way to produce SPSS-like correlation output in R? Namely, correlation matrices that can be copied and pasted easily into something like Excel, where each row and each column pertains to one variable (no wrapping of text), and ideally, sample-sizes and significance values are somehow available. Corr.test provides this information, albeit in an inconvenient format, and when variables exceed output viewer space in R, the output is basically unreadable. Any thoughts would be greatly, greatly appreciated, as I'm frequently working with many variables at once.
Is there anything wrong with
z <- matrix(rnorm(10000),100)
write.csv(cor(z),file="cortmp.csv")
? View(cor(z)) works for me, although I don't know if it's copy-and-pasteable.
For psych::corr.test
dimnames(z) <- list(1:100,1:100)
z[1,2] <- NA ## unbalance to induce sample size matrix
ct <- psych::corr.test(z)
write.csv(ct$n,file="ntmp.csv") ## sample sizes
write.csv(ct$t,file="ttmp.csv") ## t statistics
write.csv(ct$p,file="ptmp.csv") ## p-values
et cetera. (See str(ct).)
R's paradigm is that if you want to transfer information to another program you're going to output it to a file rather than copying and pasting it from the console ...
After principal component analysis I am trying to get the variables of my original correlation matrix to be sorted in the same way as in the (sorted) loadings matrix (as displayed with print.psych). I have used the following functions from the psych package, but I can't get the two to align.
pc <- principal(myCorrMatrix$correlations, nfactors=15, n.obs=49, rotate="oblimin")
print.psych(pc, cut=0.3, sort=TRUE)
sortedPC <- fa.sort(pc)
sortedMatrix <- mat.sort(myCorrMatrix$correlations, sortedPC)
I am not 100% sure what the second parameter of mat.sort should be, but i tried several of the elements of sortedPC to no avail. Any pointers would be much appreciated!
Don't pass fa.sort() to mat.sort(). mat.sort() expects an fa object, which has the $loadings in it. Note that principal() also gives the $loadings in the resulting object. I think this should work:
sortedMatrix <- mat.sort(myCorrMatrix$correlations, pc)
Here's an example:
data(Bechtoldt.1)
sorted <- mat.sort(Bechtoldt.1,principal(Bechtoldt.1,5))
cor.plot(sorted)
I have created a correlation matrix with an external program (SparCC). I have calculated p-values from the same data in SparCC as well and I end up with two objects which I imported into R, let's call them corr and pval and
> ncol(corr)==nrow(corr)
[1] TRUE
> ncol(pval)==nrow(pval)
[1] TRUE
and
> colnames(corr)==rownames(pval)
[1] TRUE ...
and the same the other way around.
Since the matrices (or should I be using data.frame?) are fairly large (about 1000 items), I would like to extract the significant correlations from the corr matrix by looking up their p-value in the pval matrix, I have looked into doing something with apply:
extracted.values <- apply(corr, nrows(corr), which(pval<0.1))
But since the part with which isn't really a function, it will output and error.
Since the which command output a list of the position in the pval matrix, I'm a bit at loss as to how to retrieve the colnames and rownames for each desired items.
Is there an easier way of doing what I want, like creating a correlation object from scratch in R (is this at all possible?) which contains both corr and pval matrices and extracting the significant values? I have found this solution in Python, but a solution with R would be much appreciated if it's less complicated than what I think it is.
thanks for any help!
edit: the python example doesn't keep headers.
You can simply do
corr[pval < 0.1]
For the life of me I cannot understand why this method is failing, I would really appreciate an additional set of eyes here:
heatmap.2(TEST,trace="none",density="none",scale="row",
ColSideColors=c("red","blue")[data.test.factors],
col=redgreen,labRow="",
hclustfun=function(x) hclust(x,method="complete"),
distfun=function(x) as.dist((1 - cor(x))/2))
The error that I get is:
row dendrogram ordering gave index of wrong length
If I don't include the distfun, everything works really well and is responsive to the hclust function. Any advice would be greatly appreicated.
The standard call to dist computes the distance between the rows of the matrix provided, cor computes the correlation between columns of the provided matrix, so the above example to work, you need to transpose the matrix:
heatmap.2(TEST,trace="none",density="none",scale="row",
ColSideColors=c("red","blue")[data.test.factors],
col=redgreen,labRow="",
hclustfun=function(x) hclust(x,method="complete"),
distfun=function(x) as.dist((1 - cor( t(x) ))/2))
should work. If you use a square matrix, you'll get code that works, but it won't be calculating what you think it is.
This is not reproducible yet ...
TEST <- matrix(runif(100),nrow=10)
heatmap.2(TEST, trace="none", density="none",
scale="row",
labRow="",
hclust=function(x) hclust(x,method="complete"),
distfun=function(x) as.dist((1-cor(x))/2))
works for me. I don't know what redgreen or data.test.factors are.
Have you tried debug(heatmap.2) or options(error=recover) (or traceback(), although it's unlikely to be useful on its own) to try to track down the precise location of the error?
> sessionInfo()
R version 2.13.0 alpha (2011-03-18 r54865)
Platform: i686-pc-linux-gnu (32-bit)
...
other attached packages:
[1] gplots_2.8.0 caTools_1.12 bitops_1.0-4.1 gdata_2.8.2 gtools_2.6.2
Building on Ben Bolker's reply, your code seems to work if TEST is an n×n matrix and data.test.factors is a vector of n integers. So for example starting with
n1 <- 5
n2 <- 5
n3 <- 5
TEST <- matrix(runif(n1*n2), nrow=n1)
data.test.factors <- sample(n3)
then your code will work. However if n1 and n2 are different then you will get the error row dendrogram ordering gave index of wrong length, while if they are the same but n3 is different or data.test.factors has non-integers then you will get the error 'ColSideColors' must be a character vector of length ncol(x).