looping through two variables in lists - r

I have a list of lists with distance matrices:
obs <- list(AA=list(A=dist(runif(100)),
B=dist(runif(100)),
C=dist(runif(100))),
BB=list(A=dist(runif(100)),
B=dist(runif(100)),
C=dist(runif(100))))
obs <- lapply(obs, function(x)
lapply(x, function(x) as.data.frame(as.matrix(x))))
And another one with however only one hierarchy:
distances <- lapply(list(A=rnorm(100),B=rnorm(100),C=rnorm(100)), function(x)
as.data.frame(as.matrix(dist(x, "euclidean"))))
I would like to compare all the matrices in the 2nd level of obs and in the first level of distances, if their names match (obs[i]$A with distances$A, B with B, C with C; never all combinations!). After trying to run sapply within lapply, which failed, i came to for loops, during which i store results and extract some values from them:
coef <- pvals <- res <- vector("list", length(names(obs)))
library(vegan)
for(i in (names(obs)){
res[[i]]$A <- mantel(obs[[i]]$A, distances$A, "spearman", perm=999)
#tmp <- res[[i]]$A
#coef[i]$A <- tmp[i]$statistic
#pvals[i]$A <- tmp[i]$signic
}
I loop through the first level of obs and fix the second level, and proceed this for obs[[i]]$A and B (not pasting the # lines from above again to save space):
for(i in (names(obs)){
res[[i]]$B <- mantel(obs[[i]]$B, distances$B, "spearman", perm=999)
...
}
for(i in (names(obs)){
res[[i]]$C <- mantel(obs[[i]]$C, distances$C, "spearman", perm=999)
...
}
The question is now to loop through the second level (obs[i]$A,B,C) as well, while also pointing to the correct matrix in distances. Would it be better to put the three loops above into one parental loop (through obs[i][j]) or is there a way to use lapply? Thank you!

If I understood your question correctly, I would do something like this:
for(i in seq_along(obs)) { # 1st level of obs
for (j in names(obs[[i]])) { # 2nd level of obs, 1st level of distances
res[[i]][[j]] <- mantel(obs[[i]][[j]], distances[[j]], "spearman", perm=999)
}
}

Related

T test for two lists of matrices

I have two lists of matrices (list A and list B, each matrix is of dimension 14x14, and list A contains 10 matrices and list B 11)
and I would like to do a t test for each coordinate to compare the means of each coordinate of group A and group B.
As a result I would like to have a matrix of dimension 14x14 which contains the p value associated with each t test.
Thank you in advance for your answers.
Here's a method using a for loop and then applying the lm() function.
First we'll generate some fake data as described in the question.
#generating fake matrices described by OP
listA <- vector(mode = "list", length = 10)
listB <- vector(mode = "list", length = 10)
for (i in 1:10) {
listA[[i]] <- matrix(rnorm(196),14,14,byrow = TRUE)
}
for (i in 1:11) {
listB[[i]] <- matrix(rnorm(196),14,14,byrow = TRUE)
}
Then we'll unwrap each matrix as described by dcarlson in a for loop.
Unwrapped.Mats <- NULL
for (ID in 1:10) {
unwrapped <- as.vector(as.matrix(listA[[ID]])) #Unwrapping each matrix into a vector
withID <- c(ID, "GroupA", unwrapped) #labeling it with ID# and which group it belongs to
UnwrappedCorMats <- rbind(Unwrapped.Mats, withID)
}
for (ID in 1:11) {
unwrapped <- as.vector(as.matrix(listB[[ID]]))
withID <- c(PID, "GroupB", unwrapped)
UnwrappedCorMats <- rbind(UnwrappedCorMats, withID)
}
Then write and apply a function to run lm(). lm() is statistically equivalent to an unpaired t-test in this context but I'm using it to be more easily adapted into a mixed effect model if anyone wants to add mixed effects.
UnwrappedDF <- as.data.frame(UnwrappedCorMats)
lmPixel2Pixel <- function(i) { #defining function to run lm
lmoutput <- summary(lm(i ~ V2, data= UnwrappedDF))
lmoutputJustP <- lmoutput$coefficients[2,4] #Comment out this line to return full lm output rather than just p value
}
Vector_pvals <- sapply(UnwrappedDF[3:length(UnwrappedDF)], lmPixel2Pixel)
Finally we will reform the vector into the same shape as the original matrix for more easily interpreting results
LM_mat.again <- as.data.frame(matrix(Vector_pvals, nrow = nrow(listA[[1]]), ncol = ncol(listA[[1]]), byrow = T))
colnames(LM_mat.again) <- colnames(listA[[1]]) #if your matrix has row or column names then restoring them is helpful for interpretation
rownames(LM_mat.again) <- colnames(listB[[1]])
head(LM_mat.again)
I'm sure there are faster methods but this one is pretty straight forward and I was surprised there weren't answers for this type of operation yet

nested loop in r to correlate columns of df1 to columns of df2

I have two datasets with abundance data from groups of different species. Columns are species and rows are sites. The sites (rows) are identical between the two datasets and what i am trying to do is to correlate the columns of the first dataset to the columns of the second dataset in order to see if there is a positive or a negative correlation.
library(Hmisc)
rcorr(otu.table.filter$sp1,new6$spA, type="spearman"))$P
rcorr(otu.table.filter$sp1,new6$spA, type="spearman"))$r
the first will give me the p value of the relation between sp1 and spA and the second the r value
I initially created a loop that allowed me to check all species of the first dataframe with a single column of the second dataframe. Needless to say if I was to make this work I would have to repeat the process a few hundred times.
My simple loop for one column of df1(new6) against all columns of df2(otu.table.filter)
pvalues = list()
for(i in 1:ncol(otu.table.filter)) {
pvalues[[i]] <-(rcorr(otu.table.filter[ , i], new6$Total, type="spearman"))$P
}
rvalues = list()
for(i in 1:ncol(otu.table.filter)) {
rvalues[[i]] <-(rcorr(otu.table.filter[ , i], new6$Total, type="spearman"))$r
}
p<-NULL
for(i in 1:length(pvalues)){
tmp <-print(pvalues[[i]][2])
p <- rbind(p, tmp)
}
r<-NULL
for(i in 1:length(rvalues)){
tmp <-print(rvalues[[i]][2])
r <- rbind(r, tmp)
}
fdr<-as.matrix(p.adjust(p, method = "fdr", n = length(p)))
sprman<-cbind(r,p,fdr)
and using the above as a starting point I tried to create a nested loop that each time would examine a column of df1 vs all columns of df2 and then it would proceed to the second column of df1 against all columns of df2 etc etc
but here i am a bit lost and i could not find an answer for a solution in r
I would assume that the pvalues output should be a list of
pvalues[[i]][[j]]
and similarly the rvalues output
rvalues[[i]][[j]]
but I am a bit lost and I dont know how to do that as I tried
pvalues = list()
rvalues = list()
for (j in 1:7){
for(i in 1:ncol(otu.table.filter)) {
pvalues[[i]][[j]] <-(rcorr(otu.table.filter[ , i], new7[,j], type="spearman"))$P
}
for(i in 1:ncol(otu.table.filter)) {
rvalues[[i]][[j]] <-(rcorr(otu.table.filter[ , i], new7[,j], type="spearman"))$r
}
}
but I cannot make it work cause I am not sure how to direct the output in the lists and then i would also appreciate if someone could help me with the next part which would be to extract for each comparison the p and r value and apply the fdr function (similar to what i did with my simple loop)
here is a subset of my two dataframes
Here a small demo. Let's assume two matrices x and y with a sample size n. Then correlation and approximate p-values can be estimated as:
n <- 100
x <- matrix(rnorm(10 * n), nrow = n)
y <- matrix(rnorm(5 * n), nrow = n)
## correlation matrix
r <- cor(x, y, method = "spearman")
## p-values
pval <- function(r, n) 2 * (1 - pt(abs(r)/sqrt((1 - r^2)/(n - 2)), n - 2))
pval(r, n)
## for comparison
cor.test(x[,1], y[,1], method = "spearman", exact = FALSE)
More details can be found here: https://stats.stackexchange.com/questions/312216/spearman-correlation-significancy-test
Edit
And finally a loop with cor.test:
## for comparison
p <- matrix(NA, nrow = ncol(x), ncol=ncol(y))
for (i in 1:ncol(x)) {
for (j in 1:ncol(y)) {
p[i, j] <- cor.test(x[,i], y[,j], method = "spearman")$p.value
}
}
p
The values differ a somewhat, because the first uses the t-approximation then the second the "exact AS 89 algorithm" of cor.test.

Cycling through two lists for all pairwise combinations

There are two lists including many matrices:
df <- data.frame(replicate(100,sample(0:100,100,rep=TRUE)))
l.i <- vector("list")
l.j <- vector("list")
for (var in names(df[1:50])) {
l.i[[var]] <- as.matrix(dist(df[var], "euclidean"))
}
for (var in names(df[51:100])) {
l.j[[var]] <- as.matrix(dist(df[var], "euclidean"))
}
I want to compute Mantel tests between all pairwise elements in l.i and l.j (but not within them). I can do e.g.:
library(vegan)
all.i.vs.j1 <- lapply(l.i, function(x) mantel(x, l.j$X51))
all.i.vs.j2 <- lapply(l.i, function(x) mantel(x, l.j$X52))
and this would be indeed my desired output environment, but i would like to wrap this into a for loop or lapply.
Thank you!
We can use Map to apply the function mantel on corresponding elements of 'l.i' and 'l.j'
library(vegan)
out <- Map(mantel, l.i, l.j)
length(out)
#[1] 50
If we need pairwise, then use outer
f1 <- function(x, y) list(mantel(x, y))
out1 <- outer(l.i, l.j, FUN = Vectorize(f1))

How to store data from for loop inside of for loop? (rolling correlation in r)

require(quantmod)
require(TTR)
iris2 <- iris[1:4]
b=NULL
for (i in 1:ncol(iris2)){
for (j in 1:ncol(iris2)){
a<- runCor(iris2[,i],iris2[,j],n=21)
b<-cbind(b,a)}}
I want to calculate a rolling correlation of different columns within a dataframe and store the data separately by a column. Although the code above stores the data into variable b, it is not as useful as it is just dumping all the results. What I would like is to be able to create different dataframe for each i.
In this case, as I have 4 columns, what I would ultimately want are 4 dataframes, each containing 4 columns showing rolling correlations, i.e. df1 = corr of col 1 vs col 1,2,3,4, df2 = corr of col 2 vs col 1,2,3,4...etc)
I thought of using lapply or rollapply, but ran into the same problem.
d=NULL
for (i in 1:ncol(iris2))
for (j in 1:ncol(iris2))
{c<-rollapply(iris2, 21 ,function(x) cor(x[,i],x[,j]), by.column=FALSE)
d<-cbind(d,c)}
Would really appreciate any inputs.
If you want to keep the expanded loop, how about a list of dataframes?
e <- list(length = length(ncol(iris2)))
for (i in 1:ncol(iris2)) {
d <- matrix(0, nrow = length(iris2[,1]), ncol = length(iris2[1,]))
for (j in 1:ncol(iris2)) {
d[,j]<- runCor(iris2[,i],iris2[,j],n=21)
}
e[[i]] <- d
}
It's also a good idea to allocate the amount of space you want with placeholders and put items into that space rather than use rbind or cbind.
Although it is not a good practice to create dataframes on the fly in R (you should prefer putting them in a list as in other answer), the way to do so is to use the assign and get functions.
for (i in 1:ncol(iris2)) {
for (j in 1:ncol(iris2)){
c <- runCor(iris2[,i],iris2[,j],n=21)
# Assign 'c' to the name df1, df2...
assign(paste0("df", i), c)
}
}
# to have access to the dataframe:
get("df1")
# or inside a loop
get(paste0("df", i))
Since you stated your computation was slow, I wanted to provide you with a parallel solution. If you have a modern computer, it probably has 2 cores, if not 4 (or more!). You can easily check this via:
require(parallel) # for parallelization
detectCores()
Now the code:
require(quantmod)
require(TTR)
iris2 <- iris[,1:4]
Parallelization requires the functions and variables be placed into a special environment that is created and destroyed with each process. That means a wrapper function must be created to define the variables and functions.
wrapper <- function(data, n) {
# variables placed into environment
force(data)
force(n)
# functions placed into environment
# same inner loop written in earlier answer
runcor <- function(data, n, i) {
d <- matrix(0, nrow = length(data[,1]), ncol = length(data[1,]))
for (j in 1:ncol(data)) {
d[,i] <- TTR::runCor(data[,i], data[,j], n = n)
}
return(d)
}
# call function to loop over iterator i
worker <- function(i) {
runcor(data, n, i)
}
return(worker)
}
Now create a cluster on your local computer. This allows the multiple cores to run separately.
parallelcluster <- makeCluster(parallel::detectCores())
models <- parallel::parLapply(parallelcluster, 1:ncol(iris2),
wrapper(data = iris2, n = 21))
stopCluster(parallelcluster)
Stop and close the cluster when finished.

statistical moments in R

I've got a data set in R of a variable, repeated 10,000 times and sampled 200 times on each repeat so a 10,000 by 200 matrix, I would like to calculate statistical moments for the variable up to an arbitrary number. So in the end I would like a numeric vector holding the value of moments.
I can get the variance and the mean for the data set using colMean and colVar, but they only go so far.
I am also aware of the moments package in R, however using the all.moments command is returning me moments for each time course, or treating each column or row as an individual variable, not what I want.
Does anyone know an equivalent to colMean and colVar for higher order moments? And if possible also for cross moments?
Many thanks!
I stole this code from an obscure R package e1071:
theskew<- function (x) {
x<-as.vector(x)
sum((x-mean(x))^3)/(length(x)*sd(x)^3)
}
thekurt <- function (x) {
x<-as.vector(x)
sum((x-mean(x))^4)/(length(x)*var(x)^2) - 3
}
You can fold that into your code by feeding them one column at a time
Okay did this yesterday for posterity here is a loop that will do what I asked.
Provided your data is a time course of a variable you are measuring, and you want the moments of that variable:
rm(list=ls())
yourdata<-read.table("whereveryourdatais/and/variableyouwant")
yourdata<-t(yourdata) #only do this at your own discretion
mu<-colMeans(yourdata,1:ncol(yourdata))
NumMoments <- 5
rawmoments <- matrix(NA, nrow=NumMoments, ncol=ncol(yourdata))
for(i in 1:NumMoments) {
rawmoments[i, ] <- colMeans(yourdata^i)
}
plot(rawmoments[1,])
holder<-matrix(NA,nrow=nrow(yourdata),ncol=ncol(yourdata))
middles<-matrix(NA,nrow=1,ncol=ncol(yourdata))
for(j in 1:nrow(yourdata)){
for(o in 1:ncol(rawmoments)){
middles[o]<-yourdata[j,o]-rawmoments[1,o]
}
holder[j,] <- middles
}
centmoments<-matrix(NA,nrow=NumMoments,ncol=ncol(yourdata))
for(i in 1:NumMoments){
centmoments[i,]<-colMeans(holder^i)
}
Then centmoments has the centralmoments and rawmoments has the raw moments, you can specify how many moments to take by changing the value of NumMoments.
Note that the first row in "centmoments" will be approximately 0.
Is this what you're looking for?
X <- matrix(1:12, 3, 4) # your data
NumMoments <- 5
moments <- matrix(NA, nrow=NumMoments, ncol=ncol(X))
for(i in 1:NumMoments) {
moments[i, ] <- colMeans(X^i)
}
EDIT:
okay, apparently you want "central moments"
X <- matrix(1:12, 3, 4)
NumMoments <- 5
moments <- matrix(NA, nrow=NumMoments, ncol=ncol(X))
Y <- X
for(i in 1:ncol(X)) {
Y[, i] <- Y[, i] - moments[1, i]
}
for(i in 2:NumMoments) {
moments[i, ] <- colMeans(Y^i)
}

Resources