Cycling through two lists for all pairwise combinations - r

There are two lists including many matrices:
df <- data.frame(replicate(100,sample(0:100,100,rep=TRUE)))
l.i <- vector("list")
l.j <- vector("list")
for (var in names(df[1:50])) {
l.i[[var]] <- as.matrix(dist(df[var], "euclidean"))
}
for (var in names(df[51:100])) {
l.j[[var]] <- as.matrix(dist(df[var], "euclidean"))
}
I want to compute Mantel tests between all pairwise elements in l.i and l.j (but not within them). I can do e.g.:
library(vegan)
all.i.vs.j1 <- lapply(l.i, function(x) mantel(x, l.j$X51))
all.i.vs.j2 <- lapply(l.i, function(x) mantel(x, l.j$X52))
and this would be indeed my desired output environment, but i would like to wrap this into a for loop or lapply.
Thank you!

We can use Map to apply the function mantel on corresponding elements of 'l.i' and 'l.j'
library(vegan)
out <- Map(mantel, l.i, l.j)
length(out)
#[1] 50
If we need pairwise, then use outer
f1 <- function(x, y) list(mantel(x, y))
out1 <- outer(l.i, l.j, FUN = Vectorize(f1))

Related

Use of dim() in the lapply command

I have a list which contains again multiple lists of matrices of the dimensions 3834 1. So all values are basically stored in one column. Now I want to adjust the dimensions of the single matrices of each sublist, so that the new dimensions are 54 71.
Here is some code to reproduce sample data:
######################### create sample data ###########################
# create empty list
list1 <- list()
# fill the list with arrays/matrices
for (i in 1:10) {
list1[[i]] <- array(sample(1:100, 600, replace=T), dim= c(54*71,1))
}
# create the big list
big_list <- list()
for (i in 1:8) {
big_list[[paste0("list", i)]] <- list1
}
The goal can be achieved by using a for loop:
# adjust the dimensions of the matrices by using for loop
for (i in 1:length(big_list)) {
for (j in 1:length(big_list[[1]])) {
dim(big_list[[i]][[j]]) <- c(54,71)
}
}
I am sure that there is a more elegant way than using five lines for this, most likely by using lapply/apply/tapply etc. But I could not figure out how to place the dim() and c(54,71) properly in the command.
Anybody with a hint?
In R, the code
f(x) <- y
is equivalent to
x <- `f<-`(x, value = y)
With that in mind, you can use (nested) lapply with dim<-:
big_list <- lapply(
big_list,
function (lst) lapply(lst, `dim<-`, c(54L, 71L))
)
… and in principle you can omit the anonymous function — but whether that’s readable is debatable:
big_list <- lapply(big_list, lapply, `dim<-`, c(54L, 71L))
For what its worth, map_depth() from purrr is useful to dig into nested lists.
library(purrr)
map_depth(big_list, 2, matrix, nrow = 54, ncol = 71)
# or
map_depth(big_list, 2, `dim<-`, c(54L, 71L))

Using sapply instead of loop in R

I have a function that requires 4 parameters:
myFun <- function(a,b,c,d){}
I have a matrix where each row contains the parameters:
myMatrix = matrix(c(a1,a2,b1,b2,c1,c2,d1,d2), nrow=2, ncol=4)
Currently I have a loop which feeds the parameters to myFun:
m <- myMatrix
i <- 1
someVector <- c()
while (i<(length(m[,1])+1)){
someVector[i] <-
myFun(m[i,1],m[i,2],m[i,3],m[i,4])
i = i+1
}
print(someVector)
What I would like to know is there a better way to get this same result using sapply instead of a loop.
You can use mapply() here which allows you to give it vectors as arguments, you should turn your matrix into a dataframe.
df <- as.data.frame(myMatrix))
results <- mapply(myFun, df$a, df$b, df$c, df$d)

R: using apply over two data.frames

I want to use apply instead of a for-loop. The problem is, my for-loop uses two data.frames as an input. For example:
x <- data.frame(col1=c(1,NA,3,NA), col2=c(9,NA,11,12))
y <- data.frame(col1=c(1,2,3,4), col2=c(5,6,7,8))
output <- rep(NA,2)
for(i in 1:2)
{
output[i] <- sum(is.na(x[,i]))+sum(y[,i])
}
The result here is, correctly c(12,27).
But if I try function and apply:
test <- function(vector1,vector2) sum(is.na(vector1))+sum(vector2)
apply(x,y,MARGIN=2,FUN=test)
With apply the result is c(38,37).
How can I fix this?
You can use mapply instead of apply:
x <- data.frame(col1=c(1,NA,3,NA), col2=c(9,NA,11,12))
y <- data.frame(col1=c(1,2,3,4), col2=c(5,6,7,8))
test <- function(vector1,vector2) sum(is.na(vector1))+sum(vector2)
mapply(test, x, y)
# col1 col2
# 12 27
?mapply

Clip outliers in columns in df2,3,4... based on quantiles from columns in df.tr

I am trying to replace the "outliers" in each column of a dataframe with Nth percentile.
n <- 1000
set.seed(1234)
df <- data.frame(a=runif(n), b=rnorm(n), c=rpois(n,1))
df.t1 <- as.data.frame(lapply(df, function(x) { q <- quantile(x,.9,names=F); x[x>q] <- q; x }))
I need the computed quantiles to truncate other dataframes. For example, I compute these quantiles on a training dataset and apply it; I want to use those same thresholds in several test datasets. Here's an alternative approach which allows that.
q.df <- sapply(df, function(x) quantile(x,.9,names=F))
df.tmp <- rbind(q.df, df.t1)
df.t2 <- as.data.frame(lapply(df.tmp, function(x) { x[x>x[1]] <- x[1]; x }))
df.t2 <- df.t2[-1,]
rownames(df.t2) <- NULL
identical(df.t1, df.t2)
The dataframes are very large and hence I would prefer not to use rbind, and then delete the row later. Is is possible to truncate the columns in the dataframes using the q.df but without having to rbind? Thx.
So just write a function that directly computes the quantile, then directly applies clipping to each column. The <- conditional assignment inside your lapply call is bogus; you want ifelse to return a vectorized expression for the entire column, already. ifelse is your friend, for vectorization.
# Make up some dummy df2 output (it's supposed to have 1000 cols really)
df2 <- data.frame(d=runif(1000), e=rnorm(1000), f=runif(1000))
require(plyr)
print(colwise(summary)(df2)) # show the summary before we clamp...
# Compute quantiles on df1...
df1 <- df
df1.quantiles <- apply(df1, 2, function(x, prob=0.9) { quantile(x, prob, names=F) })
# ...now clamp by sweeping col-index across both quantile vector, and df2 cols
clamp <- function(x, xmax) { ifelse(x<=xmax, x, xmax) }
for (j in 1:ncol(df2)) {
df2[,j] <- clamp(df2[,j], df1.quantiles[j]) # don't know how to use apply(...,2,)
}
print(colwise(summary)(df2)) # show the summary after we clamp...
Reference:
[1] "Clip values between a minimum and maximum allowed value in R"

looping through two variables in lists

I have a list of lists with distance matrices:
obs <- list(AA=list(A=dist(runif(100)),
B=dist(runif(100)),
C=dist(runif(100))),
BB=list(A=dist(runif(100)),
B=dist(runif(100)),
C=dist(runif(100))))
obs <- lapply(obs, function(x)
lapply(x, function(x) as.data.frame(as.matrix(x))))
And another one with however only one hierarchy:
distances <- lapply(list(A=rnorm(100),B=rnorm(100),C=rnorm(100)), function(x)
as.data.frame(as.matrix(dist(x, "euclidean"))))
I would like to compare all the matrices in the 2nd level of obs and in the first level of distances, if their names match (obs[i]$A with distances$A, B with B, C with C; never all combinations!). After trying to run sapply within lapply, which failed, i came to for loops, during which i store results and extract some values from them:
coef <- pvals <- res <- vector("list", length(names(obs)))
library(vegan)
for(i in (names(obs)){
res[[i]]$A <- mantel(obs[[i]]$A, distances$A, "spearman", perm=999)
#tmp <- res[[i]]$A
#coef[i]$A <- tmp[i]$statistic
#pvals[i]$A <- tmp[i]$signic
}
I loop through the first level of obs and fix the second level, and proceed this for obs[[i]]$A and B (not pasting the # lines from above again to save space):
for(i in (names(obs)){
res[[i]]$B <- mantel(obs[[i]]$B, distances$B, "spearman", perm=999)
...
}
for(i in (names(obs)){
res[[i]]$C <- mantel(obs[[i]]$C, distances$C, "spearman", perm=999)
...
}
The question is now to loop through the second level (obs[i]$A,B,C) as well, while also pointing to the correct matrix in distances. Would it be better to put the three loops above into one parental loop (through obs[i][j]) or is there a way to use lapply? Thank you!
If I understood your question correctly, I would do something like this:
for(i in seq_along(obs)) { # 1st level of obs
for (j in names(obs[[i]])) { # 2nd level of obs, 1st level of distances
res[[i]][[j]] <- mantel(obs[[i]][[j]], distances[[j]], "spearman", perm=999)
}
}

Resources