Using na.rm=T in pmin with do.call - r

I want to extract the minimum value of each element of several matrix that are stored inside a list. I'm using pmin:
do.call(pmin, mylist)
The problem is that some elements of those matrix are NAs, and pmin yields a NA where I want it to yield the minimum value after excluding the NAs. I tried to solve my problem using
do.call(pmin(na.rm=T), mylist)
but I get an error. I also tried with this answer: data.table and pmin with na.rm=TRUE argument, but I get the error because .SD is not on the environment.
Simple code for a similar problem would be:
mymat1 <- matrix(rnorm(10), ncol=2)
mymat2 <- matrix(rnorm(10), ncol=2)
mymat2[2,2] <- NA
mymat3 <- matrix(rnorm(10), ncol=2)
mylist <- list(mymat1, mymat2, mymat3)
do.call(pmin, mylist)
I get a NA in the position [2,2] of the resulting matrix, and I want to get the minimum values ignoring NAs.
Any suggestions?
Thank you.

Concatenate the na.rm = TRUE as a named list element and then use pmin with do.call so that the parameter na.rm will be found
do.call(pmin, c(mylist, list(na.rm = TRUE)))
# [,1] [,2]
#[1,] -1.0830716 -0.1237099
#[2,] -0.5949517 -3.7873790
#[3,] -2.1003236 -1.2565663
#[4,] -0.4500171 -1.0588205
#[5,] -1.0937602 -1.0537657

if you use purrr / tidyverse you can use purrr::invoke.
library(purrr)
invoke(pmin,mylist,na.rm=TRUE)
# [,1] [,2]
# [1,] -0.3053884 -1.3770596
# [2,] 0.9189774 -0.4149946
# [3,] -0.1027877 -0.3942900
# [4,] -0.6212406 -1.4707524
# [5,] -2.2146999 -0.4781501
It is basically do.call with a ... argument and its source code is more or less #akrun's answer :
function (.f, .x = NULL, ..., .env = NULL)
{
.env <- .env %||% parent.frame()
args <- c(as.list(.x), list(...))
do.call(.f, args, envir = .env)
}
purrr::partial is also interesting :
pmin2 <- partial(pmin,na.rm=TRUE)
do.call(pmin2,mylist)
# [,1] [,2]
# [1,] -0.3053884 -1.3770596
# [2,] 0.9189774 -0.4149946
# [3,] -0.1027877 -0.3942900
# [4,] -0.6212406 -1.4707524
# [5,] -2.2146999 -0.4781501

Related

Extract elements from matrix diagonal saved in multiple lists in R

I´m trying to get different elements from multiple diagonal saved as lists. My data looks something like this:
res <- list()
res[[1]] <- matrix(c(0.04770856,0.02854005,0.02854005,0.03260190), nrow=2, ncol=2)
res[[2]] <- matrix(c(0.05436957,0.04887182,0.04887182, 0.10484454), nrow=2, ncol=2)
> res
[[1]]
[,1] [,2]
[1,] 0.04770856 0.02854005
[2,] 0.02854005 0.03260190
[[2]]
[,1] [,2]
[1,] 0.05436957 0.04887182
[2,] 0.04887182 0.10484454
> diag(res[[1]])
[1] 0.04770856 0.03260190
> diag(res[[2]])
[1] 0.05436957 0.10484454
I would like to save the first and second elements of each diagonal of a given list into a vector similar to this:
d.1st.el <- c(0.04770856, 0.05436957)
d.2nd.el <- c(0.03260190, 0.10484454)
My issue is to write the function that runs for all given lists and get the diagonals. For some reason, when I use unlist() to extract the values of each matrix for a given level, it doesn't get me the number but the full matrix.
Does anyone have a simple solution?
sapply(res, diag)
[,1] [,2]
[1,] 0.04770856 0.05436957
[2,] 0.03260190 0.10484454
# or
lapply(res, diag)
[[1]]
[1] 0.04770856 0.03260190
[[2]]
[1] 0.05436957 0.10484454
If you want the vectors for some reason in your global environment:
alld <- lapply(res, diag)
names(alld) <- sprintf("d.%d.el", 1:length(alld))
list2env(alld, globalenv())
In two steps you can do:
# Step 1 - Get the diagonals
all_diags <- sapply(res, function(x) diag(t(x)))
print(all_diags)
[,1] [,2]
[1,] 0.04770856 0.05436957
[2,] 0.03260190 0.10484454
# Step 2 - Append to vectors
d.1st.el <- all_diags[1,]
d.2nd.el <- all_diags[2,]

Extract the minimum matrix of a set of matrix inside a list in r

I have a list with N matrix in R:
mylist <- list(a=matrix(rnorm(1:10), ncol=2), b=matrix(rnorm(1:10), ncol=2), c=matrix(rnorm(1:10), ncol=2))
I would like to extract a matrix with the minimum values in each (i,j) for the three matrix (in my real example all matrix have the same dimensions).
I can do it manually with
pmin(a,b,c), but it would be impractical for my case, because I have several matrix in the list.
I tried with
lapply(mylist, function(x) pmin(x))
but I get the original list.
Any suggestions?
Thank you.
Cheers.
Try this solution:
am <- matrix(rnorm(1:10), ncol=2)
bm <- matrix(rnorm(1:10), ncol=2)
cm <- matrix(rnorm(1:10), ncol=2)
mylist <- list(a=am, b=bm, c=cm)
working <- pmin(am, bm, cm) # What you stated is working
new <- do.call(pmin, mylist) # Calling for all elements of a list
identical(working, new) # verify if the new answer outputs the same
Hope it helps! :)
You can use Reduce
# creating a list of matrices
set.seed(1)
mylist <- list(a=matrix(rnorm(1:10), ncol=2),
b=matrix(rnorm(1:10), ncol=2),
c=matrix(rnorm(1:10), ncol=2))
# getting min
> Reduce("pmin", mylist)
[,1] [,2]
[1,] -0.6264538 -0.8204684
[2,] 0.1836433 -0.1557955
[3,] -0.8356286 -1.4707524
[4,] -2.2146999 -0.4781501
[5,] 0.3295078 -0.3053884

Recursive indexing to unlist a matrix

Consider the following vector x and list s
x <- c("apples and pears", "one banana", "pears, oranges, and pizza")
s <- strsplit(x, "(,?)\\s+")
The desired result will be the following, but please keep reading.
> t(sapply(s, `length<-`, 4))
# [,1] [,2] [,3] [,4]
#[1,] "apples" "and" "pears" NA
#[2,] "one" "banana" NA NA
#[3,] "pears" "oranges" "and" "pizza"
That's fine, it's a good way to do it. But R's vectorization is one its best features, and I'd like to see if I can do this with recursive indexing, that is, using only [ subscript indexing.
I want start with the following, and use the row and column indices to turn the matrix s into a 3x4 matrix. So I'm calling cbind on the list s, and starting from there.
(cb <- cbind(s))
# s
# [1,] Character,3
# [2,] Character,2
# [3,] Character,4
class(cb[1])
#[1] "list"
is.recursive(cb)
#[1] TRUE
I've gotten this far, but now I'm struggling with the higher dimensions. Here's the first row, From here I as to unlist the rest of the matrix using the [ and [[ index.
w <- character(nrow(cb)+nrow(cb)^2)
dim(w) <- c(3,4)
w[cbind(1, 1:3)] <- cb[[1]]
# [,1] [,2] [,3] [,4]
#[1,] "apples" "and" "pears" ""
#[2,] "" "" "" ""
#[3,] "" "" "" ""
At level 2 it gets more difficult. I've been doing things like this
> cb[[c(1,2,1), exact = TRUE]]
# Error in cb[[c(1, 2, 1), exact = TRUE]] :
# recursive indexing failed at level 2
> cb[[cbind(1,2,1)]]
# Error in cb[[cbind(1, 2, 1)]] : recursive indexing failed at level 2
Here's an example of how the indexing proceeds. I've tried all kinds of combinations of w[[cbind(1, 1:2)]] and alike
w[cbind(1, 1:3)] <- cb[[1]]
w[cbind(2, 1:2)] <- cb[[2]]
w[cbind(3, 1:4)] <- cb[[3]]
From the empty matrix w, this produces the result
# [,1] [,2] [,3] [,4]
#[1,] "apples" "and" "pears" ""
#[2,] "one" "banana" "" ""
#[3,] "pears" "oranges" "and" "pizza"
Is it possible to use recursive indexing on all levels, so that I can unlist cb into an empty matrix directly from when it was a list? i.e. put the three w[] <- cb[[]] lines into one.
I'm asking this because it gets to the heart of matrix structures in R. It's about learning the indexing, and not about finding an alternative solution to my problem.
You can use the rbind.fill.matrix function from the plyr package.
library(plyr)
rbind.fill.matrix(lapply(s, rbind))
This returns
1 2 3 4
[1,] "apples" "and" "pears" NA
[2,] "one" "banana" NA NA
[3,] "pears" "oranges" "and" "pizza"
Note that this does use as.matrix internally: rbind.fill.matrix calls matrices[] <- lapply(matrices, as.matrix)
If you wanted to bypass the intermediary steps, you can just use my cSplit function, like this:
cSplit(as.data.table(x), "x", "(,?)\\s+", fixed = FALSE)
# x_1 x_2 x_3 x_4
# 1: apples and pears NA
# 2: one banana NA NA
# 3: pears oranges and pizza
as.matrix(.Last.value)
# x_1 x_2 x_3 x_4
# [1,] "apples" "and" "pears" NA
# [2,] "one" "banana" NA NA
# [3,] "pears" "oranges" "and" "pizza"
Under the hood, however, that still does require creating a matrix and filling it in. It uses matrix indexing to fill in the values, so it is quite fast.
A manual approach would look something like:
myFun <- function(invec, split, fixed = TRUE) {
s <- strsplit(invec, split, fixed)
Ncol <- vapply(s, length, 1L)
M <- matrix(NA_character_, ncol = max(Ncol),
nrow = length(invec))
M[cbind(rep(sequence(length(invec)), times = Ncol),
sequence(Ncol))] <- unlist(s, use.names = FALSE)
M
}
myFun(x, "(,?)\\s+", FALSE)
# [,1] [,2] [,3] [,4]
# [1,] "apples" "and" "pears" NA
# [2,] "one" "banana" NA NA
# [3,] "pears" "oranges" "and" "pizza"
Speed is not everything, but it certainly should be a consideration for this type of transformation.
Here are some tests of what has been suggested so far:
## The manual approach
fun1 <- function(x) myFun(x, "(,?)\\s+", FALSE)
## The cSplit approach
fun2 <- function(x) cSplit(as.data.table(x), "x", "(,?)\\s+", fixed = FALSE)
## The OP's approach
fun3 <- function(x) {
s <- strsplit(x, "(,?)\\s+")
mx <- max(sapply(s, length))
do.call(rbind, lapply(s, function(x) { length(x) <- mx; x }))
}
## The plyr approach
fun4 <- function(x) {
s <- strsplit(x, "(,?)\\s+")
rbind.fill.matrix(lapply(s, rbind))
}
And, for fun, here's another approach, this one using dcast.data.table:
fun5 <- function(x) {
dcast.data.table(
data.table(
strsplit(x, "(,?)\\s+"))[, list(
unlist(V1)), by = sequence(length(x))][, N := sequence(
.N), by = sequence], sequence ~ N, value.var = "V1")
}
Testing is on slightly bigger data. Not very big--12k values:
x <- unlist(replicate(4000, x, FALSE))
length(x)
# [1] 12000
## I expect `rbind.fill.matrix` to be slow:
system.time(fun4(x))
# user system elapsed
# 3.38 0.00 3.42
library(microbenchmark)
microbenchmark(fun1(x), fun2(x), fun3(x), fun5(x))
# Unit: milliseconds
# expr min lq median uq max neval
# fun1(x) 97.22076 100.8013 102.5754 107.8349 166.6632 100
# fun2(x) 115.01466 120.6389 125.0622 138.0614 222.7428 100
# fun3(x) 146.33339 155.9599 158.8394 170.3917 228.5523 100
# fun5(x) 257.53868 266.5994 273.3830 296.8003 346.3850 100
A bit bigger data, but still not what others might consider big: 1.2M values.
X <- unlist(replicate(100, x, FALSE))
length(X)
# [1] 1200000
## Dropping fun3 and fun5 now, though they are very close...
## I wonder how fun5 scales further (but don't have the patience to wait)
system.time(fun5(X))
# user system elapsed
# 31.28 0.43 31.76
system.time(fun3(X))
# user system elapsed
# 31.62 0.33 31.99
microbenchmark(fun1(X), fun2(X), times = 10)
# Unit: seconds
# expr min lq median uq max neval
# fun1(X) 11.65622 11.76424 12.31091 13.38226 13.46488 10
# fun2(X) 12.71771 13.40967 14.58484 14.95430 16.15747 10
The penalty for the cSplit approach would be in terms of having to convert to a "data.table" and the checking of different conditions, but as your data grows, those penalties become less noticeable.

How to use some apply function to solve what requires two for-loops in R

I have a matrix, named "mat", and a smaller matrix, named "center".
temp = c(1.8421,5.6586,6.3526,2.904,3.232,4.6076,4.8,3.2909,4.6122,4.9399)
mat = matrix(temp, ncol=2)
[,1] [,2]
[1,] 1.8421 4.6076
[2,] 5.6586 4.8000
[3,] 6.3526 3.2909
[4,] 2.9040 4.6122
[5,] 3.2320 4.9399
center = matrix(c(3, 6, 3, 2), ncol=2)
[,1] [,2]
[1,] 3 3
[2,] 6 2
I need to compute the distance between each row of mat with every row of center. For example, the distance of mat[1,] and center[1,] can be computed as
diff = mat[1,]-center[1,]
t(diff)%*%diff
[,1]
[1,] 3.92511
Similarly, I can find the distance of mat[1,] and center[2,]
diff = mat[1,]-center[2,]
t(diff)%*%diff
[,1]
[1,] 24.08771
Repeat this process for each row of mat, I will end up with
[,1] [,2]
[1,] 3.925110 24.087710
[2,] 10.308154 7.956554
[3,] 11.324550 1.790750
[4,] 2.608405 16.408805
[5,] 3.817036 16.304836
I know how to implement it with for-loops. I was really hoping someone could tell me how to do it with some kind of an apply() function, maybe mapply() I guess.
Thanks
apply(center, 1, function(x) colSums((x - t(mat)) ^ 2))
# [,1] [,2]
# [1,] 3.925110 24.087710
# [2,] 10.308154 7.956554
# [3,] 11.324550 1.790750
# [4,] 2.608405 16.408805
# [5,] 3.817036 16.304836
If you want the apply for expressiveness of code that's one thing but it's still looping, just different syntax. This can be done without any loops, or with a very small one across center instead of mat. I'd just transpose first because it's wise to get into the habit of getting as much as possible out of the apply statement. (The BrodieG answer is pretty much identical in function.) These are working because R will automatically recycle the smaller vector along the matrix and do it much faster than apply or for.
tm <- t(mat)
apply(center, 1, function(m){
colSums((tm - m)^2) })
Use dist and then extract the relevant submatrix:
ix <- 1:nrow(mat)
as.matrix( dist( rbind(mat, center) )^2 )[ix, -ix]
6 7
# 1 3.925110 24.087710
# 2 10.308154 7.956554
# 3 11.324550 1.790750
# 4 2.608405 16.408805
# 5 3.817036 16.304836
REVISION: simplified slightly.
You could use outer as well
d <- function(i, j) sum((mat[i, ] - center[j, ])^2)
outer(1:nrow(mat), 1:nrow(center), Vectorize(d))
This will solve it
t(apply(mat,1,function(row){
d1<-sum((row-center[1,])^2)
d2<-sum((row-center[2,])^2)
return(c(d1,d2))
}))
Result:
[,1] [,2]
[1,] 3.925110 24.087710
[2,] 10.308154 7.956554
[3,] 11.324550 1.790750
[4,] 2.608405 16.408805
[5,] 3.817036 16.304836

equivalent way using apply family instead of a for loop to get max value of each row in list of sublists R

What would be the equivalent to this using apply family functions or a compbination between do.call and apply? I would like to keep it simple and when possible in one line:
a <- list( as.data.frame(matrix(rnorm(12),4,3)),
as.data.frame(matrix(rnorm(12),4,3)),
as.data.frame(matrix(rnorm(12),4,3))
)
l <- list()
for (i in 1:length(a)) {
l[[i]] <- apply(a[[i]],1,max)
}
b <- do.call(data.frame, l)
I would use sapply for this particular example, however I don't know how representative this example is of your actual larger problem.
> sapply(a, function(x) apply(x, 1, max))
[,1] [,2] [,3]
[1,] 0.5757814 0.9189774 0.6198257
[2,] 0.1836433 0.9438362 0.4179416
[3,] 1.5117812 1.1249309 1.3586796
[4,] 1.5952808 0.5939013 -0.1027877
sapply will simplify to a matrix whenever possible. If you want a data.frame, just wrap the output in data.frame.

Resources