R: applying function over matrix and keeping matrix dimensions - r

So I want to apply a function over a matrix in R. This works really intuitively for simple functions:
> (function(x)x*x)(matrix(1:10, nrow=2))
[,1] [,2] [,3] [,4] [,5]
[1,] 1 9 25 49 81
[2,] 4 16 36 64 100
...but clearly I don't understand all of its workings:
> m = (matrix(1:10, nrow=2))
> (function(x) if (x %% 3 == 0) { return(NA) } else { return(x+1) })(m)
[,1] [,2] [,3] [,4] [,5]
[1,] 2 4 6 8 10
[2,] 3 5 7 9 11
Warning message:
In if (x == 3) { :
the condition has length > 1 and only the first element will be used
I read up on this and found out about Vectorize and sapply, which both seemed great and just like what I wanted, except that both of them convert my matrix into a list:
> y = (function(x) if (x %% 3 == 0) { return(NA) } else { return(x+1) })
> sapply(m, y)
[1] 2 3 NA 5 6 NA 8 9 NA 11
> Vectorize(y)(m)
[1] 2 3 NA 5 6 NA 8 9 NA 11
...whereas I'd like to keep it in a matrix with its current dimensions. How might I do this? Thanks!

#Joshua Ulrich (and Dason) has a great answer. And doing it directly without the function y is the best solution. But if you really need to call a function, you can make it faster using vapply. It produces a vector without dimensions (as sapply, but faster), but then you can add them back using structure:
# Your function (optimized)
y = function(x) if (x %% 3) x+1 else NA
m <- matrix(1:1e6,1e3)
system.time( r1 <- apply(m,1:2,y) ) # 4.89 secs
system.time( r2 <- structure(sapply(m, y), dim=dim(m)) ) # 2.89 secs
system.time( r3 <- structure(vapply(m, y, numeric(1)), dim=dim(m)) ) # 1.66 secs
identical(r1, r2) # TRUE
identical(r1, r3) # TRUE
...As you can see, the vapply approach is about 3x faster than apply... And the reason vapply is faster than sapply is that sapply must analyse the result to figure out that it can be simplified to a numeric vector. With vapply, you specified the result type (numeric(1)), so it doesn't have to guess...
UPDATE I figured out another (shorter) way of preserving the matrix structure:
m <- matrix(1:10, nrow=2)
m[] <- vapply(m, y, numeric(1))
You simply assign the new values to the object using m[] <-. Then all other attributes are preserved (like dim, dimnames, class etc).

One way is to use apply on both rows and columns:
apply(m,1:2,y)
[,1] [,2] [,3] [,4] [,5]
[1,] 2 NA 6 8 NA
[2,] 3 5 NA 9 11
You can also do it with subscripting because == is already vectorized:
m[m %% 3 == 0] <- NA
m <- m+1
m
[,1] [,2] [,3] [,4] [,5]
[1,] 2 NA 6 8 NA
[2,] 3 5 NA 9 11

For this specific example you can just do something like this
> # Create some fake data
> mat <- matrix(1:16, 4, 4)
> # Set all elements divisible by 3 to NA
> mat[mat %% 3 == 0] <- NA
> # Add 1 to all non NA elements
> mat <- mat + 1
> mat
[,1] [,2] [,3] [,4]
[1,] 2 6 NA 14
[2,] 3 NA 11 15
[3,] NA 8 12 NA
[4,] 5 9 NA 17

There's a slight refinement of Dason and Josh's solution using ifelse.
mat <- matrix(1:16, 4, 4)
ifelse(mat %% 3 == 0, NA, mat + 1)
[,1] [,2] [,3] [,4]
[1,] 2 6 NA 14
[2,] 3 NA 11 15
[3,] NA 8 12 NA
[4,] 5 9 NA 17

Related

Lapply over several parameters, faster method

Suppose I have two vectors
a <- c(1,2,3,4,5)
b <- c(6,7,8,9,10)
and a function
calc <- function(x,y){x + y)
I want to apply this function for the 1st value in a for each value in b. Suppose in my case calc only allows a single value from a and b as input, so lapply(a,calc,b) wouldn't work because the length(b) is not 1 then (gives me an error).
Also mapply doesnt give me the wanted solution either, it only applies the function on paired values, i.e. 1+6, 2+7, etc.
So I built a function that gave me the wanted solution
myfunc <- function(z){lapply(a,calc,z)}
and applied it on b
solution <- lapply(b,myfunc)
We see here that the difference to lapply(a,calc,b) or a nested lapply(a,lapply,calc,b) is that it gives me all the values in its own list. Thats what I wanted or at least it was a function that gave me the right result with no error.
Now, is there a faster/ more trivial method, because I just experimented here a little. And with my function which is much larger than calc it takes 10 minutes, but maybe I have to slim down my original function and there would not be a faster method here...
EDIT:
In my function there is something like this,
calc <- function(x,y){
# ...
number <- x
example <- head(number,n=y)
# ...
}
where a vector as an input for y doesnt work anymore. With lapply(a,lapply,calc,b) or lapply(a,calc,b) I get an error,
Error in head.default(number, n = y) : length(n) == 1L is not TRUE
As Florian says, outer() could be an option.
outer(a, b, calc)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 7 8 9 10 11
# [2,] 8 9 10 11 12
# [3,] 9 10 11 12 13
# [4,] 10 11 12 13 14
# [5,] 11 12 13 14 15
But as MichaelChirico mentions, with a function that isn't vectorized it won't work. In that case something else has to be hacked together. These might or might not be quicker than your current solution.
All combinations (so both calc(1, 6) and calc(6, 1) are performed, similar to outer()
Number of calculations: n2
eg <- expand.grid(a, b)
m1 <- mapply(calc, eg[,1], eg[, 2])
matrix(m1, 5)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 7 8 9 10 11
# [2,] 8 9 10 11 12
# [3,] 9 10 11 12 13
# [4,] 10 11 12 13 14
# [5,] 11 12 13 14 15
Only unique combinations (so assumes your function is symmetric)
Number of calculations: (n2 - n) / 2
cn <- t(combn(1:length(a), 2))
m2 <- mapply(calc, a[cn[, 1]], b[cn[, 2]])
mat <- matrix(, length(a), length(a))
mat[upper.tri(mat)] <- m2
mat
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA 8 9 10 11
# [2,] NA NA 10 11 12
# [3,] NA NA NA 12 13
# [4,] NA NA NA NA 14
# [5,] NA NA NA NA NA
This second one ignores the diagonal, but adding those values are easy, as that's what the OPs mapply() call returned.
diag(mat) <- mapply(calc, a, b)
mat
# [,1] [,2] [,3] [,4] [,5]
# [1,] 7 8 9 10 11
# [2,] NA 9 10 11 12
# [3,] NA NA 11 12 13
# [4,] NA NA NA 13 14
# [5,] NA NA NA NA 15
This solved it for me, adding SIMPLIFY=FALSE to the mapply function, thanks to #AkselA.
eg <- expand.grid(a, b)
m1 <- mapply(calc, eg[,1], eg[, 2],SIMPLIFY=FALSE)
However, this method is only slightly faster than my own solution in my OP.

How to extract the value from matrix cell without "subscript out of bounds" error?

I have my.data matrix which consist of 32 columns and about 23000 rows. I want to extract particular value of cells from 10 and 20 columns, so I use a for-loop:
for (i in 1:nrow(my.data)) {
day <- as.numeric(my.data[i,10])
night <- as.numeric(my.data[i,20])
sum <- day+night
if (sum > 2200 ) {
my.data <- my.data [-i,]
}
}
but the "Error in my.data[i, 10] : subscript out of bounds" is shown. Can you explain me what is wrong? Thank you!
Negative indexes remove elements, therefore your my.data matrix becomes smaller while i will still try to access all rows which were present in your original data.
> matrix(1:9, 3,3)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> matrix(1:9, 3,3)[-1,]
[,1] [,2] [,3]
[1,] 2 5 8
[2,] 3 6 9
> x <- matrix(1:9, 3,3)
> x
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> x[-1, ]
[,1] [,2] [,3]
[1,] 2 5 8
[2,] 3 6 9
>
try:
i <- 1
while (i <= nrow(my.data)) {
day <- as.numeric(my.data[i,10])
night <- as.numeric(my.data[i,20])
sum <- day+night
if (sum > 2200 ) {
my.data <- my.data [-i,]
}
i <- i + 1
}
UPDATE:
or use a more R like solution:
row_idx <- ( as.numeric(my.data[, 10]) +
as.numeric(my.data[, 20]) ) <= 2200
my.data <- my.data[row_idx, ]

subtract a constant vector from each row in a matrix in r

I have a matrix with 5 columns and 4 rows. I also have a vector with 3 columns. I want to subtract the values in the vector from columns 3,4 and 5 respectively at each row of the matrix.
b <- matrix(rep(1:20), nrow=4, ncol=5)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
c <- c(5,6,7)
to get
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 4 7 10
[2,] 2 6 5 8 11
[3,] 3 7 6 9 12
[4,] 4 8 7 10 13
This is exactly what sweep was made for:
b <- matrix(rep(1:20), nrow=4, ncol=5)
x <- c(5,6,7)
b[,3:5] <- sweep(b[,3:5], 2, x)
b
# [,1] [,2] [,3] [,4] [,5]
#[1,] 1 5 4 7 10
#[2,] 2 6 5 8 11
#[3,] 3 7 6 9 12
#[4,] 4 8 7 10 13
..or even without subsetting or reassignment:
sweep(b, 2, c(0,0,x))
Perhaps not that elegant, but
b <- matrix(rep(1:20), nrow=4, ncol=5)
x <- c(5,6,7)
b[,3:5] <- t(t(b[,3:5])-x)
should do the trick. We subset the matrix to change only the part we need, and we use t() (transpose) to flip the matrix so simple vector recycling will take care of subtracting from the correct row.
If you want to avoid the transposed, you could do something like
b[,3:5] <- b[,3:5]-x[col(b[,3:5])]
as well. Here we subset twice, and we use the second to get the correct column for each value in x because both those matrices will index in the same order.
I think my favorite from the question that #thelatemail linked was
b[,3:5] <- sweep(b[,3:5], 2, x, `-`)
Another way, with apply:
b[,3:5] <- t(apply(b[,3:5], 1, function(x) x-c))
A simple solution:
b <- matrix(rep(1:20), nrow=4, ncol=5)
c <- c(5,6,7)
for(i in 1:nrow(b)) {
b[i,3:5] <- b[i,3:5] - c
}
This can be done with the rray package in a very satisfying way (using its (numpy-like) broadcasting - operator %b-%):
#install.packages("rray")
library(rray)
b <- matrix(rep(1:20), nrow=4, ncol=5)
x <- c(5, 6, 7)
b[, 3:5] <- b[, 3:5] %b-% matrix(x, 1)
b
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 5 4 7 10
#> [2,] 2 6 5 8 11
#> [3,] 3 7 6 9 12
#> [4,] 4 8 7 10 13
For large matrices this is even faster than sweep:
#install.packages("bench")
res <- bench::press(
size = c(10, 1000, 10000),
frac_selected = c(0.1, 0.5, 1),
{
B <- matrix(sample(size*size), nrow=size, ncol=size)
B2 <- B
x <- sample(size, size=ceiling(size*frac_selected))
idx <- sample(size, size=ceiling(size*frac_selected))
bench::mark(rray = {B2[, idx] <- B[, idx, drop = FALSE] %b-% matrix(x, nrow = 1); B2},
sweep = {B2[, idx] <- sweep(B[, idx, drop = FALSE], MARGIN = 2, x); B2}
)
}
)
plot(res)

Build a square-ish matrix with a specified number of cells

I would like to write a function that transforms an integer, n, (specifying the number of cells in a matrix) into a square-ish matrix that contain the sequence 1:n. The goal is to make the matrix as "square" as possible.
This involves a couple of considerations:
How to maximize "square"-ness? I was thinking of a penalty equal to the difference in the dimensions of the matrix, e.g. penalty <- abs(dim(mat)[1]-dim(mat)[2]), such that penalty==0 when the matrix is square and is positive otherwise. Ideally this would then, e.g., for n==12 lead to a preference for a 3x4 rather than 2x6 matrix. But I'm not sure the best way to do this.
Account for odd-numbered values of n. Odd-numbered values of n do not necessarily produce an obvious choice of matrix (unless they have an integer square root, like n==9. I thought about simply adding 1 to n, and then handling as an even number and allowing for one blank cell, but I'm not sure if this is the best approach. I imagine it might be possible to obtain a more square matrix (by the definition in 1) by adding more than 1 to n.
Allow the function to trade-off squareness (as described in #1) and the number of blank cells (as described in #2), so the function should have some kind of parameter(s) to address this trade-off. For example, for n==11, a 3x4 matrix is pretty square but not as square as a 4x4, but the 4x4 would have many more blank cells than the 3x4.
The function needs to optionally produce wider or taller matrices, so that n==12 can produce either a 3x4 or a 4x3 matrix. But this would be easy to handle with a t() of the resulting matrix.
Here's some intended output:
> makemat(2)
[,1]
[1,] 1
[2,] 2
> makemat(3)
[,1] [,2]
[1,] 1 3
[2,] 2 4
> makemat(9)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> makemat(11)
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
Here's basically a really terrible start to this problem.
makemat <- function(n) {
n <- abs(as.integer(n))
d <- seq_len(n)
out <- d[n %% d == 0]
if(length(out)<2)
stop('n has fewer than two factors')
dim1a <- out[length(out)-1]
m <- matrix(1:n, ncol=dim1a)
m
}
As you'll see I haven't really been able to account for odd-numbered values of n (look at the output of makemat(7) or makemat(11) as described in #2, or enforce the "squareness" rule described in #1, or the trade-off between them as described in #3.
I think the logic you want is already in the utility function n2mfrow(), which as its name suggests is for creating input to the mfrow graphical parameter and takes an integer input and returns the number of panels in rows and columns to split the display into:
> n2mfrow(11)
[1] 4 3
It favours tall layouts over wide ones, but that is easily fixed via rev() on the output or t() on a matrix produced from the results of n2mfrow().
makemat <- function(n, wide = FALSE) {
if(isTRUE(all.equal(n, 3))) {
dims <- c(2,2)
} else {
dims <- n2mfrow(n)
}
if(wide)
dims <- rev(dims)
m <- matrix(seq_len(prod(dims)), nrow = dims[1], ncol = dims[2])
m
}
Notice I have to special-case n = 3 as we are abusing a function intended for another use and a 3x1 layout on a plot makes more sense than a 2x2 with an empty space.
In use we have:
> makemat(2)
[,1]
[1,] 1
[2,] 2
> makemat(3)
[,1] [,2]
[1,] 1 3
[2,] 2 4
> makemat(9)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> makemat(11)
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
> makemat(11, wide = TRUE)
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
Edit:
The original function padded seq_len(n) with NA, but I realised the OP wanted to have a sequence from 1 to prod(nrows, ncols), which is what the version above does. The one below pads with NA.
makemat <- function(n, wide = FALSE) {
if(isTRUE(all.equal(n, 3))) {
dims <- c(2,2)
} else {
dims <- n2mfrow(n)
}
if(wide)
dims <- rev(dims)
s <- rep(NA, prod(dims))
ind <- seq_len(n)
s[ind] <- ind
m <- matrix(s, nrow = dims[1], ncol = dims[2])
m
}
I think this function implicitly satisfies your constraints. The parameter can range from 0 to Inf. The function always returns either a square matrix with sides of ceiling(sqrt(n)), or a (maybe) rectangular matrix with rows floor(sqrt(n)) and just enough columns to "fill it out". The parameter trades off the selection between the two: if it is less than 1, then the second, more rectangular matrices are preferred, and if greater than 1, the first, always square matrices are preferred. A param of 1 weights them equally.
makemat<-function(n,param=1,wide=TRUE){
if (n<1) stop('n must be positive')
s<-sqrt(n)
bottom<-n-(floor(s)^2)
top<-(ceiling(s)^2)-n
if((bottom*param)<top) {
rows<-floor(s)
cols<-rows + ceiling(bottom / rows)
} else {
cols<-rows<-ceiling(s)
}
if(!wide) {
hold<-rows
rows<-cols
cols<-hold
}
m<-seq.int(rows*cols)
dim(m)<-c(rows,cols)
m
}
Here is an example where the parameter is set to default, and equally trades off the distance equally:
lapply(c(2,3,9,11),makemat)
# [[1]]
# [,1] [,2]
# [1,] 1 2
#
# [[2]]
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
#
# [[3]]
# [,1] [,2] [,3]
# [1,] 1 4 7
# [2,] 2 5 8
# [3,] 3 6 9
#
# [[4]]
# [,1] [,2] [,3] [,4]
# [1,] 1 4 7 10
# [2,] 2 5 8 11
# [3,] 3 6 9 12
Here is an example of using the param with 11, to get a 4x4 matrix.
makemat(11,3)
# [,1] [,2] [,3] [,4]
# [1,] 1 5 9 13
# [2,] 2 6 10 14
# [3,] 3 7 11 15
# [4,] 4 8 12 16
What about something fairly simple and you can handle the exceptions and other requests in a wrapper?
library(taRifx)
neven <- 8
nodd <- 11
nsquareodd <- 9
nsquareeven <- 16
makemat <- function(n) {
s <- seq(n)
if( odd(n) ) {
s[ length(s)+1 ] <- NA
n <- n+1
}
sq <- sqrt( n )
dimx <- ceiling( sq )
dimy <- floor( sq )
if( dimx*dimy < length(s) ) dimy <- ceiling( sq )
l <- dimx*dimy
ldiff <- l - length(s)
stopifnot( ldiff >= 0 )
if( ldiff > 0 ) s[ seq( length(s) + 1, length(s) + ldiff ) ] <- NA
matrix( s, nrow = dimx, ncol = dimy )
}
> makemat(neven)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 NA
> makemat(nodd)
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 NA
> makemat(nsquareodd)
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 NA
[3,] 3 7 NA
[4,] 4 8 NA
> makemat(nsquareeven)
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16

Blockwise sum of matrix elements

I want to go from something like this:
1> a = matrix(c(1,4,2,5,2,5,2,1,4,4,3,2,1,6,7,4),4)
1> a
[,1] [,2] [,3] [,4]
[1,] 1 2 4 1
[2,] 4 5 4 6
[3,] 2 2 3 7
[4,] 5 1 2 4
To something like this:
[,1] [,2]
[1,] 12 15
[2,] 10 16
...without using for-loops, plyr, or otherwise without looping. Possible? I'm trying to shrink a geographic lat/long dataset from 5 arc-minutes to half-degree, and I've got an ascii grid. A little function where I specify blocksize would be great. I've got hundreds of such files, so things that allow me to do it quickly without parallelization/supercomputers would be much appreciated.
You can use matrix multiplication for this.
# Computation matrix:
mat <- function(n, r) {
suppressWarnings(matrix(c(rep(1, r), rep(0, n)), n, n/r))
}
Square-matrix example, uses a matrix and its transpose on each side of a:
# Reduce a 4x4 matrix by a factor of 2:
x <- mat(4, 2)
x
## [,1] [,2]
## [1,] 1 0
## [2,] 1 0
## [3,] 0 1
## [4,] 0 1
t(x) %*% a %*% x
## [,1] [,2]
## [1,] 12 15
## [2,] 10 16
Non-square example:
b <- matrix(1:24, 4 ,6)
t(mat(4, 2)) %*% b %*% mat(6, 2)
## [,1] [,2] [,3]
## [1,] 14 46 78
## [2,] 22 54 86
tapply(a, list((row(a) + 1L) %/% 2L, (col(a) + 1L) %/% 2L), sum)
# 1 2
# 1 12 15
# 2 10 16
I used 1L and 2L instead of 1 and 2 so indices remain integers (as opposed to numerics) and it should run faster that way.
I guess that might help you, but still it uses sapply which can be considered as loop-ish tool.
a <- matrix(c(1,4,2,5,2,5,2,1,4,4,3,2,1,6,7,4),4)
block.step <- 2
res <- sapply(seq(1, nrow(a), by=block.step), function(x)
sapply(seq(1, nrow(a), by=block.step), function(y)
sum(a[x:(x+block.step-1), y:(y+block.step-1)])
)
)
res
Is it anyhow helpful ?

Resources