I'm trying to add two matrices in R, and I'd like the addition to treat any NA's as 0's. I know I could always do something like this:
ifelse(is.na(A), 0, A) + ifelse(is.na(B), 0, B)
but it seems like there should be a more elegant way of doing this. For example, is there some way of supplying the na.rm argument to the + function?
Assuming that "A" and "B" have the same dimensions,
`dim<-`(colSums(rbind(c(A), c(B)), na.rm=TRUE), dim(A))
# [,1] [,2] [,3] [,4]
#[1,] 4 7 6 6
#[2,] 5 7 2 4
#[3,] 8 9 6 1
#[4,] 4 2 5 5
Or instead of ifelse, we could use replace which will be a bit faster
replace(A, is.na(A), 0) +replace(B, is.na(B), 0)
# [,1] [,2] [,3] [,4]
#[1,] 4 7 6 6
#[2,] 5 7 2 4
#[3,] 8 9 6 1
#[4,] 4 2 5 5
Or if there are multiple datasets, we can place it in a list and work with Reduce
Reduce(`+`, lapply(list(A,B), function(x) replace(x, is.na(x), 0)))
Another compact option would be to use NAer from qdap
library(qdap)
NAer(A)+NAer(B)
For multiple datasets
Reduce(`+`, lapply(list(A,B), NAer))
data
set.seed(324)
A <- matrix(sample(c(NA,1:5), 4*4, replace=TRUE), ncol=4)
set.seed(59)
B <- matrix(sample(c(NA,1:5), 4*4, replace=TRUE), ncol=4)
You can try recode from the car package
A <- matrix(c(1,NA,5,9,3,NA), 2)
B <- matrix(c(NA,10,3,NA,21,3), 2)
library(car)
Reduce("+", lapply(list(A, B), recode, "NA=0"))
# [,1] [,2] [,3]
# [1,] 1 8 24
# [2,] 10 9 3
Related
I am trying to make combinations of 6 numbers using three pairs from four pairs (1,2), (3,4), (5,6), (7,8) in R
d<-c(1,2,3,4,5,6,7,8)
dc1<-cbind(d[1:2],d[3:4],d[5:6])
dim(dc1)<-c(1,6)
dc2<-cbind(d[1:2],d[3:4],d[7:8])
dim(dc2)<-c(1,6)
dc3<-cbind(d[1:2],d[5:6],d[7:8])
dim(dc3)<-c(1,6)
dc4<-cbind(d[3:4],d[5:6],d[7:8])
dim(dc4)<-c(1,6)
rbind(dc1,dc2,dc3,dc4)
Is it possible to use combn to obtain
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 4 5 6
[2,] 1 2 3 4 7 8
[3,] 1 2 5 6 7 8
[4,] 3 4 5 6 7 8
I have tried
d<-structure(list(d1=c(1,2),d2=c(3,4),d3=c(5,6),d4=c(7,8)),.Names = c("d1", "d2", "d3", "d4"), row.names = 1:2, class = "data.frame")
dc <- combn(d, 3, simplify=FALSE)
for(i in 1:length(dc)){
dim(dc[i])<-c(1,6)
}
but it is not working. I will appreciate your help. Thanks.
We can create a grouping variable to split and then do the combn
grp <- as.integer(gl(length(d), 2, length(d)))
out <- do.call(rbind, combn(split(d, grp), 3, simplify = FALSE, FUN = unlist))
dimnames(out) <- NULL
out
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 1 2 3 4 5 6
#[2,] 1 2 3 4 7 8
#[3,] 1 2 5 6 7 8
#[4,] 3 4 5 6 7 8
NOTE: Here, the initial object is just the vector created instead of the pre-procesed 'd'. If we have already separated it to columns, it is much easier as #markus mentioned
t(combn(d, 3, FUN =unlist))
data
d <- 1:8
Here's another way using the combination function from the gtools package:
Create a list of your pairs:
pair.list <- list(c(1,2), c(3, 4), c(5, 6), c(7, 8))
Then create the 4 choose 3 combo matrix:
combos <- combination(4, 3)
Then use the purrr map function to generate the list of output vectors
vec.list <- map(1:4, function(x) unlist(pair.list[combos[x, ]]))
Finally convert the list of vectors to a data.frame:
df <- data.frame(Reduce(rbind, vec.list))
The benefit of this strategy is that your tuples can be of any length and have any values.
Another possibility, starting with the vector 'd':
i <- (seq_along(d) + 1) %/% 2
t(combn(unique(i), 3, function(cb) d[i %in% cb]))
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1 2 3 4 5 6
# [2,] 1 2 3 4 7 8
# [3,] 1 2 5 6 7 8
# [4,] 3 4 5 6 7 8
I am struggling to find a way to apply a specific function using apply, only to a "chunk" of a specific row.
For instance, I have a matrix:
x <- matrix(c(5,12,4,3,2,8,10,7,9,1,11,6),nrow=3)
[,1] [,2] [,3] [,4]
[1,] 5 3 10 1
[2,] 12 2 7 11
[3,] 4 8 9 6
And I would like to end up with a new matrix, made up of a sum of the first and last two values in each row. Like so:
[,1] [,2]
[1,] 8 11
[2,] 14 18
[3,] 12 15
I have tried something like this:
chunks<-c("1:2","3:4")
sumchunks<-function(x,chunks){
apply(x,1,
function(row){
for (i in chunks){
v<-sum(row[chunks[i]])
}})
}
But it doesn't work at all. Any suggestion on successful ways?
Thank you.
You can do:
chunks <- list(1:2, 3:4)
sumchunks <- function(x, chunks) sapply(chunks, function(ch) sum(x[ch]))
x <- matrix(c(5,12,4,3,2,8,10,7,9,1,11,6),nrow=3)
apply(x, 1, sumchunks, chunks=chunks)
# [,1] [,2] [,3]
# [1,] 8 14 12
# [2,] 11 18 15
Eventually you want to transpose the result.
Here is a vectorized variant:
chunks <- list(1:2, 3:4)
x <- matrix(c(5,12,4,3,2,8,10,7,9,1,11,6),nrow=3)
sapply(chunks, function(ch) rowSums(x[,ch]))
# [,1] [,2]
# [1,] 8 11
# [2,] 14 18
# [3,] 12 15
We can convert to array and then do
t(apply(array(x, c(3, 2, 2)), 1, colSums))
Or
sapply(seq(1, ncol(x), 2), function(i) rowSums(x[,i:(i+1)]))
# [,1] [,2]
#[1,] 8 11
#[2,] 14 18
#[3,] 12 15
like this?
x <- matrix(sample(1:12),nrow=3)
f = function(s) {
c(sum(s[1:2]), sum(s[3:4]))
}
t(apply(x, 1, f))
rowSums was built to sum over rows so should be quite fast. You can limit the columns you want to sum over and then cbind them to get what you want:
cbind(rowSums(x[,c(1,2)]), rowSums(x[,c(3,4)]))
# [,1] [,2]
#[1,] 8 11
#[2,] 14 18
#[3,] 12 15
Consider the following 3-dimensional array:
set.seed(123)
arr = array(sample(c(1:10)), dim=c(3,4,2))
which yields
> arr
, , 1
[,1] [,2] [,3] [,4]
[1,] 10 9 8 2
[2,] 5 1 4 10
[3,] 6 7 3 5
, , 2
[,1] [,2] [,3] [,4]
[1,] 6 7 3 5
[2,] 9 8 2 6
[3,] 1 4 10 9
I'd like to subset it like
arr[c(1,2), c(2,4), c(1)]
but the catch is that I don't know (a) which indices or (b) which dimension the indices are.
What is the best way to access an N-dimensional array with index variables?
ll = list(c(1,2), c(2,4), c(1))
arr[ll] # doesn't work
arr[grid.expand(ll)] # doesn't work
# ..what else?
use do.call, such as:
do.call(`[`, c(list(arr), ll))
or more cleanly, using a wrapper function:
getArr <- function(...)
`[`(arr, ...)
do.call(getArr, ll)
[,1] [,2]
[1,] 10 5
[2,] 7 3
There is the asub function from the abind package:
library(abind)
asub(arr, ll)
which can also do a lot more, in particular extract along a subset of the dimensions (https://stackoverflow.com/a/17752012/1201032). Worth having in your toolbox.
So I want to apply a function over a matrix in R. This works really intuitively for simple functions:
> (function(x)x*x)(matrix(1:10, nrow=2))
[,1] [,2] [,3] [,4] [,5]
[1,] 1 9 25 49 81
[2,] 4 16 36 64 100
...but clearly I don't understand all of its workings:
> m = (matrix(1:10, nrow=2))
> (function(x) if (x %% 3 == 0) { return(NA) } else { return(x+1) })(m)
[,1] [,2] [,3] [,4] [,5]
[1,] 2 4 6 8 10
[2,] 3 5 7 9 11
Warning message:
In if (x == 3) { :
the condition has length > 1 and only the first element will be used
I read up on this and found out about Vectorize and sapply, which both seemed great and just like what I wanted, except that both of them convert my matrix into a list:
> y = (function(x) if (x %% 3 == 0) { return(NA) } else { return(x+1) })
> sapply(m, y)
[1] 2 3 NA 5 6 NA 8 9 NA 11
> Vectorize(y)(m)
[1] 2 3 NA 5 6 NA 8 9 NA 11
...whereas I'd like to keep it in a matrix with its current dimensions. How might I do this? Thanks!
#Joshua Ulrich (and Dason) has a great answer. And doing it directly without the function y is the best solution. But if you really need to call a function, you can make it faster using vapply. It produces a vector without dimensions (as sapply, but faster), but then you can add them back using structure:
# Your function (optimized)
y = function(x) if (x %% 3) x+1 else NA
m <- matrix(1:1e6,1e3)
system.time( r1 <- apply(m,1:2,y) ) # 4.89 secs
system.time( r2 <- structure(sapply(m, y), dim=dim(m)) ) # 2.89 secs
system.time( r3 <- structure(vapply(m, y, numeric(1)), dim=dim(m)) ) # 1.66 secs
identical(r1, r2) # TRUE
identical(r1, r3) # TRUE
...As you can see, the vapply approach is about 3x faster than apply... And the reason vapply is faster than sapply is that sapply must analyse the result to figure out that it can be simplified to a numeric vector. With vapply, you specified the result type (numeric(1)), so it doesn't have to guess...
UPDATE I figured out another (shorter) way of preserving the matrix structure:
m <- matrix(1:10, nrow=2)
m[] <- vapply(m, y, numeric(1))
You simply assign the new values to the object using m[] <-. Then all other attributes are preserved (like dim, dimnames, class etc).
One way is to use apply on both rows and columns:
apply(m,1:2,y)
[,1] [,2] [,3] [,4] [,5]
[1,] 2 NA 6 8 NA
[2,] 3 5 NA 9 11
You can also do it with subscripting because == is already vectorized:
m[m %% 3 == 0] <- NA
m <- m+1
m
[,1] [,2] [,3] [,4] [,5]
[1,] 2 NA 6 8 NA
[2,] 3 5 NA 9 11
For this specific example you can just do something like this
> # Create some fake data
> mat <- matrix(1:16, 4, 4)
> # Set all elements divisible by 3 to NA
> mat[mat %% 3 == 0] <- NA
> # Add 1 to all non NA elements
> mat <- mat + 1
> mat
[,1] [,2] [,3] [,4]
[1,] 2 6 NA 14
[2,] 3 NA 11 15
[3,] NA 8 12 NA
[4,] 5 9 NA 17
There's a slight refinement of Dason and Josh's solution using ifelse.
mat <- matrix(1:16, 4, 4)
ifelse(mat %% 3 == 0, NA, mat + 1)
[,1] [,2] [,3] [,4]
[1,] 2 6 NA 14
[2,] 3 NA 11 15
[3,] NA 8 12 NA
[4,] 5 9 NA 17
I have X, a three-dimensional array in R. I want to take a vector of indices indx (length equal to dim(X)[1]) and form a matrix where the first row is the first row of X[ , , indx[1]], the second row is the second row of X[ , , indx[2]], and so on.
For example, I have:
R> X <- array(1:18, dim = c(3, 2, 3))
R> X
, , 1
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
, , 2
[,1] [,2]
[1,] 7 10
[2,] 8 11
[3,] 9 12
, , 3
[,1] [,2]
[1,] 13 16
[2,] 14 17
[3,] 15 18
R> indx <- c(2, 3, 1)
My desired output is
R> rbind(X[1, , 2], X[2, , 3], X[3, , 1])
[,1] [,2]
[1,] 7 10
[2,] 14 17
[3,] 3 6
As of now I'm using the inelegant (and slow) sapply(1:dim(X)[2], function(x) X[cbind(1:3, x, indx)]). Is there any way to do this using the built-in indexing functions? I had no luck experimenting with the matrix indexing methods described in ?Extract, but I may just be doing it wrong.
Maybe like this:
t(sapply(1:3,function(x) X[,,idx][x,,x]))
I may be answering the wrong question (I can't reconcile your first description and your sample output)... This produces your sample output, but I can't say that it's much faster without running it on your data.
do.call(rbind, lapply(1:dim(X)[1], function(i) X[i, , indx[i]]))
Matrix indexing to the rescue! No applys needed.
Figure out which indices you want:
n <- dim(X)[2]
foo <- cbind(rep(seq_along(indx),n),
rep(seq.int(n), each=length(indx)),
rep(indx,n))
(the result is this)
[,1] [,2] [,3]
[1,] 1 1 2
[2,] 2 1 3
[3,] 3 1 1
[4,] 1 2 2
[5,] 2 2 3
[6,] 3 2 1
and use it as index, converting back to a matrix to make it look like your output.
> matrix(X[foo],ncol=n)
[,1] [,2]
[1,] 7 10
[2,] 14 17
[3,] 3 6