Related
I have a matrix with 5 columns and 4 rows. I also have a vector with 3 columns. I want to subtract the values in the vector from columns 3,4 and 5 respectively at each row of the matrix.
b <- matrix(rep(1:20), nrow=4, ncol=5)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
c <- c(5,6,7)
to get
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 4 7 10
[2,] 2 6 5 8 11
[3,] 3 7 6 9 12
[4,] 4 8 7 10 13
This is exactly what sweep was made for:
b <- matrix(rep(1:20), nrow=4, ncol=5)
x <- c(5,6,7)
b[,3:5] <- sweep(b[,3:5], 2, x)
b
# [,1] [,2] [,3] [,4] [,5]
#[1,] 1 5 4 7 10
#[2,] 2 6 5 8 11
#[3,] 3 7 6 9 12
#[4,] 4 8 7 10 13
..or even without subsetting or reassignment:
sweep(b, 2, c(0,0,x))
Perhaps not that elegant, but
b <- matrix(rep(1:20), nrow=4, ncol=5)
x <- c(5,6,7)
b[,3:5] <- t(t(b[,3:5])-x)
should do the trick. We subset the matrix to change only the part we need, and we use t() (transpose) to flip the matrix so simple vector recycling will take care of subtracting from the correct row.
If you want to avoid the transposed, you could do something like
b[,3:5] <- b[,3:5]-x[col(b[,3:5])]
as well. Here we subset twice, and we use the second to get the correct column for each value in x because both those matrices will index in the same order.
I think my favorite from the question that #thelatemail linked was
b[,3:5] <- sweep(b[,3:5], 2, x, `-`)
Another way, with apply:
b[,3:5] <- t(apply(b[,3:5], 1, function(x) x-c))
A simple solution:
b <- matrix(rep(1:20), nrow=4, ncol=5)
c <- c(5,6,7)
for(i in 1:nrow(b)) {
b[i,3:5] <- b[i,3:5] - c
}
This can be done with the rray package in a very satisfying way (using its (numpy-like) broadcasting - operator %b-%):
#install.packages("rray")
library(rray)
b <- matrix(rep(1:20), nrow=4, ncol=5)
x <- c(5, 6, 7)
b[, 3:5] <- b[, 3:5] %b-% matrix(x, 1)
b
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 5 4 7 10
#> [2,] 2 6 5 8 11
#> [3,] 3 7 6 9 12
#> [4,] 4 8 7 10 13
For large matrices this is even faster than sweep:
#install.packages("bench")
res <- bench::press(
size = c(10, 1000, 10000),
frac_selected = c(0.1, 0.5, 1),
{
B <- matrix(sample(size*size), nrow=size, ncol=size)
B2 <- B
x <- sample(size, size=ceiling(size*frac_selected))
idx <- sample(size, size=ceiling(size*frac_selected))
bench::mark(rray = {B2[, idx] <- B[, idx, drop = FALSE] %b-% matrix(x, nrow = 1); B2},
sweep = {B2[, idx] <- sweep(B[, idx, drop = FALSE], MARGIN = 2, x); B2}
)
}
)
plot(res)
I want to set NA's in every element of a matrix where the value in a column is greater than or equal to the value of a given vector. For example, I can create a matrix:
set.seed(1)
zz <- matrix(data = round(10L * runif(12)), nrow = 4, ncol = 3)
which gives for zz:
[,1] [,2] [,3]
[1,] 8 5 7
[2,] 6 5 1
[3,] 5 10 3
[4,] 9 1 9
and for the comparison vector (for example):
xx <- round(10L * runif(4))
where xx is:
[1] 6 3 8 2
if I perform this operation:
apply(zz,2,function(x) x >= xx)
I get:
[,1] [,2] [,3]
[1,] TRUE FALSE TRUE
[2,] TRUE TRUE FALSE
[3,] FALSE TRUE FALSE
[4,] TRUE FALSE TRUE
What I want is everywhere I have a TRUE element I want an NA and everywhere I have a FALSE I get the number in the zz matrix (e.g., manually ...):
NA 5 NA
NA NA 1
5 NA 3
NA 1 NA
I can cobble together some "for" loops to do what I want, but is there a vector-based way to do this??
Thanks for any tips.
You could simply do:
zz[zz>=xx] <- NA
# [,1] [,2] [,3]
#[1,] NA 5 NA
#[2,] NA NA 1
#[3,] 5 NA 3
#[4,] NA 1 NA
Here is one option to get the expected output. We get a logical matrix (zz >= xx), using NA^ on that returns NA for the TRUE values and 1 for the FALSE, then multiply it with original matrix 'zz' so that NA remains as such while the 1 changes to the corresponding value in 'zz'.
NA^(zz >= xx)*zz
# [,1] [,2] [,3]
#[1,] NA 5 NA
#[2,] NA NA 1
#[3,] 5 NA 3
#[4,] NA 1 NA
Or another option is ifelse
ifelse(zz >= xx, NA, zz)
data
zz <- structure(c(8, 6, 5, 9, 5, 5, 10, 1, 7, 1, 3, 9), .Dim = c(4L, 3L))
xx <- c(6, 3, 8, 2)
I have a matrix (m.idx) containing position elements of a vector I want to index.
> m.idx
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 3 4 5 6 7
[3,] 5 6 7 8 9
Suppose x is my vector.
x <- c(9,3,2,5,3,2,4,8,9)
I want to repopulate the matrix index with the corresponding position elements of x.
so I would have...
> m.pop
[,1] [,2] [,3] [,4] [,5]
[1,] 9 3 2 5 3
[2,] 2 5 3 2 4
[3,] 3 2 4 8 9
I can kind of do it in a kludgy way with the following.
> m.pop <- t(matrix(t(matrix(x[c(t(m.idx))])),ncol(m.idx),nrow(m.idx)))
> m.pop
[,1] [,2] [,3] [,4] [,5]
[1,] 9 3 2 5 3
[2,] 2 5 3 2 4
[3,] 3 2 4 8 9
But it seems like there may be an easier method to index the values.
What is the best (and fastest/efficient for large sets) way to do this?
How about:
m.idx[] <- x[m.idx]
m.idx
# [,1] [,2] [,3] [,4] [,5]
# [1,] 9 3 2 5 3
# [2,] 2 5 3 2 4
# [3,] 3 2 4 8 9
Or if you don't want to overwrite the m.idx matrix, you can do this instead:
m.pop <- m.idx
m.pop[] <- x[m.pop]
Added:
One other method, using structure, is also quite fast:
structure(x[m.idx], .Dim = dim(m.idx))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 9 3 2 5 3
# [2,] 2 5 3 2 4
# [3,] 3 2 4 8 9
When applied to the large m.idx matrix in Ananda Mahto's answer, the timings on my machine are
fun5 <- function() structure(x[m.idx], .Dim = dim(m.idx))
microbenchmark(fun1(), fun2(), fun3(), fun4(), fun5(), times = 10)
# Unit: milliseconds
# expr min lq median uq max neval
# fun1() 303.3473 307.2064 309.2275 352.5076 353.6911 10
# fun2() 548.0928 555.3363 587.6144 593.4492 596.5611 10
# fun3() 480.6181 487.5807 507.5960 529.9696 533.0403 10
# fun4() 1222.6718 1231.3384 1259.8395 1269.6629 1292.2309 10
# fun5() 401.8450 403.7216 432.7162 455.4638 487.1755 10
identical(fun1(), fun5())
# [1] TRUE
You can see that structure is actually not too bad in terms of speed.
matrix(x[m.idx],ncol=5)
[,1] [,2] [,3] [,4] [,5]
[1,] 9 3 2 5 3
[2,] 2 5 3 2 4
[3,] 3 2 4 8 9
Maybe you can just use dim after matching the vector/matrix:
`dim<-`(x[m.idx], dim(m.idx))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 9 3 2 5 3
# [2,] 2 5 3 2 4
# [3,] 3 2 4 8 9
The x[m.idx] gets you the values you're interested in:
> x[m.idx]
[1] 9 2 3 3 5 2 2 3 4 5 2 8 3 4 9
And, since this should be returned int he same dimensions at the original, you just reassign the same dim to it.
For fun, some timings:
fun1 <- function() `dim<-`(x[m.idx], dim(m.idx))
fun2 <- function() { m.idx[] <- x[m.idx]; m.idx }
fun3 <- function() matrix(x[m.idx], ncol = ncol(m.idx))
fun4 <- function() t(matrix(t(matrix(x[c(t(m.idx))])),ncol(m.idx),nrow(m.idx)))
m.idx <- matrix(c(1, 2, 3, 4, 5,
3, 4, 5, 6, 7,
5, 6, 7, 8, 9),
nrow = 3, byrow = TRUE)
x <- c(9, 3, 2, 5, 3, 2, 4, 8, 9)
set.seed(1)
nrow = 10000 ## Adjust nrow and ncol to test different sizes
ncol = 1000
m.idx <- matrix(sample(unique(m.idx), nrow*ncol, TRUE), ncol = ncol)
library(microbenchmark)
microbenchmark(fun1(), fun2(), fun3(), fun4(), times = 10)
# Unit: milliseconds
# expr min lq median uq max neval
# fun1() 388.7123 403.3614 419.5792 475.7645 553.3420 10
# fun2() 800.5524 838.2398 872.8189 912.1007 978.1500 10
# fun3() 694.1511 720.5165 737.9900 799.5069 876.2552 10
# fun4() 1941.1999 2022.6578 2095.1537 2175.4864 2341.3900 10
I have a matrix with 5 columns and 4 rows. I also have a vector with 3 columns. I want to subtract the values in the vector from columns 3,4 and 5 respectively at each row of the matrix.
b <- matrix(rep(1:20), nrow=4, ncol=5)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
c <- c(5,6,7)
to get
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 4 7 10
[2,] 2 6 5 8 11
[3,] 3 7 6 9 12
[4,] 4 8 7 10 13
This is exactly what sweep was made for:
b <- matrix(rep(1:20), nrow=4, ncol=5)
x <- c(5,6,7)
b[,3:5] <- sweep(b[,3:5], 2, x)
b
# [,1] [,2] [,3] [,4] [,5]
#[1,] 1 5 4 7 10
#[2,] 2 6 5 8 11
#[3,] 3 7 6 9 12
#[4,] 4 8 7 10 13
..or even without subsetting or reassignment:
sweep(b, 2, c(0,0,x))
Perhaps not that elegant, but
b <- matrix(rep(1:20), nrow=4, ncol=5)
x <- c(5,6,7)
b[,3:5] <- t(t(b[,3:5])-x)
should do the trick. We subset the matrix to change only the part we need, and we use t() (transpose) to flip the matrix so simple vector recycling will take care of subtracting from the correct row.
If you want to avoid the transposed, you could do something like
b[,3:5] <- b[,3:5]-x[col(b[,3:5])]
as well. Here we subset twice, and we use the second to get the correct column for each value in x because both those matrices will index in the same order.
I think my favorite from the question that #thelatemail linked was
b[,3:5] <- sweep(b[,3:5], 2, x, `-`)
Another way, with apply:
b[,3:5] <- t(apply(b[,3:5], 1, function(x) x-c))
A simple solution:
b <- matrix(rep(1:20), nrow=4, ncol=5)
c <- c(5,6,7)
for(i in 1:nrow(b)) {
b[i,3:5] <- b[i,3:5] - c
}
This can be done with the rray package in a very satisfying way (using its (numpy-like) broadcasting - operator %b-%):
#install.packages("rray")
library(rray)
b <- matrix(rep(1:20), nrow=4, ncol=5)
x <- c(5, 6, 7)
b[, 3:5] <- b[, 3:5] %b-% matrix(x, 1)
b
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 5 4 7 10
#> [2,] 2 6 5 8 11
#> [3,] 3 7 6 9 12
#> [4,] 4 8 7 10 13
For large matrices this is even faster than sweep:
#install.packages("bench")
res <- bench::press(
size = c(10, 1000, 10000),
frac_selected = c(0.1, 0.5, 1),
{
B <- matrix(sample(size*size), nrow=size, ncol=size)
B2 <- B
x <- sample(size, size=ceiling(size*frac_selected))
idx <- sample(size, size=ceiling(size*frac_selected))
bench::mark(rray = {B2[, idx] <- B[, idx, drop = FALSE] %b-% matrix(x, nrow = 1); B2},
sweep = {B2[, idx] <- sweep(B[, idx, drop = FALSE], MARGIN = 2, x); B2}
)
}
)
plot(res)
I want to try two things :
How do I remove rows that contain NA/NaN/Inf
How do I set value of data point from NA/NaN/Inf to 0.
So far, I have tried using the following for NA values, but been getting warnings.
> eg <- data[rowSums(is.na(data)) == 0,]
Error in rowSums(is.na(data)) :
'x' must be an array of at least two dimensions
In addition: Warning message:
In is.na(data) : is.na() applied to non-(list or vector) of type 'closure'
I guess I'll throw my hat into the ring with my preferred methods:
# sample data
m <- matrix(c(1,2,NA,NaN,1,Inf,-1,1,9,3),5)
# remove all rows with non-finite values
m[!rowSums(!is.finite(m)),]
# replace all non-finite values with 0
m[!is.finite(m)] <- 0
library(functional)
m[apply(m, 1, Compose(is.finite, all)),]
Demonstration:
m <- matrix(c(1,2,3,NA,4,5), 3)
m
## [,1] [,2]
## [1,] 1 NA
## [2,] 2 4
## [3,] 3 5
m[apply(m, 1, Compose(is.finite, all)),]
## [,1] [,2]
## [1,] 2 4
## [2,] 3 5
Note: Compose(is.finite, all) is equivalent to function(x) all(is.finite(x))
To set the values to 0, use matrix indexing:
m[!is.finite(m)] <- 0
m
## [,1] [,2]
## [1,] 1 0
## [2,] 2 4
## [3,] 3 5
NaRV.omit(x) is my preferred option for question 1. Mnemonic NaRV means "not a regular value".
require(IDPmisc)
m <- matrix(c(1,2,3,NA,5, NaN, 7, 8, 9, Inf, 11, 12, -Inf, 14, 15), 5)
> m
[,1] [,2] [,3]
[1,] 1 NaN 11
[2,] 2 7 12
[3,] 3 8 -Inf
[4,] NA 9 14
[5,] 5 Inf 15
> NaRV.omit(m)
[,1] [,2] [,3]
[1,] 2 7 12
attr(,"na.action")
[1] 1 3 4 5
attr(,"class")
[1] "omit"
Just another way (for the first question):
m <- structure(c(1, 2, 3, NA, 4, 5, Inf, 5, 6, NaN, 7, 8),
.Dim = c(4L, 3L))
# [,1] [,2] [,3]
# [1,] 1 4 6
# [2,] 2 5 NaN
# [3,] 3 Inf 7
# [4,] NA 5 8
m[complete.cases(m * 0), , drop=FALSE]
# [,1] [,2] [,3]
# [1,] 1 4 6
I can't think anything else other than Matthew's answer for the second part.