Indexing matrices when some elements of the selector are missing (R) - r

When some elements of a vector used for row-indexing a matrix or a data.frame are missing NA in R, the indexing operation has results that I find unexpected.
m = matrix(1:15,ncol = 3)
m[1,1] = NA
m[m[,1] < 4 ,]
Gives
[,1] [,2] [,3]
[1,] NA NA NA
[2,] 2 7 12
[3,] 3 8 13
While I would have expected
[,1] [,2] [,3]
[1,] NA 4 11
[2,] 2 7 12
[3,] 3 8 13
One option seems to be
m[m[,1] < 4 | is.na(m[,1]) ,]
But I find this unhandy. It often happens to me that I lose data by mistake when indexing matrices and data.frames that contains missings. Is there an easier and safer way to reach the desired result?

Related

Column having maximum L2 norm

How to find the column of a matrix which has the maximum L2 norm? The matrix has NA values in some columns, we want to ignore those columns.
The following code I am trying, but it shows error due to NA values.
#The matrix is T
for(i in 1:ncol(T)){
if(norm(y,type='2') < norm(T[,i],type = '2'))
y = T[,i]
}
I think it would also be useful if we could somehow get the columns of T as a list, since we could use which.max function then, but I could not do that. Is that possible?
Please help
Maybe you can write your own L2 norm and find the column with the maximum, i.e.,
which.max(sqrt(colSums(T**2)))
Example
T <- matrix(c(1:10,NA,12:19,NA),nrow = 4)
> T
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 NA 15 19
[4,] 4 8 12 16 NA
> which.max(sqrt(colSums(T**2)))
[1] 4

Lapply over several parameters, faster method

Suppose I have two vectors
a <- c(1,2,3,4,5)
b <- c(6,7,8,9,10)
and a function
calc <- function(x,y){x + y)
I want to apply this function for the 1st value in a for each value in b. Suppose in my case calc only allows a single value from a and b as input, so lapply(a,calc,b) wouldn't work because the length(b) is not 1 then (gives me an error).
Also mapply doesnt give me the wanted solution either, it only applies the function on paired values, i.e. 1+6, 2+7, etc.
So I built a function that gave me the wanted solution
myfunc <- function(z){lapply(a,calc,z)}
and applied it on b
solution <- lapply(b,myfunc)
We see here that the difference to lapply(a,calc,b) or a nested lapply(a,lapply,calc,b) is that it gives me all the values in its own list. Thats what I wanted or at least it was a function that gave me the right result with no error.
Now, is there a faster/ more trivial method, because I just experimented here a little. And with my function which is much larger than calc it takes 10 minutes, but maybe I have to slim down my original function and there would not be a faster method here...
EDIT:
In my function there is something like this,
calc <- function(x,y){
# ...
number <- x
example <- head(number,n=y)
# ...
}
where a vector as an input for y doesnt work anymore. With lapply(a,lapply,calc,b) or lapply(a,calc,b) I get an error,
Error in head.default(number, n = y) : length(n) == 1L is not TRUE
As Florian says, outer() could be an option.
outer(a, b, calc)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 7 8 9 10 11
# [2,] 8 9 10 11 12
# [3,] 9 10 11 12 13
# [4,] 10 11 12 13 14
# [5,] 11 12 13 14 15
But as MichaelChirico mentions, with a function that isn't vectorized it won't work. In that case something else has to be hacked together. These might or might not be quicker than your current solution.
All combinations (so both calc(1, 6) and calc(6, 1) are performed, similar to outer()
Number of calculations: n2
eg <- expand.grid(a, b)
m1 <- mapply(calc, eg[,1], eg[, 2])
matrix(m1, 5)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 7 8 9 10 11
# [2,] 8 9 10 11 12
# [3,] 9 10 11 12 13
# [4,] 10 11 12 13 14
# [5,] 11 12 13 14 15
Only unique combinations (so assumes your function is symmetric)
Number of calculations: (n2 - n) / 2
cn <- t(combn(1:length(a), 2))
m2 <- mapply(calc, a[cn[, 1]], b[cn[, 2]])
mat <- matrix(, length(a), length(a))
mat[upper.tri(mat)] <- m2
mat
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA 8 9 10 11
# [2,] NA NA 10 11 12
# [3,] NA NA NA 12 13
# [4,] NA NA NA NA 14
# [5,] NA NA NA NA NA
This second one ignores the diagonal, but adding those values are easy, as that's what the OPs mapply() call returned.
diag(mat) <- mapply(calc, a, b)
mat
# [,1] [,2] [,3] [,4] [,5]
# [1,] 7 8 9 10 11
# [2,] NA 9 10 11 12
# [3,] NA NA 11 12 13
# [4,] NA NA NA 13 14
# [5,] NA NA NA NA 15
This solved it for me, adding SIMPLIFY=FALSE to the mapply function, thanks to #AkselA.
eg <- expand.grid(a, b)
m1 <- mapply(calc, eg[,1], eg[, 2],SIMPLIFY=FALSE)
However, this method is only slightly faster than my own solution in my OP.

R: How to convert a vector into matrix without replicating the vector?

Here's my problem:
I have a vector and I want to convert it into a matrix with fixed number of columns, but I don't want to replicate the vector to fill the matrix when it's necessary.
For example:
My vector has a length of 15, and I want a matrix with 4 columns.I wish to get the matrix wit 15 elements from the vector and a 0 for the last element in the matrix.
How can I do this?
Edit:
Sorry for not stating the question clearly and misguiding you guys with my example. In my program,I don't know the length of my vector, it depends on other parameters and this question involves with a loop, so I need a general solution that can solve many different cases, not just my example.
Thanks for answering.
You could subset your vector to a multiple of the number of columns (so as to include all the elements). This will add necessary amount of NA to the vector. Then convert to matrix.
x = 1:15
matrix(x[1:(4 * ceiling(length(x)/4))], ncol = 4)
# [,1] [,2] [,3] [,4]
#[1,] 1 5 9 13
#[2,] 2 6 10 14
#[3,] 3 7 11 15
#[4,] 4 8 12 NA
If you want to replace NA with 0, you can do so using is.na() in another step
We can also do this with dim<- and length<-
n <- 4
n1 <- ceiling(length(x)/n)
`dim<-`(`length<-`(x, n*n1), c(n1, n))
# [,1] [,2] [,3] [,4]
#[1,] 1 5 9 13
#[2,] 2 6 10 14
#[3,] 3 7 11 15
#[4,] 4 8 12 NA
data
x <- 1:15

Preserve structure, when indexing a matrix with another matrix in R

Dear StackOverflowers,
I have an integer matrix in R and I would like to subset it so that I remove 1 specified cell in each column. So that, for instance, a 4x3 matrix becomes a 3x3 matrix. I have tried doing it by creating the second logical matrix of the same dimensions.
(subject.matrix <- matrix(1:12, nrow = 4))
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
(query.matrix <- matrix(c(T, T, F, T, T, F, T, T, T, T, T, F), nrow = 4))
[,1] [,2] [,3]
[1,] TRUE TRUE TRUE
[2,] TRUE FALSE TRUE
[3,] FALSE TRUE TRUE
[4,] TRUE TRUE FALSE
The problem is that, when I index the first matrix by the second one, it is simplified to an integer vector.
subject.matrix[query.matrix]
[1] 1 2 4 5 7 8 9 10 11
I've tried adding drop=F, but to no avail. I know, I can just wrap the resulting vector into a 3x3 matrix. So the expected outcome would be:
matrix(subject.matrix[query.matrix], nrow = 3)
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 7 10
[3,] 4 8 11
But I wonder if there's a more elegant/direct solution. I'm also not attached to using a logical matrix as the index, if that means a simpler solution. Perhaps, I could subset it with a vector of indices for the rows to be removed in each column, which in this case would translate into c(3, 2, 4).
Many thanks!
Edit based on #LyzandeR suggestion: My final goal was to take column sums of the resulting matrix. So replacing the redundant values with NA's seems to be the best way to go.
I think that the only way you can preserve the matrix structure would be to use a more general way of your question edit i.e.:
matrix(subject.matrix[query.matrix], ncol = ncol(subject.matrix))
You could even convert it into a function if you plan on using it multiple times:
subset.mat <- function(mat, index, cols=ncol(mat)) {
matrix(mat[index], ncol = cols)
}
Output:
> subset.mat(subject.matrix, query.matrix)
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 7 10
[3,] 4 8 11
Also (sorry just read your updated comment) you might consider using NAs in the matrix instead of subsetting them out, which will allow you to calculate the column sums as you say:
subject.matrix[!query.matrix] <- NA
subject.matrix
# [,1] [,2] [,3]
#[1,] 1 5 9
#[2,] 2 NA 10
#[3,] NA 7 11
#[4,] 4 8 NA
This is a little brute-forceish, but I think you'll be able to extrapolate it into something more general:
new.matrix = matrix(ncol = ncol(subject.matrix), nrow = nrow(subject.matrix) - 1)
for(i in 1:ncol(subject.matrix)){
new.matrix[,i] = subject.matrix[,i][query.matrix[,i] == TRUE]
}
new.matrix
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 7 10
[3,] 4 8 11
Essentially, I just initialized an empty matrix, and then iterated through each column of subject.matrix taking only the TRUE values for query.matrix.

R: subsetting N-dimensional arrays

Consider the following 3-dimensional array:
set.seed(123)
arr = array(sample(c(1:10)), dim=c(3,4,2))
which yields
> arr
, , 1
[,1] [,2] [,3] [,4]
[1,] 10 9 8 2
[2,] 5 1 4 10
[3,] 6 7 3 5
, , 2
[,1] [,2] [,3] [,4]
[1,] 6 7 3 5
[2,] 9 8 2 6
[3,] 1 4 10 9
I'd like to subset it like
arr[c(1,2), c(2,4), c(1)]
but the catch is that I don't know (a) which indices or (b) which dimension the indices are.
What is the best way to access an N-dimensional array with index variables?
ll = list(c(1,2), c(2,4), c(1))
arr[ll] # doesn't work
arr[grid.expand(ll)] # doesn't work
# ..what else?
use do.call, such as:
do.call(`[`, c(list(arr), ll))
or more cleanly, using a wrapper function:
getArr <- function(...)
`[`(arr, ...)
do.call(getArr, ll)
[,1] [,2]
[1,] 10 5
[2,] 7 3
There is the asub function from the abind package:
library(abind)
asub(arr, ll)
which can also do a lot more, in particular extract along a subset of the dimensions (https://stackoverflow.com/a/17752012/1201032). Worth having in your toolbox.

Resources