How to store multidimensional subscript as variable in R - r

Suppose I have a matrix,
mat <- matrix((1:9)^2, 3, 3)
I can slice the matrix like so
> mat[2:3, 2]
[1] 25 36
How does one store the subscript as a variable? That is, what should my_sub be, such that
> mat[my_sub]
[1] 25 36
A list gets "invalid subscript type" error. A vector will lose the multidimensionality. Seems like such a basic operation to not have a primitive type that fits this usage.
I know I can access the matrix via vector addressing, which means converting from [2:3, 2] to c(5, 6), but that mapping presumes knowledge of matrix shape. What if I simply want [2:3, 2] for any matrix shape (assuming it is at least those dimensions)?

Here are some alternatives. They both generalize to higher dimenional arrays.
1) matrix subscripting If the indexes are all scalar except possibly one, as in the question, then:
mi <- cbind(2:3, 2)
mat[mi]
# test
identical(mat[mi], mat[2:3, 2])
## [1] TRUE
In higher dimensions:
a <- array(1:24, 2:4)
mi <- cbind(2, 2:3, 3)
a[mi]
# test
identical(a[mi], a[2, 2:3, 3])
## [1] TRUE
It would be possible to extend this to eliminate the scalar restriction using:
L <- list(2:3, 2:3)
array(mat[as.matrix(do.call(expand.grid, L))], lengths(L))
however, in light of (2) which also uses do.call but avoids the need for expand.grid it seems unnecessarily complex.
2) do.call This approach does not have the scalar limitation. mat and a are from above:
L2 <- list(2:3, 1:2)
do.call("[", c(list(mat), L2))
# test
identical(do.call("[", c(list(mat), L2)), mat[2:3, 1:2])
## [1] TRUE
L3 <- list(2, 2:3, 3:4)
do.call("[", c(list(a), L3))
# test
identical(do.call("[", c(list(a), L3)), a[2, 2:3, 3:4])
## [1] TRUE
This could be made prettier by defining:
`%[%` <- function(x, indexList) do.call("[", c(list(x), indexList))
mat %[% list(2:3, 1:2)
a %[% list(2, 2:3, 3:4)

Use which argument arr.ind = TRUE.
x <- c(25, 36)
inx <- which(mat == x, arr.ind = TRUE)
Warning message:
In mat == x :
longer object length is not a multiple of shorter object length
mat[inx]
#[1] 25 36

This is an interesting question. The subset function can actually help. You cannot subset directly your matrix using a vector or a list, but you can store the indexes in a list and use subset to do the trick.
mat <- matrix(1:12, nrow=4)
mat[2:3, 1:2]
# example using subset
subset(mat, subset = 1:nrow(mat) %in% 2:3, select = 1:2)
# double check
identical(mat[2:3, 1:2],
subset(mat, subset = 1:nrow(mat) %in% 2:3, select = 1:2))
# TRUE
Actually, we can write a custom function if we want to store the row- and column- indexes in the same list.
cust.subset <- function(mat, dim.list){
subset(mat, subset = 1:nrow(mat) %in% dim.list[[1]], select = dim.list[[2]])
}
# initialize a list that includes your sub-setting indexes
sbdim <- list(2:3, 1:2)
sbdim
# [[1]]
# [1] 2 3
# [[2]]
# [1] 1 2
# subset using your custom f(x) and your list
cust.subset(mat, sbdim)
# [,1] [,2]
# [1,] 2 6
# [2,] 3 7

Related

Rolling correlations across multiple columns, some with NAs?

I have the below dataset, where I am trying to do a rolling 3 days correlation across x,y,z,a. So the code should do rolling correlations of xy,xz,xa, yx, yz,ya and so on. Also, as you can see below, the data for y and a is incomplete, but I would wish to do rolling correlations of them starting from the date where they first had values (i.e. id 3 and id 4).
How should I accomplish this? Don't know where to start...
set.seed(42)
n <- 10
dat <- data.frame(id=1:n,
date=seq.Date(as.Date("2020-12-22"), as.Date("2020-12-31"), "day"),
x=rnorm(n),
y=rnorm(n),
z=rnorm(n),
a=rnorm(n))
dat$y[1:2] <- NA
dat$a[1:3] <- NA
I am able to find this set of code from stack, but it only helps in finding the answer for 1st column and not all the columns
rollapplyr(x, 5, function(x) cor(x[, 1], x[, -1]), by.column = FALSE)
Create a data frame with only the columns wanted and then use rollapplyr with cor. cor takes a use= argument that specifies how missing values are to be handled. See ?cor for the values it can take since you may or may not wish to use the value we used below.
The result r is a matrix whose i-th row describes the correlation matrix of the 5 dat2 rows ending in and including row i. That is, matrix(r[i, ], 4, 4) is the correlation matrix of dat2[i-(4:0), ].
We can also create ar which is a 3d array which is such that ar[i,,] is the correlation matrix of the 5 rows of dat2 ending in and including row i.
That is these are equal for each i in 5, ..., nrow(dat2). (The first 4 rows of r are all NA since there do not exist 5 rows leading to those rows.)
1. cor(dat2[i-(4:0), ], use = "pairwise")
2. matrix(r[i, ], 4, 4)
3. ar[i,,]
We run checks for these equivalences for i=5 below.
library(zoo)
w <- 5
dat2 <- dat[c("x", "y", "z", "a")]
nr <- nrow(dat2)
nc <- ncol(dat2)
r <- rollapplyr(dat2, w, cor, use = "pairwise", by.column = FALSE, fill = NA)
colnames(r) <- paste(names(dat2)[c(row(diag(nc)))],
names(dat2)[c(col(diag(nc)))], sep = ".")
ar <- array(r, c(nr, nc, nc),
dimnames = list(NULL, names(dat2), names(dat2)))
# run some checks
cor5 <- cor(dat2[1:w, ], use = "pairwise") # cor of 1st w rows
# same except for names
all.equal(unname(cor5), matrix(r[w, ], nc))
## [1] TRUE
all.equal(cor5, ar[w,,])
## [1] TRUE
The above shows a matrix whose rows are strung out correlation matrices and a 3d array whose slices are correlation matrices. Another possibility for output is to create a list of correlation matrices.
lapply(1:nr, function(i) {
if (i >= w) cor(dat2[i-((w-1):0), ], use = "pairwise")
})
combn produces all the combinations.
cols <- c("x", "y", "z", "a")
combn(cols, 2)
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] "x" "x" "x" "y" "y" "z"
# [2,] "y" "z" "a" "z" "a" "a"
combn has a function argument where you first na.omit all rows with NA's. Then subset with mapply over incrementing sequences 1:3 and calculate correlations, until nrow is reached.
w <- 3 ## size of the rolling window
combn(dat[cols], 2, function(x) {
X <- na.omit(x)
n <- nrow(X)
mapply(function(y, z) cor(X[y + z, 1], X[y + z, 2]), list(1:w), 0:(n - w))
}, simplify=FALSE)
# [[1]]
# [1] 0.5307784 -0.9874843 -0.8364802 0.2407730 0.3655328 -0.4458231
#
# [[2]]
# [1] 0.8121466 0.9652715 0.3304100 0.8278965 -0.1425097 0.5832558 0.9959705
# [8] 0.8696023
#
# [[3]]
# [1] 0.6733985 0.2194488 0.5593983 -0.6589249 -0.9291184
#
# [[4]]
# [1] 0.97528684 -0.90599558 -0.42319742 0.92882443 0.28058418 0.05427966
#
# [[5]]
# [1] -0.7815678 -0.7182037 -0.6698260 0.4592962 0.7452225
#
# [[6]]
# [1] 0.9721521 0.9343926 -0.3470329 -0.7237291 -0.6253825

Use `[` in *apply

I'm sure I'm missing something obvious in base R itself. I want to subset a matrix by its columns based on indices stored in a list.
m <- matrix(1:50L, ncol = 5)
l <- lapply(0:3, `+`, 1:2)
> l
[[1]]
[1] 1 2
[[2]]
[1] 2 3
[[3]]
[1] 3 4
[[4]]
[1] 4 5
I want a list of matrices as below -
list(m[, l[[1]]], m[, l[[2]]], m[, l[[3]]], m[, l[[4]]])
I tried lapply(l, '[', m), but that obviously didn't work because '[' works on m like a vector. How can make it work on m like a matrix and specify the index 2? I also tried apply(X = do.call(rbind, l), MARGIN = 1, FUN = '[', m) but that didn't work either.
I know I can define a function f as below; but that feels hacky.
f <- function(ind){m[, ind]}
lapply(l, f)
Is there any clever way I can avoid defining f or am I being unrealistic in not working with f?

Multiply values of column with itself in R

I am trying to multiply elements of column with itself but am unable to do it.
I have column A with values a, b, c, I want answer as (a*b + a*c + b*c).
For example, with
A <- c(2, 3, 5) the expected output is sum(6 + 10 + 15) = 31.
I am trying to run for loop to execute but was failing. Can anyone please provide R code to do this.
example data :
df1 <- data.frame(A=c(2,3,5))
combn will give you the combinations
combinations <- combn(df1$A,2)
# [,1] [,2] [,3]
# [1,] 2 2 3
# [2,] 3 5 5
apply with margin 2 (by columns), will do the multiplication
multiplied_terms <- apply(combinations,2,function(x) x[1]*x[2])
# [1] 6 10 15
Or shorter and more general, thanks to #zacdav :
multiplied_terms <- apply(combinations,2,prod)
then we can sum them
output <- sum(multiplied_terms)
# [1] 31
Piped for a compact solution:
library(magrittr)
df1$A %>% combn(2) %>% apply(2,prod) %>% sum
Here's another way. Approach by #Moody_Mudskipper maybe easier to extend to groups of 3 etc. But, I think this should be much faster since there isn't the need to actually find the combinations.
Using for loop
It just goes through the vector A multiplying the rest of the elements until the last one.
len <- length(A)
res <- numeric(0)
for (j in seq_len(len - 1))
res <- res + sum(A[j] * A[(j+1) : len]))
res
#[1] 31
Using lapply or sapply
The for loop can be replaced by using lapply
res <- sum(unlist(lapply(1 : (len - 1), function(j) sum(A[j] * A[(j+1) : len]))))
or sapply,
res <- sum(sapply(1 : (len - 1), function(j) sum(A[j] * A[(j+1) : len])))
I didn't check which of these is the fastest.
# If you need to store the pairwise multiplications, then use the following;
# res <- NULL
# for (j in 1 : (len-1))
# res <- c(res, A[j] * A[(j+1) : len])
# res
# [1] 6 10 15
# sum(res)
# [1] 31

Consistently subset matrix to a vector and avoid colnames?

I would like to know if there is R syntax to extract a column from a matrix and always have no name attribute on the returned vector (I wish to rely on this behaviour).
My problem is the following inconsistency:
when a matrix has more than one row and I do myMatrix[, 1] I will get the first column of myMatrix with no name attribute. This is what I want.
when a matrix has exactly one row and I do myMatrix[, 1], I will get the first column of myMatrix but it has the first colname as its name.
I would like to be able to do myMatrix[, 1] and consistently get something with no name.
An example to demonstrate this:
# make a matrix with more than one row,
x <- matrix(1:2, nrow=2)
colnames(x) <- 'foo'
# foo
# [1,] 1
# [2,] 2
# extract first column. Note no 'foo' name is attached.
x[, 1]
# [1] 1 2
# now suppose x has just one row (and is a matrix)
x <- x[1, , drop=F]
# extract first column
x[, 1]
# foo # <-- we keep the name!!
# 1
Now, the documentation for [ (?'[') mentions this behaviour, so it's not a bug or anything (although, why?! why this inconsistency?!):
A vector obtained by matrix indexing will be unnamed unless ‘x’ is one-dimensional when the row names (if any) will be indexed to provide names for the result.
My question is, is there a way to do x[, 1] such that the result is always unnamed, where x is a matrix?
Is my only hope unname(x[, 1]) or is there something analogous to ['s drop argument? Or is there an option I can set to say "always unname"? Some trick I can use (somehow override ['s behaviour when the extracted result is a vector?)
Update on why the code below works (as far as I can tell)
Subsetting with [ is handled using functions contained in the R source file subset.c in ~/src/main. When using matrix indexing to subset a matrix, the function VectorSubset is called. When there is more than one index used (i.e., one each for rows and columns as in x[,1]), then MatrixSubset is called.
The function VectorSubset only assigns names to 1-dimensional arrays being subsetted. Since a matrix is a 2-D array, no names are assigned to the result when using matrix indexing. The function MatrixSubset, however, does attempt to pass on dimnames under certain circumstances.
Therefore, the matrix indexing you refer to in the quote from the help page seems to be the key:
x <- matrix(1)
colnames(x) <- "foo"
x[, 1] ## 'Normal' indexing
# foo
# 1
x[matrix(c(1, 1), ncol = 2)] ## Matrix indexing
# [1] 1
And with a wider 1-row matrix:
xx <- matrix(1:10, nrow = 1)
colnames(xx) <- sprintf('foo%i', seq_len(ncol(xx)))
xx[, 6] ## 'Normal' indexing
# foo6
# 6
xx[matrix(c(1, 6), ncol = 2)] ## Matrix indexing
# [1] 6
With a matrix with both dimensions > 1:
yy <- matrix(1:10, nrow = 2, dimnames = list(NULL,
sprintf('foo%i', 1:5)))
yy[cbind(seq_len(nrow(yy)), 3)] ## Matrix indexing
# [1] 5 6

Check that a vector is contained in a matrix in R

I can't believe this is taking me this long to figure out, and I still can't figure it out.
I need to keep a collection of vectors, and later check that a certain vector is in that collection. I tried lists combined with %in% but that doesn't appear to work properly.
My next idea was to create a matrix and rbind vectors to it, but now I don't know how to check if a vector is contained in a matrix. %in appears to compare sets and not exact rows. Same appears to apply to intersect.
Help much appreciated!
Do you mean like this:
wantVec <- c(3,1,2)
myList <- list(A = c(1:3), B = c(3,1,2), C = c(2,3,1))
sapply(myList, function(x, want) isTRUE(all.equal(x, want)), wantVec)
## or, is the vector in the set?
any(sapply(myList, function(x, want) isTRUE(all.equal(x, want)), wantVec))
We can do a similar thing with a matrix:
myMat <- matrix(unlist(myList), ncol = 3, byrow = TRUE)
## As the vectors are now in the rows, we use apply over the rows
apply(myMat, 1, function(x, want) isTRUE(all.equal(x, want)), wantVec)
## or
any(apply(myMat, 1, function(x, want) isTRUE(all.equal(x, want)), wantVec))
Or by columns:
myMat2 <- matrix(unlist(myList), ncol = 3)
## As the vectors are now in the cols, we use apply over the cols
apply(myMat, 2, function(x, want) isTRUE(all.equal(x, want)), wantVec)
## or
any(apply(myMat, 2, function(x, want) isTRUE(all.equal(x, want)), wantVec))
If you need to do this a lot, write your own function
vecMatch <- function(x, want) {
isTRUE(all.equal(x, want))
}
And then use it, e.g. on the list myList:
> sapply(myList, vecMatch, wantVec)
A B C
FALSE TRUE FALSE
> any(sapply(myList, vecMatch, wantVec))
[1] TRUE
Or even wrap the whole thing:
vecMatch <- function(x, want) {
out <- sapply(x, function(x, want) isTRUE(all.equal(x, want)), want)
any(out)
}
> vecMatch(myList, wantVec)
[1] TRUE
> vecMatch(myList, 5:3)
[1] FALSE
EDIT: Quick comment on why I used isTRUE() wrapped around the all.equal() calls. This is due to the fact that where the two arguments are not equal, all.equal() doesn't return a logical value (FALSE):
> all.equal(1:3, c(3,2,1))
[1] "Mean relative difference: 1"
isTRUE() is useful here because it returns TRUE iff it's argument is TRUE, whilst it returns FALSE if it is anything else.
> M
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
v <- c(2, 5, 8)
check each column:
c1 <- which(M[, 1] == v[1])
c2 <- which(M[, 2] == v[2])
c3 <- which(M[, 3] == v[3])
Here is a way to still use intersect() on more than 2 elements
> intersect(intersect(c1, c2), c3)
[1] 2

Resources