Matrix indexing notation syntax - r

Lets create matrix m.
m <- matrix(1:9, 3,3, T); m
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
m[3,1] # 7
m[3][1] # 7
Why does second indexing notation work? Is there difference between these two notations? Is it safe to use?
But sequences behave differently:
m[1:2, 1:2] # works as expected, return matrix
m[1:2][1:2] # return vector 1 4, why?

A matrix is a vector with dim attributes. By doing the m[3], it returns only the 3rd element. If we want to use a chained extract, then extract the row with column index blank after the , (drop = FALSE - in case we want to avoid coercing the matrix to vector) and select the first element which is the first column
m[3,, drop = FALSE][1]
#[1] 7
In the OP's first option, it uses the row index and column index with 3, 1 which selects the element based on both index
In the updated example, OP specified row index as first 2 rows and columns as first 2 columns. So, it returns a matrix omitting the 3rd row and 3rd column
m[1:2, 1:2]
# [,1] [,2]
#[1,] 1 2
#[2,] 4 5
But, in the second case
m[1:2]
#[1] 1 4
extracts the first two elements
likewise, if we do
m[1:5]
#[1] 1 4 7 2 5
is the first five elements following the columnwise order
Therefore,
m[1:2][1:2]
returns only 1, 4 because from the first Extract, it is only extracting 1 and 4. Then, the second extract is based on that subset and it also have 2 elements. If we increase the index, those positions are not available and filled by NA
m[1:2][1:4]
#[1] 1 4 NA NA
The elementwise indexing is acting on the vector
c(m)
#[1] 1 4 7 2 5 8 3 6 9
where the first two elements are 1 and 4

Related

Mystery Matrix Subset

I came across this strange matrix operation the other day and can't figure out what it is doing.
Consider:
a<-matrix(nrow=2,ncol=2,c(9,8,7,6))
b<-matrix(nrow=2,ncol=2,c(1,2,1,2))
a[b]
Whoa! How can you even use a matrix to subset another matrix? Anyway - this is the result
a[b]
#[1] 9 6
I thought maybe b was providing the indexing to reference a (i.e. get 1,1 and then get 2,2. But if that is what is happening the rules get thrown out of the window when you do this
a<-matrix(nrow=3,ncol=3,c(9,8,7,6,5,4,3,2,1))
b<-matrix(nrow=3,ncol=3,c(1,2,3,2,2,2,1,1,1))
a[b]
#[1] 9 8 7 8 8 8 9 9 9
Does anyone know what is happening here?
this is not a mystery. in your second example the indexing matrix b is treated as a numeric:
as.numeric(b)
#[1] 1 2 3 2 2 2 1 1 1
a[as.numeric(b)]
#[1] 9 8 7 8 8 8 9 9 9
you have to remember that on top of having a two dimensional ij (row x column) indexing, matrices also have a one-dimensional one, where each element is assigned a number in sequence, starting with the top-left element and going down the columns. so a[1, 1] is the same as a[1] and a[2, 2] is the same as a[5]. hence a[b] gives you c(a[1], a[2], a[3], a[2],...,a[1]), which is the same as c(a[1,1], a[2,1], a[3,1], a[2,1], a[2,1],..., a[1,1]).
A matrix is essentially a numeric vector with a dimension attribute. In R, matrices are stored using "column-major order", meaning that the matrix is filled columnwisely. This implies the following:
a <- matrix(1:4, nrow = 2)
> a
# [,1] [,2]
#[1,] 1 3
#[2,] 2 4
Since it is also a vector, you will be still be able to access elements of a using single indices.
> a[1]
#1
> a[2]
#2
> a[3]
#3
When you put a variable between the bracket operator, it tries to coerce your variable into an integer vector, such that it becomes a set of indices for the numeric vector a.
To understand better, you could try the following:
a<-matrix(nrow=3,ncol=3,c(9,8,7,6,5,4,3,2,1))
b<-matrix(nrow=3,ncol=3,c(1,2,3,2,2,2,1,10,1)) # a[10] = NA
> a[b]
# [1] 9 8 7 8 8 8 9 NA 9
Since the brackets coerce the matrices into integer vectors, you can even imagine having a b matrix with floating values:
b<-matrix(nrow=3,ncol=3,c(1.1,2.1,3.9,2.8,2,2,1,10.5,1))
> a[b]
# [1] 9 8 7 8 8 8 9 NA 9
This is because, as said earlier:
> as.integer(b) # same as as.integer(c(1.1,2.1,3.9,2.8,2,2,1,10.5,1))
# 1 2 3 2 2 2 1 10 1

R: How to randomly sample from each row of matrix into new column but only switch columns with rng

I'm new to R and I'm having difficulty finding answers to my question. I found this question that was asked years ago which leads me in the right direction:
R: How to add a column with a randomly chosen value from each row of a matrix?
The difference in mine is that I want to build the third column with conditional sampling while randomly choosing which one to start with. If I have two columns in my matrix, I want to 1. randomly pick which column to start sampling from and 2. only switch columns if I run an rng and it falls below a certain threshold. I want to be able to run the rng before sampling each row.
I haven't been able to find much in the way of help and that might be a consequence of just not knowing what to look for. Thank you.
Updated with example:
This code (from the link) generates a third column by randomly picking an element from row 1 (either from column 1 or column 2) and does so for all the rows.
t <- matrix(c(1,1,4,6,1,3,2,1,1,7), ncol=2)
cbind(t,apply(t,1,function(x) sample(x,size=1)))
[,1] [,2] [,3]
[1,] 1 3 1
[2,] 1 2 2
[3,] 4 1 4
[4,] 6 1 1
[5,] 1 7 1
I want to be able to generate a matrix that builds column 3 by randomly choosing column 1 or 2 at first and only samples from that column. And after each row is built, run an rng. If the rng generates a number below a threshold, I want it to switch the column it is building from.
[,1] [,2] [,3]
[1,] 1 3 3
[2,] 1 2 2
[3,] 4 1 4 (supposedly rng meets threshold here and switches from col 2 to 1)
[4,] 6 1 6
[5,] 1 7 1
we can set the range for our sample
x <- 1:10
sample(x[x > 5])
sample(x[x < 10]
We first randomly choose the default column
def <- sample(1:2, 1) # set default
We can write a simple function that takes two arguments: a vector x (should be length 2), and a threshold (here I set the default threshold at 0.2). If the random number generated by runif() is less than the threshold, pick element 3 - def from x (so we switch the column), else use the default.
sample_row <- function(x, thr = .2) x[ifelse(runif(1) <= thr, 3 - def, def)]
Run that through apply, and cbind. In the example, def is 2, and row 2 has been "switched":
cbind(t, apply(t, 1, sample_row))
# [,1] [,2] [,3]
# [1,] 1 3 3
# [2,] 1 2 1
# [3,] 4 1 1
# [4,] 6 1 1
# [5,] 1 7 7

Return first 3 smallest values and their indexes from a matrix

I want to get the first 3 largest values in each row and coordinate for each value.
Let's say I have the following matrix:
[,1] [,2] [,3] [,4] [,5]
[1,] 4 3 6 5 2
[2,] 5 2 1 3 6
Take the first row:
I want:
value coordinate
2 [1,5]
3 [1,2]
4 [1,1]
Currently, I am able to get the first 3 largest values in each row by something as follows:
# example for first row
a <- m[1,]
a
ndx <- order(a)[1:3]
a[ndx]
But how to get the corresponding coordinate?
We could use apply with MARGIN=1 to loop over the rows. If we want the smallest values, we can simply use order (as in the OP's post) and select the first 3 elements. The order gives the index, and it can be used to subset the elements to create the 'value' column. To create coordinate, we get the order to get the column index, replicate the rows for row index, paste them together with sprintf. Use the 'value' and 'coordinate' to create a 'data.frame'.
value <- c(apply(m, 1, function(x) x[order(x)[1:3]]))
coordinate <- sprintf('[%d,%d]', rep(1:nrow(m), each=3),
c(apply(m, 1, function(x) order(x)[1:3])))
df1 <- data.frame(value, coordinate, stringsAsFactors=FALSE)
df1
# value coordinate
#1 2 [1,5]
#2 3 [1,2]
#3 4 [1,1]
#4 1 [2,3]
#5 2 [2,2]
#6 3 [2,4]
If we want the largest values, use order(., decreasing=TRUE) in the above code.

How can I extract matrix elements corresponding to column list

This seems like it should be very simple to do with an apply function, but I find myself struggling with it.
I have a matrix (dataframe ok also) of data:
u <- matrix(sample(seq(4),20,T),5,4)
u
[,1] [,2] [,3] [,4]
[1,] 1 2 4 2
[2,] 4 3 2 2
[3,] 3 3 3 1
[4,] 3 2 4 4
[5,] 4 1 3 4
Suppose I just wanted to use the elements (like indirect in excel) of
column j to select a corresponding column value from each row.
e.g. given col(j) = 3
row 1 would get element corresponding to row=1,col(j=3)=4 and return 2 (row(1):col(4))
row 2 would get element
corresponding to row=2,col(j=3)=2 and return 3 (row(2):col(2))
...
row
5 would get element corresponding to row=5,col(j=3)=3 and return 3
(row(5),col(3))
I end up with a vector of those values v<-c(4,2,...3)
You can use matrix indexing:
i <- seq_len(nrow(u))
j <- u[, 3]
u[cbind(i, j)]
I think the following also works:
sapply(1:nrow(u), function(i) u[i,u[i,3]])

Select a column by column-name, but a different name for each row of a matrix in R?

suppose I have a matrix, I want to select values from column1 for the first row, column5 for the second row, and column4 for the third row (...). The columns are stored as column names in a vector and the position in this vector equals the row in which the column is to be chosen.
How can I achieve this efficiently, i.e. without looping?
(The background is: My purpose is to use this in a simulation, that's why I would love to vectorize it to speed it up)
A minimal example:
# Creating my dummy matrix
aMatrix <-matrix(1:15,3,5,dimnames=list(NULL,LETTERS[1:5]))
aMatrix
A B C D E
[1,] 1 4 7 10 13
[2,] 2 5 8 11 14
[3,] 3 6 9 12 15
# Here are the columns I want for each row
columns <-c("A","E","D")
columns
[1] "A" "E" "D"
# means: select for row 1 column "A" = 1,
# select for row 2 column "E" = 11,
# select for row 3 column "D" = 12
# Now obviously I could do looping, but this is inefficient
for (i in columns) print(grep(i,colnames(aMatrix))) #grep is necessary for my specific matrix-names in my simulation only.
[1] 1 #wanting col. 1 for row 1
[1] 5 #wanting col. 5 for row 2
[1] 4 #wanting col. 4 for row 3
I just saw that looping the way I did it does not work very efficiently.
I was thinking about sapply/tapply but somehow could not get that to work, since there are two arguments that change (the row to be searched in the matrix, and the letter to be chosen from the target columnname-vector).
I would apprechiate your help a lot.
Thanks!
Jana
P.S. I use "grep" here as the column names are substrings of the actual column names in the simulation I will run. But that substring-creation would have made the example more complicated, thus I skipped it.
As the help page ?`[` says, you can subset with a matrix to get individual elements. Each row of the subsetting matrix is an element, and the columns specify the indices for each dimension.
match(columns,colnames(aMatrix)) #gets the column indices
# [1] 1 5 4
b <- cbind(seq_along(columns),match(columns,colnames(aMatrix))) #subset matrix
# [,1] [,2]
# [1,] 1 1 #first element: first row first column
# [2,] 2 5 #second element: second row fifth column
# [3,] 3 4 #third element: third row fourth column
aMatrix[b]
# [1] 1 14 12

Resources