Let's say you have a matrix
m <- matrix(1:25*2, nrow = 5, ncol=5)
How do you go from matrix subscripts (row index, column index) to a linear index you can use on the matrix. For example you can extract values of the matrix with either of these two methods
m[2,3] == 24
m[12] == 24
How do you go from (2,3) => 12 or 12 => (2,3) in R
In Matlab the functions you would use for converting matrix subscripts to linear indices and vice versa are ind2sub and `sub2ind
Is there an equivalent way in R?
This is not something I've used before, but according to this handy dandy Matlab to R cheat sheet, you might try something like this, where m is the number of rows in the matrix, r and c are row and column numbers respectively, and ind the linear index:
MATLAB:
[r,c] = ind2sub(size(A), ind)
R:
r = ((ind-1) %% m) + 1
c = floor((ind-1) / m) + 1
MATLAB:
ind = sub2ind(size(A), r, c)
R:
ind = (c-1)*m + r
For higher dimension arrays, there is the arrayInd function.
> abc <- array(dim=c(10,5,5))
> arrayInd(12,dim(abc))
dim1 dim2 dim3
[1,] 2 2 1
You mostly don't need those functions in R. In Matlab you need those because you can't do e.g.
A(i, j) = x
where i,j,x are three vectors of row and column indices and x contains the corresponding values. (see also this question)
In R you can simply:
A[ cbind(i, j) ] <- x
There are row and col functions that return those indices in matrix form. So it should be as simple as indexing the return from those two functions:
M<- matrix(1:6, 2)
row(M)[5]
#[1] 1
col(M)[5]
#[1] 3
rc.ind <- function(M, ind) c(row(M)[ind], col(M)[ind] )
rc.ind(M,5)
[1] 1 3
Late answer but there's an actual function for ind2sub in the base package called arrayInd
m <- matrix(1:25, nrow = 5, ncol=5)
# linear indices in R increase row number first, then column
arrayInd(5, dim(m))
arrayInd(6, dim(m))
# so, for any arbitrary row/column
numCol <- 3
numRow <- 4
arrayInd(numRow + ((numCol-1) * nrow(m)), dim(m))
# find the row/column of the maximum element in m
arrayInd(which.max(m), dim(m))
# actually which has an arr.ind parameter for returning array indexes
which(m==which.max(m), arr.ind = T)
For sub2ind, JD Long's answer seems to be the best
Something like this works for arbitrary dimensions-
ind2sub = function(sz,ind)
{
ind = as.matrix(ind,ncol=1);
sz = c(1,sz);
den = 1;
sub = c();
for(i in 2:length(sz)){
den = den * sz[i-1];
num = den * sz[i];
s = floor(((ind-1) %% num)/den) + 1;
sub = cbind(sub,s);
}
return(sub);
}
Related
I have two large sparse matrices (about 41,000 x 55,000 in size). The density of nonzero elements is around 10%. They both have the same row index and column index for nonzero elements.
I now want to modify the values in the first sparse matrix if values in the second matrix are below a certain threshold.
library(Matrix)
# Generating the example matrices.
set.seed(42)
# Rows with values.
i <- sample(1:41000, 227000000, replace = TRUE)
# Columns with values.
j <- sample(1:55000, 227000000, replace = TRUE)
# Values for the first matrix.
x1 <- runif(227000000)
# Values for the second matrix.
x2 <- sample(1:3, 227000000, replace = TRUE)
# Constructing the matrices.
m1 <- sparseMatrix(i = i, j = j, x = x1)
m2 <- sparseMatrix(i = i, j = j, x = x2)
I now get the rows, columns and values from the first matrix in a new matrix. This way, I can simply subset them and only the ones I am interested in remain.
# Getting the positions and values from the matrices.
position_matrix_from_m1 <- rbind(i = m1#i, j = summary(m1)$j, x = m1#x)
position_matrix_from_m2 <- rbind(i = m2#i, j = summary(m2)$j, x = m2#x)
# Subsetting to get the elements of interest.
position_matrix_from_m1 <- position_matrix_from_m1[,position_matrix_from_m1[3,] > 0 & position_matrix_from_m1[3,] < 0.05]
# We add 1 to the values, since the sparse matrix is 0-based.
position_matrix_from_m1[1,] <- position_matrix_from_m1[1,] + 1
position_matrix_from_m1[2,] <- position_matrix_from_m1[2,] + 1
Now I am getting into trouble. Overwriting the values in the second matrix takes too long. I let it run for several hours and it did not finish.
# This takes hours.
m2[position_matrix_from_m1[1,], position_matrix_from_m1[2,]] <- 1
m1[position_matrix_from_m1[1,], position_matrix_from_m1[2,]] <- 0
I thought about pasting the row and column information together. Then I have a unique identifier for each value. This also takes too long and is probably just very bad practice.
# We would get the unique identifiers after the subsetting.
m1_identifiers <- paste0(position_matrix_from_m1[1,], "_", position_matrix_from_m1[2,])
m2_identifiers <- paste0(position_matrix_from_m2[1,], "_", position_matrix_from_m2[2,])
# Now, I could use which and get the position of the values I want to change.
# This also uses to much memory.
m2_identifiers_of_interest <- which(m2_identifiers %in% m1_identifiers)
# Then I would modify the x values in the position_matrix_from_m2 matrix and overwrite m2#x in the sparse matrix object.
Is there a fundamental error in my approach? What should I do to run this efficiently?
Is there a fundamental error in my approach?
Yes. Here it is.
# This takes hours.
m2[position_matrix_from_m1[1,], position_matrix_from_m1[2,]] <- 1
m1[position_matrix_from_m1[1,], position_matrix_from_m1[2,]] <- 0
Syntax as mat[rn, cn] (whether mat is a dense or sparse matrix) is selecting all rows in rn and all columns in cn. So you get a length(rn) x length(cn) matrix. Here is a small example:
A <- matrix(1:9, 3, 3)
# [,1] [,2] [,3]
#[1,] 1 4 7
#[2,] 2 5 8
#[3,] 3 6 9
rn <- 1:2
cn <- 2:3
A[rn, cn]
# [,1] [,2]
#[1,] 4 7
#[2,] 5 8
What you intend to do is to select (rc[1], cn[1]), (rc[2], cn[2]) ..., only. The correct syntax is then mat[cbind(rn, cn)]. Here is a demo:
A[cbind(rn, cn)]
#[1] 4 8
So you need to fix your code to:
m2[cbind(position_matrix_from_m1[1,], position_matrix_from_m1[2,])] <- 1
m1[cbind(position_matrix_from_m1[1,], position_matrix_from_m1[2,])] <- 0
Oh wait... Based on your construction of position_matrix_from_m1, this is just
ij <- t(position_matrix_from_m1[1:2, ])
m2[ij] <- 1
m1[ij] <- 0
Now, let me explain how you can do better. You have underused summary(). It returns a 3-column data frame, giving (i, j, x) triplet, where both i and j are index starting from 1. You could have worked with this nice output directly, as follows:
# Getting (i, j, x) triplet (stored as a data.frame) for both `m1` and `m2`
position_matrix_from_m1 <- summary(m1)
# you never seem to use `position_matrix_from_m2` so I skip it
# Subsetting to get the elements of interest.
position_matrix_from_m1 <- subset(position_matrix_from_m1, x > 0 & x < 0.05)
Now you can do:
ij <- as.matrix(position_matrix_from_m1[, 1:2])
m2[ij] <- 1
m1[ij] <- 0
Is there a even better solution? Yes! Note that nonzero elements in m1 and m2 are located in the same positions. So basically, you just need to change m2#x according to m1#x.
ind <- m1#x > 0 & m1#x < 0.05
m2#x[ind] <- 1
m1#x[ind] <- 0
A complete R session
I don't have enough RAM to create your large matrix, so I reduced your problem size a little bit for testing. Everything worked smoothly.
library(Matrix)
# Generating the example matrices.
set.seed(42)
## reduce problem size to what my laptop can bear with
squeeze <- 0.1
# Rows with values.
i <- sample(1:(41000 * squeeze), 227000000 * squeeze ^ 2, replace = TRUE)
# Columns with values.
j <- sample(1:(55000 * squeeze), 227000000 * squeeze ^ 2, replace = TRUE)
# Values for the first matrix.
x1 <- runif(227000000 * squeeze ^ 2)
# Values for the second matrix.
x2 <- sample(1:3, 227000000 * squeeze ^ 2, replace = TRUE)
# Constructing the matrices.
m1 <- sparseMatrix(i = i, j = j, x = x1)
m2 <- sparseMatrix(i = i, j = j, x = x2)
## give me more usable RAM
rm(i, j, x1, x2)
##
## fix to your code
##
m1a <- m1
m2a <- m2
# Getting (i, j, x) triplet (stored as a data.frame) for both `m1` and `m2`
position_matrix_from_m1 <- summary(m1)
# Subsetting to get the elements of interest.
position_matrix_from_m1 <- subset(position_matrix_from_m1, x > 0 & x < 0.05)
ij <- as.matrix(position_matrix_from_m1[, 1:2])
m2a[ij] <- 1
m1a[ij] <- 0
##
## the best solution
##
m1b <- m1
m2b <- m2
ind <- m1#x > 0 & m1#x < 0.05
m2b#x[ind] <- 1
m1b#x[ind] <- 0
##
## they are identical
##
all.equal(m1a, m1b)
#[1] TRUE
all.equal(m2a, m2b)
#[1] TRUE
Caveat:
I know that some people may propose
m1c <- m1
m2c <- m2
logi <- m1 > 0 & m1 < 0.05
m2c[logi] <- 1
m1c[logi] <- 0
It looks completely natural in R's syntax. But trust me, it is extremely slow for large matrices.
I want to use rmultinom(), combined with a transition matrix, to generate whole number outputs that, when summed, are equal to the original values. However, I can't figure out how to do it without iterating over the matrix. Here is an example:
a = matrix(runif(16),nrow=4,ncol=4)
a = apply(a,2,FUN = function(x) x/sum(x))
b = c(5,7,5,9)
out = c(0,0,0,0) # initialize
for (i in 1:ncol(a)){
tmp = rmultinom(1,b[i],a[,i])
out = tmp + out
}
sum(out) == sum(b) ## Should eval to true
a represents a transition matrix, with each column summing to 1. b is a starting vector of integers. The loop iterates along the columns to generate a vector in out that sums to the initial numbers in b. How can I do this without using a loop? The results would be similar to if I multiply a %*% b, but this leaves me with floating point values.
You could do apply and rowSums (this will be stochastic):
library(magrittr)
set.seed(1)
a = matrix(runif(16),nrow=4,ncol=4)
a = apply(a,2,FUN = function(x) x/sum(x))
b <- c(5,7,5,9)
out <- purrr::map(1:4, ~rmultinom(1, b[.x], a[,.x])) %>%
unlist() %>%
matrix(nrow = 4) %>%
rowSums()
out
[1] 7 7 9 3
sum(out)
[1] 26
sum(b)
[1] 26
I want to create a function which replaces the a chosen row of a matrix with zeros. I try to think of the matrix as arbitrary but for this example I have done it with a sample 3x3 matrix with the numbers 1-9, called a_matrix
1 4 7
2 5 8
3 6 9
I have done:
zero_row <- function(M, n){
n <- c(0,0,0)
M*n
}
And then I have set the matrix and tried to get my desired result by using my zero_row function
mat1 <- a_matrix
zero_row(M = mat1, n = 1)
zero_row(M = mat1, n = 2)
zero_row(M = mat1, n = 3)
However, right now all I get is a matrix with only zeros, which I do understand why. But if I instead change the vector n to one of the following
n <- c(0,1,1)
n <- c(1,0,1)
n <- c(1,1,0)
I get my desired result for when n=1, n=2, n=3 separately. But what i want is, depending on which n I put in, I get that row to zero, so I have a function that does it for every different n, instead of me having to change the vector for every separate n. So that I get (n=2 for example)
1 4 7
0 0 0
3 6 9
And is it better to do it in another form, instead of using vectors?
Here is a way.
zero_row <- function(M, n){
stopifnot(n <= nrow(M))
M[n, ] <- 0
M
}
A <- matrix(1:9, nrow = 3)
zero_row(A, 1)
zero_row(A, 2)
zero_row(A, 3)
I have a 25x25 matrix with numeric values and I want to choose through some conditions ! For example I want only the values from 0 to 0.2 to install them in another matrix how can I do this ?
x<-matrix(rnorm(25*25),25,25)
which(x>0.2) # indices where x>0.2
n<-40
h<-hist(x,breaks = seq(min(x),max(x),length.out = n+1),plot = F) # For multiple ranges and counts
h$breaks #n+1 break points
h$count #n counts of numbers between those breakpoints
What you want can be done with simple logical operations, see file R-intro.pdf that comes with your distribution of R, section 2.7 Index vectors; selecting and modifying subsets of a data set.
set.seed(1356) # make the results reproducible
m <- matrix(rnorm(25*25), 25) # input matrix
i <- 0 <= m & m <= 0.2 # logical index into 'm'
# create a result matrix with the same dimensions as the input
m2 <- matrix(NA, nrow = nrow(m), ncol = ncol(m))
m2[i] <- m[i] # assign the values you want
m2
sum(i) # count of values in [0, 0.2]
sum(m < 0) # count of values less than zero
sum(m > 0.2) # count of values greater than 0.2
I am generating a matrix in R using following,
ncolumns = 3
nrows = 10
my.mat <- matrix(runif(ncolumns*nrows), ncol=ncolumns)
This matrix indicates the co-ordinates of a point in 3D. How to calculate following in R?
sum of x(i)*y(i)
e.g. if the matrix is,
x y z
1 2 3
4 5 6
then output = 1*2 + 4*5
I'm trying to learn R. So any help will be really appreciated.
Thanks
You're looking for the %*% function.
ncolumns = 3
nrows = 10
my.mat <- matrix(runif(ncolumns*nrows), ncol=ncolumns)
(my.answer <- my.mat[,1] %*% my.mat[,2])
# [,1]
# [1,] 1.519
you simply do:
# x is the first column; y is the 2nd
sum(my.mat[i, 1] * my.mat[i, 2])
Now, if you want to name your columns, you can refer to them directly
colnames(my.mat) <- c("x", "y", "z")
sum(my.mat[i, "x"] * my.mat[i, "y"])
# or if you want to get the product of each i'th element
# just leave empty the space where the i would go
sum(my.mat[ , "x"] * my.mat[ , "y"])
each column is designated by the second argument in [], so
my_matrix[,1] + my_matrix[,2]
is all you need.