R: Converting a data frame of row/column indices to a matrix - r

I would like to transform an object like this
df <- data.frame(ROW = c(1,3),COLUMN =c(2,3),VALUE = c(10,20))
df
ROW COLUMN VALUE
1 2 10
3 3 20
to a matrix like this
m <-matrix(c(0,10,0,0,0,0,0,0,20),ncol = 3,nrow = 3)
m
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 10 0 0
[3,] 0 0 20
I know that the data frame represents a sparse matrix but I did not find any other question that relates exactly to what I am looking for.

We can use sparseMatrix
library(Matrix)
as.matrix( sparseMatrix(i = df$COLUMN, j= df$ROW, x= df$VALUE))
# [,1] [,2] [,3]
#[1,] 0 0 0
#[2,] 10 0 0
#[3,] 0 0 20
Or create a matrix of 0's and then assign
m1 <- matrix(0, 3, 3)
m1[as.matrix(df[2:1])] <- df$VALUE
Note: Based on the output, the ROW/COLUMN index seems to be reversed

We can iterate over the rows of df and fill a matrix according to the row and column indexes included in df:
# initialize
new_mat <- matrix(0, ncol = max(df$ROW), nrow = max(df$COLUMN))
for(i in 1:nrow(df)){
new_mat[df[i,]$COLUMN, df[i,]$ROW] <- df[i,]$VALUE
}
# [,1] [,2] [,3]
# [1,] 0 0 0
# [2,] 10 0 0
# [3,] 0 0 20
as #akrun has noted, it seems as though the row and column indexes are flipped.

Here is a solution with apply
mat <- matrix( 0, ncol = max(df$ROW), nrow = max(df$COLUMN) )
f <- function( x ) { mat[x[1],x[2]] <<- x[3] }
apply( df, 1, f } )
The <<- operator applies the value to the externally defined mat.
May need to use as.numeric etc. for data type conversions.

Related

How to extract values from a vector and create a diagonal matrix?

I have a vector of n numbers, for simplicity assume that
test <- c(1:100)
It is simple to construct a diagonal matrix for a vector with diag().
However, I want to extract every value of the vector and create a 4x4 matrix with the extracted value being in i = 1 and j = 1 (upper left hand corner) and all other values being zero.
Personally, I have no clue whatsoever how to accomplish that.
Why do I want to do that? I'm performing Input/Output analysis and want to calculate the inoperability of a sector. For that I need the sector recovery time which is in a vector of 1000 randomly generated recovery times from a pert distribution.
To be more precise:
If I have this vector from 1:100 I want to extract every value from 1:100 and create a separate matrix that looks like this (for 1 to 100):
1 0 0 0
0 0 0 0
0 0 0 0
A slightly shorter version would be
lapply(1:100, function(x) matrix(c(x, rep(0, 15)), 4))
lapply(1:100, \(z) {
m <- matrix(0, nrow = 4, ncol = 4);
m[1,1] <- z;
m})
if possible sparse matrices, could be more memory efficient:
lapply(1:100, \(z) {
Matrix::sparseMatrix(1, 1, x = z, dims = c(4,4))
})
Another option would be to use a 3D array:
test <- 1:100
nr <- 3L
nc <- 4L
a <- array(0L, c(3, 4, length(test)))
a[1,1,] <- test
a[,,1]
#> [,1] [,2] [,3] [,4]
#> [1,] 1 0 0 0
#> [2,] 0 0 0 0
#> [3,] 0 0 0 0
a[,,12]
#> [,1] [,2] [,3] [,4]
#> [1,] 12 0 0 0
#> [2,] 0 0 0 0
#> [3,] 0 0 0 0

Vectorizing an easy loop in R

I am trying to improve my code, but there is a task where I am not able to vectorize a loop. I have 5 sparse matrices, where I have to loop through and add a 1x3-vector resulting from the non-null elements to another matrix. In this example I work with only 1 sparse matrix to keep it simple.
m <- matrix(0, nrow = 3, ncol = 3)
m[1, 2] <- 5
m[3, 3] <- 8
n <- matrix(0, nrow=3, ncol=5)
for (i in 1:nrow(m)){
for (j in 1:ncol(m)){
if (!m[i,j]==0){
n[i, j:(j+2)] <- n[i, j:(j+2)] + rep(m[i, j], 3)
}
}
}
The code works, but I have the feeling that there are much better solutions with vectorized functions. I tried and failed with apply functions as the new matrix n has other dimensions as m.
Would be great to get some ideas here.
Here's a vectorized base R solution:
Create the sequence of indices where you want your values to be inserted. Check ?sequence to understand how this works under the hood.
Insert the values
idx <- sequence(nvec = rep(3, length(m[m != 0])), from = which(m != 0), by = nrow(m))
n[idx] <- rep(m[m != 0], each = 3)
n
[,1] [,2] [,3] [,4] [,5]
[1,] 0 5 5 5 0
[2,] 0 0 0 0 0
[3,] 0 0 8 8 8
I'm not sure this is all that much better, but you could use which() to identify the row and column indices of the non-zero values in m, then use a bit of dplyr code to expand the list of column indices by the required length. Then use those indices and the values to replace the appropriate cells in the matrix.
library(dplyr)
library(tidyr)
m <- matrix(0, nrow = 3, ncol = 3)
m[1, 2] <- 5
m[3, 3] <- 8
n <- matrix(0, nrow=3, ncol=5)
w <- which(m != 0, arr.ind=TRUE)
w <- cbind(w, value=m[w])
w <- as_tibble(w)
w <- w %>%
rowwise() %>%
mutate(col = list(col:(col+2))) %>%
unnest(col)
n[cbind(w$row, w$col)] <- w$value
n
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0 5 5 5 0
#> [2,] 0 0 0 0 0
#> [3,] 0 0 8 8 8
Created on 2023-02-10 by the reprex package (v2.0.1)
Nested loops can often be solved with mapply() or outer().
#Sample Data
m <- matrix(0, nrow = 3, ncol = 3)
m[1, 2] <- 5
m[3, 3] <- 8
n <- matrix(0, nrow=3, ncol=5)
#Build the function using the <<- operators, which are normally only used in functions, and cause a search to be made through parent environments for an existing definition of the variable being assigned
FN <- function (i, j){
if (m[i, j] != 0){
n[i, j:(j+2)] <<- n[i, j:(j+2)] + rep(m[i, j], 3)
}
}
#Outer() function to iterate, with mapply() to vectorize the if statement
outer(1:nrow(m), 1:ncol(m), FUN=function(x, y) mapply(FN, x, y))
n
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0 5 5 5 0
#[2,] 0 0 0 0 0
#[3,] 0 0 8 8 8

How to write the function to create a diagonal matrix from upper right to lower left in R?

I would like to write one function whose input is a square matrix, and it returns a square matrix whose numbers from the upper right corner down to lower left corner are preserved and other numbers are zero.
For example
suppose A is a 4*4 matrix in the following.(sorry I do not know how to type the matrix expression)
[1,2,3,4]
[5,6,7,8]
[9,10,11,12]
[13,14,15,16]
How can I write a function in R without any loops to transform the matrix into this?
[0,0,0,4]
[0,0,7,0]
[0,10,0,0]
[13,0,0,0]
This feels like a gymnastics exercise...
xy <- matrix(1:16, ncol = 4, byrow = TRUE)
xy <- apply(xy, MARGIN = 1, rev)
xy[lower.tri(xy)] <- 0
xy[upper.tri(xy)] <- 0
t(apply(xy, MARGIN = 1, rev))
[,1] [,2] [,3] [,4]
[1,] 0 0 0 4
[2,] 0 0 7 0
[3,] 0 10 0 0
[4,] 13 0 0 0
Here is another option.
mat <- matrix(1:16, 4, byrow = TRUE)
idx <- cbind(seq_len(nrow(mat)),
ncol(mat):1)
values <- mat[idx]
mat <- matrix(0, nrow = dim(mat)[1], ncol = dim(mat)[2])
mat[idx] <- values
mat
# [,1] [,2] [,3] [,4]
#[1,] 0 0 0 4
#[2,] 0 0 7 0
#[3,] 0 10 0 0
#[4,] 13 0 0 0
A non-apply solution using some maths to generate the indices stealing xy from #Roman
xy <- matrix(1:16, ncol = 4, byrow = TRUE)
ind <- nrow(xy)
xy[setdiff(1:length(xy), seq(ind, by = ind -1, length.out = ind))] <- 0
xy
# [,1] [,2] [,3] [,4]
#[1,] 0 0 0 4
#[2,] 0 0 7 0
#[3,] 0 10 0 0
#[4,] 13 0 0 0
Trying it on 5 X 5 matrix
xy <- matrix(1:25, 5, byrow = TRUE)
ind <- nrow(xy)
xy[setdiff(1:length(xy), seq(ind, by = ind -1, length.out = ind))] <- 0
xy
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0 0 0 0 5
#[2,] 0 0 0 9 0
#[3,] 0 0 13 0 0
#[4,] 0 17 0 0 0
#[5,] 21 0 0 0 0
This answer takes a slightly different approach than the other answers. Instead of trying to zero out everything except for the diagonal, we can just build the diagonal by itself:
m <- matrix(rep(0,16), nrow = 4, byrow = TRUE)
for (i in 0:15) {
row <- floor(i / 4)
col <- i %% 4
if (i == 3 + (row*3)) {
m[row+1, col+1] <- i+1
}
}
m
[,1] [,2] [,3] [,4]
[1,] 0 0 0 4
[2,] 0 0 7 0
[3,] 0 10 0 0
[4,] 13 0 0 0
I just thought about a way to reverse the original diag function from base R.
You can see it by just typing diag in the console.
Here the highlighted change I made in my diag_reverse:
y <- x[((m - 1L):0L * (dim(x)[1L])) + (1L:m)] # m is min(dim(x))
And here's the complete function (I kept all the code except that one line):
diag_reverse <- function (x = 1, nrow, ncol, names = TRUE)
{
if (is.matrix(x)) {
if (nargs() > 1L && (nargs() > 2L || any(names(match.call()) %in%
c("nrow", "ncol"))))
stop("'nrow' or 'ncol' cannot be specified when 'x' is a matrix")
if ((m <- min(dim(x))) == 0L)
return(vector(typeof(x), 0L))
y <- x[((m - 1L):0L * (dim(x)[1L])) + (1L:m)] # HERE I made the change
if (names) {
nms <- dimnames(x)
if (is.list(nms) && !any(vapply(nms, is.null, NA)) &&
identical((nm <- nms[[1L]][seq_len(m)]), nms[[2L]][seq_len(m)]))
names(y) <- nm
}
return(y)
}
if (is.array(x) && length(dim(x)) != 1L)
stop("'x' is an array, but not one-dimensional.")
if (missing(x))
n <- nrow
else if (length(x) == 1L && nargs() == 1L) {
n <- as.integer(x)
x <- 1
}
else n <- length(x)
if (!missing(nrow))
n <- nrow
if (missing(ncol))
ncol <- n
.Internal(diag(x, n, ncol))
}
Then we can call it:
m <- matrix(1:16,nrow=4,ncol=4,byrow = T)
diag_reverse(m)
#[1] 4 7 10 13
I'll test it on other matrices to see if it gives always the correct answer.
The apply family are really just loops with a bow tie.
Here is a way to do it without apply. With some input checking and should work on any size matrix.
off_diag = function(X)
{
if(!is.matrix(X)) stop('Argument is not a matrix')
n <- nrow(X)
if(ncol(X) != n) stop('Matrix is not square')
if(n<2) return(X)
Y <- X * c(0,rep(rep(c(0,1),c(n-2,1)),n),rep(0,n-1))
return(Y)
}
Now it can handle numeric vectors, character vectors and NAs.
mat <- matrix(1:16, 4, byrow = TRUE)
off_diag(mat)
# [,1] [,2] [,3] [,4]
# [1,] 0 0 0 4
# [2,] 0 0 7 0
# [3,] 0 10 0 0
# [4,] 13 0 0 0
Edit: improvement
I realised my function will fail if there are NAs since NA*0 is NA, additionally it will not work on characters, but doesn't check the matrix has mode as numeric. So instead I use the same setup to make a logical vector
minor_diag = function(X)
{
if(!is.matrix(X)) stop('Argument is not a matrix')
n <- nrow(X)
if(ncol(X) != n) stop('Matrix is not square')
if(n<2) return(X)
index = c(TRUE,rep(rep(c(TRUE,FALSE),c(n-2,1)),n),rep(TRUE,n-1))
X[index]=0
return(X)
}
mat <- matrix(letters[1:16], 4, byrow = TRUE)
minor_diag(mat)
## [,1] [,2] [,3] [,4]
## [1,] "0" "0" "0" "d"
## [2,] "0" "0" "g" "0"
## [3,] "0" "j" "0" "0"
## [4,] "m" "0" "0" "0"
minor_diag(matrix(NA,2,2))
## [,1] [,2]
## [1,] 0 NA
## [2,] NA 0
A one liner without loops
#setup
n <- 5
A <- matrix(1:(n^2), n)
#solution
diag(diag(A[n:1,]))[n:1,]

R function to compute deviation matrix

I've writen a function to compute a matrix where each column is the corresponding input matrix column minus the column mean.
# compute the deviation matrix
deviation <- function(X) {
one <- rep(1, nrow(X))
n <- ncol(X)
d <- matrix(data = NA, nrow = nrow(X), ncol = ncol(X))
for(i in seq.int(from = 1, to = n)) {
d[,i] <- X[,i] - mean(X[,i], na.rm = TRUE) * one
}
d
}
Could this function be written more idiomatically in R (using functional programming, perhaps)?
Use sweep and colMeans:
sweep(mat, 2, colMeans(mat))
By default, sweep uses - or the subtraction function, taking the column means as calculated by colMeans, from the values in each column (MARGIN=2). Gives the same result:
mat <- matrix(1:12,nrow=3)
deviation(mat)
# [,1] [,2] [,3] [,4]
#[1,] -1 -1 -1 -1
#[2,] 0 0 0 0
#[3,] 1 1 1 1
sweep(mat, 2, colMeans(mat))
# [,1] [,2] [,3] [,4]
#[1,] -1 -1 -1 -1
#[2,] 0 0 0 0
#[3,] 1 1 1 1

Randomly selecting values from an existing matrix after adding a vector (in R)

Thank you so much for your help in advance!
I am trying to modify an existing matrix such that, when a new line is added to the matrix, it removes values from the preexisting matrix.
For example, I have the matrix:
[,1] [,2] [,3] [,4]
1 1 0 0
0 1 0 0
1 0 1 0
0 0 1 1
I want to add another vector, I.vec, which has two values (I.vec=c(0,1,1,0)).
This is easy enough to do. I just rbind it to the matrix.
Now, for every column where I.vec is equal to 1, I want to randomly select a value from the other rows and make it zero.
Ideally, this would end up with a matrix like:
[,1] [,2] [,3] [,4]
1 0 0 0
0 1 0 0
1 0 0 0
0 0 1 1
0 1 1 0
But each time I run the iteration, I want it to randomly sample again.
So this is what I have tried:
mat1<-matrix(c(1,1,0,0,0,1,0,0,1,0,1,0,0,0,1,1),byrow=T, nrow=4)
I.vec<-c(0,1,1,0)
mat.I<-rbind(mat1,I.vec)
mat.I.r<-mat.I
d1<-mat.I[,which(mat.I[5,]==1)]
mat.I.r[sample(which(d1[1:4]==1),1),which(mat.I[5,]==1)]<-0
But this only deletes one of the two values I would like to delete. I have also tried variations on subsetting the matrix, but I have not been successful.
Thank you again!
There is a little bit of ambiguity in the description from the OP, so two solutions are suggested:
Assuming that only existing 1s in relevant columns can be set to 0
I'll just alter the original function (see below). The change is to the line defining rows. I now have (there was a bug in the original - the version below is revised to handle deal with the bug):
rows <- sapply(seq_along(cols),
function(x, mat, cols) {
ones <- which(mat[,cols[x]] == 1L)
out <- if(length(ones) == 1L) {
ones
} else {
sample(ones, 1)
}
out
}, mat = mat, cols = cols)
Basically, what this does is, for each column we need to swap a 1 to a 0, we work out which rows of the column contain 1s and sample one of these.
Edit: We have to handle the case where there is only a single 1 in a column. If we just sample from a length 1 vector, R's sample() will treat it as if we wanted to sample from the set seq_len(n) not from the length 1 set n. We handle this now with an if, else statement.
We have to do this individually for each column so we get the correct rows. I suppose we could do some nice manipulation to avoid repeated calls to which() and sample(), but how escapes me at the moment, because we do have to handle the case where there is only one 1 in the column. Here's the finished function (updated to handle the length 1 sample bug in the original):
foo <- function(mat, vec) {
nr <- nrow(mat)
nc <- ncol(mat)
cols <- which(vec == 1L)
rows <- sapply(seq_along(cols),
function(x, mat, cols) {
ones <- which(mat[,cols[x]] == 1L)
out <- if(length(ones) == 1L) {
ones
} else {
sample(ones, 1)
}
out
}, mat = mat, cols = cols)
ind <- (nr*(cols-1)) + rows
mat[ind] <- 0
mat <- rbind(mat, vec)
rownames(mat) <- NULL
mat
}
and here it is in action:
> set.seed(2)
> foo(mat1, ivec)
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0
[2,] 0 1 0 0
[3,] 1 0 1 0
[4,] 0 0 0 1
[5,] 0 1 1 0
and it works when there is only one 1 in a column we want to do a swap in:
> foo(mat1, c(0,0,1,1))
[,1] [,2] [,3] [,4]
[1,] 1 1 0 0
[2,] 0 1 0 0
[3,] 1 0 1 0
[4,] 0 0 0 1
[5,] 0 0 1 1
Original Answer: Assuming any value in a relevant column can be set to zero
Here is a vectorised answer, where we treat the matrix as a vector when doing the replacement. Using the example data:
mat1 <- matrix(c(1,1,0,0,0,1,0,0,1,0,1,0,0,0,1,1), byrow = TRUE, nrow = 4)
ivec <- c(0,1,1,0)
## Set a seed to make reproducible
set.seed(2)
## number of rows and columns of our matrix
nr <- nrow(mat1)
nc <- ncol(mat1)
## which of ivec are 1L
cols <- which(ivec == 1L)
## sample length(cols) row indices, with replacement
## so same row can be drawn more than once
rows <- sample(seq_len(nr), length(cols), replace = TRUE)
## Compute the index of each rows cols combination
## if we treated mat1 as a vector
ind <- (nr*(cols-1)) + rows
## ind should be of length length(cols)
## copy for illustration
mat2 <- mat1
## replace the indices we want with 0, note sub-setting as a vector
mat2[ind] <- 0
## bind on ivec
mat2 <- rbind(mat2, ivec)
This gives us:
> mat2
[,1] [,2] [,3] [,4]
1 0 0 0
0 1 0 0
1 0 0 0
0 0 1 1
ivec 0 1 1 0
If I were doing this more than once or twice, I'd wrap this in a function:
foo <- function(mat, vec) {
nr <- nrow(mat)
nc <- ncol(mat)
cols <- which(vec == 1L)
rows <- sample(seq_len(nr), length(cols), replace = TRUE)
ind <- (nr*(cols-1)) + rows
mat[ind] <- 0
mat <- rbind(mat, vec)
rownames(mat) <- NULL
mat
}
Which gives:
> foo(mat1, ivec)
[,1] [,2] [,3] [,4]
[1,] 1 1 0 0
[2,] 0 1 0 0
[3,] 1 0 1 0
[4,] 0 0 0 1
[5,] 0 1 1 0
If you wanted to do this for multiple ivecs, growing mat1 each time, then you probably don't want to do that in a loop as growing objects is slow (it involves copies etc). But you could just modify the definition of ind to include the extra n rows you bind on for the n ivecs.
You could try something like this. Having 'nrow' in there will allow you to run it multiple times with other 'I.vec's. I tried to do this in a single line with 'apply' but couldn't get a matrix to come out again.
mat1<-matrix(c(1,1,0,0,0,1,0,0,1,0,1,0,0,0,1,1),byrow=T, nrow=4)
I.vec<-c(0,1,1,0)
mat.I.r<-rbind(mat1,I.vec)
for(i in 1:ncol(mat.I.r))
{
ifelse(mat.I.r[nrow(mat.I.r),i]==1, mat.I.r[sample(which(mat.I.r[1:(nrow(mat.I.r)-1),i]==1),1), i] <- 0, "")
}
mat.I.r

Resources