How can I create a sparse matrix from a list of dimension names?
Suppose you have this matrix edgelist in a data frame:
from to weight
1 4 a 1
2 5 b 2
3 6 c 3
It can be created like this:
from <- factor(c(4:6))
to <- c("a", "b", "c")
weight <- c(1:3)
foo <- data.frame(from, to, weight)
A matrix can be created by first creating an empty matrix filled with 0s, naming the rows and columns, and then filling the values in:
bar <- matrix(
0,
nrow = length(unique(foo$from)),
ncol = length(unique(foo$to)),
dimnames = list(levels(foo$from), levels(foo$to))
)
bar[as.matrix(foo[,1:2])] <- foo[,3]
The result looks like this:
a b c
4 1 0 0
5 0 2 0
6 0 0 3
How can I create a sparse matrix?
Solution
An elegant way is to use the Matrix package which requires using the integer values of the factors:
bar_sparse <- sparseMatrix(
i = as.numeric(foo$from),
j = as.numeric(foo$to),
x = foo$weight,
dimnames = list(levels(foo$from), levels(foo$to))
)
Here we go:
a b c
4 1 . .
5 . 2 .
6 . . 3
Thanks, Martin, for pointing me in this direction.
As maintainer of the Matrix package: Using dimnames for sparseMatrix objects is allowed in construction,
and for column names even of importance, notably e.g. for sparse model matrices (in glmnet etc).
but for efficiency reasons (and partly lack of use cases and hence "not yet
implemented") they are not always propagated, e.g., IIRC, in matrix multiplications.
The main reasons for this "semi discouraged" support is the fact that sparse Matrices are particularly important when very large in the sense of nrow(.) * ncol(.) being large.
In such cases, carrying (and copying !!) hundreds of thousands of row (and column) names is costly.
After all this caveat, of course I acknowledge you've asked a well valid question, and you may not have a choice for now and indeed need to work with row and column names instead of integer indices.
Yes, you are (almost) right:
Using
M <- Matrix(0, n,m, dimnames=....)
for(i in ...)
for(j in ...)
M[i,j] <- ...
is never a good idea for sparseMatrix objects (i.e. all Matrix objects inheriting from sparseMatrix).
Rather, using sparseMatrix(...., dimnames = ..) .. by the way noting that using the dimnames argument is more efficient than setting colnames and rownames separately afterwards.
I presume that you know you can do something as simple as:
for (i in 1:nrow(foo)) bar[as.character(c(foo[i,1])),c(foo[i,2])] <- foo[i,3]
but if you want to get something more efficient to work with Matrix, you may need to write your own function to assign it. Something like:
convert from and to columns to factors, ordered in whatever way you want
Sort foo by from then to (if you can't guarantee this is already true) and remove duplicates
Create empty Matrix with correct dimensions
set foo#i to bar$from-1
set foo#p to bar$to-1 + length(colnames(bar)) * (bar$from-1)
set foo#x to bar$weight
Related
I have extracted the array indeces of some elements I want to look at as follows:
mat = matrix(0,10,10)
arrInd = which(mat ==0,arr.ind = T)
Then I do some more operations on this matrix and eventually end up with a vector or rows rowInd and a vector of columns colInd. I want us these indeces to insert values into another matrix, say mat2. But I can't seem to figure out a way to do this without looping or doing the modular arithmetic calculation myself. I realize I could take something like
mat2[rowInd*(colInd-1)+rowInd]
In order to transform back to the 1-d indexing. But since R usually has built in functions to do this sort of thing, I was wondering if there is any more concise way to do this? It would just seem natural that such a handy data-manipulation function like which(,arr.ind=T) would have a handy inverse.
I also tried using mat2[rowInd,colInd], but this did not work.
Have a read on R intro: indexing a matrix on the use of matrix indexing. which(, arr.ind = TRUE) returns a two column matrix suitable for direct use of matrix indexing. For example:
A <- matrix(c(1L,2L,2L,1L), 2)
iv <- which(A == 1L, arr.ind = TRUE)
# row col
#[1,] 1 1
#[2,] 2 2
A[iv]
# [1] 1 1
If you have another matrix B which you want to update values according to iv, just do
B[iv] <- replacement
Maybe for some reason you've separated row index and column index into rowInd and colInd. In that case, just use
cbind(rowInd, colInd)
as indexing matrix.
I have extracted the array indeces of some elements I want to look at as follows:
mat = matrix(0,10,10)
arrInd = which(mat ==0,arr.ind = T)
Then I do some more operations on this matrix and eventually end up with a vector or rows rowInd and a vector of columns colInd. I want us these indeces to insert values into another matrix, say mat2. But I can't seem to figure out a way to do this without looping or doing the modular arithmetic calculation myself. I realize I could take something like
mat2[rowInd*(colInd-1)+rowInd]
In order to transform back to the 1-d indexing. But since R usually has built in functions to do this sort of thing, I was wondering if there is any more concise way to do this? It would just seem natural that such a handy data-manipulation function like which(,arr.ind=T) would have a handy inverse.
I also tried using mat2[rowInd,colInd], but this did not work.
Have a read on R intro: indexing a matrix on the use of matrix indexing. which(, arr.ind = TRUE) returns a two column matrix suitable for direct use of matrix indexing. For example:
A <- matrix(c(1L,2L,2L,1L), 2)
iv <- which(A == 1L, arr.ind = TRUE)
# row col
#[1,] 1 1
#[2,] 2 2
A[iv]
# [1] 1 1
If you have another matrix B which you want to update values according to iv, just do
B[iv] <- replacement
Maybe for some reason you've separated row index and column index into rowInd and colInd. In that case, just use
cbind(rowInd, colInd)
as indexing matrix.
I am working on my first real project within R and ran into a problem. I am trying to compare 2 columns within 2 different data.frames. I tried running the code,
matrix1 = matrix
for (i in 1:2000){
if(data.QW[i,1] == data.RS[i,1]){
matrix1[i,1]== "True"
}
else{
matrix1[i,1]== "False"
}
}
I got this error:
Error in Ops.factor(data.QW[i,1], data.RS[i,1]) :
level sets of factors are different
I think this may be because QW and RS have different row lengths. But I am trying to see where these errors might be within the different data.frames and fix them according to the source document.
I am also unsure if matrix will work for this or if I need to make it into a vector and rbind it into the matrix every time.
Any good readings on this would also be appreciated.
As mentioned in the comments, providing a reproducible example with the contents of the dataframe will be helpful.
Going by how the question topic sounds, it appears that you want to compare column 1 of data frame A against column 1 of data frame B and store the result in a logical vector. If that summary is accurate, please take a look here.
Too long for a comment.
Some observations:
Your columns, data.QW[,1] and data.RS[,1] are almost certainly factors.
The factors almost certainly have different set of levels (it's possible that one of the factors has a subset of the levels in the other factor). When this happens, comparisons using == will not work.
If you read your data into these data.frames using something like read.csv(...) any columns containing character data were converted to factors by default. You can change that behavior by setting stringsAsFactors=FALSE in the call to read.csv(...). This is a very common problem.
Once you've sorted out the factors/levels problem, you can avoid the loop by using, simply: data.QW[1:2000,1]==data.RW[1:2000,1]. This will create a vector of length 2000 containing all the comparisons. No loop needed. Of course this assumes that both data.frames have at least 2000 rows.
Here's an example of item 2:
x <- as.factor(rep(LETTERS[1:5],3)) # has levels: A, B, C, D, E
y <- as.factor(rep(LETTERS[1:3],5)) # has levels: A, B, C
y==x
# Error in Ops.factor(y, x) : level sets of factors are different
The below function compare compares data.frames or matrices a,b to find row matches of a in b. It returns the first row position in b which matches (after some internal sorting required to speed thinks up). Rows in a which have no match in b will have a return value of 0. Should handle numeric, character and factor column types and mixtures thereof (the latter for data.frames only). Check the example below the function definition.
compare<-function(a,b){
#################################################
if(dim(a)[2]!=dim(b)[2]){
stop("\n Matrices a and b have different number of columns!")
}
if(!all(sapply(a, class)==sapply(b, class))){
stop("\n Matrices a and b have incomparable column data types!")
}
#################################################
if(is.data.frame(a)){
i <- sapply(a, is.factor)
a[i] <- lapply(a[i], as.character)
}
if(is.data.frame(b)){
i <- sapply(b, is.factor)
b[i] <- lapply(b[i], as.character)
}
len1<-dim(a)[1]
len2<-dim(b)[1]
ord1<-do.call(order,as.data.frame(a))
a<-a[ord1,]
ord2<-do.call(order,as.data.frame(b))
b<-b[ord2,]
#################################################
found<-rep(0,len1)
dims<-dim(a)[2]
do_dims<-c(1:dim(a)[2])
at<-1
for(i in 1:len1){
for(m in do_dims){
while(b[at,m]<a[i,m]){
at<-(at+1)
if(at>len2){break}
}
if(at>len2){break}
if(b[at,m]>a[i,m]){break}
if(m==dims){found[i]<-at}
}
if(at>len2){break}
}
#################################################
found<-found[order(ord1)]
found<-ord2[found]
return(found)
}
# example data sets:
ncols<-10
nrows<-1E4
a <- matrix(sample(LETTERS,size = (ncols*nrows), replace = T), ncol = ncols, nrow = nrows)
b <- matrix(sample(LETTERS,size = (ncols*nrows), replace = T), ncol = ncols, nrow = nrows)
b <- rbind(a,b) # example of b containing a
b <- b[sample(dim(b)[1],dim(b)[1],replace = F),]
found<-compare(a,b)
a<-as.data.frame(a) # = conversion to factors
b<-as.data.frame(b) # = conversion to factors
found<-compare(a,b)
I am a newbie to R, but avid to learn.
I have been trying endlessly to create a matrix with a variable element (in this case [2,2]). The variable element should take number 4 on the first run and 5 on the second (numbers).
This matrix would be multiplied by another matrix (N0) and produce a result matrix (resul).
Up so far, I have only been able to create the initial matrix with the variable element using a for loop, but I am having problems indexing the result matrix. I have tried several versions, but this is the latest. Any suggestions would be greatly appreciated. Thank you.
numbers <- c(4,5,length.out = 2)
A <- matrix(c(1,2,3,NA),nrow=2,ncol=2)
resul <- matrix(nrow=2,ncol=1)
for (i in 1:2) {
A[2,2]<- matrix(numbers[i])
N0 <- matrix(c(1,2),nrow=2,ncol=1)
resul[i,]<- A[i,i]%*%N0
}
Your code has two distinct problems. the first is that A[i,i] is a 1 x 1
matrix, so you're getting an error because your multiplying a 1 x 1 matrix
by a 2 x 1 matrix (N0).
you could either drop the subscript [i,i] and initialize the result to be
a two by two matrix like so:
result <- matrix(nrow=2,ncol=1)
for (i in 1:2){
A[2,2]<- matrix(numbers[i])
# a colunm vector
N0 <- matrix(c(1,2),
nrow=2,
ncol=1)
# note the index is on the column b/c `A%*%N0` is a column matrix
result[,i]<- A%*%N0
}
or you could either drop the the second subscript [i,] and initialize the result to be
a two by two matrix like so:
result <- matrix(nrow=2,ncol=1)
for (i in 1:2){
A[2,2]<- matrix(numbers[i])
# a colunm vector
N0 <- matrix(c(1,2),
nrow=2,
ncol=1)
result[i,]<- A[i,]%*%N0
}
but it's not clear from you post which (if either) answer is the correct one. Indexing is tricky :)
So I know that if you have:
m = matrix(1:9, 3,3)
z = as.matrix(expand.grid(1:3, 1:3))
and you do
m[z]
# you get back 1 2 3 4 5 6 7 8 9
But if you do
m[] = m[z]
# You get back a matrix..
I'm a little confused as to what this [] operator does? why doesnt something like m[][z] or m[z][] return a matrix? and how would I get it to return a matrix without assigning it to a variable m[]
Thanks!
The key here is that when the argument to "[]" (which is really a function) is a two column matrix as you provided, the result will be a vector where the first column specifies the row and the second column specifies the column in operated-upon matrix. This is a "feature" ( and a very handy one I might add) of the language.
The arguments might or might not contain all of the possible combinations of row and column so the result would not predictably be something that would sensibly be a matrix of the same dimensions. The form: m[] <- m[ z[1:4, ] ] will produce a result but also a warning. You should look at the result and then make an effort to understand what is happening.