How to create a factor from a binary indicator matrix? - r

Say I have the following matrix mat, which is a binary indicator matrix for the levels A, B, and C for a set of 5 observations:
mat <- matrix(c(1,0,0,
1,0,0,
0,1,0,
0,1,0,
0,0,1), ncol = 3, byrow = TRUE)
colnames(mat) <- LETTERS[1:3]
> mat
A B C
[1,] 1 0 0
[2,] 1 0 0
[3,] 0 1 0
[4,] 0 1 0
[5,] 0 0 1
I want to convert that into a single factor such that the output is equivalent to fac defines as:
> fac <- factor(rep(LETTERS[1:3], times = c(2,2,1)))
> fac
[1] A A B B C
Levels: A B C
Extra points if you get the labels from the colnames of mat, but a set of numeric codes (e.g. c(1,1,2,2,3)) would also be acceptable as desired output.

Elegant solution with matrix multiplication (and shortest up to now):
as.factor(colnames(mat)[mat %*% 1:ncol(mat)])

This solution makes use of the arr.ind=TRUE argument of which, returning the matching positions as array locations. These are then used to index the colnames:
> factor(colnames(mat)[which(mat==1, arr.ind=TRUE)[, 2]])
[1] A A B B C
Levels: A B C
Decomposing into steps:
> which(mat==1, arr.ind=TRUE)
row col
[1,] 1 1
[2,] 2 1
[3,] 3 2
[4,] 4 2
[5,] 5 3
Use the values of the second column, i.e. which(...)[, 2] and index colnames:
> colnames(mat)[c(1, 1, 2, 2, 3)]
[1] "A" "A" "B" "B" "C"
And then convert to a factor

One way is to replicate the names out by row number and index directly with the matrix, then wrap that with factor to restore the levels:
factor(rep(colnames(mat), each = nrow(mat))[as.logical(mat)])
[1] A A B B C
Levels: A B C
If this is from model.matrix, the colnames have fac prepended, and so this should work the same but removing the extra text:
factor(gsub("^fac", "", rep(colnames(mat), each = nrow(mat))[as.logical(mat)]))

You could use something like this:
lvls<-apply(mat, 1, function(currow){match(1, currow)})
fac<-factor(lvls, 1:3, labels=colnames(mat))

Here is another one
factor(rep(colnames(mat), colSums(mat)))

Related

How to iterate through nxn matrix and store the x co-ord, y co-ord as well as value into an nx3 matrix

Have written a script that iterates through a matrix and returns the x co-ord and y co-ord of every non-NA value in the matrix. How do i append this code to create another column of the value of each element in the matrix attached to the co-ordinates
matrixop = function(m2){
zzz <- NULL
for (i in 1:ncol(m2)){
for (j in 1:nrow(m2)) {
if ((is.na(m2[i,j])) == FALSE ){
}
zzz <- rbind(zzz,c(i,j))
}
}
zzz
}
result = lapply(m1, FUN = matrixop) #m1 being existing nxn matrix
actual results were a nx2 matrix with the x co ords in the first column and the y coords in the 2nd column. Trying to get a third column with the value attached to those co ords
Take advantage of which argument arr.ind and cbind with the values of the matrix seen as a vector. The missing values are removed with complete.cases.
mat2coord <- function(x){
d <- which(x == x | is.na(x), arr.ind = TRUE)
d <- cbind(d, value = c(x))
d[complete.cases(d), ]
}
m <- matrix(1:6, nrow = 3)
mat2coord(m)
# row col value
#[1,] 1 1 1
#[2,] 2 1 2
#[3,] 3 1 3
#[4,] 1 2 4
#[5,] 2 2 5
#[6,] 3 2 6
set.seed(1234)
is.na(m) <- sample(6, 2)
mat2coord(m)
# row col value
#[1,] 1 1 1
#[2,] 3 1 3
#[3,] 2 2 5
#[4,] 3 2 6

what does rbind.fill.matrix really do?

I have this code and can't understand how rbind.fill.matrix is used.
dtmat is a matrix with the documents on rows and words on columns.
word <- do.call(rbind.fill.matrix,lapply(1:ncol(dtmat), function(i) {
t(rep(1:length(dtmat[,i]), dtmat[,i]))
}))
I read the description of the function and says that binds matrices but cannot understand which ones and fills with NA missing columns.
From what I understand, the function replaces columns that dont bind with NA.
Lets say I have 2 matrices A with two columns col1 and col2, B with three columns col1, col2 and colA. Since I want to bind all both these matrices, but rbind only binds matrices with equal number of columns and same column names, rbind.fill.matrix binds the columns but adds NA to all values that should be in both the matrices that are not. The code below will explain it more clearly.
a <- matrix(c(1,1,2,2), nrow = 2, byrow = T)
> a
[,1] [,2]
[1,] 1 1
[2,] 2 2
>
> b <- matrix(c(1,1,1,2,2,2,3,3,3), nrow = 3, byrow = T)
> b
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 2 2
[3,] 3 3 3
>
> library(plyr)
> r <- rbind.fill.matrix(a,b)
> r
1 2 3
[1,] 1 1 NA
[2,] 2 2 NA
[3,] 1 1 1
[4,] 2 2 2
[5,] 3 3 3
>
>
The documentation also mentions about column names, which I think you can also understand from the example.

Write a value for maximum/minimum between two values

I have a two-column matrix and I want to produce a new matrix/data.frame where Col N has 1 if is maximum, 0 otherwise (they are never equal). This is my attempt:
testM <- matrix(c(1,2,3, 1,1,5), ncol = 2, byrow = T)
>testM
V1 V2
1 1 2
2 3 1
3 1 5
apply(data.frame(testM), 1, function(row) ifelse(max(row[1],row[2]),1,0))
I expect to have:
0 1
1 0
0 1
because of the 0,1 parameters in max() function, but I just get
[1] 1 1 1
Any ideas?
Or using pmax
testM <- matrix(c(1,2,3, 1,1,5), ncol = 2, byrow = T)
--(testM==pmax(testM[,1],testM[,2]))
V1 V2
[1,] 0 1
[2,] 1 0
[3,] 0 1
You can perform arithmetic on Booleans in R! Just check if an element in each row is equal to it's max value and multiply by 1.
t(apply(testM, 1, function(row) 1*(row == max(row))))
You can use max.col and col to produce a logical matrix:
res <- col(testM) == max.col(testM)
res
[,1] [,2]
[1,] FALSE TRUE
[2,] TRUE FALSE
[3,] FALSE TRUE
If you want it as 0/1, you can do:
res <- as.integer(col(testM) == max.col(testM)) # this removes the dimension
dim(res) <- dim(testM) # puts the dimension back
res
[,1] [,2]
[1,] 0 1
[2,] 1 0
[3,] 0 1

NA won't omit in R when 0 is between 1's

I am suppose to change a square matrix which represents a graph (the vertices-and-edges kind) and change it into a list that represents the same graph
square matrix: element (i,j) = 1 means there is an edge i -> j
list: element i is a vector (possibly empty, coded as NA) of all j s.t. there is an edge i -> j
My problem is that if there is a zero in the middle of the row it returns a NA and it is only suppose to do that when a vector is empty(no edges). It only does it when a zero is between two 1's. I don't know why and NA.omit doesn't work.
This is my first time programming in R.
squaretolist <- function(m){
ml <- list() #creates an empty list that we will return at the end
for(i in 1:ncol(m)){ #loop through columns
b1 <- c()
for(j in 1:nrow(m)){ #loop through rows
ifelse(m[i,j] %in% 1, b1[j] <- j, next)
}
ifelse(length(b1) == 0, ml[[i]]<- NA, ml[[i]] <- b1 )
}
return(ml)
}
In your function, if you have a zero in between two 1s, for example 1 in the 1st position and in the 3rd position, you're assigning b1[1] to 1, b1[3] to 3 but, as you have a 0 in the 2nd position, you're not assigning b1[2] to anything so it becomes NA.
To avoid that, you can replace ifelse(m[i,j] %in% 1, b1[j] <- j, next)
by ifelse(m[i,j] %in% 1, b1 <- c(b1,j), next).
You can also get what you want with the use of grep and apply functions :
ml <- apply(m, 1, function(i) {if(any(i==1)) grep(1, i) else NA})
This instruction tells R to apply, for each row of the matrix m, a function that returns, if there is at least one 1, the position of the 1(s), else NA.
Example:
set.seed(123)
m<-matrix(sample(c(0,1),25,replace=T),nrow=5)
m[4,]<-rep(0,5)
# > m
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0 0 1 1 1
# [2,] 1 1 0 0 1
# [3,] 0 1 1 0 1
# [4,] 0 0 0 0 0
# [5,] 1 0 0 1 1
ml<-apply(m,1,function(i){if(any(i==1)) grep(1,i) else NA})
# > ml
# [[1]]
# [1] 3 4 5
# [[2]]
# [1] 1 2 5
# [[3]]
# [1] 2 3 5
# [[4]]
# [1] NA
# [[5]]
# [1] 1 4 5

Named rows and cols for matrices in R

Is it possible to have named rows and columns in Matrices?
for example:
[,a] [,b]
[a,] 1 , 2
[b,] 3 , 4
Is it even reasonable to have such a thing for exploring the data?
Sure. Use dimnames:
> a <- matrix(1:4, nrow = 2)
> a
[,1] [,2]
[1,] 1 3
[2,] 2 4
> dimnames(a) <- list(c("A", "B"), c("AA", "BB"))
> a
AA BB
A 1 3
B 2 4
With dimnames, you can provide a list of (first) rownames and (second) colnames for your matrix. Alternatively, you can specify rownames(x) <- whatever and colnames(x) <- whatever.

Resources