I have the following matrix 'x'
a b
a 1 3
b 2 4
It is a really large matrix (trimmed down for this question)
I would like to print out this matrix by each row name and column name combination along with the value in that cell. So the expected output would be
a,a,1
a,b,3
b,a,2
b,b,4
I could loop through them, but I'm pretty sure this can be avoided (apply?). Any pointers much appreciated.
One way is to use the melt function from the reshape2 package.
x <- matrix(1:4, nrow = 2, ncol = 2,
dimnames = list(dim1 = c("a", "b"), dim2 = c("a", "b")))
library(reshape2)
melt(x)
# dim1 dim2 value
# 1 a a 1
# 2 b a 2
# 3 a b 3
# 4 b b 4
Edit
If your data is so big that speed is an issue, I would also suggest:
data.frame(dim1 = rep(rownames(x), ncol(x)),
dim2 = rep(colnames(x), each = nrow(x)),
value = c(x))
Edit2
After testing with relatively big data, I would not rule out melt:
x <- matrix(runif(9e6), nrow = 3000, ncol = 3000,
dimnames = list(dim1 = paste0("x", runif(3000)),
dim2 = paste0("y", runif(3000))))
system.time(y1 <- melt(x))
# user system elapsed
# 1.17 0.44 1.61
system.time(y2 <- data.frame(dim1 = rep(rownames(x), ncol(x)),
dim2 = rep(colnames(x), each = nrow(x)),
value = c(x)))
# user system elapsed
# 1.98 0.37 2.36
You can also use the base R function row and col
If you want to reference the row.names and col.names then use as.factor = T. Using as.character and as.numeric flattens the matrix.
do.call(data.frame,list(lapply(list(row = row(x, T),col=col(x,T)), as.character),
value =as.numeric(x)))
## row col value
## 1 a a 1
## 2 b a 2
## 3 a b 3
## 4 b b 4
If you want a matrix you will need to have all the columns as the same class (character or numeric. You could then use
do.call(cbind, lapply(list(row = row(x), col = col(x), value = x), as.numeric))
## row col value
## [1,] 1 1 1
## [2,] 2 1 2
## [3,] 1 2 3
## [4,] 2 2 4
Or as character
do.call(cbind, lapply(list(row = row(x, T), col = col(x, T), value = x), as.character))
## row col value
## [1,] "a" "a" "1"
## [2,] "b" "a" "2"
## [3,] "a" "b" "3"
## [4,] "b" "b" "4"
Related
I have a data like following
class.df <- data.frame(
A = sample(1:2, 100, replace=TRUE),
B = sample(1:2, 100, replace=TRUE),
C = sample(1:2, 100, replace=TRUE),
D = sample(1:2, 100, replace=TRUE)
)
ids_df <- t(combn(names(class.df), 2))
fisher_tests <- apply(ids_df, 1, function(i) tryCatch(fisher.test(table(class.df[,i])), error = function(e) NA_real_))
edge_table <- cbind(ids_df, t(sapply(fisher_tests, "[", c("p.value", "estimate"))))
edge_table
p.value estimate
[1,] "A" "B" 0.6826874 0.7919741
[2,] "A" "C" 0.6873498 1.219358
[3,] "A" "D" 0.5473441 0.7356341
[4,] "B" "C" 0.6828843 0.8164863
[5,] "B" "D" 1 1.033625
[6,] "C" "D" 0.2257244 0.5789776
write.csv(edge_table,"/Users/Results/EE2.csv")
But when i try to write.csv and open the csv file. The last column (estimate) shows weird values of matrix like following. I just need numerical values as it was showing in R. How to resolve the issue.
Your edge_table object is an array that contains list columns due to the way you have created it. If you want a standard data frame, you can try:
edge_table <- as.data.frame(edge_table)
edge_table[] <- lapply(edge_table, unlist)
write.csv(edge_table, 'test.csv', row.names = FALSE)
read.csv('test.csv')
#> V1 V2 p.value estimate
#> 1 A B 0.2372953 0.6160489
#> 2 A C 0.3164033 1.6070638
#> 3 A D 0.3238934 1.5017413
#> 4 B C 0.8411522 1.1562830
#> 5 B D 0.6882391 0.8017990
#> 6 C D 0.5514280 1.2965724
Created on 2022-12-06 with reprex v2.0.2
I would like to process the following matrix data by using which(), and return all the row names, for example:
m = matrix(seq(-1,1, 0.5), nrow = 3)
# [,1] [,2]
#[1,] -1.0 0.5
#[2,] -0.5 1.0
#[3,] 0.0 -1.0
which(m==0.5,arr.ind=TRUE)
# row col
# [1,] 1 2
How can I get the matrix like this? All row names can be shown in the table, and the missing value in col is NA.
# row col
# [1,] 1 2
# [2,] 2 NA
# [3,] 3 NA
# [4,] 4 NA
Here is a method that first change the matrix into a dataframe, and use tidyr::complete() to "expand" the dataframe based on the number of rows of m. Finally change it back to a matrix.
library(tidyverse)
as.data.frame(which(m==0.5,arr.ind=TRUE)) %>%
complete(row = 1:nrow(m)) %>%
as.matrix()
row col
[1,] 1 2
[2,] 2 NA
[3,] 3 NA
Using only base R, you could build a function. Please find below a reprex:
Reprex
Your data
m = matrix(seq(-1,1, 0.5), nrow = 3)
Code of the function
indexfunct <- function(x, value){
res <- matrix(NA, nrow = dim(x)[1], ncol = dim(x)[2], dimnames = (list(seq(dim(x)[1]), c("row", "col"))))
res[,1] <- seq(dim(x)[1])
res[which(x==value,arr.ind=TRUE)[,1],2] <- which(x==value,arr.ind=TRUE)[,2]
return(res)
}
Output of the function
indexfunct(m, value = 0.5)
#> row col
#> 1 1 2
#> 2 2 NA
#> 3 3 NA
Created on 2022-03-06 by the reprex package (v2.0.1)
Another possible base R solution:
m = matrix(seq(-1,1, 0.5), nrow = 3)
rbind(which(m == 0.5, arr.ind = TRUE),
cbind(setdiff(1:nrow(m), which(m == 0.5, arr.ind = TRUE)[,"row"]), NA))
#> row col
#> [1,] 1 2
#> [2,] 2 NA
#> [3,] 3 NA
In case a function is needed:
getindexes <- function(mat, value)
{
rbind(which(mat == value, arr.ind = TRUE),
cbind(setdiff(1:nrow(mat), which(mat == value,arr.ind = TRUE)[,"row"]), NA))
}
getindexes(m, 0.5)
#> row col
#> [1,] 1 2
#> [2,] 2 NA
#> [3,] 3 NA
We indeed can try this base R option without which
> v <- rowSums(col(m) * (m == 0.5))
> cbind(row = 1:nrow(m), col = v * NA^(!v))
row col
[1,] 1 2
[2,] 2 NA
[3,] 3 NA
I spent a while the other day looking for a way to check if a row vector is contained in some set of row vectors in R. Basically, I want to generalize the %in% operator to match a tuple instead of each entry in a vector. For example, I want:
row.vec = c("A", 3)
row.vec
# [1] "A" "3"
data.set = rbind(c("A",1),c("B",3),c("C",2))
data.set
# [,1] [,2]
# [1,] "A" "1"
# [2,] "B" "3"
# [3,] "C" "2"
row.vec %tuple.in% data.set
# [1] FALSE
for my made-up operator %tuple.in% because the row vector c("A",3) is not a row vector in data.set. Using the %in% operator gives:
row.vec %in% data.set
# [1] TRUE TRUE
because "A" and 3 are in data.set, which is not what I want.
I have two questions. First, are there any good existing solutions to this?
Second, since I couldn't find them (even if they exist), I tried to write my own function to do it. It works for an input matrix of row vectors, but I'm wondering if any experts have proposed improvements:
is.tuple.in <- function(matrix1, matrix2){
# Apply rbind() so that matrix1 has columns even if it is a row vector.
matrix1 = rbind(matrix1)
if(ncol(matrix1) != ncol(matrix2)){
stop("Matrices must have the same number of columns.") }
# Now check for the first row and handle other rows recursively
row.vec = matrix1[1,]
tuple.found = FALSE
for(i in 1:nrow(matrix2)){
# If we find a match, then this row exists in matrix 2 and we can break the loop
if(all(row.vec == matrix2[i,])){
tuple.found = TRUE
break
}
}
# If there are more rows to be checked, use a recursive call
if(nrow(matrix1) > 1){
return(c(tuple.found, is.tuple.in(matrix1[2:nrow(matrix1),],matrix2)))
} else {
return(tuple.found)
}
}
I see a couple problems with that that I'm not sure how to fix. First, I'd like the base case to be clear at the start of the function. I didn't manage to do this because I pass matrix1[2:nrow(matrix1),] in the recursive call, which produces an error if matrix1 has one row. So instead of getting to a case where matrix1 is empty, I have an if condition at the end deciding if more iterations are necessary.
Second, I think the use of rbind() at the start is sloppy, but I needed it for when matrix1 had been reduced to a single row. Without using rbind(), ncol(matrix1) produced an error in the 1-row case. I figure my trouble here has to do with a lack of knowledge about R data types.
Any help would be appreciated.
I'm wondering if you have made this a bit more complicated than it is. For example,
set.seed(1618)
vec <- c(1,3)
mat <- matrix(rpois(1000,3), ncol = 2)
rownames(mat) <- 1:nrow(mat)
mat[sapply(1:nrow(mat), function(x) all(vec %in% mat[x, ])), ]
# gives me this
# [,1] [,2]
# 6 3 1
# 38 3 1
# 39 3 1
# 85 1 3
# 88 1 3
# 89 1 3
# 95 3 1
# 113 1 3
# ...
you could subset this further if you care about the order
or you could modify the function slightly:
mat[sapply(1:nrow(mat), function(x)
all(paste(vec, collapse = '') %in% paste(mat[x, ], collapse = ''))), ]
# [,1] [,2]
# 85 1 3
# 88 1 3
# 89 1 3
# 113 1 3
# 133 1 3
# 139 1 3
# 187 1 3
# ...
another example with a longer vector
set.seed(1618)
vec <- c(1,4,5,2)
mat <- matrix(rpois(10000, 3), ncol = 4)
rownames(mat) <- 1:nrow(mat)
mat[sapply(1:nrow(mat), function(x) all(vec %in% mat[x, ])), ]
# [,1] [,2] [,3] [,4]
# 57 2 5 1 4
# 147 1 5 2 4
# 279 1 2 5 4
# 303 1 5 2 4
# 437 1 5 4 2
# 443 1 4 5 2
# 580 5 4 2 1
# ...
I see a couple that match:
mat[sapply(1:nrow(mat), function(x)
all(paste(vec, collapse = '') %in% paste(mat[x, ], collapse = ''))), ]
# [,1] [,2] [,3] [,4]
# 443 1 4 5 2
# 901 1 4 5 2
# 1047 1 4 5 2
but only three
for your single row case:
vec <- c(1,4,5,2)
mat <- matrix(c(1,4,5,2), ncol = 4)
rownames(mat) <- 1:nrow(mat)
mat[sapply(1:nrow(mat), function(x)
all(paste(vec, collapse = '') %in% paste(mat[x, ], collapse = ''))), ]
# [1] 1 4 5 2
here is a simple function with the above code
is.tuplein <- function(vec, mat, exact = TRUE) {
rownames(mat) <- 1:nrow(mat)
if (exact)
tmp <- mat[sapply(1:nrow(mat), function(x)
all(paste(vec, collapse = '') %in% paste(mat[x, ], collapse = ''))), ]
else tmp <- mat[sapply(1:nrow(mat), function(x) all(vec %in% mat[x, ])), ]
return(tmp)
}
is.tuplein(vec = vec, mat = mat)
# [1] 1 4 5 2
seems to work, so let's make our own %in% operator:
`%tuple%` <- function(x, y) is.tuplein(vec = x, mat = y, exact = TRUE)
`%tuple1%` <- function(x, y) is.tuplein(vec = x, mat = y, exact = FALSE)
and try her out
set.seed(1618)
c(1,2,3) %tuple% matrix(rpois(1002,3), ncol = 3)
# [,1] [,2] [,3]
# 133 1 2 3
# 190 1 2 3
# 321 1 2 3
set.seed(1618)
c(1,2,3) %tuple1% matrix(rpois(1002,3), ncol = 3)
# [,1] [,2] [,3]
# 48 2 3 1
# 64 2 3 1
# 71 1 3 2
# 73 3 1 2
# 108 3 1 2
# 112 1 3 2
# 133 1 2 3
# 166 2 1 3
Does this do what you want (even for more than 2 columns)?
paste(row.vec,collapse="_") %in% apply(data.set,1,paste,collapse="_")
Been stuck with this for a while!
How can I manipulate the first list into the second?
list("X" = X, "Y" = Y, ...)
list("X" = c(X,n), "Y" = c(Y,n), ...)
where X and Y are matrices and n is an integer and the lists are of unknown size. Thanks!
If c(X,n), which coerces matrix to vector, is what you really want, then
lst <- list(a = matrix(1:4, 2), b = matrix(1:4, 2))
n <- 5
lapply(lst, c, n)
# $a
# [1] 1 2 3 4 5
#
# $b
# [1] 1 2 3 4 5
Say I have the following matrix mat, which is a binary indicator matrix for the levels A, B, and C for a set of 5 observations:
mat <- matrix(c(1,0,0,
1,0,0,
0,1,0,
0,1,0,
0,0,1), ncol = 3, byrow = TRUE)
colnames(mat) <- LETTERS[1:3]
> mat
A B C
[1,] 1 0 0
[2,] 1 0 0
[3,] 0 1 0
[4,] 0 1 0
[5,] 0 0 1
I want to convert that into a single factor such that the output is equivalent to fac defines as:
> fac <- factor(rep(LETTERS[1:3], times = c(2,2,1)))
> fac
[1] A A B B C
Levels: A B C
Extra points if you get the labels from the colnames of mat, but a set of numeric codes (e.g. c(1,1,2,2,3)) would also be acceptable as desired output.
Elegant solution with matrix multiplication (and shortest up to now):
as.factor(colnames(mat)[mat %*% 1:ncol(mat)])
This solution makes use of the arr.ind=TRUE argument of which, returning the matching positions as array locations. These are then used to index the colnames:
> factor(colnames(mat)[which(mat==1, arr.ind=TRUE)[, 2]])
[1] A A B B C
Levels: A B C
Decomposing into steps:
> which(mat==1, arr.ind=TRUE)
row col
[1,] 1 1
[2,] 2 1
[3,] 3 2
[4,] 4 2
[5,] 5 3
Use the values of the second column, i.e. which(...)[, 2] and index colnames:
> colnames(mat)[c(1, 1, 2, 2, 3)]
[1] "A" "A" "B" "B" "C"
And then convert to a factor
One way is to replicate the names out by row number and index directly with the matrix, then wrap that with factor to restore the levels:
factor(rep(colnames(mat), each = nrow(mat))[as.logical(mat)])
[1] A A B B C
Levels: A B C
If this is from model.matrix, the colnames have fac prepended, and so this should work the same but removing the extra text:
factor(gsub("^fac", "", rep(colnames(mat), each = nrow(mat))[as.logical(mat)]))
You could use something like this:
lvls<-apply(mat, 1, function(currow){match(1, currow)})
fac<-factor(lvls, 1:3, labels=colnames(mat))
Here is another one
factor(rep(colnames(mat), colSums(mat)))