How to do basic row name mapping of matrix in R?

How to do basic row name mapping of matrix in R? - r

I have very big matrix called A, I need to add one column to that matrix, which is the mapped row names of this matrix from other matrix called B .
row names of matrix A are in column called ID and it's mapped name is in column Sample
Here iss simple reproduceable example and expected output.
A<-cbind(c("a","b", "c"), c(1,2,3), c(10,14,15))
rownames(A)<-c("q","w","r")
> A
[,1] [,2] [,3]
q "a" "1" "10"
w "b" "2" "14"
r "c" "3" "15"
and second matrix, which row names of matrix A are in column ID and it's mapped name is in Sample column:
B<-cbind(c("d1","f2","g5","y4"),c("q","L","w","r"),c("qw","we","zr","ls"))
colnames(B)<-c("M","ID","Sample"
> B
M ID Sample
[1,] "d1" "q" "qw"
[2,] "f2" "L" "we"
[3,] "g5" "w" "zr"
[4,] "y4" "r" "ls"
>
Here is the expected output:
A<-cbind(c("a","b", "c"), c(1,2,3), c(10,14,15), c("qw","zr","ls"))
rownames(A)<-c("q","w","r")
> A
[,1] [,2] [,3] [,4]
q "a" "1" "10" "qw"
w "b" "2" "14" "zr"
r "c" "3" "15" "ls"
>
Would someone help me to implement it in R ?

You can also use the merge function in R.
> A <-matrix( data = NA, nrow = 3, ncol =3)
> A[1,] <- c("a" , "1", "10")
> A[2,] <- c( "b" , "2" , "14")
> A[3,] <- c("c" , "3" , "15")
>
> row.names(A) = c("q","w","r")
>
>
> B <- matrix(data = "NA" , nrow = 4, ncol = 3)
> B[1,] <- c("d1" ,"q" ,"qw")
> B[2,] <- c( "f2" ,"L" ,"we")
> B[3,] <- c("g5" ,"w", "zr")
> B[4,] <- c("y4", "r", "ls" )
> colnames(B) = c("M", "ID", "Sample")
> A
[,1] [,2] [,3]
q "a" "1" "10"
w "b" "2" "14"
r "c" "3" "15"
> B
M ID Sample
[1,] "d1" "q" "qw"
[2,] "f2" "L" "we"
[3,] "g5" "w" "zr"
[4,] "y4" "r" "ls"
>
> C <- merge(A, B, by.x = 0, by.y = "ID" )
> D <- C[,-5]
> D
Row.names V1 V2 V3 Sample
1 q a 1 10 qw
2 r c 3 15 ls
3 w b 2 14 zr

You were almost there just putting the sample matrices together.
While we cannot use the $ operator on matrices, we can use the dimnames (as well as the row/column numbers) to subset the matrix. Then we can find which ID are in the row names of A with %in%
> cbind(A, B[,"Sample"][B[,"ID"] %in% rownames(A)])
# [,1] [,2] [,3] [,4]
# q "a" "1" "10" "qw"
# w "b" "2" "14" "zr"
# r "c" "3" "15" "ls"

Related

Comparing rows of matrix and replacing matching elements

I want to compare two matrices. If row elements in the first matrix matches row elements in the second matrix, then I want the rows in the second matrix to be kept. If the rows do not match, then I want those rows to be to empty. I apologise that I had a quite similar question recently, but I still haven't been able to solve this one.
INPUT:
> mat1<-cbind(letters[3:8])
> mat1
[,1]
[1,] "c"
[2,] "d"
[3,] "e"
[4,] "f"
[5,] "g"
[6,] "h"
> mat2<-cbind(letters[1:5],1:5)
> mat2
[,1] [,2]
[1,] "a" "1"
[2,] "b" "2"
[3,] "c" "3"
[4,] "d" "4"
[5,] "e" "5"
Expected OUTPUT:
> mat3
[,1] [,2]
[1,] "NA" "NA"
[2,] "NA" "NA"
[3,] "c" "3"
[4,] "d" "4"
[5,] "e" "5"
I have unsuccessfully attempted this:
> mat3<-mat2[ifelse(mat2[,1] %in% mat1[,1],mat2,""),]
Error in mat2[ifelse(mat2[, 1] %in% mat1[, 1], mat2, ""), ] :
no 'dimnames' attribute for array
I have been struggling for hours, so any suggestions are welcomed.

You were on the right track, but the answer is a little simpler than what you were trying. mat2[, 1] %in% mat1[, 1] returns the matches as a logical vector, and we can just set the non-matches to NA using that vector as an index.
mat1<-cbind(letters[3:8])
mat2<-cbind(letters[1:5],1:5)
match <- mat2[,1] %in% mat1 # gives a T/F vector of matches
mat3 <- mat2
mat3[!match,] <- NA

How to perform a check on a permutation "on-the-fly" without storing the result in R

Assume we have the following permutations of the letters, "a", "b", and "c":
library(combinat)
do.call(rbind, permn(letters[1:3]))
# [,1] [,2] [,3]
# [1,] "a" "b" "c"
# [2,] "a" "c" "b"
# [3,] "c" "a" "b"
# [4,] "c" "b" "a"
# [5,] "b" "c" "a"
# [6,] "b" "a" "c"
Is it possible to perform some function on a given permutation "on-the-fly" (i.e., a particular row) without storing the result?
That is, if the row == "a" "c" "b" or row == "b" "c" "a", do not store the result. The desired result in this case would be:
# [,1] [,2] [,3]
# [1,] "a" "b" "c"
# [2,] "c" "a" "b"
# [3,] "c" "b" "a"
# [4,] "b" "a" "c"
I know I can apply a function to all the permutations on the fly within combinat::permn with the fun argument such as:
permn(letters[1:3], fun = function(x) {
res <- paste0(x, collapse = "")
if (res == "acb" | res == "bca") {
return(NA)
} else {
return(res)
}
})
But this stills stores an NA and the returned list has 6 elements instead of the desired 4 elements:
# [[1]]
# [1] "abc"
#
# [[2]]
# [1] NA
#
# [[3]]
# [1] "cab"
#
# [[4]]
# [1] "cba"
#
# [[5]]
# [1] NA
#
# [[6]]
# [1] "bac"
Note, I am not interested in subsequently removing the NA values; I am specifically interested in not appending to the result list "on-the-fly" for a given permutation.

We could use a magrittr pipeline where we rbind the input matrix to the Rows to be checked and omit the duplicate rows.
library(combinat)
library(magrittr)
Rows <- rbind(c("a", "c", "b"), c("b", "c", "a"))
do.call(rbind, permn(letters[1:3])) %>%
subset(tail(!duplicated(rbind(Rows, .)), -nrow(Rows)))
giving:
[,1] [,2] [,3]
[1,] "a" "b" "c"
[2,] "c" "a" "b"
[3,] "c" "b" "a"
[4,] "b" "a" "c"

You can return NULL for the particular condition that you want to ignore and rbind the result which will ignore the NULL elements and bind only the combinations that you need.
do.call(rbind, combinat::permn(letters[1:3], function(x)
if(!all(x == c("a", "c", "b") | x == c("b", "c", "a")))
return(x)
))
# [,1] [,2] [,3]
#[1,] "a" "b" "c"
#[2,] "c" "a" "b"
#[3,] "c" "b" "a"
#[4,] "b" "a" "c"
Similarly,
do.call(rbind, permn(letters[1:3],function(x) {
res <- paste0(x, collapse = "")
if (!res %in% c("acb","bca"))
return(res)
}))
# [,1]
#[1,] "abc"
#[2,] "cab"
#[3,] "cba"
#[4,] "bac"

Access vectors in a list with an array of indices in R

I have a list containing 3 vectors, e.g.:
> test_list
[[1]]
[1] "a" "b"
[[2]]
[1] "c" "d" "e"
[[3]]
[1] "f" "g"
I want to access elements of those vectors using an array containing the vector indices, e.g.:
> indices
[,1] [,2] [,3]
[1,] 1 3 2
[2,] 2 2 2
This is the desired output:
[,1] [,2] [,3]
[1,] "a" "e" "g"
[2,] "b" "d" "g"
I found the following way to do it:
test_list <- list(c("a", "b"), c("c", "d", "e"), c("f", "g"))
indices <- matrix(c(1, 3, 2, 2, 2, 2), nrow = 2, ncol = 3, byrow = TRUE)
t(apply(indices, 1, function(row){mapply(`[[`, test_list, row)}))
Is there a cleaner, more idiomatic way?

One option involving purrr could be:
map2(.x = test_list,
.y = asplit(indices, 2),
~ .x[.y]) %>%
transpose()
[[1]]
[1] "a" "e" "g"
[[2]]
[1] "b" "d" "g"
Or a base R solution using the idea from the comment provided by #nicola:
mapply(`[`, test_list, asplit(indices, 2))

Another option in base R
out <- do.call(rbind, lapply(test_list, `length<-`, max(lengths(test_list))))
`dim<-`(out[cbind(c(col(indices)), c(indices))], c(2, 3))
# [,1] [,2] [,3]
#[1,] "a" "e" "g"
#[2,] "b" "d" "g"

R dealing with NULL values while converting list to matrix

Suppose I have a structure like the following:
data = structure(list(person = structure(list(name = "A, B",
gender = "F", dead = NULL), .Names = c("name",
"gender", "dead")), person = structure(list(name = "C",
gender = "M", dead = "RIP"), .Names = c("name",
"gender", "dead"))), .Names = c("person", "person"))
and I want to convert it into a matrix
data = matrix(unlist(data), nrow = length(data), ncol=length(data[[1]]), byrow = TRUE)
How do I avoid recycling the elements when using matrix or even before that using only the base functions without plyr's rbind.fill?
The result is:
> data
[,1] [,2] [,3]
[1,] "A, B" "F" "C"
[2,] "M" "RIP" "A, B"
and I would like to get NA or "" where the value is NULL. For instance:
> data
[,1] [,2] [,3]
[1,] "A, B" "F" ""
[2,] "C" "M" "RIP"
Any help would be appreciated.

You could try the new stri_list2matrix function in the stringi package.
library(stringi)
stri_list2matrix(lapply(data, unlist), byrow=TRUE, fill="")
# [,1] [,2] [,3]
# [1,] "A, B" "F" ""
# [2,] "C" "M" "RIP"
Or for NA instead of "", leave out the fill argument
stri_list2matrix(lapply(data, unlist), byrow=TRUE)
# [,1] [,2] [,3]
# [1,] "A, B" "F" NA
# [2,] "C" "M" "RIP"
Or if you prefer a base R answer, to avoid problems you could make all vectors the same length first with length<-. This will append NA to all shorter vectors and make them the same length of the longest vector.
len <- max(sapply(data, length)) ## get length of longest vector
t(sapply(unname(data), function(x) `length<-`(unname(unlist(x)), len)))
# [,1] [,2] [,3]
# [1,] "A, B" "F" NA
# [2,] "C" "M" "RIP"

Subset dataframe into equal subgroup chunks

I have df dataframe that needs subsetting into chunks of 2 names. From example below, there are 4 unique names: a,b,c,d. I need to subset into 2 one column matrices a,b and c,d.
Output format:
name1
item_value
item_value
...
END
name2
item_value
item_value
...
END
Example:
#dummy data
df <- data.frame(name=sort(c(rep(letters[1:4],2),"a","a","c")),
item=round(runif(11,1,10)),
stringsAsFactors=FALSE)
#tried approach - split per name. I need to split per 2 names.
lapply(split(df,f=df$name),
function(x)
{name <- unique(x$name)
as.matrix(c(name,x[,2],"END"))
})
#expected output
[,1]
[1,] "a"
[2,] "8"
[3,] "9"
[4,] "6"
[5,] "4"
[6,] "END"
[1,] "b"
[2,] "2"
[3,] "10"
[4,] "END"
[,2]
[1,] "c"
[2,] "6"
[3,] "6"
[4,] "2"
[5,] "END"
[1,] "d"
[2,] "4"
[3,] "1"
[4,] "END"
Note: Actual df has ~300000 rows with ~35000 unique names.

You may try this.
# for each 'name', "pad" 'item' with 'name' and 'END'
l1 <- lapply(split(df, f = df$name), function(x){
name <- unique(x$name)
as.matrix(c(name, x$item, "END"))
})
# create a sequence of numbers, to select two by two elements from the list
steps <- seq(from = 0, to = length(unique(df$name))/2, by = 2)
# loop over 'steps' to bind together list elements, two by two.
l2 <- lapply(steps, function(x){
do.call(rbind, l1[1:2 + x])
})
l2
# [[1]]
# [,1]
# [1,] "a"
# [2,] "6"
# [3,] "4"
# [4,] "10"
# [5,] "3"
# [6,] "END"
# [7,] "b"
# [8,] "6"
# [9,] "7"
# [10,] "END"
#
# [[2]]
# [,1]
# [1,] "c"
# [2,] "2"
# [3,] "6"
# [4,] "10"
# [5,] "END"
# [6,] "d"
# [7,] "5"
# [8,] "4"
# [9,] "END"

Instead of making the lists from individual names make it from the column of subsets of the data.frame
res <- list("a_b" = c(df[df$name == "a",2],"END",df[df$name == "b", 2],"END"),
"c_d" = c(df[df$name == "c",2],"END", df[df$name == "d", 2],"END"))
res2 <- vector(mode="list",length=2)
res2 <- sapply(1:(length(unique(df$name))/2),function(x) {
sapply(seq(1,length(unique(df$name))-1,by=2), function(y) {
name <- unique(df$name)
res2[x] <- as.matrix(c(name[y],df[df$name == name[y],2],"END",name[y+1],df[df$name == name[y+1],2],"END"))
})
})
answer <- res2[,1]
This is giving me a matrix of lists since there are two sapplys happening, I think everything you want is in res2[,1]