R dealing with NULL values while converting list to matrix - r

Suppose I have a structure like the following:
data = structure(list(person = structure(list(name = "A, B",
gender = "F", dead = NULL), .Names = c("name",
"gender", "dead")), person = structure(list(name = "C",
gender = "M", dead = "RIP"), .Names = c("name",
"gender", "dead"))), .Names = c("person", "person"))
and I want to convert it into a matrix
data = matrix(unlist(data), nrow = length(data), ncol=length(data[[1]]), byrow = TRUE)
How do I avoid recycling the elements when using matrix or even before that using only the base functions without plyr's rbind.fill?
The result is:
> data
[,1] [,2] [,3]
[1,] "A, B" "F" "C"
[2,] "M" "RIP" "A, B"
and I would like to get NA or "" where the value is NULL. For instance:
> data
[,1] [,2] [,3]
[1,] "A, B" "F" ""
[2,] "C" "M" "RIP"
Any help would be appreciated.

You could try the new stri_list2matrix function in the stringi package.
library(stringi)
stri_list2matrix(lapply(data, unlist), byrow=TRUE, fill="")
# [,1] [,2] [,3]
# [1,] "A, B" "F" ""
# [2,] "C" "M" "RIP"
Or for NA instead of "", leave out the fill argument
stri_list2matrix(lapply(data, unlist), byrow=TRUE)
# [,1] [,2] [,3]
# [1,] "A, B" "F" NA
# [2,] "C" "M" "RIP"
Or if you prefer a base R answer, to avoid problems you could make all vectors the same length first with length<-. This will append NA to all shorter vectors and make them the same length of the longest vector.
len <- max(sapply(data, length)) ## get length of longest vector
t(sapply(unname(data), function(x) `length<-`(unname(unlist(x)), len)))
# [,1] [,2] [,3]
# [1,] "A, B" "F" NA
# [2,] "C" "M" "RIP"

Related

I want to check if a string like A B C D exists in each cell of its corresponding row or not. If it does not exists then NA should be returned

I have put the data and output here. In first row if anything is not A,B,C or D then it should return NA , in second row if anything is not A,C,B or E then return NA
Here is a example showing one option to make it
> t(mapply(function(a, b) b[match(a, b)], asplit(x, 1), strsplit(y, "")))
[,1] [,2] [,3] [,4]
[1,] NA "B" "C" "A"
[2,] NA "B" "C" NA
Data
> x <- rbind(c("E", "B", "C", "A"), c("S", "B", "C", "D"))
> y <- c("ABCD", "ACBE")
> x
[,1] [,2] [,3] [,4]
[1,] "E" "B" "C" "A"
[2,] "S" "B" "C" "D"
> y
[1] "ABCD" "ACBE"

create a matrix of given dimensions from a vector without replacement

I want to create a matrix from a vector, but the number of entries isn't divisable by the dimensions. example below.
vector1 <- c('a','b','c','d','e','f','g')
result1 <- a b c
d e f
g
I want the result to be 3 columns wide and fill as many rows as necessary. I want empty spaces or something easily distinguishable at the end not replaced values.
Pre-calculate nrow and ncol to create matrix.
vector1 <- c('a','b','c','d','e','f','g')
ncol <- 3
nrow <- ceiling(length(vector1)/ncol)
matrix(vector1[seq_len(nrow * ncol)], ncol = 3, byrow = TRUE)
# [,1] [,2] [,3]
#[1,] "a" "b" "c"
#[2,] "d" "e" "f"
#[3,] "g" NA NA
We can use stri_list2matrix
library(collapse)
library(stringi)
stri_list2matrix(rsplit(vector1, as.integer(gl(length(vector1), 3,
length(vector1)))), byrow = TRUE)
# [,1] [,2] [,3]
#[1,] "a" "b" "c"
#[2,] "d" "e" "f"
#[3,] "g" NA NA
data
vector1 <- c('a','b','c','d','e','f','g')

Access vectors in a list with an array of indices in R

I have a list containing 3 vectors, e.g.:
> test_list
[[1]]
[1] "a" "b"
[[2]]
[1] "c" "d" "e"
[[3]]
[1] "f" "g"
I want to access elements of those vectors using an array containing the vector indices, e.g.:
> indices
[,1] [,2] [,3]
[1,] 1 3 2
[2,] 2 2 2
This is the desired output:
[,1] [,2] [,3]
[1,] "a" "e" "g"
[2,] "b" "d" "g"
I found the following way to do it:
test_list <- list(c("a", "b"), c("c", "d", "e"), c("f", "g"))
indices <- matrix(c(1, 3, 2, 2, 2, 2), nrow = 2, ncol = 3, byrow = TRUE)
t(apply(indices, 1, function(row){mapply(`[[`, test_list, row)}))
Is there a cleaner, more idiomatic way?
One option involving purrr could be:
map2(.x = test_list,
.y = asplit(indices, 2),
~ .x[.y]) %>%
transpose()
[[1]]
[1] "a" "e" "g"
[[2]]
[1] "b" "d" "g"
Or a base R solution using the idea from the comment provided by #nicola:
mapply(`[`, test_list, asplit(indices, 2))
Another option in base R
out <- do.call(rbind, lapply(test_list, `length<-`, max(lengths(test_list))))
`dim<-`(out[cbind(c(col(indices)), c(indices))], c(2, 3))
# [,1] [,2] [,3]
#[1,] "a" "e" "g"
#[2,] "b" "d" "g"

bind the same vector in multiple rows

I have this vector:
Vec=c("a" , "b", "c ", "d")
I want this as data frame:
[,1] [,2] [,3] [,4]
[1,] a b c d
[2,] a b c d
[3,] a b c d
[4,] a b c d
[5,] a b c d
Another option:
t(replicate(5, Vec))
# [,1] [,2] [,3] [,4]
#[1,] "a" "b" "c " "d"
#[2,] "a" "b" "c " "d"
#[3,] "a" "b" "c " "d"
#[4,] "a" "b" "c " "d"
#[5,] "a" "b" "c " "d"
One way using rbind and do.call would be:
do.call(rbind, replicate(5, Vec, simplify = FALSE))
[,1] [,2] [,3] [,4]
[1,] "a" "b" "c " "d"
[2,] "a" "b" "c " "d"
[3,] "a" "b" "c " "d"
[4,] "a" "b" "c " "d"
[5,] "a" "b" "c " "d"
You can replace 5 with any number you like.
replicate returns the Vec 5 times in a list (simplify = FALSE creates the list) . These elements are rbind-ed using do.call.
Update:
Actually using matrix is probably the best:
> matrix(Vec, nrow=5, ncol=length(Vec), byrow=TRUE)
[,1] [,2] [,3] [,4]
[1,] "a" "b" "c " "d"
[2,] "a" "b" "c " "d"
[3,] "a" "b" "c " "d"
[4,] "a" "b" "c " "d"
[5,] "a" "b" "c " "d"
Change the nrow argument to whatever number you want and you ll have it ready.
All 3 answers will need to use as.data.frame to convert to a data.frame so I am excluding this from the microbenchmark:
Microbenchmark
> microbenchmark::microbenchmark(t(replicate(5, Vec)),
+ do.call(rbind, replicate(5, Vec, simplify = FALSE)),
+ matrix(Vec, nrow=5, ncol=4, byrow=TRUE),
+ times=1000)
Unit: microseconds
expr min lq mean median uq max neval
t(replicate(5, Vec)) 52.854 59.013 68.393740 63.374 70.815 1749.326 1000
do.call(rbind, replicate(5, Vec, simplify = FALSE)) 18.986 23.092 27.325856 25.144 27.710 105.708 1000
matrix(Vec, nrow = 5, ncol = 4, byrow = TRUE) 1.539 2.566 3.474166 3.079 3.593 29.763 1000
As you can see the matrix solution is by far the best.

How to do basic row name mapping of matrix in R?

I have very big matrix called A, I need to add one column to that matrix, which is the mapped row names of this matrix from other matrix called B .
row names of matrix A are in column called ID and it's mapped name is in column Sample
Here iss simple reproduceable example and expected output.
A<-cbind(c("a","b", "c"), c(1,2,3), c(10,14,15))
rownames(A)<-c("q","w","r")
> A
[,1] [,2] [,3]
q "a" "1" "10"
w "b" "2" "14"
r "c" "3" "15"
and second matrix, which row names of matrix A are in column ID and it's mapped name is in Sample column:
B<-cbind(c("d1","f2","g5","y4"),c("q","L","w","r"),c("qw","we","zr","ls"))
colnames(B)<-c("M","ID","Sample"
> B
M ID Sample
[1,] "d1" "q" "qw"
[2,] "f2" "L" "we"
[3,] "g5" "w" "zr"
[4,] "y4" "r" "ls"
>
Here is the expected output:
A<-cbind(c("a","b", "c"), c(1,2,3), c(10,14,15), c("qw","zr","ls"))
rownames(A)<-c("q","w","r")
> A
[,1] [,2] [,3] [,4]
q "a" "1" "10" "qw"
w "b" "2" "14" "zr"
r "c" "3" "15" "ls"
>
Would someone help me to implement it in R ?
You can also use the merge function in R.
> A <-matrix( data = NA, nrow = 3, ncol =3)
> A[1,] <- c("a" , "1", "10")
> A[2,] <- c( "b" , "2" , "14")
> A[3,] <- c("c" , "3" , "15")
>
> row.names(A) = c("q","w","r")
>
>
> B <- matrix(data = "NA" , nrow = 4, ncol = 3)
> B[1,] <- c("d1" ,"q" ,"qw")
> B[2,] <- c( "f2" ,"L" ,"we")
> B[3,] <- c("g5" ,"w", "zr")
> B[4,] <- c("y4", "r", "ls" )
> colnames(B) = c("M", "ID", "Sample")
> A
[,1] [,2] [,3]
q "a" "1" "10"
w "b" "2" "14"
r "c" "3" "15"
> B
M ID Sample
[1,] "d1" "q" "qw"
[2,] "f2" "L" "we"
[3,] "g5" "w" "zr"
[4,] "y4" "r" "ls"
>
> C <- merge(A, B, by.x = 0, by.y = "ID" )
> D <- C[,-5]
> D
Row.names V1 V2 V3 Sample
1 q a 1 10 qw
2 r c 3 15 ls
3 w b 2 14 zr
You were almost there just putting the sample matrices together.
While we cannot use the $ operator on matrices, we can use the dimnames (as well as the row/column numbers) to subset the matrix. Then we can find which ID are in the row names of A with %in%
> cbind(A, B[,"Sample"][B[,"ID"] %in% rownames(A)])
# [,1] [,2] [,3] [,4]
# q "a" "1" "10" "qw"
# w "b" "2" "14" "zr"
# r "c" "3" "15" "ls"

Resources