create a matrix of given dimensions from a vector without replacement - r

I want to create a matrix from a vector, but the number of entries isn't divisable by the dimensions. example below.
vector1 <- c('a','b','c','d','e','f','g')
result1 <- a b c
d e f
g
I want the result to be 3 columns wide and fill as many rows as necessary. I want empty spaces or something easily distinguishable at the end not replaced values.

Pre-calculate nrow and ncol to create matrix.
vector1 <- c('a','b','c','d','e','f','g')
ncol <- 3
nrow <- ceiling(length(vector1)/ncol)
matrix(vector1[seq_len(nrow * ncol)], ncol = 3, byrow = TRUE)
# [,1] [,2] [,3]
#[1,] "a" "b" "c"
#[2,] "d" "e" "f"
#[3,] "g" NA NA

We can use stri_list2matrix
library(collapse)
library(stringi)
stri_list2matrix(rsplit(vector1, as.integer(gl(length(vector1), 3,
length(vector1)))), byrow = TRUE)
# [,1] [,2] [,3]
#[1,] "a" "b" "c"
#[2,] "d" "e" "f"
#[3,] "g" NA NA
data
vector1 <- c('a','b','c','d','e','f','g')

Related

Compare two matrices, keeping values in one matrix that are TRUE in the other

This seems to be an easy task, which I am not finding a solution on R after looking up here and elsewhere. I have two matrices, one with string values and another with logical values.
a <- matrix(c(
"A", "B", "C"
))
b <- matrix(c(
T, F, T
))
> b
[,1]
[1,] TRUE
[2,] FALSE
[3,] TRUE
> a
[,1]
[1,] "A"
[2,] "B"
[3,] "C"
I need to create a third matrix that keeps values in the first that are TRUE in the second, and leaving NA on the remainder, like so:
> C
[,1]
[1,] "A"
[2,] NA
[3,] "C"
How do I achieve the above result?
C <- matrix(a[ifelse(b, T, NA)], ncol = ncol(a))
Here is an alternative by just assigning the NA to FALSE:
a[b==FALSE] <- NA
[,1]
[1,] "A"
[2,] NA
[3,] "C"
using which:
c<-a
c[which(b==FALSE)]<-NA
a <- a[b] . This might also work, Depending on how you want the result.

Split string with n repetitive elements into n sub-strings

I have a string that is a concatenation of m possible types of elements - for the sake of simplicity m = 4 with A, B, C and D.
Whenever there are single elements more than once, I would have to split the string so that there are no repetitions left. However, I would like to generate all possible strings without repetitions.
To make this a little bit clearer, here is an example:
For A B A C D
String: A B C D
String: B A C D
This gets more complicated when there are several different elements that show up more than once:
For A B A C B D
String: A B C D
String: A C B D
String: B A C D
String: A C B D
Is there a smart way to compute this in R?
vec <- c("A","B","A","C","B","D")
combs <- lapply(setNames(nm = unique(vec)), function(a) which(vec == a))
eg <- do.call(expand.grid, combs)
out <- t(apply(eg, 1, function(r) names(eg)[order(r)]))
# [,1] [,2] [,3] [,4]
# [1,] "A" "B" "C" "D"
# [2,] "B" "A" "C" "D"
# [3,] "A" "C" "B" "D"
# [4,] "A" "C" "B" "D"
out
First vector:
vec <- c("A","B","A","C","D")
# ...
# [,1] [,2] [,3] [,4]
# [1,] "A" "B" "C" "D"
# [2,] "B" "A" "C" "D"
If you are starting and ending with strings vice vectors, then know that you can wrap the above with:
strsplit("ABACBD", "")[[1]]
# [1] "A" "B" "A" "C" "B" "D"
apply(out, 1, paste, collapse = "")
# [1] "ABCD" "BACD" "ACBD" "ACBD"

Access vectors in a list with an array of indices in R

I have a list containing 3 vectors, e.g.:
> test_list
[[1]]
[1] "a" "b"
[[2]]
[1] "c" "d" "e"
[[3]]
[1] "f" "g"
I want to access elements of those vectors using an array containing the vector indices, e.g.:
> indices
[,1] [,2] [,3]
[1,] 1 3 2
[2,] 2 2 2
This is the desired output:
[,1] [,2] [,3]
[1,] "a" "e" "g"
[2,] "b" "d" "g"
I found the following way to do it:
test_list <- list(c("a", "b"), c("c", "d", "e"), c("f", "g"))
indices <- matrix(c(1, 3, 2, 2, 2, 2), nrow = 2, ncol = 3, byrow = TRUE)
t(apply(indices, 1, function(row){mapply(`[[`, test_list, row)}))
Is there a cleaner, more idiomatic way?
One option involving purrr could be:
map2(.x = test_list,
.y = asplit(indices, 2),
~ .x[.y]) %>%
transpose()
[[1]]
[1] "a" "e" "g"
[[2]]
[1] "b" "d" "g"
Or a base R solution using the idea from the comment provided by #nicola:
mapply(`[`, test_list, asplit(indices, 2))
Another option in base R
out <- do.call(rbind, lapply(test_list, `length<-`, max(lengths(test_list))))
`dim<-`(out[cbind(c(col(indices)), c(indices))], c(2, 3))
# [,1] [,2] [,3]
#[1,] "a" "e" "g"
#[2,] "b" "d" "g"

Print a vector to file with predefined number of columns, without recycling

I have a vector x containing 13 elements:
x <- letters[c(1:9, 12:15)]
x
# [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "l" "m" "n" "o"
I want to print the vector to a file in the following format, i.e. with 3 columns:
a b c
d e f
g h i
l m n
o
I can't find any direct way to print it in this way.
I tried to convert the vector to a matrix with 3 columns:
matrix(x, ncol = 3, byrow = TRUE)
# [,1] [,2] [,3]
# [1,] "a" "b" "c"
# [2,] "d" "e" "f"
# [3,] "g" "h" "i"
# [4,] "l" "m" "n"
# [5,] "o" "a" "b" # <~~ "a" and "b" recycled
# Warning message:
# In matrix(x, ncol = 3, byrow = TRUE) :
# data length [13] is not a sub-multiple or multiple of the number of rows [5]
But this method recycles the last two values of x (a and b) at the end of the matrix and I don't want that.
We can use stri_list2matrix from stringi after splitting the vector ('v1') into groups of successive 3 elements into a list ("lst"). The grouping can be done by gl or using %/% (ie. (seq_along(v1)-1)%/%3+1).
library(stringi)
lst <- split(v1, as.numeric(gl(length(v1), 3, length(v1))))
stri_list2matrix(lst, byrow=TRUE, fill='')
# [,1] [,2] [,3]
#[1,] "a" "b" "c"
#[2,] "d" "e" "f"
#[3,] "g" "h" "i"
#[4,] "l" "m" "n"
#[5,] "o" "" ""
Or using base R, we can pad "NA's" into those list elements that have less number of elements compared to the maximum length.
t(sapply(lst, `length<-`, max(sapply(lst, length))))
data
v1 <- letters[c(1:9,12:15)]
Some possibilities:
m <- matrix(x, ncol = 3, byrow = TRUE)
# create a matrix of x indices of same dimension as "m"
# find duplicated indices, retain same dimensions using MARGIN = 0
# replace duplicated x indices with `""`
m[duplicated(matrix(seq_along(x), ncol = 3, byrow = TRUE), MARGIN = 0)] <- ""
m
# [,1] [,2] [,3]
# [1,] "a" "b" "c"
# [2,] "d" "e" "f"
# [3,] "g" "h" "i"
# [4,] "l" "m" "n"
# [5,] "o" "" ""
Or:
# create row index
rr <- ceiling(seq_along(x)/3)
# create column index
cc <- ave(rr, rr, FUN = seq_along)
# create an empty matrix
m <- matrix("", nrow = max(rr), ncol = 3)
# replace values at idx with X
m[cbind(rr, cc)] <- x
m
# [,1] [,2] [,3]
# [1,] "a" "b" "c"
# [2,] "d" "e" "f"
# [3,] "g" "h" "i"
# [4,] "l" "m" "n"
# [5,] "o" "" ""
Added benchmarks
a1 <- function(x){
lst <- split(x, as.numeric(gl(length(x), 3, length(x))))
stri_list2matrix(lst, byrow = TRUE, fill = '')
}
a2 <- function(x){
lst <- split(x, as.numeric(gl(length(x), 3, length(x))))
t(sapply(lst, `length<-`, max(sapply(lst, length))))
}
h1 <- function(x){
m <- matrix(x, ncol = 3, byrow = TRUE)
m[duplicated(matrix(seq_along(x), ncol = 3, byrow = TRUE), MARGIN = 0)] <- ""
m
}
h2 <- function(x){
rr <- ceiling(seq_along(x)/3)
cc <- ave(rr, rr, FUN = seq_along)
m <- matrix("", nrow = max(rr), ncol = 3)
m[cbind(rr, cc)] <- x
m
}
x <- sample(letters, 13, replace = TRUE)
library(microbenchmark)
microbenchmark(
a1(x),
a2(x),
h1(x),
h2(x),
times = 5)
# Unit: microseconds
# expr min lq mean median uq max neval cld
# a1(x) 135.708 140.270 171.5926 144.451 162.317 275.217 5 a
# a2(x) 270.276 270.655 277.2696 271.035 283.960 290.422 5 b
# h1(x) 191.968 217.436 246.4028 224.659 225.420 372.531 5 ab
# h2(x) 408.264 409.784 425.2176 411.685 412.445 483.910 5 c
x <- sample(letters, 1e6, replace = TRUE)
microbenchmark(
a1(x),
a2(x),
h1(x),
h2(x),
times = 5)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# a1(x) 2406.03363 2411.56990 2554.72548 2606.9757 2623.604 2725.4443 5 b
# a2(x) 4266.47556 4292.69452 4510.61242 4513.6833 4653.349 4826.8594 5 c
# h1(x) 76.51557 76.79041 91.24598 78.8207 101.336 122.7672 5 a
# h2(x) 2419.89711 2570.22968 2636.08654 2663.2898 2751.662 2775.3540 5 b
Thus, for a small vector a1 is faster. For a larger vector h1 is about 25* faster.

Convert List to Vectors in R (1st vector comprises of 1st element of each element of list, etc.) in R

I am wondering how to convert lists to vectors in R where each vector comprises of an element of an element of the list. In particular, I would like the 1st vector to comprise of the 1st element of each element of the list. I would like the 2nd vector to comprise of the 2nd element of each element of the list. More generally, I would like the nth vector to comprise of the nth element of each element of the list. Thus, n will equal the length of the longest element of the list.
For example, suppose we had:
mylist = list(c("a", "b"), c(character(0)), c(1, 2, 3))
I would like to create three vectors where
first_vector = c("a", NA, 1)
second_vector = c("b", NA, 2)
third_vector = c(NA, NA, 3)
As you can see in the above example, I may have additional complications due to missing values.
Thank you so much in advance for help!
-Vincent
Usually creating endless amount of objects in your global environment is bad practice, and since all your vectors are of the same length, you could just create one matrix instead
indx <- max(lengths(mylist))
sapply(mylist, `length<-`, indx)
# [,1] [,2] [,3]
# [1,] "a" NA "1"
# [2,] "b" NA "2"
# [3,] NA NA "3"
You could also consider stri_list2matrix from the "stringi" package. Depending on the orientation desired:
library(stringi)
stri_list2matrix(mylist)
# [,1] [,2] [,3]
# [1,] "a" NA "1"
# [2,] "b" NA "2"
# [3,] NA NA "3"
stri_list2matrix(mylist, byrow = TRUE)
# [,1] [,2] [,3]
# [1,] "a" "b" NA
# [2,] NA NA NA
# [3,] "1" "2" "3"

Resources