Combine multiple cells into one cell in R - r

I want to combine vectors of values, each currently saved as a row in a matrix, into single cells, with values separate by commas.
My current code creates random vectors.
For instance,
## Group 1
N <- 10
set.seed(06510)
grp1 <- t(replicate(N, sample(seq(1:4), 4, replace = FALSE)) )
The results look like
Table 1:
[,1] [,2] [,3] [,4]
[1,] 2 4 3 1
[2,] 4 2 1 3
[3,] 2 4 1 3
[4,] 1 4 3 2
[5,] 1 3 2 4
[6,] 2 1 3 4
[7,] 4 3 2 1
[8,] 4 1 3 2
[9,] 2 4 3 1
[10,] 1 4 2 3
But I want the results to look like:
Table 2:
[,1]
[1,] 2,4,3,1
[2,] 4,2,1,3
[3,] 2,4,1,3
[4,] 1,4,3,2
[5,] 1,3,2,4
[6,] 2,1,3,4
[7,] 4,3,2,1
[8,] 4,1,3,2
[9,] 2,4,3,1
[10,] 1,4,2,3
I'm creating a randomization table and each cell represents the ordering of 4 survey questions for each survey respondent. Ultimately, I want to create multiple columns like the one above, so maintaining 4 columns for every randomization item will make for a big hard-to-read randomization table.

Your main problem is you need to use the function I() to protect the list strucuture. Your second problem is that you need to return a list structure from replicate() which is returning a matrix (because you have a set of equal length vectors). Set simplify = FALSE and note where the transpose operation t occurs....
grp1 <- replicate(N, t( sample(seq(1:4), 4, replace = FALSE ) ) , simplify = FALSE )
as.data.frame( I(grp1) )
# I(grp1)
#1 2, 4, 3, 1
#2 4, 2, 1, 3
#3 2, 4, 1, 3
#4 1, 4, 3, 2
#5 1, 3, 2, 4
#6 2, 1, 3, 4
#7 4, 3, 2, 1
#8 4, 1, 3, 2
#9 2, 4, 3, 1
#10 1, 4, 2, 3
# And just to check...
sapply( as.data.frame( I(grp1) ) , mode )
I(grp1)
"list"
However, I don't know why this is more useful to you than a plain old data.frame or probably even better for your use-case, a list of matrices.

Whatever you do, you will end up with characters, I hope you are not surprised by that.
apply(grp1,1,paste,collapse=",")
gives you a vector result. You can turn that into a matrix like this:
matrix(apply(grp1,1,paste,collapse=","),ncol=1)
See ?apply. apply() is enormously useful.

Related

Efficiently replicate matrix rows by group in R

I am trying to find a way to efficiently replicate rows of a matrix in R based on a group. Let's say I have the following matrix a:
a <- matrix(
c(1, 2, 3,
4, 5, 6,
7, 8, 9),
ncol = 3, byrow = TRUE
)
I want to create a new matrix where each row in a is repeated based on a number specified in a vector (what I'm calling a "group"), e.g.:
reps <- c(2, 3, 4)
In this case, the resulting matrix would be:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 3
[3,] 4 5 6
[4,] 4 5 6
[5,] 4 5 6
[6,] 7 8 9
[7,] 7 8 9
[8,] 7 8 9
[9,] 7 8 9
This is the only solution I've come up with so far:
matrix(
rep(a, times = rep(reps, times = 3)),
ncol = 3, byrow = FALSE
)
Notice that in this solution I have to use rep() twice - first to replicate the reps vector, and then again to actually replicate each row of a.
This solution works fine, but I'm looking for a more efficient solution as in my case this is being done inside an optimization loop and is being computed in each iteration of the loop, and it's rather slow if a is large.
I'll note that this question is very similar, but it is about repeating each row the same number of times. This question is also similarly about efficiency, but it's about replicating entire matrices.
UPDATE
Since I'm interested in efficiency, here is a simple comparison of the solutions provided thus far...I'll update this as more come in, but in general it looks like the seq_along solution by F. Privé is the fastest.
library(dplyr)
library(tidyr)
a <- matrix(seq(9), ncol = 3, byrow = TRUE)
reps <- c(2, 3, 4)
rbenchmark::benchmark(
"original solution" = {
result <- matrix(rep(a, times = rep(reps, times = 3)),
ncol = 3, byrow = FALSE)
},
"seq_along" = {
result <- a[rep(seq_along(reps), reps), ]
},
"uncount" = {
result <- as.data.frame(a) %>%
uncount(reps)
},
replications = 1000,
columns = c("test", "replications", "elapsed", "relative")
)
test replications elapsed relative
1 original solution 1000 0.004 1.333
2 seq_along 1000 0.003 1.000
3 uncount 1000 1.722 574.000
Simply use a[rep(seq_along(reps), reps), ].
Another option with uncount
library(dplyr)
library(tidyr)
as.data.frame(a) %>%
uncount(reps)
-ouptut
V1 V2 V3
1 1 2 3
2 1 2 3
3 4 5 6
4 4 5 6
5 4 5 6
6 7 8 9
7 7 8 9
8 7 8 9
9 7 8 9
Another base R option (not as elegant as the answer by #F. Privé or #akrun)
> t(do.call(cbind, mapply(replicate, reps, asplit(a, 1))))
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 3
[3,] 4 5 6
[4,] 4 5 6
[5,] 4 5 6
[6,] 7 8 9
[7,] 7 8 9
[8,] 7 8 9
[9,] 7 8 9

Subset assignment of multidimensional array in R

I am trying to assign rows of a 3D array, but I don't know how excatly.
I have a 2D index array where each row corresponds to the first and second index of the 3D array, and a 2D value array which i want to insert into the 3D array. The simplest way I found to do this was
indexes <- cbind(1:30, rep(c(1, 2), 15))
rows <- cbind(1:20, 31:50, 71:90)
for (i in 1:nrow(indexes)) for (j in 1:3)
data[indexes[i,1], indexes[i,2], j] <- rows[i, j]
But this is hard to read, because it uses nested indexing, so I was hoping there was a simpler way, like
data[indexes,] <- rows
(this does not work)
What I've tried:
this question shows how to index the array (without assignment)
apply(data, 3, `[`, indexes)
but this doesn't allow assignment
apply(data, 3, `[`, indexes) <- rows #: could not find function "apply<-"
nor does using [<- work:
apply(data, 3, `[<-`, indexes, rows)
because it treats rows as a vector.
Neither of the following works either
data[indexes[1], indexes[2],] <- rows #: subscript out of bounds
data[indexes,] <- rows #: incorrect number of subscripts on matrix
So is there a simpler way of assigning to a multidimensional array?
Your indexes variable implies that data has first dim of 30, but rows[30,j] doesn't exist. So your problem isn't well posed, and I'll change it.
The basic idea is that you can index a 3 way array by an n x 3 matrix. Each row of the matrix corresponds to a location in the 3 way array, so if you want to set entry data[1,2,3] to 4, and entry data[5,6,7] to 8, you'd use
index <- rbind(c(1,2,3), c(5,6,7))
data[index] <- c(4,8)
You will need to expand your indexes variable to replicate each row 3 times, then read the rows matrix as a vector, and then this works:
data <- array(NA, dim=c(30, 2, 3))
indexes <- cbind(1:30, rep(c(1, 2), 15))
rows <- cbind(1:30, 31:60, 71:100)
indexes1 <- indexes[rep(1:nrow(indexes), each = 3),]
indexes2 <- cbind(indexes1, 1:3)
data[indexes2] <- t(rows) # Transpose because R reads down columns first
I don't think this is any simpler than what you had with the for loops, but maybe you'll find it preferable.
After reading #user2554330's answer, I found a slightly simpler solution
# initialize as in user2554330's answer
data <- ...
indexes <- ...
rows <- ...
indexes3 <- as.matrix(merge(indexes, 1:3))
data[indexes3] <- rows
comparison of indexes2 and indexes3 (using fewer elements):
# print(indexes2)
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 1 2
[3,] 1 1 3
[4,] 2 2 1
[5,] 2 2 2
[6,] 2 2 3
[7,] 3 1 1
[8,] 3 1 2
[9,] 3 1 3
[10,] 4 2 1
[11,] 4 2 2
[12,] 4 2 3
# print(indexes3)
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 2 1
[3,] 3 1 1
[4,] 4 2 1
[5,] 1 1 2
[6,] 2 2 2
[7,] 3 1 2
[8,] 4 2 2
[9,] 1 1 3
[10,] 2 2 3
[11,] 3 1 3
[12,] 4 2 3

Converting a list of lists into a data frame

I have a function that first generates a list of vectors (generated by using lapply), and then cbinds it to a column vector. I thought this would produce a dataframe. However, it produces a list of lists.
The cbind function isn't working as I thought it would.
Here's a small example of what the function is generating
col_test <- c(1, 2, 1, 1, 2)
lst_test <- list(c(1, 2 , 3), c(2, 2, 2), c(1, 1, 2), c(1, 2, 2), c(1, 1, 1))
a_df <- cbind(col_test, lst_test)
Typing
> a_df[1,]
gives the output
$`col_test`
[1] 1
$lst_test
[1] 1 2 3
I'd like the data frame to be
[,1] [,2] [,3] [,4]
[1,] 1 1 2 3
[2,] 2 2 2 2
[3,] 1 1 1 2
[4,] 1 1 2 2
[5,] 2 1 1 1
How do I get it into this form?
data.frame(col_test,t(as.data.frame(lst_test)))
do.call(rbind, Map(c, col_test, lst_test))
# [,1] [,2] [,3] [,4]
#[1,] 1 1 2 3
#[2,] 2 2 2 2
#[3,] 1 1 1 2
#[4,] 1 1 2 2
#[5,] 2 1 1 1
col_test <- c(1, 2, 1, 1, 2)
lst_test <- list(c(1, 2 , 3), c(2, 2, 2), c(1, 1, 2), c(1, 2, 2), c(1, 1, 1))
name the sublists so we can use bind_rows
names(lst_test) <- 1:length(lst_test)
lst_test1 <- bind_rows(lst_test)
the bind_rows function binds by cols in this case so we need to pivot it
lst_test_pivot <- t(lst_test1)
but this gives us a matrix, so we need to cast it back to a dataframe
lst_test_pivot_df <- as.data.frame(lst_test_pivot)
now it works as
cbind(col_test, lst_test_pivot_df)
now produces
col_test V1 V2 V3
1 1 1 2 3
2 2 2 2 2
3 1 1 1 2
4 1 1 2 2
5 2 1 1 1
This should do the trick. Note that we are using do.call so that the individual elements of lst_test are sent as parameters to cbind, which prevents cbind from creating a list-of-lists. t is used to transpose the resulting matrix to your preferred orientation, and finally, one more cbind with col_test inserts that data as well.
library(tidyverse)
mat.new <- do.call(cbind, lst_test) %>%
t %>%
cbind(col_test, .) %>%
unname
[,1] [,2] [,3] [,4]
[1,] 1 1 2 3
[2,] 2 2 2 2
[3,] 1 1 1 2
[4,] 1 1 2 2
[5,] 2 1 1 1

All possible combinations of two vectors while keeping the order in R

I have a vector, say vec1, and another vector named vec2 as follows:
vec1 = c(4,1)
# [1] 4 1
vec2 = c(5,3,2)
# [1] 5 3 2
What I'm looking for is all possible combinations of vec1 and vec2 while the order of the vectors' elements is kept. That is, the resultant matrix should be like this:
> res
[,1] [,2] [,3] [,4] [,5]
[1,] 4 1 5 3 2
[2,] 4 5 1 3 2
[3,] 4 5 3 1 2
[4,] 4 5 3 2 1
[5,] 5 4 1 3 2
[6,] 5 4 3 1 2
[7,] 5 4 3 2 1
[8,] 5 3 4 1 2
[9,] 5 3 4 2 1
[10,] 5 3 2 4 1
# res=structure(c(4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 1, 5, 5, 5, 4, 4, 4,
# 3, 3, 3, 5, 1, 3, 3, 1, 3, 3, 4, 4, 2, 3, 3, 1, 2, 3, 1, 2, 1,
# 2, 4, 2, 2, 2, 1, 2, 2, 1, 2, 1, 1), .Dim = c(10L, 5L))
There is no repetition allowed for two vectors. That is, all rows of the resultant matrix have unique elements.
I'm actually looking for the most efficient way. One way to tackle this problem is to generate all possible permutations of length n which grows factorially (n=5 here) and then apply filtering. But it's time-consuming as n grows.
Is there an efficient way to do that?
Try this one:
nv1 <- length(vec1)
nv2 <- length(vec2)
n <- nv1 + nv2
result <- combn(n,nv1,function(v) {z=integer(n);z[v]=vec1;z[-v]=vec2;z})
The idea is to produce all combinations of indices at which to put the elements of vec1.
Not that elegant as Marat Talipov solution, but you can do:
# get the ordering per vector
cc <- c(order(vec1,decreasing = T), order(vec2, decreasing = T)+length(vec1))
cc
[1] 1 2 3 4 5
# permutation to get all "order-combinations"
library(combinat)
m <- do.call(rbind, permn(cc))
# remove unsorted per vector, only if both vectors are correct set TRUE for both:
gr <- apply(m, 1, function(x){
!is.unsorted(x[x < (length(vec1)+1)]) & !is.unsorted(x[x > (length(vec1))])
})
# result, exchange the order index with the vector elements:
t(apply(m[gr, ], 1, function(x, y) y[x], c(vec1, vec2)))
[,1] [,2] [,3] [,4] [,5]
[1,] 4 1 5 3 2
[2,] 4 5 3 1 2
[3,] 4 5 3 2 1
[4,] 4 5 1 3 2
[5,] 5 4 1 3 2
[6,] 5 4 3 2 1
[7,] 5 4 3 1 2
[8,] 5 3 4 1 2
[9,] 5 3 4 2 1
[10,] 5 3 2 4 1

create a frequency matrix from a 4 dim matrix in R

I have a matrix in k*4 dimension that each row is one of a combination of (1:20,1:20,1:20,1:20) and specify type of quadruplet node . For example for k=3 I have 3 tetrahedron that type of node is here
X <- matrix(c(1, 3, 1 ,4,
2, 5, 6 ,1,
12,20,15 ,3), 3,4,byrow=T)
Now I want to create a frequency table in dim 20*8000 from it that record the frequency of each node in contact with the three remaining node. On the other hand I want to know that each node in quadruplet is in contact with which type of node.
For example for the first row I have a one in 1,(1,3,4)th of F and also in 3,(1,1,4) and in 4,(1,1,3).
I hope that I could explain my problem good to understand.
Please help me in code of this conversion
Note:
As the first row of my X matrix is 1,3,1,4 the output matrix(F) should record a one in the
F[1,which(colnames(F)=="1 3 4") <- F[1,which(colnames(F)=="1 3 4") +1
F[1,which(colnames(F)=="1 3 4") <- F[1,which(colnames(F)=="1 3 4") +1
F[3,which(colnames(F)=="1 1 4") <- F[3,which(colnames(F)=="1 1 4") +1
F[4,which(colnames(F)=="1 1 3") <- F[4,which(colnames(F)=="1 1 3")+1
It means that each row add 4 ones to the frequency matrix in the 4 row of it and it may be the same for 2,3 or 4 of it. For example because of one is repeated in the row one, adds two records to F[1,which(colnames(F)=="1 3 4")
I'm not sure I understand, and if I do, then you are not doing this correctly because you are not properly ordering your triplets, so this is a guess. I'm thinking the vector c(3,1,4) should be different than the vector c(1,3,4). Correct me if I'm wrong about that.
I thought trying to work with a 20^4 array was rather excessive so I constructed an input matrix that would fit in a 5^4 array:
X <- matrix(c(1, 3, 1 ,4,
2, 5, 2 ,1,
3, 2, 5 ,4), 3,4, byrow=T)
We produce the combinations of 4 items taken three at a time from each row and arrange it in column major fashion:
array( apply( X, 1, function(x) combn(x, 3) ), dim=c(3,4,3) )
, , 1
[,1] [,2] [,3] [,4]
[1,] 1 1 1 3
[2,] 3 3 1 1
[3,] 1 4 4 4
, , 2
[,1] [,2] [,3] [,4]
[1,] 2 2 2 5
[2,] 5 5 2 2
[3,] 2 1 1 1
, , 3
[,1] [,2] [,3] [,4]
[1,] 3 3 3 2
[2,] 2 2 5 5
[3,] 5 4 4 4
I found an elementary answer to my question. But I think it's not quick as well as it should be.
For example I have a matrix in dim (3*4) and for simplicity I suppose I have 5 type only.
so to find the frequncy table for this situation I write the below codes:
n <- 5
k <- dim(X)[1]
F <- matrix(0,n,n^3)
colnames(F) <- simplify2array(apply(expand.grid(1:n,1:n,1:n ), 1, paste, collapse =" ", sep = ""))
for(i in 1:k)
{
for(j in 1:4){
per <- simplify2array(permn(X[i,-j]))
pert_charac <- apply(per,2,paste,sep="",collapse=" ")
num <- sapply(pert_charac,f <- function(x) which(colnames(F)==x))
F[X[i,j],num] <- F[X[i,j],num]+1
}
}

Resources