adding repeating pattern while converting list to matrix in R

adding repeating pattern while converting list to matrix in R - r

I am looking for a fast way to convert a list into a matrix with an additional column containing a repeating 1:5 pattern. For instance, the list mat looks like this. The list and the repeating pattern can get to thousands of values in length and so a fast approach would be ideal.
I can convert the list to a matrix using melt (may not be ideal for large matrices though), however, I am having trouble getting the repeating pattern to work.
The matrix looks like this
mat
[[1]]
[1] 5
[[2]]
[1] 1 4 5
[[3]]
[1] 3 1
[[4]]
[1] 4 6 5 3
The output should contain the values of the list as well as an index column containing a 1:5 repeating pattern depending on the length of each index in the list. For instance, mat[[4]] contains 4 values, therefore the index column should contain a values 1:4
output
[,1] [,2]
5 1
1 1
4 2
5 3
3 1
1 2
4 1
6 2
5 3
3 4

mat <- list(5, c(1,4,5), c(3,1), c(4,6,5,3)) ## your example data
We can use basic operations:
cbind( unlist(mat), sequence(lengths(mat)) )
# [,1] [,2]
# [1,] 5 1
# [2,] 1 1
# [3,] 4 2
# [4,] 5 3
# [5,] 3 1
# [6,] 1 2
# [7,] 4 1
# [8,] 6 2
# [9,] 5 3
#[10,] 3 4
Alternatively,
cbind( unlist(mat), unlist(lapply(mat, seq_along)) )

Here is another option with Map. We get the sequence of each list element with lapply, cbind the corresponding elements of list using Map and rbind it.
do.call(rbind, Map(cbind, mat, lapply(mat, seq_along)))
# [,1] [,2]
#[1,] 5 1
#[2,] 1 1
#[3,] 4 2
#[4,] 5 3
#[5,] 3 1
#[6,] 1 2
#[7,] 4 1
#[8,] 6 2
#[9,] 5 3
#[10,] 3 4
Or with data.table, we melt the list to a 2 column data.frame, convert it to data.table with setDT and assign (:=) the sequence of 'L1' to 'L1' after grouping by 'L1'
library(data.table)
setDT(melt(mat))[, L1 := seq_len(.N), L1][]
# value L1
# 1: 5 1
# 2: 1 1
# 3: 4 2
# 4: 5 3
# 5: 3 1
# 6: 1 2
# 7: 4 1
# 8: 6 2
# 9: 5 3
#10: 3 4

Related

Get a grid of unique combination from vector

I'd like to return a grid with unique rows from a sequence vector. I'm looking for a general solution so I can pass any number of sequences in a vector. I don't know the terminology for this, so how can I do this?
Example
num <- 3
v <- c(seq(1, num, 1))
Desired Output
1 2 3
2 3 1
3 1 2
Second and third column can be switched:
1 3 2
2 1 3
3 2 1
I tried manipulating expand.grid() but it requires sorting and filtering which seems excessive.

We can use permn from combinat package which generates all possible permutations of v and then select top num of them using head
head(as.data.frame(do.call(rbind, combinat::permn(v))), num)
# V1 V2 V3
#1 1 2 3
#2 1 3 2
#3 3 1 2
We can also use sample to select any num rows instead of first num rows using head.
where
combinat::permn(v) #gives
#[[1]]
#[1] 1 2 3
#[[2]]
#[1] 1 3 2
#[[3]]
#[1] 3 1 2
#[[4]]
#[1] 3 2 1
#[[5]]
#[1] 2 3 1
#[[6]]
#[1] 2 1 3

Here's one solution (column order differs but the idea holds):
n = 3
sweep(replicate(n, 1:n), 2, 1:n, "+") %% n + 1
[,1] [,2] [,3]
[1,] 3 1 2
[2,] 1 2 3
[3,] 2 3 1
Explanation:
replicate will first create a matrix where each row is 1:n:
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 2 2
[3,] 3 3 3
I then use the sweep function to add 1 to column 1, 2 to column 2, 3 to column 3:
[,1] [,2] [,3]
[1,] 2 3 4
[2,] 3 4 5
[3,] 4 5 6
At this point, you can do a modulo on the matrix and then add 1 to arrive at the desired matrix.
Edit: If you need to have the same column order as you had above, can do
(sweep(replicate(n, 1:n), 2, 1:n, "+") + 1) %% n + 1

Another base R option
t(sapply(1:length(v), function(i) rep(v, 2)[i:(i+2)]))
# [,1] [,2] [,3]
#[1,] 1 2 3
#[2,] 2 3 1
#[3,] 3 1 2
Explanation: We cyclically permute v and store the vectors as column vectors in a matrix.
For general v (of length length(v)) this becomes
t(sapply(1:length(v), function(i) rep(v, 2)[i:(i + length(v) - 1)]))

How to stop recycling for uneven row lengths in r [duplicate]

I have several vectors of unequal length and I would like to cbind them. I've put the vectors into a list and I have tried to combine the using do.call(cbind, ...):
nm <- list(1:8, 3:8, 1:5)
do.call(cbind, nm)
# [,1] [,2] [,3]
# [1,] 1 3 1
# [2,] 2 4 2
# [3,] 3 5 3
# [4,] 4 6 4
# [5,] 5 7 5
# [6,] 6 8 1
# [7,] 7 3 2
# [8,] 8 4 3
# Warning message:
# In (function (..., deparse.level = 1) :
# number of rows of result is not a multiple of vector length (arg 2)
As expected, the number of rows in the resulting matrix is the length of the longest vector, and the values of the shorter vectors are recycled to make up for the length.
Instead I'd like to pad the shorter vectors with NA values to obtain the same length as the longest vector. I'd like the matrix to look like this:
# [,1] [,2] [,3]
# [1,] 1 3 1
# [2,] 2 4 2
# [3,] 3 5 3
# [4,] 4 6 4
# [5,] 5 7 5
# [6,] 6 8 NA
# [7,] 7 NA NA
# [8,] 8 NA NA
How can I go about doing this?

You can use indexing, if you index a number beyond the size of the object it returns NA. This works for any arbitrary number of rows defined with foo:
nm <- list(1:8,3:8,1:5)
foo <- 8
sapply(nm, '[', 1:foo)
EDIT:
Or in one line using the largest vector as number of rows:
sapply(nm, '[', seq(max(sapply(nm,length))))
From R 3.2.0 you may use lengths ("get the length of each element of a list") instead of sapply(nm, length):
sapply(nm, '[', seq(max(lengths(nm))))

You should fill vectors with NA before calling do.call.
nm <- list(1:8,3:8,1:5)
max_length <- max(unlist(lapply(nm,length)))
nm_filled <- lapply(nm,function(x) {ans <- rep(NA,length=max_length);
ans[1:length(x)]<- x;
return(ans)})
do.call(cbind,nm_filled)

This is a shorter version of Wojciech's solution.
nm <- list(1:8,3:8,1:5)
max_length <- max(sapply(nm,length))
sapply(nm, function(x){
c(x, rep(NA, max_length - length(x)))
})

Here is an option using stri_list2matrix from stringi
library(stringi)
out <- stri_list2matrix(nm)
class(out) <- 'numeric'
out
# [,1] [,2] [,3]
#[1,] 1 3 1
#[2,] 2 4 2
#[3,] 3 5 3
#[4,] 4 6 4
#[5,] 5 7 5
#[6,] 6 8 NA
#[7,] 7 NA NA
#[8,] 8 NA NA

Late to the party but you could use cbind.fill from rowr package with fill = NA
library(rowr)
do.call(cbind.fill, c(nm, fill = NA))
# object object object
#1 1 3 1
#2 2 4 2
#3 3 5 3
#4 4 6 4
#5 5 7 5
#6 6 8 NA
#7 7 NA NA
#8 8 NA NA
If you have a named list instead and want to maintain the headers you could use setNames
nm <- list(a = 1:8, b = 3:8, c = 1:5)
setNames(do.call(cbind.fill, c(nm, fill = NA)), names(nm))
# a b c
#1 1 3 1
#2 2 4 2
#3 3 5 3
#4 4 6 4
#5 5 7 5
#6 6 8 NA
#7 7 NA NA
#8 8 NA NA

Generate all permutations of a matrix in R

I have a matrix M given by the following:
M <- matrix(1:6, nrow=2, byrow=TRUE)
1 2 3
4 5 6
and I wish to generate all possible permutations for this matrix as a list. After reading Generating all distinct permutations of a list in R, I've tried using
library(combinat)
permn(M)
but this gives the me all the permutations as a single row, and not the 2 x 3 matrix I had originally.
So what I get is something like
[[1]]
[1] 1 4 2 5 3 6
[[2]]
[1] 1 4 2 5 6 3
[[3]]
[1] 1 4 2 6 5 3
...
[[720]]
[1] 4 1 2 5 3 6
But what I want is to keep the first and second rows distinct from each other so it would be a list that looks more like the following:
[[1]]
1 2 3
4 5 6
[[2]]
1 3 2
4 5 6
[[3]]
2 3 1
5 4 6
...
until I get all possible combinations of M. Is there a way to do this in R?
Thank you!

How about this, using expand.grid to get all the possibilities of combinations?
M <- matrix(1:6, nrow=2, byrow=TRUE)
pcM <- permn(ncol(M))
expP <- expand.grid(1:length(pcM), 1:length(pcM))
Map(
function(a,b) rbind( M[1, pcM[[a]]], M[2, pcM[[a]]] ),
expP[,1],
expP[,2]
)
#[[1]]
# [,1] [,2] [,3]
#[1,] 1 2 3
#[2,] 4 5 6
#
#...
#
#[[36]]
# [,1] [,2] [,3]
#[1,] 2 1 3
#[2,] 5 4 6

apply a function on each row with different arguments

I have data table like below DT,
col1 col2 col3 col4 col5
1: 1 2 3 4 5
2: 4 5 6 8 9
3: 3 4 4 5 5
4: 4 3 5 3 3
5: 4 5 6 6 67
I want to count unique values in certain columns for each row (for each row I want to use different columns for counting unique)
How do I achieve this in minimum number of steps possible? The table is huge so running for loop is out of the question.
I am looking for a solution like
DT[ , count_unique:= apply(DT[ , cols, with = F], 1, function(x) { length(unique(x)) })]
But this will fail, since "cols" will need to take different columns for each row.
Any help will be appreciated.

I think this is easiest to do with matrices, which have a matrix subset operation (from which incidentally the data.table join syntax is inspired from).
Let's say this is your data:
m = matrix(c(1:4, 1,3,2,2, 1,2,3,3), ncol = 3)
# [,1] [,2] [,3]
#[1,] 1 1 1
#[2,] 2 3 2
#[3,] 3 2 3
#[4,] 4 2 3
And let's say you want to count unique values for all columns for rows 1 and 2, and for first and last columns only for rows 3 and 4. The way you can represent this is as follows:
cols = matrix(c(1,1, 1,2, 1,3,
2,1, 2,2, 2,3,
3,1, 3,3,
4,1, 4,3), ncol = 2, byrow = T)
# [,1] [,2]
# [1,] 1 1
# [2,] 1 2
# [3,] 1 3
# [4,] 2 1
# [5,] 2 2
# [6,] 2 3
# [7,] 3 1
# [8,] 3 3
# [9,] 4 1
#[10,] 4 3
The result you want is then easy to compute:
tapply(m[cols], cols[,1], function(x) length(unique(x)))
#1 2 3 4
#1 2 1 2

Comparing coordinates between two matrices in r and outputting ones which are the same

I have 2 data frames each consisting of rows of co-ordinates namely x,y,z
These data frames are of different length
I would like to be able to use one data frame as a reference and search the other for any coordinates that match in all 3 positions
I would then like these coordinates to be written to another data frame
i.e.
data frame one:
[1,] 1 2 3
[2,] 2 3 3
[3,] 1 2 4
[4,] 4 2 5
data frame two:
[1,] 3 2 3
[2,] 1 1 2
[3,] 2 3 3
[4,] 1 2 3
and I would like this to return
[1,] 2 3 3
[2,] 1 2 3
the ones that match
i.e. I want it to, not just check rows of the same number but all rows in the data frame.

You can use intersect from dplyr
library(dplyr)
intersect(as.data.frame(m1) , as.data.frame(m2))
# V1 V2 V3
#1 2 3 3
#2 1 2 3
Or you can use
mNew <- rbind(m1,m2)
mNew[duplicated(mNew),]
# [,1] [,2] [,3]
#[1,] 2 3 3
#[2,] 1 2 3
data
m1 <- matrix(c(1,2,1,4, 2,3,2,4, 3,3,4,5), ncol=3)
m2 <- matrix(c(3,1,2,1,2,1,3,2,3,2,3,3), ncol=3)