Consider a list of mixed classes like what returns from boxplot. I want to concatenate each list element, sort of stack each pair of elements horizontally.
(I clicked all of the "similar questions" and searched and am not aware of a base function to do this, modifyList being similar but not exactly what I want. I also looked quickly through the package rlist, but nothing struck me as similar. Also this question/answer is similar but only works for vectors)
f <- function(x) boxplot(mpg ~ vs, data = x, plot = FALSE)
(bp1 <- f(mtcars[mtcars$vs == 0, ]))
# $stats
# [,1]
# [1,] 10.40
# [2,] 14.70
# [3,] 15.65
# [4,] 19.20
# [5,] 21.00
#
# $n
# [1] 18
#
# $conf
# [,1]
# [1,] 13.97416
# [2,] 17.32584
#
# $out
# [1] 26
#
# $group
# [1] 1
#
# $names
# [1] "0"
(bp2 <- f(mtcars[mtcars$vs == 1, ]))
# $stats
# [,1]
# [1,] 17.8
# [2,] 21.4
# [3,] 22.8
# [4,] 30.4
# [5,] 33.9
#
# $n
# [1] 14
#
# $conf
# [,1]
# [1,] 18.99955
# [2,] 26.60045
#
# $out
# numeric(0)
#
# $group
# numeric(0)
#
# $names
# [1] "1"
The idea is to combine the two lists above into what one would get having simply done the following:
(bp <- f(mtcars))
# $stats
# [,1] [,2]
# [1,] 10.40 17.8
# [2,] 14.70 21.4
# [3,] 15.65 22.8
# [4,] 19.20 30.4
# [5,] 21.00 33.9
#
# $n
# [1] 18 14
#
# $conf
# [,1] [,2]
# [1,] 13.97416 18.99955
# [2,] 17.32584 26.60045
#
# $out
# [1] 26
#
# $group
# [1] 1
#
# $names
# [1] "0" "1"
This function seems to get the job done but is simple, so it can probably be broken easily.
cList <- function (x, y) {
islist <- function(x) inherits(x, 'list')
get_fun <- function(x, y)
switch(class(if (is.null(x)) y else x),
matrix = cbind,
data.frame = function(x, y)
do.call('cbind.data.frame', Filter(Negate(is.null), list(x, y))),
factor = function(...) unlist(list(...)), c)
stopifnot(islist(x), islist(y))
nn <- names(rapply(c(x, y), names, how = 'list'))
if (is.null(nn) || any(!nzchar(nn)))
stop('All non-NULL list elements should have unique names', domain = NA)
nn <- unique(c(names(x), names(y)))
z <- setNames(vector('list', length(nn)), nn)
for (ii in nn)
z[[ii]] <- if (islist(x[[ii]]) && islist(y[[ii]]))
Recall(x[[ii]], y[[ii]]) else
(get_fun(x[[ii]], y[[ii]]))(x[[ii]], y[[ii]])
z
}
f <- function(x) boxplot(mpg ~ vs, data = x, plot = FALSE)
bp1 <- f(mtcars[mtcars$vs == 0, ])
bp2 <- f(mtcars[mtcars$vs == 1, ])
bp <- f(mtcars)
identical(cList(bp1, bp2), bp)
# [1] TRUE
Also works on nested lists or lists not having the same elements in the same order, the caveat being the lists must be named, otherwise the function doesn't know which elements to concatenate.
l0 <- list(x = 1:5, y = matrix(1:4, 2), z = head(cars), l = list(1:5))
l1 <- list(x = factor(1:5), y = matrix(1:4, 2), z = head(cars), l = list(zz = 1:5))
l2 <- list(z = head(cbind(cars, cars)), x = factor('a'), l = list(zz = 6:10))
cList(l0, l2) ## should throw error
cList(l1, l2)
# $x
# [1] 1 2 3 4 5 a
# Levels: 1 2 3 4 5 a
#
# $y
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
#
# $z
# speed dist speed dist speed dist
# 1 4 2 4 2 4 2
# 2 4 10 4 10 4 10
# 3 7 4 7 4 7 4
# 4 7 22 7 22 7 22
# 5 8 16 8 16 8 16
# 6 9 10 9 10 9 10
#
# $l
# $l$zz
# [1] 1 2 3 4 5 6 7 8 9 10
Update -- new version (approximately here) which can rbind or cbind rectangular objects (matrices, data frames)
cList <- function(x, y, how = c('cbind', 'rbind')) {
if (missing(y))
return(x)
how <- match.arg(how)
islist <- function(x) inherits(x, 'list')
get_fun <- function(x, y)
switch(class(if (is.null(x)) y else x),
matrix = match.fun(how),
data.frame = function(x, y)
do.call(sprintf('%s.data.frame', how),
Filter(Negate(is.null), list(x, y))),
factor = function(...) unlist(list(...)), c)
stopifnot(islist(x), islist(y))
nn <- names(rapply(c(x, y), names, how = 'list'))
if (is.null(nn) || any(!nzchar(nn)))
stop('All non-NULL list elements should have unique names', domain = NA)
nn <- unique(c(names(x), names(y)))
z <- setNames(vector('list', length(nn)), nn)
for (ii in nn)
z[[ii]] <- if (islist(x[[ii]]) && islist(y[[ii]]))
Recall(x[[ii]], y[[ii]]) else
(get_fun(x[[ii]], y[[ii]]))(x[[ii]], y[[ii]])
z
}
Related
What is the fastest approach to saving unique values that go into matrix multiplication (without 0)?
For example, if I have a data.table object
library(data.table)
A = data.table(j3=c(3,0,3),j5=c(0,5,5),j7=c(0,7,0),j8=c(8,0,8))
I would like to see which unique values go into A*transpose(A) (or as.matrix(A) %*% as.matrix(t(A))). Right now, I can do it using for loops as:
B=t(A)
L = list()
models = c('A1','A2','A3')
for(i in 1:nrow(A)){
for(j in 1:ncol(B)){
u = union(unlist(A[i,]),B[,j])
u = u[u!=0] # remove 0
L[[paste(models[i],models[j])]]= u
}
}
However, is there a faster and more RAM-efficient way? The output doesn't have to be a list object, as in my case, it can be a data.table (data.frame) as well. Also, the order of values is not important. For example, 3 5 8 is as good as 5 3 8, 8 5 3 etc.
Any help is appreciated.
EDIT: So as.matrix(A) %*% as.matrix(t(A)) is:
[,1] [,2] [,3]
[1,] 73 0 73
[2,] 0 74 25
[3,] 73 25 98
The first element is calculated as 3*3+0*0+0*0+8*8 = 73, the second element is 3*0+0*5+0*7+8*0 = 0, etc. I need unique numbers that go to this calculation but without 0.
Therefore outputs (saved in the list L) are:
> L
$`A1 A1`
[1] 3 8
$`A1 A2`
[1] 3 8 5 7
$`A1 A3`
[1] 3 8 5
$`A2 A1`
[1] 5 7 3 8
$`A2 A2`
[1] 5 7
$`A2 A3`
[1] 5 7 3 8
$`A3 A1`
[1] 3 5 8
$`A3 A2`
[1] 3 5 8 7
$`A3 A3`
[1] 3 5 8
Once again, the output doesn't have to be a list object. I would prefer data.table if it is doable. Is it possible to rewrite my approach as Rcpp function?
Potential optimizations
Following up on #user2554330's answer, note that if A is an m-by-n matrix, then AAT = A %*% t(A) (equivalently tcrossprod(A)) is an m-by-m symmetric matrix. AAT[i, j] and AAT[j, i] are computed using the same entries of A, so you only need to inspect m*(m+1)/2 pairs of rows of A, not m*m.
You can do even better by finding and caching the unique elements of each row before pairing them. Preprocessing in this way avoids redundant computation and should noticeably improve performance when m << n.
Limitations
Another aspect of the problem is how unique works under the hood. unique has an argument nmax that you can use to specify an expected maximum number of unique elements. From ?duplicated:
Except for factors, logical and raw vectors the default nmax = NA is equivalent to nmax = length(x). Since a hash table of size 8*nmax bytes is allocated, setting nmax suitably can save large amounts of memory. For factors it is automatically set to the smaller of length(x) and the number of levels plus one (for NA). If nmax is set too small there is liable to be an error: nmax = 1 is silently ignored.
Long vectors are supported for the default method of duplicated, but may only be usable if nmax is supplied.
These comments apply to unique as well. Since you have a 300-by-4e+07 matrix, you would be evaluating (with preprocessing):
unique(<4e+07-length vector>), 300 times,
unique(<up to 8e+07-length vector>), 299*300/2 times.
That can consume a lot of memory if you don't know anything about your matrix that might allow you to set nmax. And it can take a long time if you don't have access to many CPUs.
So I agree with comments asking you to consider why you need to do this at all and whether your underlying problem has a nicer solution.
Two answers
FWIW, here are two approaches to your general problem that actually take advantage of symmetry. f and g are without and with preprocessing. [[.utri allows you to extract elements from the return value, an m*(m+1)/2-length list, as if it were an m-by-m matrix. as.matrix.utri constructs the full, symmetric m-by-m list matrix.
f <- function(A, nmax = NA) {
a <- seq_len(nrow(A))
J <- cbind(sequence(a), rep.int(a, a))
FUN <- function(i) {
if (i[1L] == i[2L]) {
x <- A[i[1L], ]
} else {
x <- c(A[i[1L], ], A[i[2L], ])
}
unique.default(x[x != 0], nmax = nmax)
}
res <- apply(J, 1L, FUN, simplify = FALSE)
class(res) <- "utri"
res
}
g <- function(A, nmax = NA) {
l <- lapply(asplit(A, 1L), function(x) unique.default(x[x != 0], nmax = nmax))
a <- seq_along(l)
J <- cbind(sequence(a), rep.int(a, a))
FUN <- function(i) {
if (i[1L] == i[2L]) {
l[[i[1L]]]
} else {
unique.default(c(l[[i[1L]]], l[[i[2L]]]))
}
}
res <- apply(J, 1L, FUN, simplify = FALSE)
class(res) <- "utri"
res
}
`[[.utri` <- function(x, i, j) {
stopifnot(length(i) == 1L, length(j) == 1L)
class(x) <- NULL
if (i <= j) {
x[[i + (j * (j - 1L)) %/% 2L]]
} else {
x[[j + (i * (i - 1L)) %/% 2L]]
}
}
as.matrix.utri <- function(x) {
p <- length(x)
n <- as.integer(round(0.5 * (-1 + sqrt(1 + 8 * p))))
i <- rep.int(seq_len(n), n)
j <- rep.int(seq_len(n), rep.int(n, n))
r <- i > j
ir <- i[r]
i[r] <- j[r]
j[r] <- ir
res <- x[i + (j * (j - 1L)) %/% 2L]
dim(res) <- c(n, n)
res
}
Here is a simple test on a 4-by-4 integer matrix:
mkA <- function(m, n) {
A <- sample(0:(n - 1L), size = as.double(m) * n, replace = TRUE,
prob = rep.int(c(n - 1, 1), c(1L, n - 1L)))
dim(A) <- c(m, n)
A
}
set.seed(1L)
A <- mkA(4L, 4L)
A
## [,1] [,2] [,3] [,4]
## [1,] 0 0 2 3
## [2,] 0 1 0 0
## [3,] 2 1 0 3
## [4,] 1 2 0 0
identical(f(A), gA <- g(A))
## [1] TRUE
gA[[1L, 1L]] # used for 'tcrossprod(A)[1L, 1L]'
## [1] 2 3
gA[[1L, 2L]] # used for 'tcrossprod(A)[1L, 2L]'
## [1] 2 3 1
gA[[2L, 1L]] # used for 'tcrossprod(A)[2L, 1L]'
## [1] 2 3 1
gA # under the hood, an 'm*(m+1)/2'-length list
## [[1]]
## [1] 2 3
##
## [[2]]
## [1] 2 3 1
##
## [[3]]
## [1] 1
##
## [[4]]
## [1] 2 3 1
##
## [[5]]
## [1] 1 2 3
##
## [[6]]
## [1] 2 1 3
##
## [[7]]
## [1] 2 3 1
##
## [[8]]
## [1] 1 2
##
## [[9]]
## [1] 2 1 3
##
## [[10]]
## [1] 1 2
##
## attr(,"class")
## [1] "utri"
mgA <- as.matrix(gA) # the full, symmetric, 'm'-by-'m' list matrix
mgA
## [,1] [,2] [,3] [,4]
## [1,] integer,2 integer,3 integer,3 integer,3
## [2,] integer,3 1 integer,3 integer,2
## [3,] integer,3 integer,3 integer,3 integer,3
## [4,] integer,3 integer,2 integer,3 integer,2
mgA[1L, ] # used for first row of 'tcrossprod(A)'
## [[1]]
## [1] 2 3
##
## [[2]]
## [1] 2 3 1
##
## [[3]]
## [1] 2 3 1
##
## [[4]]
## [1] 2 3 1
## If you need names
dimnames(mgA) <- rep.int(list(sprintf("A%d", seq_len(nrow(mgA)))), 2L)
mgA["A1", ]
## $A1
## [1] 2 3
##
## $A2
## [1] 2 3 1
##
## $A3
## [1] 2 3 1
##
## $A4
## [1] 2 3 1
## If you need an 'm'-by-'m' 'data.table' result
DT <- data.table::as.data.table(mgA)
DT
## A1 A2 A3 A4
## 1: 2,3 2,3,1 2,3,1 2,3,1
## 2: 2,3,1 1 1,2,3 1,2
## 3: 2,3,1 1,2,3 2,1,3 2,1,3
## 4: 2,3,1 1,2 2,1,3 1,2
And here are two benchmarks on two large integer matrices, showing that preprocessing can help quite a bit:
set.seed(1L)
A <- mkA(100L, 1e+04L)
microbenchmark::microbenchmark(f(A), g(A), times = 10L, setup = gc(FALSE))
## Unit: milliseconds
## expr min lq mean median uq max neval
## f(A) 2352.0572 2383.3100 2435.7954 2403.8968 2431.6214 2619.553 10
## g(A) 843.0206 852.5757 858.7262 858.2746 863.8239 881.450 10
A <- mkA(100L, 1e+06L)
microbenchmark::microbenchmark(f(A), g(A), times = 10L, setup = gc(FALSE))
## Unit: seconds
## expr min lq mean median uq max neval
## f(A) 290.93327 295.54319 302.57001 301.17810 307.50226 318.14203 10
## g(A) 72.85608 73.83614 76.67941 76.57313 77.78056 83.73388 10
Perhaps we can try this
f <- function(A, models) {
AA <- replace(A, A == 0, NA)
setNames(
c(t(outer(
1:nrow(A),
1:nrow(A),
Vectorize(function(x, y) unique(na.omit(c(t(AA[c(x, y)])))))
))),
t(outer(models, models, paste))
)
}
which gives
$`A1 A1`
[1] 3 8
$`A1 A2`
[1] 3 8 5 7
$`A1 A3`
[1] 3 8 5
$`A2 A1`
[1] 5 7 3 8
$`A2 A2`
[1] 5 7
$`A2 A3`
[1] 5 7 3 8
$`A3 A1`
[1] 3 5 8
$`A3 A2`
[1] 3 5 8 7
$`A3 A3`
[1] 3 5 8
If you care about the speed, you can try
lst <- asplit(replace(A, A == 0, NA), 1)
mat <- matrix(list(), nrow = nrow(A), ncol = nrow(A))
mat[lower.tri(mat)] <- combn(lst, 2, function(...) unique(na.omit(unlist(...))), simplify = FALSE)
mat[upper.tri(mat)] <- t(mat)[upper.tri(mat)]
diag(mat) <- Map(function(x) unname(x)[!is.na(x)], lst)
L <- c(t(mat))
Thanks for posting the additional information in your edits. From what you posted, it appears that for all pairs of rows of a matrix or data table A, you want the unique non-zero values in those two rows.
To do that efficiently I'd suggest ensuring that A is a matrix. Row indexing in dataframes or data tables is a lot slower than doing so in matrices. (Column indexing can be faster, but I doubt if it's worth transposing the table to get that.)
Once you have a matrix, A[i, ] is a vector containing the values in row i, and that's a pretty fast calculation. You want the unique non-zero values in c(A[i, ], A[j, ]). The unique function will produce this, but won't leave out the zeros. I'd suggest experimenting. Depending on the contents of each row, it is conceivable that leaving the zeros out of the rows first before computing the unique entries could be either faster or slower than calculating all the unique values and deleting 0 afterwards.
You say you want to do this for a few hundred rows, but each row is very long. I'd guess you won't be able to improve much on nested loops: the time will be spent on each entry, not on the loops. However, you could experiment with vectorization using the apply() function, e.g.
result <- vector("list", nrows)
for (i in 1:nrows)
result[[i]] <- apply(A, 1, function(row) setdiff(unique(c(row, A[i,])), 0))
This will give a list of lists; if you want to examine entry i, j, you can use result[[c(i,j)]].
I want to apply a sample function to a nested list (I will call this list bb) and I also have a list of numbers (I will call this list k) to be supplied in the sample function. I would like each of the numbers in k to iterate through all the values of each list in bb. How to do this using mapply or lapply?
Here are the data:
k <- list(1,2,4,3) #this is the list of numbers to be supplied in the `sample.int` function
b1 <- list(c(1,2,3),c(2,3,4),c(3,4,5),c(4,5,6)) #The first list of bb
b2 <- list(c(1,2),c(2,3),c(3,4),c(4,5), c(5,6)) #The second list of bb
bb <- list(b1,b2) #This is list bb containing b1 and b2 whose values are to be iterated through
I created this mapply function but it didn't get the expected outcome:
mapply(function(x, y) {
x[sample.int(y,y, replace = TRUE)]
}, bb,k, SIMPLIFY = FALSE)
This only returns 10 output values but I would like each number of k to loop through all values of the two lists in bb and so there should be 10*2 outputs for the two lists in bb. I might be using mapply in the wrong way and so I would appreciate if anyone can point me to the right direction!
outer is your friend. It's normally used to calculate the outer matrix product. Consider:
outer(1:3, 2:4)
1:3 %o% 2:4 ## or
# [,1] [,2] [,3]
# [1,] 2 3 4
# [2,] 4 6 8
# [3,] 6 9 12
It also has a FUN= argument that defaults to "*". However it enables you to calculate any function over the combinations of x and y cross-wise, i.e. x[1] X y[1], x[1] X y[2], ... whereas *apply functions only calculate x[1] X y[1], x[2] X y[2], .... So let's do it:
FUN <- Vectorize(function(x, y) x[sample.int(y, y)])
set.seed(42)
res <- outer(bb, k, FUN)
res
# [,1] [,2] [,3] [,4]
# [1,] List,1 List,2 List,4 List,3
# [2,] List,1 List,2 List,4 List,3
This result looks a little weird, but we may easily unlist it.
res <- unlist(res, recursive=F)
Result
res
# [[1]]
# [1] 1 2 3
#
# [[2]]
# [1] 1 2
#
# [[3]]
# [1] 1 2 3
#
# [[4]]
# [1] 2 3 4
#
# [[5]]
# [1] 2 3
#
# [[6]]
# [1] 1 2
#
# [[7]]
# [1] 2 3 4
#
# [[8]]
# [1] 4 5 6
#
# [[9]]
# [1] 1 2 3
#
# [[10]]
# [1] 3 4 5
#
# [[11]]
# [1] 3 4
#
# [[12]]
# [1] 4 5
#
# [[13]]
# [1] 2 3
#
# [[14]]
# [1] 1 2
#
# [[15]]
# [1] 1 2 3
#
# [[16]]
# [1] 2 3 4
#
# [[17]]
# [1] 3 4 5
#
# [[18]]
# [1] 2 3
#
# [[19]]
# [1] 3 4
#
# [[20]]
# [1] 1 2
VoilĂ , 20 results.
Simple problem, given a list:
main_list <- list(1:3,
4:6,
7:9,
10:12,
13:15)
main_list
# [[1]]
# [1] 1 2 3
# [[2]]
# [1] 4 5 6
# [[3]]
# [1] 7 8 9
# [[4]]
# [1] 10 11 12
# [[5]]
# [1] 13 14 15
I want to split the list into multiple lists where I break up the original one into lists each of length x. So if I said x = 2, I would get 3 lists of length 2, 2 and the leftover 1:
target <- list(list(1:3,
4:6),
list(7:9,
10:12),
list(13:15))
target
# [[1]]
# [[1]][[1]]
# [1] 1 2 3
# [[1]][[2]]
# [1] 4 5 6
# [[2]]
# [[2]][[1]]
# [1] 7 8 9
# [[2]][[2]]
# [1] 10 11 12
# [[3]]
# [[3]][[1]]
# [1] 13 14 15
Something like:
my_split <- function(listtest, x) {
split(listtest, c(1:x))
}
target <- my_split(main_list, 2)
Thanks
here is an option with gl
split(main_list, as.integer(gl(length(main_list), 2, length(main_list))))
It can be converted to a custom function
f1 <- function(lstA, n) {
l1 < length(lstA)
split(lstA, as.integer(gl(l1, n, l1)))
}
EDIT: no conditional logic needed. Just use split() with c() and rep():
my_split <- function(l, x){
l_length <- length(l)
l_div <- l_length / x
split(l, c(rep(seq_len(l_div), each = x), rep(ceiling(l_div), l_length %% x)))
}
my_split(main_list, 2)
I have a range of values
c(1,2,3,4,5,8,9,10,13,14,15)
And I want to find the ranges where the numbers become discontinuous. All I want is this as output:
(1,5)
(8,10)
(13,15)
I need to find break points.
I need to do it in R.
Something like this?
x <- c(1:5, 8:10, 13:15) # example data
unname(tapply(x, cumsum(c(1, diff(x)) != 1), range)
# [[1]]
# [1] 1 5
#
# [[2]]
# [1] 8 10
#
# [[3]]
# [1] 13 15
Another example:
x <- c(1, 5, 10, 11:14, 20:21, 23)
unname(tapply(x, cumsum(c(1, diff(x)) != 1), range))
# [[1]]
# [1] 1 1
#
# [[2]]
# [1] 5 5
#
# [[3]]
# [1] 10 14
#
# [[4]]
# [1] 20 21
#
# [[5]]
# [1] 23 23
x <- c(1:5, 8:10, 13:15)
rr <- rle(x - seq_along(x))
rr$values <- seq_along(rr$values)
s <- split(x, inverse.rle(rr))
s
# $`1`
# [1] 1 2 3 4 5
#
# $`2`
# [1] 8 9 10
#
# $`3`
# [1] 13 14 15
## And then to get *literally* what you asked for:
cat(paste0("(", gsub(":", ",", sapply(s, deparse)), ")"), sep="\n")
# (1,5)
# (8,10)
# (13,15)
I published seqle which will do this for you in one line. You can load the package cgwtools or search SO for the code, as it's been posted a couple times.
Assuming that you don't care about the exact output and are looking for the min and max of each range, you can use diff/cumsum/range as follows:
x <- c(1:5, 8:10, 13:15)
x. <- c(0, cumsum( diff(x)-1 ) )
lapply( split(x, x.), range )
I spent a while the other day looking for a way to check if a row vector is contained in some set of row vectors in R. Basically, I want to generalize the %in% operator to match a tuple instead of each entry in a vector. For example, I want:
row.vec = c("A", 3)
row.vec
# [1] "A" "3"
data.set = rbind(c("A",1),c("B",3),c("C",2))
data.set
# [,1] [,2]
# [1,] "A" "1"
# [2,] "B" "3"
# [3,] "C" "2"
row.vec %tuple.in% data.set
# [1] FALSE
for my made-up operator %tuple.in% because the row vector c("A",3) is not a row vector in data.set. Using the %in% operator gives:
row.vec %in% data.set
# [1] TRUE TRUE
because "A" and 3 are in data.set, which is not what I want.
I have two questions. First, are there any good existing solutions to this?
Second, since I couldn't find them (even if they exist), I tried to write my own function to do it. It works for an input matrix of row vectors, but I'm wondering if any experts have proposed improvements:
is.tuple.in <- function(matrix1, matrix2){
# Apply rbind() so that matrix1 has columns even if it is a row vector.
matrix1 = rbind(matrix1)
if(ncol(matrix1) != ncol(matrix2)){
stop("Matrices must have the same number of columns.") }
# Now check for the first row and handle other rows recursively
row.vec = matrix1[1,]
tuple.found = FALSE
for(i in 1:nrow(matrix2)){
# If we find a match, then this row exists in matrix 2 and we can break the loop
if(all(row.vec == matrix2[i,])){
tuple.found = TRUE
break
}
}
# If there are more rows to be checked, use a recursive call
if(nrow(matrix1) > 1){
return(c(tuple.found, is.tuple.in(matrix1[2:nrow(matrix1),],matrix2)))
} else {
return(tuple.found)
}
}
I see a couple problems with that that I'm not sure how to fix. First, I'd like the base case to be clear at the start of the function. I didn't manage to do this because I pass matrix1[2:nrow(matrix1),] in the recursive call, which produces an error if matrix1 has one row. So instead of getting to a case where matrix1 is empty, I have an if condition at the end deciding if more iterations are necessary.
Second, I think the use of rbind() at the start is sloppy, but I needed it for when matrix1 had been reduced to a single row. Without using rbind(), ncol(matrix1) produced an error in the 1-row case. I figure my trouble here has to do with a lack of knowledge about R data types.
Any help would be appreciated.
I'm wondering if you have made this a bit more complicated than it is. For example,
set.seed(1618)
vec <- c(1,3)
mat <- matrix(rpois(1000,3), ncol = 2)
rownames(mat) <- 1:nrow(mat)
mat[sapply(1:nrow(mat), function(x) all(vec %in% mat[x, ])), ]
# gives me this
# [,1] [,2]
# 6 3 1
# 38 3 1
# 39 3 1
# 85 1 3
# 88 1 3
# 89 1 3
# 95 3 1
# 113 1 3
# ...
you could subset this further if you care about the order
or you could modify the function slightly:
mat[sapply(1:nrow(mat), function(x)
all(paste(vec, collapse = '') %in% paste(mat[x, ], collapse = ''))), ]
# [,1] [,2]
# 85 1 3
# 88 1 3
# 89 1 3
# 113 1 3
# 133 1 3
# 139 1 3
# 187 1 3
# ...
another example with a longer vector
set.seed(1618)
vec <- c(1,4,5,2)
mat <- matrix(rpois(10000, 3), ncol = 4)
rownames(mat) <- 1:nrow(mat)
mat[sapply(1:nrow(mat), function(x) all(vec %in% mat[x, ])), ]
# [,1] [,2] [,3] [,4]
# 57 2 5 1 4
# 147 1 5 2 4
# 279 1 2 5 4
# 303 1 5 2 4
# 437 1 5 4 2
# 443 1 4 5 2
# 580 5 4 2 1
# ...
I see a couple that match:
mat[sapply(1:nrow(mat), function(x)
all(paste(vec, collapse = '') %in% paste(mat[x, ], collapse = ''))), ]
# [,1] [,2] [,3] [,4]
# 443 1 4 5 2
# 901 1 4 5 2
# 1047 1 4 5 2
but only three
for your single row case:
vec <- c(1,4,5,2)
mat <- matrix(c(1,4,5,2), ncol = 4)
rownames(mat) <- 1:nrow(mat)
mat[sapply(1:nrow(mat), function(x)
all(paste(vec, collapse = '') %in% paste(mat[x, ], collapse = ''))), ]
# [1] 1 4 5 2
here is a simple function with the above code
is.tuplein <- function(vec, mat, exact = TRUE) {
rownames(mat) <- 1:nrow(mat)
if (exact)
tmp <- mat[sapply(1:nrow(mat), function(x)
all(paste(vec, collapse = '') %in% paste(mat[x, ], collapse = ''))), ]
else tmp <- mat[sapply(1:nrow(mat), function(x) all(vec %in% mat[x, ])), ]
return(tmp)
}
is.tuplein(vec = vec, mat = mat)
# [1] 1 4 5 2
seems to work, so let's make our own %in% operator:
`%tuple%` <- function(x, y) is.tuplein(vec = x, mat = y, exact = TRUE)
`%tuple1%` <- function(x, y) is.tuplein(vec = x, mat = y, exact = FALSE)
and try her out
set.seed(1618)
c(1,2,3) %tuple% matrix(rpois(1002,3), ncol = 3)
# [,1] [,2] [,3]
# 133 1 2 3
# 190 1 2 3
# 321 1 2 3
set.seed(1618)
c(1,2,3) %tuple1% matrix(rpois(1002,3), ncol = 3)
# [,1] [,2] [,3]
# 48 2 3 1
# 64 2 3 1
# 71 1 3 2
# 73 3 1 2
# 108 3 1 2
# 112 1 3 2
# 133 1 2 3
# 166 2 1 3
Does this do what you want (even for more than 2 columns)?
paste(row.vec,collapse="_") %in% apply(data.set,1,paste,collapse="_")