Is there a function that will help me output all 2^n permutations of a boolean vector of length n? For instance, if i have a boolean vector of length n=2, c(FALSE,FALSE), i should obtain 2^2=4 permutations.
As such, I need a function, that will generalize this output for an array of length n,
that means if n=3, output should be of length 2^3 and so on...
I have already tried permutations from gtools package but this seems to be incorrect, or providing only a partial answer to say the least. This method does not generalize well and has given me errors for n>2 as well.
> permutations(2,2,c(TRUE,FALSE))
[,1] [,2]
[1,] FALSE TRUE
[2,] TRUE FALSE
Output should be:
FALSE, FALSE,
TRUE, TRUE,
FALSE, TRUE,
TRUE, FALSE
You where missing repeats.allowed=T :
gtools::permutations(2,2, c(T,F), repeats.allowed = T)
[,1] [,2]
[1,] FALSE FALSE
[2,] FALSE TRUE
[3,] TRUE FALSE
[4,] TRUE TRUE
You can make your custom function around permutations:
my_permute <- function(vect, n, repeats = TRUE) {
gtools::permutations(length(vect), n, vect, repeats.allowed = repeats)
}
my_permute(vect=c(T,F), n=2)
Example with more elements:
my_permute(letters[1:3], n=3)
You can use expand.grid,
expand.grid(c(TRUE, FALSE), c(TRUE, FALSE))
# Var1 Var2
#1 TRUE TRUE
#2 FALSE TRUE
#3 TRUE FALSE
#4 FALSE FALSE
You can use gtools package and the function permutations:
This is the source code:
library(gtools)
x <- c(TRUE, FALSE)
permutations(n=length(x),r=2,v=x,repeats.allowed=T)
Related
I'm new to R and I've got a question:
choice <- c(TRUE, FALSE, FALSE, FALSE)
rep(sample(choice, size = 4, replace=FALSE), times = n)
always repeats the same vector, e.g. (FALSE, TRUE, FALSE, FALSE)
However, I want to have n different random samples of the vector choice in a new vector (replace must be FALSE, because only 1 in 4 elements should be TRUE).
Which function should I choose? I'm not allowed to use for-loops.
You can use replicate. It returns a matrix, which you can then turn into a vector.
choice <- c(TRUE, FALSE, FALSE, FALSE)
n <- 3
set.seed(42) # for reproducibility
as.vector(replicate(n, sample(choice, size = 4, replace=FALSE)))
#[1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE
I am looking for an efficient way to combine selected columns in a logical matrix by "ANDing" them together and ending up with a new matrix. An example of what I am looking for:
matrixData <- rep(c(TRUE, TRUE, FALSE), 8)
exampleMatrix <- matrix(matrixData, nrow=6, ncol=4, byrow=TRUE)
exampleMatrix
[,1] [,2] [,3] [,4]
[1,] TRUE TRUE FALSE TRUE
[2,] TRUE FALSE TRUE TRUE
[3,] FALSE TRUE TRUE FALSE
[4,] TRUE TRUE FALSE TRUE
[5,] TRUE FALSE TRUE TRUE
[6,] FALSE TRUE TRUE FALSE
The columns to be ANDed to each other are specified in a numeric vector of length ncol(exampleMatrix), where the columns to be grouped together ANDed have the same value (a value from 1 to n, where n <= ncol(exampleMatrix) and every value in 1:n is used at least once). The resulting matrix should have the columns in order from 1:n. For example, if the vector that specifies the column groups is
colGroups <- c(3, 2, 2, 1)
Then the resulting matrix would be
[,1] [,2] [,3]
[1,] TRUE FALSE TRUE
[2,] TRUE FALSE TRUE
[3,] FALSE TRUE FALSE
[4,] TRUE FALSE TRUE
[5,] TRUE FALSE TRUE
[6,] FALSE TRUE FALSE
Where in the resulting matrix
[,1] = exampleMatrix[,4]
[,2] = exampleMatrix[,2] & exampleMatrix[,3]
[,3] = exampleMatrix[,1]
My current way of doing this looks basically like this:
finalMatrix <- matrix(TRUE, nrow=nrow(exampleMatrix), ncol=3)
for (i in 1:3){
selectedColumns <- exampleMatrix[,colGroups==i, drop=FALSE]
finalMatrix[,i] <- rowSums(selectedColumns)==ncol(selectedColumns)
}
Where rowSums(selectedColumns)==ncol(selectedColumns) is an efficient way to AND all of the columns of a matrix together.
My problem is that I am doing this on very big matrices (millions of rows) and I am looking for any way to make this quicker. My first instinct would be to use apply in some way but I can't see any way to use that to improve efficiency as I am not performing the operation in the for loop many times but instead it is the operation in the loop that is slow.
In addition, any tips to reduce memory allocation would be very useful, as I currently have to run gc() within the loop frequently to avoid running out of memory completely, and it is a very expensive operation that significantly slows everything down as well. Thanks!
For a more representative example, this is a much larger exampleMatrix:
matrixData <- rep(c(TRUE, TRUE, FALSE), 8e7)
exampleMatrix <- matrix(matrixData, nrow=6e7, ncol=4, byrow=TRUE)
From your example, I understand that there are very few columns and very many rows. In this case, it'll be efficient to just do a simple loop over colGroups (30% improvement over your suggestion):
for (jj in seq_along(colGroups))
finalMatrix[ , colGroups[jj]] =
finalMatrix[ , colGroups[jj]] & exampleMatrix[ , jj]
I think it will be hard to beat this without parallelizing. This loop is parallelizable if there are more columns (though the parallelization will have to be done a bit carefully (in batches)).
As far as I can tell, this is an aggregation across columns using the all function. So if you transpose to rows, then use colGroups as the grouping factor to apply all, then transpose back to columns, you should get the intended result:
t(aggregate(t(exampleMatrix), list(colGroups), FUN=all)[-1])
# [,1] [,2] [,3]
#V1 TRUE FALSE TRUE
#V2 TRUE FALSE TRUE
#V3 FALSE TRUE FALSE
#V4 TRUE FALSE TRUE
#V5 TRUE FALSE TRUE
#V6 FALSE TRUE FALSE
The [-1] just drops the group-identifier variable which you don't require in the final output.
If you're working with stupid big data, the by-group aggregation could be done in data.table as well:
library(data.table)
t(as.data.table(t(exampleMatrix))[, lapply(.SD,all), by=colGroups][,-1])
Background: PDF Parse My program looks for data in scanned PDF documents. I've created a CSV with rows representing various parameters to be searched for in a PDF, and columns for the different flavors of document that might contain those parameters. There are different identifiers for each parameter depending on the type of document. The column headers use dot separation to uniquely identify the document by type, subtype... , like so: type.subtype.s_subtype.s_s_subtype.
t.s.s2.s3 t.s.s2.s3 t.s.s2.s3 t.s.s2.s3 ...
p1 str1 str2
p2 str3 str4
p3 str5 str6
p4 str7
...
I'm reading in PDF files, and based on the filepaths they can be uniquely categorized into one of these types. I can apply various logical conditions to a substring of a given filepath, and based on that I'd like to output an NxM Boolean matrix, where N = NROW(filepath_vector), and M = ncol(params_csv). This matrix would show membership of a given file in a type with TRUE, and FALSE elsewhere.
t.s.s2.s3 t.s.s2.s3 t.s.s2.s3 t.s.s2.s3 ...
fpath1 FALSE FALSE TRUE FALSE
fpath2 FALSE TRUE FALSE FALSE
fpath3 FALSE TRUE FALSE FALSE
fpath4 FALSE FALSE FALSE TRUE
...
My solution: I'm trying to apply a function to a matrix that takes a vector as argument, and applies the first element of the vector to the first row, the second element to the second row, etc... however, the function has conditional behavior depending on the element of the vector being applied.
I know this is very similar to the question below (my reference point), but the conditionals in my function are tripping me up. I've provided a simplified reproducible example of the issue below.
R: Apply function to matrix with elements of vector as argument
set.seed(300)
x <- y <- 5
m <- matrix(rbinom(x*y,1,0.5),x,y)
v <- c("321", "", "A160470", "7IDJOPLI", "ACEGIKM")
f <- function(x) {
sapply(v, g <- function(y) {
if(nchar(y)==8) {x=x*2
} else if (nchar(y)==7) {
if(grepl("^[[:alpha:]]*$", substr(y, 1, 1))) {x=x*3}
else {x}
} else if (nchar(y)<3) {x=x*4
} else {x=x-2}
})
}
mapply(f, as.data.frame(t(m)))
Desired output:
# [,1] [,2] [,3] [,4] [,5]
# [1,] -1 0 -1 -1 -1
# [2,] 4 4 0 4 0
# [3,] 3 0 3 3 0
# [4,] 2 0 2 2 0
# [5,] 1 1 1 1 0
But I get this error:
Error in if (y == 8) { : missing value where TRUE/FALSE needed
Can't seem to figure out the error or if I'm misguided elsewhere in my entire approach, any thoughts are appreciated.
Update (03April2018):
I had provided this as a toy example for the sake of reproducibility, but I think it would be more informative to use something similar to my actual code with #grand_chat's excellent solution. Hopefully this helps someone who's struggling with a similar issue.
chk <- c(NA, "abc.TRO", "def.TRO", "ghi.TRO", "kjl.TRO", "mno.TRO")
len <- c(8, NA, NA)
seed <- c(FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE)
A = matrix(seed, nrow=3, ncol=6, byrow=TRUE)
pairs <- mapply(list, as.data.frame(t(A)), len, SIMPLIFY=F)
f <- function(pair) {
x = unlist(pair[[1]])
y = pair[[2]]
if(y==8 & !is.na(y)) {
x[c(grep("TRO", chk))] <- (x[c(grep("TRO", chk))] & TRUE)
} else {x <- (x & FALSE)}
return(x)
}
t(mapply(f, pairs))
Output:
# $v1
# [1,] FALSE TRUE TRUE FALSE FALSE FALSE
# $v2
# [2,] FALSE FALSE FALSE FALSE FALSE FALSE
# $v3
# [3,] FALSE FALSE FALSE FALSE FALSE FALSE
You're processing the elements of vector v and the rows of your matrix m (columns of data frame t(m)) in parallel, so you could zip the corresponding elements into a list of pairs and process the pairs. Try this:
x <- y <- 5
m <- matrix(rbinom(x*y,1,0.5),x,y)
v <- c("321", "", "A160470", "7IDJOPLI", "ACEGIKM")
# Zip into pairs:
pairs <- mapply(list, as.data.frame(t(m)), v, SIMPLIFY=F)
# Define a function that acts on pairs:
f <- function(pair) {
x = pair[[1]]
y = pair[[2]]
if(nchar(y)==8) {x=x*2
} else if (nchar(y)==7) {
if(grepl("^[[:alpha:]]*$", substr(y, 1, 1))) {x=x*3}
else {x}
} else if (nchar(y)<3) {x=x*4
} else {x=x-2}
}
# Apply it:
mapply(f, pairs, SIMPLIFY=F)
with result:
$V1
[1] -2 -1 -2 -2 -1
$V2
[1] 4 4 0 0 4
$V3
[1] 3 3 3 3 0
$V4
[1] 2 0 2 2 0
$V5
[1] 0 0 3 0 3
(This doesn't agree with your desired output because you don't seem to have applied your function f properly.)
How to get index number in a Boolean vector? For instance, my vector looks like this:
vector = (TRUE TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE)
How to get index number for all TRUEs? vector["TRUE"] doesn't work.
Try using the which function (type ?which):
> my.vec <- c(TRUE, FALSE, FALSE, TRUE)
> which(my.vec)
> [1] 1 4
I have a matrix A,
A = as.matrix(data.frame(col1 = c(1,1,2,3,1,2), col2 = c(-1,-1,-2,-3,-1,-2), col3 = c(2,6,1,3,2,4)))
And I have a vector v,
v = c(-1, 2)
How can I get a vector of TRUE/FALSE that compares the last two columns of the matrix and returns TRUE if the last two columns match the vector, or false if they don't?
I.e., If I try,
A[,c(2:3)] == v
I obtain,
col2 col3
[1,] TRUE FALSE
[2,] FALSE FALSE
[3,] FALSE FALSE
[4,] FALSE FALSE
[5,] TRUE FALSE
[6,] FALSE FALSE
Which is not what I want, I want both columns to be the same as vector v, more like,
result = c(TRUE, FALSE, FALSE, FALSE, TRUE, FALSE)
Since the first, and 5th rows match the vector v entirely.
Here's a simple alternative
> apply(A[, 2:3], 1, function(x) all(x==v))
[1] TRUE FALSE FALSE FALSE TRUE FALSE
Ooops by looking into R mailing list I found an answer: https://stat.ethz.ch/pipermail/r-help/2010-September/254096.html,
check.equal <- function(x, y)
{
isTRUE(all.equal(y, x, check.attributes=FALSE))
}
result = apply(A[,c(2:3)], 1, check.equal, y=v)
Not sure I need to define a function and do all that, maybe there are easier ways to do it.
Here's another straightforward option:
which(duplicated(rbind(A[, 2:3], v), fromLast=TRUE))
# [1] 1 5
results <- rep(FALSE, nrow(A))
results[which(duplicated(rbind(A[, 2:3], v), fromLast=TRUE))] <- TRUE
results
# [1] TRUE FALSE FALSE FALSE TRUE FALSE
Alternatively, as one line:
duplicated(rbind(A[, 2:3], v), fromLast=TRUE)[-(nrow(A)+1)]
# [1] TRUE FALSE FALSE FALSE TRUE FALSE
A dirty one:
result <- c()
for(n in 1:nrow(A)){result[n] <-(sum(A[n,-1]==v)==2)}
> result
[1] TRUE FALSE FALSE FALSE TRUE FALSE