Combinations from grouped elements

Combinations from grouped elements - r

I have a list of grouped elements and I want to make all possible combinations of these elements, however I only want to take one element from each group. Order does not matter.
vars_list <- list(Group1 = letters[1:5], Group2 = letters[6:9],
Group3 = letters[10:11], Group4 = letters[12:15])
Let's say I want to make combinations for n=2, n=3, n=4 where n is the number of groups I want to use.
I found a solution to do it when n=number of groups (Combinations from recursive lists) :
lengths <- c(1, 1, 1, 1)
combos <- expand.grid( mapply(function(element, n) combn(element, m=n,
FUN=paste0, collapse=""), vars_list, lengths, SIMPLIFY = F) )
How could I do this for n < number of groups?

You could use combn for getting all combinations of your groups for n=1, n=2, n=3 and n=4 and then use expand_grid:
n = 2
apply(combn(1:length(vars_list), n), 2, function(x){expand.grid(vars_list[x])})
so for n=4, you would get the same as in your question. Is this what you meant?

Related

Assign the same index if two vectors have a common intersection

I need help with a question closely related to some other question of mine.
How to merge two different groupings if they are not disjoint with dplyr
As the title of the question says, I want to generate an index in a vector that links different vectors in a list if they have an intersection or, if not, if both intersect with some other vector in a list, and so on...
This is a question involving graph theory/networks - I want to find indirectly connected vectors.
The question above solved my problem considering two columns a dataframe, but I don't know how to generalize this to a list in which elements my have different lengths.
This is an example: list(1:3, 3:5, 5, 6) should give me c(1, 1, 1, 2)
EDIT:
I've tried using the fact that the powers of an adjacency matrix represent possible paths from one edge to some other one.
find_connections <- function(list_vectors){
list_vectors <- list_vectors %>%
set_names(paste0("x", 1:length(list_vectors)))
x <- crossprod(table(stack(list_vectors)))
power <- nrow(x) - 2
x <- ifelse(x >= 1, 1, 0)
if(power > 0){
z <- accumulate(replicate(power, x, simplify = FALSE),
`%*%`, .init = x) %>%
reduce(`+`)
} else{
z <- x
}
z <- ifelse(z >= 1, 1, 0)
w <- z %>%
as.data.frame() %>%
group_by(across()) %>%
group_indices()
return(w)
}
The problem is that it took too long to run my code. Each matrix is not very large, but I do need to run the function on a large number of them.
Is it possible to improve this?

This is one way to do it. It creates a loop for the elements in each vector and then uses the same trick as the previous answer to find clusters.
library(data.table)
library(igraph)
x <- list(1:3, 3:5, 5, 6)
dt <- rbindlist(lapply(x,
function(r) data.table(from = r, to = shift(r, -1, fill = r[1]))))
dg <- graph_from_data_frame(dt, directed = FALSE)
unname(sapply(x, function(v) components(dg)$membership[as.character(v[1])]))
#> [1] 1 1 1 2

Subset in the data frame rows in R

I have a data frame with 30 rows and 4 columns (namely, x, y, z, u). It is given below.
mydata = data.frame(x = rnorm(30,4), y = rnorm(30,2,1), z = rnorm(30,3,1), u = rnorm(30,5))
Further, I have a sequence values, which represent row number in my data frame.
myseq = c(seq(1, 30, by = 5))
myseq
[1] 1 6 11 16 21 26
Now, I wanted to compute the prob values for each segment of 99 rows.
filt= subset(mydata[1:6,], mydata[1:6,]$x < mydata[1:6,]$y & mydata[1:6,]$z < mydata[1:6,]$u
filt
prob = length(filt$x)/30
prob
Then I need to compute the above prob for 1:6,.., 27:30 and so on . Here, I have only 6 prob values. So, I can do one by one. If I have 100 values it would be tedious. Are there any way to compute the prob values?.
Thank you in advance.

BTW: in subset(DF[1:99,], ...), use DF[1:99,] in the first argument, not again, ala
subset(DF[1:99,], cumsuml < inchivaluel & cumsumr < inchivaluer)
Think about how to do this in a list.
The first step is to break your data into the va starting points. I'll start with a list of the indices to break it into:
inds <- mapply(seq, va, c(va[-1], nrow(DF)), SIMPLIFY=FALSE)
this now is a list of sequences, starting with 1:99, then 100:198, etc. See str(inds) to verify.
Now we can subset a portion of the data based on each element's vector of indices:
filts <- lapply(inds, function(ind) subset(DF[ind,], cumsuml < inchivaluel & cumsumr < inchivaluer))
We now have a list of vectors, let's summarize it:
results <- sapply(filts, function(filt) length(filt$cumsuml)/length(alpha))
Bottom line, it helps to think about how to break this problem into lists, examples at http://stackoverflow.com/a/24376207/3358272.
BTW: instead of initially making a list of indices, we could just break up the data in that first step, ala
DF2 <- mapply(function(a,b) DF[a:b,], va, c(va[-1], nrow(DF)), SIMPLIFY=FALSE)
filts <- lapply(DF2, function(x) subset(x, cumsuml < inchivaluel & cumsumr < inchivaluer))
results <- sapply(filts, function(filt) length(filt$cumsuml)/length(alpha))

Unique combinations of vector elements that fulfill criteria

I have a vector of integers, e.g., totalVector <- c(4,2,1), and two variables totalResult and totalNumber. What I want to do is the following:
I want to to find all UNIQUE combinations of "totalNumber" elements from totalVector that add up to "totalResult". To clarify, if totalResult = 100 and totalNumber = 50, I want all combinations of 50 elements from totalVector that have a sum of 100 (repetitions are obviously allowed, but duplicate results such as 25 fours and 25 re-arranged fours should only be counted once).
I originally did this by expanding the total vector (repeating each element 50 times), getting all combinations of 50 elements with combn() and then filtering their sums. For large values however, this proved very inefficient, and failed due to the sheer amount of data. Is there a quicker and less data-heavy way to do this?

I think the OP is looking for the combinations with repetition of a vector that sum to a particular number. This will do it:
totalVector <- c(4,2,1)
totalNumber <- 50
totalResult <- 100
library(RcppAlgos)
myAns <- comboGeneral(totalVector, totalNumber, repetition = TRUE,
constraintFun = "sum", comparisonFun = "==",
limitConstraints = totalResult)
dim(myAns)
[1] 17 50
all(apply(myAns, 1, sum) == totalResult)
[1] TRUE
Disclaimer: I am the author of RcppAlgos

This would give you what you need for a small sample, but you will encounter issues with combinatorial explosion very quickly as you increase the size of the problem
tv <- sample(1:10, 10, replace = TRUE)
tn <- 5
tr <- 20
combinations <- combn(tv, tn)
equals.tr <- apply(combinations, MARGIN = 2, FUN = function(x) sum(x) == tr)
combinations[, equals.tr]

Create a matrix from a list consisting of unequal matrices for individual bootstraps

I tried to create a matrix from a list which consists of N unequal matrices...
The reason to do this is to make R individual bootstrap samples.
In the example below you can find e.g. 2 companies, where we have 1 with 10 & 1 with just 5 observations.
Data:
set.seed(7)
Time <- c(10,5)
xv <- matrix(c(rnorm(10,5,2), rnorm(5,20,1), rnorm(10,5,2), rnorm(5,20,1)), ncol=2);
y <- matrix( c(rnorm(10,5,2), rnorm(5,20,1)));
z <- matrix(c(rnorm(10,5,2), rnorm(5,20,1), rnorm(10,5,2), rnorm(5,20,1)), ncol=2)
# create data frame of input variables which helps
# to conduct the rowise bootstrapping
data <- data.frame (y = y, xv = xv, z = z);
rows <- dim(data)[1];
cols <- dim(data)[2];
# create the index to sample from the different panels
cumTime <- c(0, cumsum (Time));
index <- findInterval (seq (1:rows), cumTime, left.open = TRUE);
# draw R individual bootstrap samples
bootList <- replicate(R = 5, list(), simplify=F);
bootList <- lapply (bootList, function(x) by (data, INDICES = index, FUN = function(x) dplyr::sample_n (tbl = x, size = dim(x)[1], replace = T)));
---------- UNLISTING ---------
Currently, I try do it incorrectly like this:
Example for just 1 entry of the list:
matrix(unlist(bootList[[1]], recursive = T), ncol = cols)
The desired output is just
bootList[[1]]
as a matrix.
Do you have an idea how to do this & if possible reasonably efficient?
The matrices are then processed in unfortunately slow MLE estimations...

i found a solution for you. From what i gather, you have a Dataframe containing all observations of all companies, which may have different panel lengths. And as a result you would like to have a Bootstap sample for each company of same size as the original panel length.
You mearly have to add a company indicator
data$company = c(rep(1, 10), rep(2, 5)) # this could even be a factor.
L1 = split(data, data$company)
L2 = lapply(L1, FUN = function(s) s[sample(x = 1:nrow(s), size = nrow(s), replace = TRUE),] )
stop here if you would like to have saperate bootstap samples e.g. in case you want to estimate seperately
bootdata = do.call(rbind, L2)
Best wishes,
Tim

Sampling from a subset of data

I have the following problem.
I have multiple subarrays (say 2) that I have populated with character labels (1, 2, 3, 4, 5). My algorithm selects labels at random based on occurrence probabilities.
How can I get R to instead select labels 1:3 for subarray 1 and 4:5 for subarray 2, say, without using subsetting (i.e., []). That is, I want a random subset of labels to be selected for each subarray, instead of all labels assigned to each subarray manually using [].
I know sample() should help.
Using subsetting (which I don't want) one would do
x <- 1:5
sample(x[1:3], size, prob = probs[1:3])
but this assigns labels 1:3 to ALL subarrays.
Would
sample(sample(x), size, replace = TRUE, prob = probs)
work?
Any ideas? Please let me know if this is unclear.
Here is a small example, which selects labels from 1:5 for each of 10 subarrays.
set.seed(1)
N <- 10
K <- 2
Hstar <- 5
probs <- rep(1/Hstar, Hstar)
perms <- 5
## Set up container(s) to hold the identity of each individual from each permutation ##
num.specs <- ceiling(N / K)
## Create an ID for each haplotype ##
haps <- 1:Hstar
## Assign individuals (N) to each subpopulation (K) ##
specs <- 1:num.specs
## Generate permutations, assume each permutation has N individuals, and sample those individuals' haplotypes from the probabilities ##
gen.perms <- function() {
sample(haps, size = num.specs, replace = TRUE, prob = probs) # I would like each subarray to contain a random subset of 1:5.
}
pop <- array(dim = c(perms, num.specs, K))
for (i in 1:K) {
pop[,, i] <- replicate(perms, gen.perms())
}
pop
Hopefully this helps.

I think what you actually want is something like that
num.specs <- 3
haps[sample(seq(haps),size = num.specs,replace = F)]
[1] 3 5 4
That is a random subset of your vector haps ?

Not quite what you want (returns list of matrices instead of 3D array) but this might help
lapply(split(1:5, cut(1:5, breaks=c(0, 2, 5))), function(i) matrix(sample(i, 25, replace=TRUE), ncol=5))
Use cut and split to partition your vector of character labels before sampling them. Here I split your character labels at the value 2. Also, rather than sampling 5 numbers 5 times, you can sample 25 numbers once, and convert to matrix.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Combinations from grouped elements - r

You could use combn for getting all combinations of your groups for n=1, n=2, n=3 and n=4 and then use expand_grid: n = 2 apply(combn(1:length(vars_list), n), 2, function(x){expand.grid(vars_list[x])}) so for n=4, you would get the same as in your question. Is this what you meant?

Related

Assign the same index if two vectors have a common intersection

Subset in the data frame rows in R

Unique combinations of vector elements that fulfill criteria

Create a matrix from a list consisting of unequal matrices for individual bootstraps

Sampling from a subset of data

Categories

Resources