Getting all combinations in R, repetition allowed - r

The built-in combn only gives half the combinations:
> t(combn(1:5, 2))
[,1] [,2]
[1,] 1 2
[2,] 1 3
[3,] 1 4
[4,] 1 5
[5,] 2 3
[6,] 2 4
[7,] 2 5
[8,] 3 4
[9,] 3 5
[10,] 4 5
For example there is no (1,1) nor (2,1).
How can I get all combinations?

As #akrun said, it looks like expand.grid will do it.
> expand.grid(rep(list(1:5), 2))
Var1 Var2
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 1 2
7 2 2
8 3 2
9 4 2
10 5 2
11 1 3
12 2 3
13 3 3
14 4 3
15 5 3
16 1 4
17 2 4
18 3 4
19 4 4
20 5 4
21 1 5
22 2 5
23 3 5
24 4 5
25 5 5

You could get the Cartesian product using merge:
merge(1:5, 1:5)
Output:
x y
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 1 2
7 2 2
8 3 2
9 4 2
10 5 2
11 1 3
12 2 3
13 3 3
14 4 3
15 5 3
16 1 4
17 2 4
18 3 4
19 4 4
20 5 4
21 1 5
22 2 5
23 3 5
24 4 5
25 5 5
Using sqldf:
df1 <- data.frame(a = 1:5)
df2 <- df1
sqldf("SELECT df1.a, df2.a FROM df1
CROSS JOIN df2")

This is actually called as permutations with repeated elements. Besides the given recommendations, you can use gtools::permutations function:
gtools::permutations(5, 2, 1:5, repeats.allowed=TRUE)

Related

R: Separate data into combinations of two columns

I have some data where each id is measured by different types which can be have different values type_val. The measured value is val. A small dummy data is like this:
df <- data.frame(id=rep(letters[1:2],6),
type=c(rep('t1',6), rep('t2',6)),
type_val=rep(c(1,1,2,2,3,3),2),
val=1:12)
Then df is:
id type type_val val
1 a t1 1 1
2 b t1 1 2
3 a t1 2 3
4 b t1 2 4
5 a t1 3 5
6 b t1 3 6
7 a t2 1 7
8 b t2 1 8
9 a t2 2 9
10 b t2 2 10
11 a t2 3 11
12 b t2 3 12
I need to spread/cast data so that all combinations of type and type_val for each id are row-wise. I think this must be a job for pkgs reshape2 or tidyr but I have completely failed to generate anything other than errors.
The outcome data structure - somewhat redundant - would be something like this (hope I got it right!) where pairs of type (as given by combinations of the type_val) are columns type_t1 and type_t2 , and their associated values (val in df) are val_t1 and val_t2 - columns names are of cause arbitrary :
id type_t1 type_t2 val_t1 val_t2
1 a 1 1 1 7
2 a 1 2 1 9
3 a 1 3 1 11
4 a 2 1 3 7
5 a 2 2 3 9
6 a 2 3 3 11
7 a 3 1 5 7
8 a 3 2 5 9
9 a 3 3 5 11
10 b 1 1 2 8
11 b 1 2 2 10
12 b 1 3 2 12
13 b 2 1 4 8
14 b 2 2 4 10
15 b 2 3 4 12
16 b 3 1 6 8
17 b 3 2 6 10
18 b 3 3 6 12
UPDATE
Note that (#Sotos)
> spread(df, type, val)
id type_val t1 t2
1 a 1 1 7
2 a 2 3 9
3 a 3 5 11
4 b 1 2 8
5 b 2 4 10
6 b 3 6 12
is not the desired output - it fails to deliver the wide format defined by combinations of type and type_val in df.
how about this:
df1=df[df$type=="t1",]
df2=df[df$type=="t2",]
DF=merge(df1,df2,by="id")
DF=DF[,-c(2,5)]
colnames(DF)<-c("id", "type_t1", "val_t1","type_t2", "val_t2")
Here is something more generic that will work with an arbitrary number of unique type:
library(dplyr)
# This function takes a list of dataframes (.data) and merges them by ID
reduce_merge <- function(.data, ID) {
return(Reduce(function(x, y) merge(x, y, by = ID), .data))
}
# This function renames the cols columns in .data by appending _identifier
batch_rename <- function(.data, cols, identifier, sep = '_') {
return(plyr::rename(.data, sapply(cols, function(x){
x = paste(x, .data[1, identifier], sep = sep)
})))
}
# This function creates a list of subsetted dataframes
# (subsetted by values of key),
# uses batch_rename() to give each dataframe more informative column names,
# merges them together, and returns the columns you'd like in a sensible order
multi_spread <- function(.data, grp, key, vals) {
.data %>%
plyr::dlply(key, subset) %>%
lapply(batch_rename, vals, key) %>%
reduce_merge(grp) %>%
select(-starts_with(paste0(key, '.'))) %>%
select(id, sort(setdiff(colnames(.), c(grp, key, vals))))
}
# Your example
df <- data.frame(id=rep(letters[1:2],6),
type=c(rep('t1',6), rep('t2',6)),
type_val=rep(c(1,1,2,2,3,3),2),
val=1:12)
df %>% multi_spread('id', 'type', c('type_val', 'val'))
id type_val_t1 type_val_t2 val_t1 val_t2
1 a 1 1 1 7
2 a 1 2 1 9
3 a 1 3 1 11
4 a 2 1 3 7
5 a 2 2 3 9
6 a 2 3 3 11
7 a 3 1 5 7
8 a 3 2 5 9
9 a 3 3 5 11
10 b 1 1 2 8
11 b 1 2 2 10
12 b 1 3 2 12
13 b 2 1 4 8
14 b 2 2 4 10
15 b 2 3 4 12
16 b 3 1 6 8
17 b 3 2 6 10
18 b 3 3 6 12
# An example with three unique values of 'type'
df <- data.frame(id = rep(letters[1:2], 9),
type = c(rep('t1', 6), rep('t2', 6), rep('t3', 6)),
type_val = rep(c(1, 1, 2, 2, 3, 3), 3),
val = 1:18)
df %>% multi_spread('id', 'type', c('type_val', 'val'))
id type_val_t1 type_val_t2 type_val_t3 val_t1 val_t2 val_t3
1 a 1 1 1 1 7 13
2 a 1 1 2 1 7 15
3 a 1 1 3 1 7 17
4 a 1 2 1 1 9 13
5 a 1 2 2 1 9 15
6 a 1 2 3 1 9 17
7 a 1 3 1 1 11 13
8 a 1 3 2 1 11 15
9 a 1 3 3 1 11 17
10 a 2 1 1 3 7 13
11 a 2 1 2 3 7 15
12 a 2 1 3 3 7 17
13 a 2 2 1 3 9 13
14 a 2 2 2 3 9 15
15 a 2 2 3 3 9 17
16 a 2 3 1 3 11 13
17 a 2 3 2 3 11 15
18 a 2 3 3 3 11 17
19 a 3 1 1 5 7 13
20 a 3 1 2 5 7 15
21 a 3 1 3 5 7 17
22 a 3 2 1 5 9 13
23 a 3 2 2 5 9 15
24 a 3 2 3 5 9 17
25 a 3 3 1 5 11 13
26 a 3 3 2 5 11 15
27 a 3 3 3 5 11 17
28 b 1 1 1 2 8 14
29 b 1 1 2 2 8 16
30 b 1 1 3 2 8 18
31 b 1 2 1 2 10 14
32 b 1 2 2 2 10 16
33 b 1 2 3 2 10 18
34 b 1 3 1 2 12 14
35 b 1 3 2 2 12 16
36 b 1 3 3 2 12 18
37 b 2 1 1 4 8 14
38 b 2 1 2 4 8 16
39 b 2 1 3 4 8 18
40 b 2 2 1 4 10 14
41 b 2 2 2 4 10 16
42 b 2 2 3 4 10 18
43 b 2 3 1 4 12 14
44 b 2 3 2 4 12 16
45 b 2 3 3 4 12 18
46 b 3 1 1 6 8 14
47 b 3 1 2 6 8 16
48 b 3 1 3 6 8 18
49 b 3 2 1 6 10 14
50 b 3 2 2 6 10 16
51 b 3 2 3 6 10 18
52 b 3 3 1 6 12 14
53 b 3 3 2 6 12 16
54 b 3 3 3 6 12 18

Clogit function in CEDesign not converge

I designed a CE Experiment using the package support.CEs. I generated a CE Design with 3 attributes an 4 levels per attribute. The questionnaire had 4 alternatives and 4 blocks
des1 <- rotation.design(attribute.names = list(
Qualitat = c("Aigua potable", "Cosetes.blanques.flotant", "Aigua.pou", "Aigua.marro"),
Disponibilitat.acces = c("Aixeta.24h", "Aixeta.10h", "Diposit.comunitari", "Pou.a.20"),
Preu = c("No.problemes.€", "Esforç.economic", "No.pagues.acces", "No.pagues.no.acces")),
nalternatives = 4, nblocks = 4, row.renames = FALSE,
randomize = TRUE, seed = 987)
The questionnaire was replied by 15 persons (ID 1-15), so 60 outputs (15 persons responding per 4 blocks:
ID BLOCK q1 q2 q3 q4
1 1 1 1 2 3 3
2 1 2 1 3 3 4
3 1 3 5 1 3 5
4 1 4 5 2 2 5
5 2 1 1 2 4 3
6 2 2 1 4 3 4
7 2 3 3 1 3 2
8 2 4 1 2 2 2
9 3 1 1 2 2 2
10 3 2 1 4 3 4
11 3 3 3 1 3 4
12 3 4 3 2 1 4
13 4 1 1 5 4 3
14 4 2 1 4 5 4
15 4 3 5 5 3 2
16 4 4 5 2 5 5
17 5 1 1 2 4 2
18 5 2 3 2 3 2
19 5 3 3 1 3 4
20 5 4 3 2 1 4
21 6 1 1 5 5 5
22 6 2 1 3 3 4
23 6 3 3 1 3 4
24 6 4 1 2 2 2
25 7 1 1 2 4 3
26 7 2 4 2 3 4
27 7 3 3 1 3 3
28 7 4 3 4 5 5
29 8 1 1 3 2 3
30 8 2 1 4 3 4
31 8 3 3 1 3 4
32 8 4 1 2 2 1
33 9 1 1 2 3 3
34 9 2 1 3 3 4
35 9 3 5 1 3 5
36 9 4 5 2 2 5
37 15 1 1 5 5 5
38 15 2 4 4 5 4
39 15 3 5 5 3 5
40 15 4 4 3 5 5
41 11 1 1 5 5 5
42 11 2 4 4 5 4
43 11 3 5 5 3 5
44 11 4 5 3 5 5
45 12 1 1 2 4 3
46 12 2 4 2 3 4
47 12 3 3 1 3 3
48 12 4 3 4 5 5
49 13 1 1 2 2 2
50 13 2 1 4 3 4
51 13 3 3 1 3 2
52 13 4 1 2 2 2
53 14 1 1 1 3 3
54 14 2 1 4 1 4
55 14 3 4 1 3 2
56 14 4 3 2 1 2
57 15 1 1 1 3 2
58 15 2 5 2 1 4
59 15 3 4 4 3 1
60 15 4 3 4 1 4
The probles is that, when i merge the questions and answers matrix with the formula
dataset1 <- make.dataset(respondent.dataset = res1,
choice.indicators = c("q1","q2","q3","q4"),
design.matrix = desmat1)
R shows a warning message: In fitter(X, Y, strats, offset, init, control, weights = weights, :
Ran out of iterations and did not converge
I should expect that the matrix desmat1 generated had 4800 observations (80 possible combinations and 60 outputs). Instead of that i have only 1200 obseravations. The matrix dataset1 only shows the combination of 1 set of alternatives instead of the 4.
For example, for ID 1, Block 1, Question 1 only appears alternative 1. It match with the answer selected by the person, but in other cases it does not match, and that information is lost in R, so the results when clogit is applied are wrong.
I do hope thay the problems is understood.
Regards,
Edition:
I found my problem. When i make the dataset from the respondent.dataset that i generated in .csv format, r detects only the q1 response instead of q1-q4. dataset1
dataset1 <- make.dataset(respondent.dataset = res1,
choice.indicators = c("q1","q2","q3","q4"),
design.matrix = desmat1)
detects q1-q4 as new columns. But the key is that q1-q4 has to fill the columns QES in dataset1. I did another CE before with 1 block and the dataset was correctly done one reading the respondant.dataset. So the key point is that now i'm using 4 blocks but i do not know how to make R to interprete that q1-q4 are the columns QUES for each block.
res1 matrix (repondant.dataset) (Complete matriz has 60 rows = 15 respondants (ID 1-15) * 4 Questions (QES column in make.dataset)
Kind reagards,

Randomly Assign Integers in R within groups without replacement

I am running an experiment with two experiments: experiment_1 and experiment_2. Each experiment has 5 different treatments (i.e. 1, 2, 3, 4, 5). We are trying to randomly assign the treatments within groups.
We would like to do this via sampling without replacement iteratively within each group. We want to do this to insure that we get as a balanced a sample as possible in the treatment (e.g. we don't want to end up with 4 subjects in group 1 getting assigned to treatment 2 and no one getting treatment 1). So if a group has 23 subjects, we want to split the respondent into 4 subgroups of 5, and 1 subgroup of 3. We then want to randomly sample without replacement across the first subgroup of 5, so everyone gets assigned 1 of the treatments, do the same things for the the second, third and 4th subgroup of 5, and for the final subgroup of 3 randomly sample without replacement. So we would guarantee that every treatment is assigned to at least 4 subjects, and 3 are assigned to 5 subjects within this group. We would like to do this for all the groups in the experiment and for both treatments. The resultant output would look something like this...
group experiment_1 experiment_2
[1,] 1 5 3
[2,] 1 3 2
[3,] 1 4 4
[4,] 1 1 5
[5,] 1 2 1
[6,] 1 2 3
[7,] 1 4 1
[8,] 1 3 2
[9,] 2 5 5
[10,] 2 1 4
[11,] 2 3 4
[12,] 2 1 5
[13,] 2 2 1
. . . .
. . . .
. . . .
I know how to use the sample function, but am unsure how to sample without replacement within each group, so that our output corresponds to above described procedure. Any help would be appreciated.
I think we just need to shuffle sample IDs, see this example:
set.seed(124)
#prepare groups and samples(shuffled)
df <- data.frame(group=sort(rep(1:3,9)),
sampleID=sample(1:27,27))
#treatments repeated nrow of df
df$ex1 <- rep(c(1,2,3,4,5),ceiling(nrow(df)/5))[1:nrow(df)]
df$ex2 <- rep(c(2,3,4,5,1),ceiling(nrow(df)/5))[1:nrow(df)]
df <- df[ order(df$group,df$sampleID),]
#check treatment distribution
with(df,table(group,ex1))
# ex1
# group 1 2 3 4 5
# 1 2 2 2 2 1
# 2 2 2 2 1 2
# 3 2 2 1 2 2
with(df,table(group,ex2))
# ex2
# group 1 2 3 4 5
# 1 1 2 2 2 2
# 2 2 2 2 2 1
# 3 2 2 2 1 2
How about this function:
f <- function(n,m) {sample( c( rep(1:m,n%/%m), sample(1:m,n%%m) ), n )}
"n" is the group size, "m" the number of treatments.
Each treatment must be containt at least "n %/% m" times in the group.
The treatment numbers of the remaining "n %% m" group members are
assigned arbitrarily without repetition.
The vector "c( rep(1:m,n%/%m), sample(1:m,n%%m) )" contains these treatment numbers. Finally the "sample" function
perturbes these numbers.
> f(8,5)
[1] 5 3 1 5 4 2 2 1
> f(8,5)
[1] 4 5 3 4 2 2 1 1
> f(8,5)
[1] 4 2 1 5 3 5 2 3
Here is a function that creates a dataframe, using the above function:
Plan <- function( groupSizes, numExp=2, numTreatment=5 )
{
numGroups <- length(groupSizes)
df <- data.frame( group = rep(1:numGroups,groupSizes) )
for ( e in 1:numExp )
{
df <- cbind(df,unlist(lapply(groupSizes,function(n){f(n,numTreatment)})))
colnames(df)[e+1] <- sprintf("Exp_%i", e)
}
return(df)
}
Example:
> P <- Plan(c(8,23,13,19))
> P
group Exp_1 Exp_2
1 1 4 1
2 1 1 4
3 1 2 2
4 1 2 1
5 1 3 5
6 1 5 5
7 1 1 2
8 1 3 3
9 2 5 1
10 2 2 1
11 2 5 2
12 2 1 2
13 2 2 1
14 2 1 4
15 2 3 5
16 2 5 3
17 2 2 4
18 2 5 4
19 2 2 5
20 2 1 1
21 2 4 2
22 2 3 3
23 2 4 3
24 2 2 5
25 2 3 3
26 2 5 2
27 2 1 5
28 2 3 4
29 2 4 4
30 2 4 2
31 2 4 3
32 3 2 5
33 3 5 3
34 3 5 1
35 3 5 1
36 3 2 5
37 3 4 4
38 3 1 4
39 3 3 2
40 3 3 2
41 3 3 3
42 3 1 1
43 3 4 2
44 3 4 4
45 4 5 1
46 4 3 1
47 4 1 2
48 4 1 5
49 4 3 3
50 4 3 1
51 4 4 5
52 4 2 4
53 4 5 3
54 4 2 1
55 4 4 2
56 4 2 5
57 4 4 4
58 4 5 3
59 4 5 4
60 4 1 2
61 4 2 5
62 4 3 2
63 4 4 4
Check the distribution:
> with(P,table(group,Exp_1))
Exp_1
group 1 2 3 4 5
1 2 2 2 1 1
2 4 5 4 5 5
3 2 2 3 3 3
4 3 4 4 4 4
> with(P,table(group,Exp_2))
Exp_2
group 1 2 3 4 5
1 2 2 1 1 2
2 4 5 5 5 4
3 3 3 2 3 2
4 4 4 3 4 4
>
The design of efficient experiments is a science on its own and there are a few R-packages dealing with this issue:
https://cran.r-project.org/web/views/ExperimentalDesign.html
I am afraid your approach is not optimal regarding the resources, no matter how you create the samples...
However this might help:
n <- 23
group <- sort(rep(1:5, ceiling(n/5)))[1:n]
exp1 <- rep(NA, length(group))
for(i in 1:max(group)) {
exp1[which(group == i)] <- sample(1:5)[1:sum(group == i)]
}
Not exactly sure if this meets all your constraints, but you could use the randomizr package:
library(randomizr)
experiment_1 <- complete_ra(N = 23, num_arms = 5)
experiment_2 <- block_ra(experiment_1, num_arms = 5)
table(experiment_1)
table(experiment_2)
table(experiment_1, experiment_2)
Produces output like this:
> table(experiment_1)
experiment_1
T1 T2 T3 T4 T5
4 5 5 4 5
> table(experiment_2)
experiment_2
T1 T2 T3 T4 T5
6 3 6 4 4
> table(experiment_1, experiment_2)
experiment_2
experiment_1 T1 T2 T3 T4 T5
T1 2 0 1 1 0
T2 1 1 1 1 1
T3 1 1 1 1 1
T4 1 0 2 0 1
T5 1 1 1 1 1

How to change the way split returns values in R?

I'm working on a project and I want to take a matrix, split it by the values w and x, and then for each of those splits find the maximum value of y.
Here's an example matrix
>rah = cbind(w = 1:6, x = 1:3, y = 12:1, z = 1:12)
>rah
w x y z
[1,] 1 1 12 1
[2,] 2 2 11 2
[3,] 3 3 10 3
[4,] 4 1 9 4
[5,] 5 2 8 5
[6,] 6 3 7 6
[7,] 1 1 6 7
[8,] 2 2 5 8
[9,] 3 3 4 9
[10,] 4 1 3 10
[11,] 5 2 2 11
[12,] 6 3 1 12
So I run split
> doh = split(rah, list(rah[,1], rah[,2]))
> doh
$`1.1`
[1] 1 1 1 1 12 6 1 7
$`2.1`
integer(0)
$`3.1`
integer(0)
$`4.1`
[1] 4 4 1 1 9 3 4 10
$`5.1`
integer(0)
$`6.1`
integer(0)
$`1.2`
integer(0)
$`2.2`
[1] 2 2 2 2 11 5 2 8
$`3.2`
integer(0)
$`4.2`
integer(0)
$`5.2`
[1] 5 5 2 2 8 2 5 11
...
So I'm a bit confused as to how take the output of split and use it to sort the rows with the matching combination of w and x values (Such as row 1 compared to row 7) and then compared them to find the one with the high y value.
EDIT: Informative answers so far but I just realized that I forgot to mention one very important part: I want to keep the whole row (x,w,y,z).
Use aggregate instead
> aggregate(y ~ w + x, max, data=rah)
w x y
1 1 1 12
2 4 1 9
3 2 2 11
4 5 2 8
5 3 3 10
6 6 3 7
If you want to use split, try
> split_rah <- split(rah[,"y"], list(rah[, "w"], rah[, "x"]))
> ind <- sapply(split_rah, function(x) length(x)>0)
> sapply(split_rah[ind], max)
1.1 4.1 2.2 5.2 3.3 6.3
12 9 11 8 10 7
Just for the record, summaryBy from doBy package also works in the same fashion of aggregate
> library(doBy)
> summaryBy(y ~ w + x, FUN=max, data=as.data.frame(rah))
w x y.max
1 1 1 12
2 2 2 11
3 3 3 10
4 4 1 9
5 5 2 8
6 6 3 7
data.table solution:
> library(data.table)
> dt <- data.table(rah)
> dt[, max(y), by=list(w, x)]
w x V1
1: 1 1 12
2: 2 2 11
3: 3 3 10
4: 4 1 9
5: 5 2 8
6: 6 3 7
> tapply(rah[,"y"], list( rah[,"w"], rah[,"x"]), max)
1 2 3
1 12 NA NA
2 NA 11 NA
3 NA NA 10
4 9 NA NA
5 NA 8 NA
6 NA NA 7
Another option using plyr package:
ddply(as.data.frame(rah),.(w,x),summarize,z=max(y))
w x z
1 1 1 12
2 2 2 11
3 3 3 10
4 4 1 9
5 5 2 8
6 6 3 7

Is there an expand.grid like function in R, returning permutations?

to become more specific, here is an example:
> expand.grid(5, 5, c(1:4,6),c(1:4,6))
Var1 Var2 Var3 Var4
1 5 5 1 1
2 5 5 2 1
3 5 5 3 1
4 5 5 4 1
5 5 5 6 1
6 5 5 1 2
7 5 5 2 2
8 5 5 3 2
9 5 5 4 2
10 5 5 6 2
11 5 5 1 3
12 5 5 2 3
13 5 5 3 3
14 5 5 4 3
15 5 5 6 3
16 5 5 1 4
17 5 5 2 4
18 5 5 3 4
19 5 5 4 4
20 5 5 6 4
21 5 5 1 6
22 5 5 2 6
23 5 5 3 6
24 5 5 4 6
25 5 5 6 6
This data frame was created from all combinations of the supplied vectors. I would like to create a similar data frame from all permutations of the supplied vectors. Notice that each row must contain exactly 2 fives, yet not necessarily the fist two in line.
Thank you.
The code below works. (relies on permutations from gtools)
comb <- t(as.matrix(expand.grid(5, 5, c(1:4,6),c(1:4,6))))
perms <- t(permutations(4,4))
ans <- apply(comb,2,function(x) x[perms])
ans <- unique(matrix(as.vector(ans), ncol = 4, byrow = TRUE))
Try ?allPerms in the vegan package.

Resources