Generating all possible outcomes in r - r

Given a vector with numeric values, how do I generate all possible outcomes for subtraction to find the differences and put them in a data.frame?
dataset1 <- data.frame(numbers = c(1,2,3,4,5,6,7,8,9,10))
i.e. (1 - 1, 1 - 2 , 1 - 3,...)
Ideally, I would want the output to give me a data frame with 3 columns (Number X, Number Y, Difference) using dataset1.

The expand.grid function can get you "pairings" which are different than the pairings you get with combn. Since you included 1-1 I'm assuming you didn't want since it doesn't return 1-1 and only gives you 45 combinations.
> pairs=expand.grid(X=1:10, Y=1:10)
> pairs$diff <- with(pairs, X-Y)
> pairs
X Y diff
1 1 1 0
2 2 1 1
3 3 1 2
4 4 1 3
5 5 1 4
6 6 1 5
7 7 1 6
8 8 1 7
9 9 1 8
10 10 1 9
11 1 2 -1
12 2 2 0
13 3 2 1
14 4 2 2
15 5 2 3
16 6 2 4
17 7 2 5
snipped remainder (total of 100 rows)
Use outer as another way to get such a group of paired differences;
> tbl <- matrix( outer(X=1:10, Y=1:10, "-"), 10, dimnames=list(X=1:10, Y=1:10))
> tbl
Y
X 1 2 3 4 5 6 7 8 9 10
1 0 -1 -2 -3 -4 -5 -6 -7 -8 -9
2 1 0 -1 -2 -3 -4 -5 -6 -7 -8
3 2 1 0 -1 -2 -3 -4 -5 -6 -7
4 3 2 1 0 -1 -2 -3 -4 -5 -6
5 4 3 2 1 0 -1 -2 -3 -4 -5
6 5 4 3 2 1 0 -1 -2 -3 -4
7 6 5 4 3 2 1 0 -1 -2 -3
8 7 6 5 4 3 2 1 0 -1 -2
9 8 7 6 5 4 3 2 1 0 -1
10 9 8 7 6 5 4 3 2 1 0
But I didn't see a compact way to create a dataframe of the sort you specified.
The now deleted comment by #RitchieSacramento iswas correct:
> tbl <- matrix( outer(X=1:10, Y=1:10, "-"), 10, dimnames=list(X=1:10, Y=1:10))
> as.data.frame.table(tbl)
X Y Freq
1 1 1 0
2 2 1 1
3 3 1 2
4 4 1 3
5 5 1 4
6 6 1 5
7 7 1 6
8 8 1 7
9 9 1 8
10 10 1 9
11 1 2 -1
12 2 2 0
13 3 2 1
14 4 2 2
15 5 2 3
16 6 2 4

You can use the combn() function to generate the list of all combinations take 2 at a time.
numbers = c(1,2,3,4,5,6,7,8,9,10)
output <-combn(numbers, 2, FUN = NULL, simplify = TRUE )
answer <- as.data.frame(t(output))
answer$Difference <- answer[ ,1] - answer[ ,2]
head(answer)
V1 V2 Difference
1 1 2 -1
2 1 3 -2
3 1 4 -3
4 1 5 -4
5 1 6 -5
6 1 7 -6

Related

for loop for unique values with more than 1 occurrence

I have a questionnaire (yes it is the same one from all my questions so far...A for loop for multiple likert graphs returns NOTHING)
Now I am evaluating the answers per geographical region (11 regions), which I want to do with a for loop.
The liker package used to make my graphs won't plot any graph for <=1 non-NA answers. So neither of these work properly:
My (hypothetical) data looks something like this (-9 denotes NA):
M1 M2 M3 M4 M5 M6 M7 M8 M9 group
1 1 5 5 1 2 4 4 -9 5 1
2 2 4 5 1 2 4 4 1 5 1
3 3 3 5 1 2 4 3 1 3 1
4 1 5 5 1 2 4 4 -9 5 1
5 2 4 5 1 2 4 4 1 5 2
6 1 5 5 1 2 4 4 -9 5 2
7 2 4 5 1 2 4 4 1 5 2
8 3 3 5 1 2 4 3 1 3 3
9 4 5 5 1 2 4 -9 1 3 3
10 5 5 -9 1 3 4 4 2 -9 3
11 3 3 5 1 2 4 3 1 3 3
12 4 5 5 1 2 4 -9 1 3 4
13 5 5 -9 1 3 4 4 2 -9 3
14 5 5 -9 1 3 4 4 2 -9 3
15 3 3 5 1 2 4 3 1 3 4
16 1 5 5 1 2 4 4 -9 5 4
17 2 4 5 1 2 4 4 1 5 4
18 1 5 5 1 2 4 4 -9 5 4
19 2 4 5 1 2 4 4 1 5 4
20 3 3 5 1 2 4 3 1 3 4
21 -9 -9 -9 -9 -9 -9 -9 -9 -9 5
22 1 1 1 1 1 1 1 1 1 6
These two version of for-looping will not work:
for (i in 1:5)
for (i in unique(mydata$group))
because: Error in FUN(X[[i]], ...) : object 'pos' not found
Which I think is because group 5 contains only N/As and group 6 contains only one sample.
So I need a function executable via my for loop which only creates values for i that have more than 2 rows which are non-N/A. Any ideas?
You can ask inside your loop if the condition is given that there are more than 2 rows which are non-N/A.
mydata <- read.table(text = "
M1 M2 M3 M4 M5 M6 M7 M8 M9 group
1 1 5 5 1 2 4 4 -9 5 1
2 2 4 5 1 2 4 4 1 5 1
3 3 3 5 1 2 4 3 1 3 1
4 1 5 5 1 2 4 4 -9 5 1
5 2 4 5 1 2 4 4 1 5 2
6 1 5 5 1 2 4 4 -9 5 2
7 2 4 5 1 2 4 4 1 5 2
8 3 3 5 1 2 4 3 1 3 3
9 4 5 5 1 2 4 -9 1 3 3
10 5 5 -9 1 3 4 4 2 -9 3
11 3 3 5 1 2 4 3 1 3 3
12 4 5 5 1 2 4 -9 1 3 4
13 5 5 -9 1 3 4 4 2 -9 3
14 5 5 -9 1 3 4 4 2 -9 3
15 3 3 5 1 2 4 3 1 3 4
16 1 5 5 1 2 4 4 -9 5 4
17 2 4 5 1 2 4 4 1 5 4
18 1 5 5 1 2 4 4 -9 5 4
19 2 4 5 1 2 4 4 1 5 4
20 3 3 5 1 2 4 3 1 3 4
21 -9 -9 -9 -9 -9 -9 -9 -9 -9 5
22 1 1 1 1 1 1 1 1 1 6
", header=T, na.strings="-9")
for (i in unique(mydata$group)) {
x <- mydata[mydata$group==i,]
if(sum(complete.cases(x)) > 2) { #more than 2 rows which are non-N/A?
plot(x[1:9])
}
}

Group by each increasing sequence in data frame

If I have a data frame with a column of monotonically increasing values such as:
x
1
2
3
4
1
2
3
1
2
3
4
5
6
1
2
How do I add a column to group each increasing sequence that results in:
x y
1 1
2 1
3 1
4 1
1 2
2 2
3 2
1 3
2 3
3 3
4 3
5 3
6 3
1 4
2 4
I can only think of using a loop which will be slow.
You may choose cumsum function to do it.
> x <- c(1,2,3,4,1,2,3,1,2,4,5,1,2)
> cumsum(x==1)
[1] 1 1 1 1 2 2 2 3 3 3 3 4 4
I would use diff and compute the cumulative sum:
df$y <- c(1, cumsum(diff(df$x) < 0 ) + 1)
> df
x y
1 1 1
2 2 1
3 3 1
4 4 1
5 1 2
6 2 2
7 3 2
8 1 3
9 2 3
10 3 3
11 4 3
12 5 3
13 6 3
14 1 4
15 2 4

How to calculate recency in R

I have the following data:
set.seed(20)
round<-rep(1:10,2)
part<-rep(1:2, c(10,10))
game<-rep(rep(1:2,c(5,5)),2)
pay1<-sample(1:10,20,replace=TRUE)
pay2<-sample(1:10,20,replace=TRUE)
pay3<-sample(1:10,20,replace=TRUE)
decs<-sample(1:3,20,replace=TRUE)
previous_max<-c(0,1,0,0,0,0,0,1,0,0,0,0,1,1,1,0,0,1,1,0)
gamematrix<-cbind(part,game,round,pay1,pay2,pay3,decs,previous_max )
gamematrix<-data.frame(gamematrix)
Here is the output:
part game round pay1 pay2 pay3 decs previous_max
1 1 1 1 9 5 6 2 0
2 1 1 2 8 1 1 1 1
3 1 1 3 3 5 5 3 0
4 1 1 4 6 1 5 1 0
5 1 1 5 10 3 8 3 0
6 1 2 6 10 1 5 1 0
7 1 2 7 1 10 7 3 0
8 1 2 8 1 10 8 2 1
9 1 2 9 4 1 5 1 0
10 1 2 10 4 7 7 2 0
11 2 1 1 8 4 1 1 0
12 2 1 2 8 5 5 2 0
13 2 1 3 1 9 3 1 1
14 2 1 4 8 2 10 2 1
15 2 1 5 2 6 2 3 1
16 2 2 6 5 5 6 2 0
17 2 2 7 4 5 1 2 0
18 2 2 8 2 10 5 2 1
19 2 2 9 3 7 3 2 1
20 2 2 10 9 3 1 1 0
How can I calculate a new indicator variable "previous_max",which returns whether in the next round of the same game, the same participant choose the maximal payoff from the previous round.
So I want something like follows:
Participant (part) 1:
In the first round of each game, previous_max is "0" (no previous round), in round 2, previous_max ="1", because in round 1, the maximal pay was max(pay1,pay2,pay3)=max(9,5,6)=9, and in round 2, the participant's decisions (decs) was 1 (which was the maximal value in previous round).
In round 3, previous_max=0, because the maximal value in round 2 was 8 (which is "pay1"), but the participant choose "3" (which is pay3).
Here's a solution using dplyr and purr::map.
I would have preferred to use group_by than split but max.col ignores groups and I don't know of a dplyr equivalent`.
the output is slightly different but I think it's because of your mistakes, please explain if not and I'll update my answer.
library(purrr)
library(dplyr)
gamematrix %>%
split(.$part) %>%
map(~ .x %>% mutate(
prev_max = as.integer(
decs ==
c(0,max.col(.[c("pay1","pay2","pay3")])[-n()]) # the number of the max columns, offset by one
))) %>%
bind_rows
# ` part game round pay1 pay2 pay3 decs prev_max
# 1 1 1 1 9 5 6 2 0
# 2 1 1 2 8 1 1 1 1
# 3 1 1 3 3 5 5 3 0
# 4 1 1 4 6 1 5 1 0
# 5 1 1 5 10 3 8 3 0
# 6 1 2 6 10 1 5 1 1
# 7 1 2 7 1 10 7 3 0
# 8 1 2 8 1 10 8 2 1
# 9 1 2 9 4 1 5 1 0
# 10 1 2 10 4 7 7 2 0
# 11 2 1 1 8 4 1 1 0
# 12 2 1 2 8 5 5 2 0
# 13 2 1 3 1 9 3 1 1
# 14 2 1 4 8 2 10 2 1
# 15 2 1 5 2 6 2 3 1
# 16 2 2 6 5 5 6 2 1
# 17 2 2 7 4 5 1 2 0
# 18 2 2 8 2 10 5 2 1
# 19 2 2 9 3 7 3 2 1
# 20 2 2 10 9 3 1 1 0

Randomly Assign Integers in R within groups without replacement

I am running an experiment with two experiments: experiment_1 and experiment_2. Each experiment has 5 different treatments (i.e. 1, 2, 3, 4, 5). We are trying to randomly assign the treatments within groups.
We would like to do this via sampling without replacement iteratively within each group. We want to do this to insure that we get as a balanced a sample as possible in the treatment (e.g. we don't want to end up with 4 subjects in group 1 getting assigned to treatment 2 and no one getting treatment 1). So if a group has 23 subjects, we want to split the respondent into 4 subgroups of 5, and 1 subgroup of 3. We then want to randomly sample without replacement across the first subgroup of 5, so everyone gets assigned 1 of the treatments, do the same things for the the second, third and 4th subgroup of 5, and for the final subgroup of 3 randomly sample without replacement. So we would guarantee that every treatment is assigned to at least 4 subjects, and 3 are assigned to 5 subjects within this group. We would like to do this for all the groups in the experiment and for both treatments. The resultant output would look something like this...
group experiment_1 experiment_2
[1,] 1 5 3
[2,] 1 3 2
[3,] 1 4 4
[4,] 1 1 5
[5,] 1 2 1
[6,] 1 2 3
[7,] 1 4 1
[8,] 1 3 2
[9,] 2 5 5
[10,] 2 1 4
[11,] 2 3 4
[12,] 2 1 5
[13,] 2 2 1
. . . .
. . . .
. . . .
I know how to use the sample function, but am unsure how to sample without replacement within each group, so that our output corresponds to above described procedure. Any help would be appreciated.
I think we just need to shuffle sample IDs, see this example:
set.seed(124)
#prepare groups and samples(shuffled)
df <- data.frame(group=sort(rep(1:3,9)),
sampleID=sample(1:27,27))
#treatments repeated nrow of df
df$ex1 <- rep(c(1,2,3,4,5),ceiling(nrow(df)/5))[1:nrow(df)]
df$ex2 <- rep(c(2,3,4,5,1),ceiling(nrow(df)/5))[1:nrow(df)]
df <- df[ order(df$group,df$sampleID),]
#check treatment distribution
with(df,table(group,ex1))
# ex1
# group 1 2 3 4 5
# 1 2 2 2 2 1
# 2 2 2 2 1 2
# 3 2 2 1 2 2
with(df,table(group,ex2))
# ex2
# group 1 2 3 4 5
# 1 1 2 2 2 2
# 2 2 2 2 2 1
# 3 2 2 2 1 2
How about this function:
f <- function(n,m) {sample( c( rep(1:m,n%/%m), sample(1:m,n%%m) ), n )}
"n" is the group size, "m" the number of treatments.
Each treatment must be containt at least "n %/% m" times in the group.
The treatment numbers of the remaining "n %% m" group members are
assigned arbitrarily without repetition.
The vector "c( rep(1:m,n%/%m), sample(1:m,n%%m) )" contains these treatment numbers. Finally the "sample" function
perturbes these numbers.
> f(8,5)
[1] 5 3 1 5 4 2 2 1
> f(8,5)
[1] 4 5 3 4 2 2 1 1
> f(8,5)
[1] 4 2 1 5 3 5 2 3
Here is a function that creates a dataframe, using the above function:
Plan <- function( groupSizes, numExp=2, numTreatment=5 )
{
numGroups <- length(groupSizes)
df <- data.frame( group = rep(1:numGroups,groupSizes) )
for ( e in 1:numExp )
{
df <- cbind(df,unlist(lapply(groupSizes,function(n){f(n,numTreatment)})))
colnames(df)[e+1] <- sprintf("Exp_%i", e)
}
return(df)
}
Example:
> P <- Plan(c(8,23,13,19))
> P
group Exp_1 Exp_2
1 1 4 1
2 1 1 4
3 1 2 2
4 1 2 1
5 1 3 5
6 1 5 5
7 1 1 2
8 1 3 3
9 2 5 1
10 2 2 1
11 2 5 2
12 2 1 2
13 2 2 1
14 2 1 4
15 2 3 5
16 2 5 3
17 2 2 4
18 2 5 4
19 2 2 5
20 2 1 1
21 2 4 2
22 2 3 3
23 2 4 3
24 2 2 5
25 2 3 3
26 2 5 2
27 2 1 5
28 2 3 4
29 2 4 4
30 2 4 2
31 2 4 3
32 3 2 5
33 3 5 3
34 3 5 1
35 3 5 1
36 3 2 5
37 3 4 4
38 3 1 4
39 3 3 2
40 3 3 2
41 3 3 3
42 3 1 1
43 3 4 2
44 3 4 4
45 4 5 1
46 4 3 1
47 4 1 2
48 4 1 5
49 4 3 3
50 4 3 1
51 4 4 5
52 4 2 4
53 4 5 3
54 4 2 1
55 4 4 2
56 4 2 5
57 4 4 4
58 4 5 3
59 4 5 4
60 4 1 2
61 4 2 5
62 4 3 2
63 4 4 4
Check the distribution:
> with(P,table(group,Exp_1))
Exp_1
group 1 2 3 4 5
1 2 2 2 1 1
2 4 5 4 5 5
3 2 2 3 3 3
4 3 4 4 4 4
> with(P,table(group,Exp_2))
Exp_2
group 1 2 3 4 5
1 2 2 1 1 2
2 4 5 5 5 4
3 3 3 2 3 2
4 4 4 3 4 4
>
The design of efficient experiments is a science on its own and there are a few R-packages dealing with this issue:
https://cran.r-project.org/web/views/ExperimentalDesign.html
I am afraid your approach is not optimal regarding the resources, no matter how you create the samples...
However this might help:
n <- 23
group <- sort(rep(1:5, ceiling(n/5)))[1:n]
exp1 <- rep(NA, length(group))
for(i in 1:max(group)) {
exp1[which(group == i)] <- sample(1:5)[1:sum(group == i)]
}
Not exactly sure if this meets all your constraints, but you could use the randomizr package:
library(randomizr)
experiment_1 <- complete_ra(N = 23, num_arms = 5)
experiment_2 <- block_ra(experiment_1, num_arms = 5)
table(experiment_1)
table(experiment_2)
table(experiment_1, experiment_2)
Produces output like this:
> table(experiment_1)
experiment_1
T1 T2 T3 T4 T5
4 5 5 4 5
> table(experiment_2)
experiment_2
T1 T2 T3 T4 T5
6 3 6 4 4
> table(experiment_1, experiment_2)
experiment_2
experiment_1 T1 T2 T3 T4 T5
T1 2 0 1 1 0
T2 1 1 1 1 1
T3 1 1 1 1 1
T4 1 0 2 0 1
T5 1 1 1 1 1

Can I have different aggregation rules for different columns in acast?

Brain afunctional today: How do I tell acast to return different aggregations?
# the rows and columns have integer names
Rgames> foo
1 2
1 1 1
2 2 2
3 3 3
4 4 4
1 1 4
2 2 8
3 3 2
4 4 1
Rgames> mfoo<-melt(foo)
Rgames> mfoo
Var1 Var2 value
1 1 1 1
2 2 1 2
3 3 1 3
4 4 1 4
5 1 1 1
6 2 1 2
7 3 1 3
8 4 1 4
9 1 2 1
10 2 2 2
11 3 2 3
12 4 2 4
13 1 2 4
14 2 2 8
15 3 2 2
16 4 2 1
Rgames> acast(mfoo,Var1~Var2,function(x)x[1]-x[2])
1 2
1 0 -3
2 0 -6
3 0 1
4 0 3
# what I would like is the casting formula to return
1 2
1 1 -3
2 2 -6
3 3 1
4 4 3
With the caveat that this is a simple example. In the general case, there will be rows with unique names -- but never more than two rows with a given name, so my x[1]-x[2] won't ever fail.
Or should I just use this:
aggregate(foo[,2],by=list((foo[,1])),function(x)x[1]-x[2])

Resources