How do I add a vector where I collapse scores from individuals within pairs? - r

I have done an experiment in which participants have solved a task in pairs, with another participant. Each participant has then received a score for how well they did the task. Pairs have gone through different amounts of trials.
I have a data frame similar to the one below:
participant <- c(1,1,2,2,3,3,3,4,4,4,5,6)
pair <- c(1,1,1,1,2,2,2,2,2,2,3,3)
trial <- c(1,2,1,2,1,2,3,1,2,3,1,1)
score <- c(2,3,6,3,4,7,3,1,8,5,4,3)
data <- data.frame(participant, pair, trial, score)
participant pair trial score
1 1 1 2
1 1 2 3
2 1 1 6
2 1 2 3
3 2 1 4
3 2 2 7
3 2 3 3
4 2 1 1
4 2 2 8
4 2 3 5
5 3 1 4
6 3 1 3
I would like to add a new vector to the data frame, where each participant gets the numeric difference between their own score and the other participant's score within each trial.
Does someone have an idea about how one might do that?
It should end up looking something like this:
participant pair trial score difference
1 1 1 2 4
1 1 2 3 0
2 1 1 6 4
2 1 2 3 0
3 2 1 4 3
3 2 2 7 1
3 2 3 3 2
4 2 1 1 3
4 2 2 8 1
4 2 3 5 2
5 3 1 4 1
6 3 1 3 1

Here's a solution that involves first reordering data such that each sequential pair of rows corresponds to a single pair within a single trial. This allows us to make a single call to diff() to extract the differences:
data <- data[order(data$trial,data$pair,data$participant),];
data$diff <- rep(diff(data$score)[c(T,F)],each=2L)*c(-1L,1L);
data;
## participant pair trial score diff
## 1 1 1 1 2 -4
## 3 2 1 1 6 4
## 5 3 2 1 4 3
## 8 4 2 1 1 -3
## 11 5 3 1 4 1
## 12 6 3 1 3 -1
## 2 1 1 2 3 0
## 4 2 1 2 3 0
## 6 3 2 2 7 -1
## 9 4 2 2 8 1
## 7 3 2 3 3 -2
## 10 4 2 3 5 2
I assumed you wanted the sign to capture the direction of the difference. So, for instance, if a participant has a score 4 points below the other participant in the same trial-pair, then I assumed you would want -4. If you want all-positive values, you can remove the multiplication by c(-1L,1L) and add a call to abs():
data$diff <- rep(abs(diff(data$score)[c(T,F)]),each=2L);
data;
## participant pair trial score diff
## 1 1 1 1 2 4
## 3 2 1 1 6 4
## 5 3 2 1 4 3
## 8 4 2 1 1 3
## 11 5 3 1 4 1
## 12 6 3 1 3 1
## 2 1 1 2 3 0
## 4 2 1 2 3 0
## 6 3 2 2 7 1
## 9 4 2 2 8 1
## 7 3 2 3 3 2
## 10 4 2 3 5 2
Here's a solution built around ave() that doesn't require reordering the whole data.frame first:
data$diff <- ave(data$score,data$trial,data$pair,FUN=function(x) abs(diff(x)));
data;
## participant pair trial score diff
## 1 1 1 1 2 4
## 2 1 1 2 3 0
## 3 2 1 1 6 4
## 4 2 1 2 3 0
## 5 3 2 1 4 3
## 6 3 2 2 7 1
## 7 3 2 3 3 2
## 8 4 2 1 1 3
## 9 4 2 2 8 1
## 10 4 2 3 5 2
## 11 5 3 1 4 1
## 12 6 3 1 3 1
Here's how you can get the score of the other participant in the same trial-pair:
data$other <- ave(data$score,data$trial,data$pair,FUN=rev);
data;
## participant pair trial score other
## 1 1 1 1 2 6
## 2 1 1 2 3 3
## 3 2 1 1 6 2
## 4 2 1 2 3 3
## 5 3 2 1 4 1
## 6 3 2 2 7 8
## 7 3 2 3 3 5
## 8 4 2 1 1 4
## 9 4 2 2 8 7
## 10 4 2 3 5 3
## 11 5 3 1 4 3
## 12 6 3 1 3 4
Or, assuming the data.frame has been reordered as per the initial solution:
data$other <- c(rbind(data$score[c(F,T)],data$score[c(T,F)]));
data;
## participant pair trial score other
## 1 1 1 1 2 6
## 3 2 1 1 6 2
## 5 3 2 1 4 1
## 8 4 2 1 1 4
## 11 5 3 1 4 3
## 12 6 3 1 3 4
## 2 1 1 2 3 3
## 4 2 1 2 3 3
## 6 3 2 2 7 8
## 9 4 2 2 8 7
## 7 3 2 3 3 5
## 10 4 2 3 5 3
Alternative, using matrix() instead of rbind():
data$other <- c(matrix(data$score,2L)[2:1,]);
data;
## participant pair trial score other
## 1 1 1 1 2 6
## 3 2 1 1 6 2
## 5 3 2 1 4 1
## 8 4 2 1 1 4
## 11 5 3 1 4 3
## 12 6 3 1 3 4
## 2 1 1 2 3 3
## 4 2 1 2 3 3
## 6 3 2 2 7 8
## 9 4 2 2 8 7
## 7 3 2 3 3 5
## 10 4 2 3 5 3

Here is an option using data.table:
library(data.table)
setDT(data)[,difference := abs(diff(score)), by = .(pair, trial)]
data
# participant pair trial score difference
# 1: 1 1 1 2 4
# 2: 1 1 2 3 0
# 3: 2 1 1 6 4
# 4: 2 1 2 3 0
# 5: 3 2 1 4 3
# 6: 3 2 2 7 1
# 7: 3 2 3 3 2
# 8: 4 2 1 1 3
# 9: 4 2 2 8 1
#10: 4 2 3 5 2
#11: 5 3 1 4 1
#12: 6 3 1 3 1
A slightly faster option would be:
setDT(data)[, difference := abs((score - shift(score))[2]) , by = .(pair, trial)]
If we need the value of the other pair:
data[, other:= rev(score) , by = .(pair, trial)]
data
# participant pair trial score difference other
# 1: 1 1 1 2 4 6
# 2: 1 1 2 3 0 3
# 3: 2 1 1 6 4 2
# 4: 2 1 2 3 0 3
# 5: 3 2 1 4 3 1
# 6: 3 2 2 7 1 8
# 7: 3 2 3 3 2 5
# 8: 4 2 1 1 3 4
# 9: 4 2 2 8 1 7
#10: 4 2 3 5 2 3
#11: 5 3 1 4 1 3
#12: 6 3 1 3 1 4
Or using dplyr:
library(dplyr)
data %>%
group_by(pair, trial) %>%
mutate(difference = abs(diff(score)))
# participant pair trial score difference
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 1 1 2 4
#2 1 1 2 3 0
#3 2 1 1 6 4
#4 2 1 2 3 0
#5 3 2 1 4 3
#6 3 2 2 7 1
#7 3 2 3 3 2
#8 4 2 1 1 3
#9 4 2 2 8 1
#10 4 2 3 5 2
#11 5 3 1 4 1
#12 6 3 1 3 1

Related

How to extract or predict latent class membership in gmnl?

Let's say you run the example for a latent class model from ?gmnl:
library(mlogit)
library(gmnl)
## Examples using the Electricity data set from the mlogit package
data("Electricity", package = "mlogit")
Electr <- mlogit.data(Electricity, id.var = "id", choice = "choice",
varying = 3:26, shape = "wide", sep = "")
## Estimate a LC model with 2 classes
Elec.lc <- gmnl(choice ~ pf + cl + loc + wk + tod + seas| 0 | 0 | 0 | 1,
data = Electr,
subset = 1:3000,
model = 'lc',
panel = TRUE,
Q = 2)
summary(Elec.lc)
You get a fitted model with coefficient estimates for two classes (class 1 & 2). Is there a way to extract (or predict) for each observation, what the most likely class is that this observation belongs to?
After several helpful comments and lots of digging, it seems that there is an undocumented feature that allows you to get predicted class probabilities, which are stored in Wnq. You get one entry per observation and the number of columns matches the number of latent classes (Q = 2 from above), and entries sum to 1.
## Get class probabilities
head(Elec.lc$Wnq)
init
[1,] 0.5547805 0.4452195
[2,] 0.5547805 0.4452195
[3,] 0.5547805 0.4452195
[4,] 0.5547805 0.4452195
[5,] 0.5547805 0.4452195
[6,] 0.5547805 0.4452195
The fitted model contains a matrix called prob.alt which gives the probability of each choice, so you can do:
predictions <- apply(Elec.cor$prob.alt,1, which.max)
predictions
#> [1] 1 1 2 3 1 4 4 3 3 3 2 1 2 2 3 1 1 1 2 3 4 4 4 1 1 4 1 1 4 4 4 2 4 3 1 2 4
#> [38] 4 4 1 1 4 1 1 4 4 4 2 1 1 2 3 4 4 4 2 4 3 4 2 1 4 2 2 2 2 4 2 1 3 4 3 4 4
#> [75] 4 1 4 2 3 2 2 1 3 3 4 3 4 1 1 4 2 1 4 4 2 2 2 2 2 2 1 4 2 2 2 2 1 2 2 4 3
#> [112] 1 1 1 2 3 4 4 4 2 4 3 4 1 1 4 2 1 4 4 2 2 1 4 2 2 2 2 1 2 1 2 4 3 2 2 2 2
#> [149] 1 4 2 2 2 1 2 1 4 3 2 2 2 1 2 1 1 4 2 1 4 2 2 2 2 1 2 1 1 4 3 2 2 2 2 1 4
#> [186] 2 2 2 2 4 2 1 4 3 2 2 2 2 2 1 1 4 2 1 4 4 3 2 2 4 4 1 3 4 1 2 4 3 1 1 1 2
#> [223] 3 4 4 4 1 2 4 2 3 4 4 1 3 4 2 3 3 2 4 1 1 4 4 4 2 1 3 1 2 1 1 2 3 1 4 4 2
#> [260] 4 3 2 1 2 4 2 3 3 4 1 3 4 2 3 3 4 4 4 4 4 1 3 2 3 1 3 3 1 4 2 1 4 4 2 2 1
#> [297] 3 1 1 4 2 4 1 2 4 1 1 4 4 4 2 1 1 2 3 4 4 4 2 4 3 4 1 1 1 2 3 1 4 4 3 4 3
#> [334] 2 1 1 4 1 1 4 4 2 2 1 3 1 3 1 4 2 2 2 2 1 2 1 3 4 3 2 2 2 2 1 4 3 2 2 2 1
#> [371] 2 4 4 1 3 4 2 3 3 2 1 3 3 3 3 4 1 1 4 1 1 4 4 2 2 2 4 2 3 4 4 4 1 4 2 3 2
#> [408] 1 4 3 2 2 2 1 2 1 1 4 3 1 1 2 3 4 4 4 3 3 3 2 1 2 4 3 4 4 4 3 4 3 4 3 4 1
#> [445] 1 4 1 1 4 4 4 2 1 4 2 2 2 2 1 2 1 3 4 3 1 4 2 2 2 2 1 2 4 2 4 3 3 3 4 1 1
#> [482] 4 2 1 4 4 2 2 2 2 3 1 1 1 2 3 4 4 4 2 2 4 2 3 4 4 4 3 4 2 3 2 2 4 2 3 4 4
#> [519] 1 1 4 2 3 2 2 4 1 1 4 4 4 2 2 3 1 3 2 1 2 2 1 4 4 2 2 2 4 2 1 4 3 2 2 2 4
#> [556] 2 1 1 4 2 1 4 2 2 2 2 1 2 1 2 4 3 1 1 2 3 4 4 4 2 4 3 4 2 4 4 4 3 4 2 3 3
#> [593] 3 1 3 3 1 1 2 3 1 4 4 3 4 3 2 1 2 2 2 2 1 4 3 2 2 2 2 2 2 4 2 3 3 4 1 3 4
#> [630] 2 3 3 2 3 1 1 4 4 4 2 2 3 1 3 1 1 2 3 1 4 4 3 3 3 4 1 4 4 4 3 4 1 4 3 1 1
#> [667] 3 3 2 2 3 1 1 1 2 3 1 4 4 2 1 4 2 2 2 2 1 2 1 1 4 2 1 1 2 3 4 4 4 2 4 3 4
#> [704] 1 2 2 2 2 1 4 2 2 2 2 4 2 2 2 2 2 1 4 3 2 2 2 4 2 1 4 2 2 2 2 4 2 1 3 4 3
#> [741] 1 4 3 2 2 2 2 2 1 1
If we compare these predictions to the actual choice, we see that the prediction is correct about 50% of the time (the values in the diagonal are correct):
table(predictions, Electricity$choice[1:750])
#>
#> predictions 1 2 3 4
#> 1 78 35 28 32
#> 2 40 129 40 33
#> 3 16 27 57 24
#> 4 27 36 38 110
Created on 2022-08-06 by the reprex package (v2.0.1)
I have a feeling that this object Wnq is not class membership probabilities though.
Even in your example above, when calling Elec.lc$Wnq, you seem to have obtained a list of probabilities of class membership for your individuals, but critically they are all equal across individuals.
When looking for this I also found myself with the same problem. I think Elec.lc$Wnq is just the mean of class membership probabilities.
I have not looked throughly in the gmnl code, but I think the object Qir is what you should look for ?

Group by each increasing sequence in data frame

If I have a data frame with a column of monotonically increasing values such as:
x
1
2
3
4
1
2
3
1
2
3
4
5
6
1
2
How do I add a column to group each increasing sequence that results in:
x y
1 1
2 1
3 1
4 1
1 2
2 2
3 2
1 3
2 3
3 3
4 3
5 3
6 3
1 4
2 4
I can only think of using a loop which will be slow.
You may choose cumsum function to do it.
> x <- c(1,2,3,4,1,2,3,1,2,4,5,1,2)
> cumsum(x==1)
[1] 1 1 1 1 2 2 2 3 3 3 3 4 4
I would use diff and compute the cumulative sum:
df$y <- c(1, cumsum(diff(df$x) < 0 ) + 1)
> df
x y
1 1 1
2 2 1
3 3 1
4 4 1
5 1 2
6 2 2
7 3 2
8 1 3
9 2 3
10 3 3
11 4 3
12 5 3
13 6 3
14 1 4
15 2 4

How to calculate recency in R

I have the following data:
set.seed(20)
round<-rep(1:10,2)
part<-rep(1:2, c(10,10))
game<-rep(rep(1:2,c(5,5)),2)
pay1<-sample(1:10,20,replace=TRUE)
pay2<-sample(1:10,20,replace=TRUE)
pay3<-sample(1:10,20,replace=TRUE)
decs<-sample(1:3,20,replace=TRUE)
previous_max<-c(0,1,0,0,0,0,0,1,0,0,0,0,1,1,1,0,0,1,1,0)
gamematrix<-cbind(part,game,round,pay1,pay2,pay3,decs,previous_max )
gamematrix<-data.frame(gamematrix)
Here is the output:
part game round pay1 pay2 pay3 decs previous_max
1 1 1 1 9 5 6 2 0
2 1 1 2 8 1 1 1 1
3 1 1 3 3 5 5 3 0
4 1 1 4 6 1 5 1 0
5 1 1 5 10 3 8 3 0
6 1 2 6 10 1 5 1 0
7 1 2 7 1 10 7 3 0
8 1 2 8 1 10 8 2 1
9 1 2 9 4 1 5 1 0
10 1 2 10 4 7 7 2 0
11 2 1 1 8 4 1 1 0
12 2 1 2 8 5 5 2 0
13 2 1 3 1 9 3 1 1
14 2 1 4 8 2 10 2 1
15 2 1 5 2 6 2 3 1
16 2 2 6 5 5 6 2 0
17 2 2 7 4 5 1 2 0
18 2 2 8 2 10 5 2 1
19 2 2 9 3 7 3 2 1
20 2 2 10 9 3 1 1 0
How can I calculate a new indicator variable "previous_max",which returns whether in the next round of the same game, the same participant choose the maximal payoff from the previous round.
So I want something like follows:
Participant (part) 1:
In the first round of each game, previous_max is "0" (no previous round), in round 2, previous_max ="1", because in round 1, the maximal pay was max(pay1,pay2,pay3)=max(9,5,6)=9, and in round 2, the participant's decisions (decs) was 1 (which was the maximal value in previous round).
In round 3, previous_max=0, because the maximal value in round 2 was 8 (which is "pay1"), but the participant choose "3" (which is pay3).
Here's a solution using dplyr and purr::map.
I would have preferred to use group_by than split but max.col ignores groups and I don't know of a dplyr equivalent`.
the output is slightly different but I think it's because of your mistakes, please explain if not and I'll update my answer.
library(purrr)
library(dplyr)
gamematrix %>%
split(.$part) %>%
map(~ .x %>% mutate(
prev_max = as.integer(
decs ==
c(0,max.col(.[c("pay1","pay2","pay3")])[-n()]) # the number of the max columns, offset by one
))) %>%
bind_rows
# ` part game round pay1 pay2 pay3 decs prev_max
# 1 1 1 1 9 5 6 2 0
# 2 1 1 2 8 1 1 1 1
# 3 1 1 3 3 5 5 3 0
# 4 1 1 4 6 1 5 1 0
# 5 1 1 5 10 3 8 3 0
# 6 1 2 6 10 1 5 1 1
# 7 1 2 7 1 10 7 3 0
# 8 1 2 8 1 10 8 2 1
# 9 1 2 9 4 1 5 1 0
# 10 1 2 10 4 7 7 2 0
# 11 2 1 1 8 4 1 1 0
# 12 2 1 2 8 5 5 2 0
# 13 2 1 3 1 9 3 1 1
# 14 2 1 4 8 2 10 2 1
# 15 2 1 5 2 6 2 3 1
# 16 2 2 6 5 5 6 2 1
# 17 2 2 7 4 5 1 2 0
# 18 2 2 8 2 10 5 2 1
# 19 2 2 9 3 7 3 2 1
# 20 2 2 10 9 3 1 1 0

Can I have different aggregation rules for different columns in acast?

Brain afunctional today: How do I tell acast to return different aggregations?
# the rows and columns have integer names
Rgames> foo
1 2
1 1 1
2 2 2
3 3 3
4 4 4
1 1 4
2 2 8
3 3 2
4 4 1
Rgames> mfoo<-melt(foo)
Rgames> mfoo
Var1 Var2 value
1 1 1 1
2 2 1 2
3 3 1 3
4 4 1 4
5 1 1 1
6 2 1 2
7 3 1 3
8 4 1 4
9 1 2 1
10 2 2 2
11 3 2 3
12 4 2 4
13 1 2 4
14 2 2 8
15 3 2 2
16 4 2 1
Rgames> acast(mfoo,Var1~Var2,function(x)x[1]-x[2])
1 2
1 0 -3
2 0 -6
3 0 1
4 0 3
# what I would like is the casting formula to return
1 2
1 1 -3
2 2 -6
3 3 1
4 4 3
With the caveat that this is a simple example. In the general case, there will be rows with unique names -- but never more than two rows with a given name, so my x[1]-x[2] won't ever fail.
Or should I just use this:
aggregate(foo[,2],by=list((foo[,1])),function(x)x[1]-x[2])

Episode count for each row

I'm sure this has been asked before but for the life of me I can't figure out what to search for!
I have the following data:
x y
1 3
1 3
1 3
1 2
1 2
2 2
2 4
3 4
3 4
And I would like to output a running count that resets everytime either x or y changes value.
x y o
1 3 1
1 3 2
1 3 3
1 2 1
1 2 2
2 2 1
2 4 1
3 4 1
3 4 2
Try something like
df<-read.table(header=T,text="x y
1 3
1 3
1 3
1 2
1 2
2 2
2 4
3 4
3 4")
cbind(df,o=sequence(rle(paste(df$x,df$y))$lengths))
> cbind(df,o=sequence(rle(paste(df$x,df$y))$lengths))
x y o
1 1 3 1
2 1 3 2
3 1 3 3
4 1 2 1
5 1 2 2
6 2 2 1
7 2 4 1
8 3 4 1
9 3 4 2
After seeing #ttmaccer's I see my first attempt with ave was wrong and this is perhaps what is needed:
> dat$o <- ave(dat$y, list(dat$y, dat$x), FUN=seq )
# there was a warning but the answer is corect.
> dat
x y o
1 1 3 1
2 1 3 2
3 1 3 3
4 1 2 1
5 1 2 2
6 2 2 1
7 2 4 1
8 3 4 1
9 3 4 2

Resources