How to calculate recency in R - r

I have the following data:
set.seed(20)
round<-rep(1:10,2)
part<-rep(1:2, c(10,10))
game<-rep(rep(1:2,c(5,5)),2)
pay1<-sample(1:10,20,replace=TRUE)
pay2<-sample(1:10,20,replace=TRUE)
pay3<-sample(1:10,20,replace=TRUE)
decs<-sample(1:3,20,replace=TRUE)
previous_max<-c(0,1,0,0,0,0,0,1,0,0,0,0,1,1,1,0,0,1,1,0)
gamematrix<-cbind(part,game,round,pay1,pay2,pay3,decs,previous_max )
gamematrix<-data.frame(gamematrix)
Here is the output:
part game round pay1 pay2 pay3 decs previous_max
1 1 1 1 9 5 6 2 0
2 1 1 2 8 1 1 1 1
3 1 1 3 3 5 5 3 0
4 1 1 4 6 1 5 1 0
5 1 1 5 10 3 8 3 0
6 1 2 6 10 1 5 1 0
7 1 2 7 1 10 7 3 0
8 1 2 8 1 10 8 2 1
9 1 2 9 4 1 5 1 0
10 1 2 10 4 7 7 2 0
11 2 1 1 8 4 1 1 0
12 2 1 2 8 5 5 2 0
13 2 1 3 1 9 3 1 1
14 2 1 4 8 2 10 2 1
15 2 1 5 2 6 2 3 1
16 2 2 6 5 5 6 2 0
17 2 2 7 4 5 1 2 0
18 2 2 8 2 10 5 2 1
19 2 2 9 3 7 3 2 1
20 2 2 10 9 3 1 1 0
How can I calculate a new indicator variable "previous_max",which returns whether in the next round of the same game, the same participant choose the maximal payoff from the previous round.
So I want something like follows:
Participant (part) 1:
In the first round of each game, previous_max is "0" (no previous round), in round 2, previous_max ="1", because in round 1, the maximal pay was max(pay1,pay2,pay3)=max(9,5,6)=9, and in round 2, the participant's decisions (decs) was 1 (which was the maximal value in previous round).
In round 3, previous_max=0, because the maximal value in round 2 was 8 (which is "pay1"), but the participant choose "3" (which is pay3).

Here's a solution using dplyr and purr::map.
I would have preferred to use group_by than split but max.col ignores groups and I don't know of a dplyr equivalent`.
the output is slightly different but I think it's because of your mistakes, please explain if not and I'll update my answer.
library(purrr)
library(dplyr)
gamematrix %>%
split(.$part) %>%
map(~ .x %>% mutate(
prev_max = as.integer(
decs ==
c(0,max.col(.[c("pay1","pay2","pay3")])[-n()]) # the number of the max columns, offset by one
))) %>%
bind_rows
# ` part game round pay1 pay2 pay3 decs prev_max
# 1 1 1 1 9 5 6 2 0
# 2 1 1 2 8 1 1 1 1
# 3 1 1 3 3 5 5 3 0
# 4 1 1 4 6 1 5 1 0
# 5 1 1 5 10 3 8 3 0
# 6 1 2 6 10 1 5 1 1
# 7 1 2 7 1 10 7 3 0
# 8 1 2 8 1 10 8 2 1
# 9 1 2 9 4 1 5 1 0
# 10 1 2 10 4 7 7 2 0
# 11 2 1 1 8 4 1 1 0
# 12 2 1 2 8 5 5 2 0
# 13 2 1 3 1 9 3 1 1
# 14 2 1 4 8 2 10 2 1
# 15 2 1 5 2 6 2 3 1
# 16 2 2 6 5 5 6 2 1
# 17 2 2 7 4 5 1 2 0
# 18 2 2 8 2 10 5 2 1
# 19 2 2 9 3 7 3 2 1
# 20 2 2 10 9 3 1 1 0

Related

Filter ids when a maximum score is not observed in r

I need to filter ids that do not have maximum score points in them. Here is my sample dataset looks like
df <- data.frame(id = c(1,1,1,1,1, 2,2,2,2, 3,3,3,3,3, 4,4,4,4,4, 5,5,5),
score = c(0,1,2,0,1, 1,0,1,1, 0,1,2,3,3, 3,1,2,0,3, 0,1,0),
max.score = c(2,2,2,2,2, 1,1,1,1, 4,4,4,4,4, 3,3,3,3,3, 2,2,2))
> df
id score max.score
1 1 0 2
2 1 1 2
3 1 2 2
4 1 0 2
5 1 1 2
6 2 1 1
7 2 0 1
8 2 1 1
9 2 1 1
10 3 0 4
11 3 1 4
12 3 2 4
13 3 3 4
14 3 3 4
15 4 3 3
16 4 1 3
17 4 2 3
18 4 0 3
19 4 3 3
20 5 0 2
21 5 1 2
22 5 0 2
In this dataframe, I need to filter ids c(3,5) because these ids do not have the max.score in them. The desired output would be:
> df
id score max.score
1 3 0 4
2 3 1 4
3 3 2 4
4 3 3 4
5 3 3 4
6 5 0 2
7 5 1 2
8 5 0 2
Any ideas?
Thanks

dplyr: comparing values within a variable dependent on another variable

How can I compare values within a variable dependent on another variable with dplyr?
The df is based on choice data (long format) from a survey. It has one variable that indicates a participants id, another that indicates the choice instance and one that indicates which alternative was chosen.
In my data I have the feeling that a lot of people tend to get bored of the task and therefore stick to one alternative for every instance. I would therefore like to identify people who always selected the same option from a certain instance onwards till the end.
Here is an example df:
set.seed(0)
df <- tibble(
id = rep(1:5,each=12),
inst = rep(1:12,5),
alt = sample(1:3, size =60, replace=T),
)
That looks like the following:
id inst alt
1 1 1 3
2 1 2 1
3 1 3 2
4 1 4 2
5 1 5 3
6 1 6 1
7 1 7 3
8 1 8 3
9 1 9 2
10 1 10 2
11 1 11 1 <-
12 1 12 1 <-
13 2 1 1
14 2 2 3
...
I would like to create two new variables count and count_alt. The new variable count should indicate how often the same value appeared in alt based on id and inst, only counting values from the end of id. So for participant (id==1) the count variable should be 2, since alternative 1 was chosen in the last two instances (11 & 12). The count_alt would take the value 1 (always the same as inst == 12)
The new df schould look like the following
id inst alt count count_alt
1 1 1 3 2 1
2 1 2 1 2 1
3 1 3 2 2 1
4 1 4 2 2 1
5 1 5 3 2 1
6 1 6 1 2 1
7 1 7 3 2 1
8 1 8 3 2 1
9 1 9 2 2 1
10 1 10 2 2 1
11 1 11 1 2 1
12 1 12 1 2 1
...
I would prefer to solve this with dplyr and not with a loop since I want to incooperate it into further data wrangling steps.
See if that solves it:
library(dplyr)
df %>%
group_by(id) %>%
mutate(
count = cumsum(alt != lag(alt, default = "rndm")),
count = sum(count == max(count)),
count_alt = alt[n()]
)
Output:
id inst alt count count_alt
1 1 1 3 2 1
2 1 2 1 2 1
3 1 3 2 2 1
4 1 4 2 2 1
5 1 5 3 2 1
6 1 6 1 2 1
7 1 7 3 2 1
8 1 8 3 2 1
9 1 9 2 2 1
10 1 10 2 2 1
11 1 11 1 2 1
12 1 12 1 2 1
13 2 1 1 1 2
14 2 2 3 1 2
15 2 3 2 1 2
16 2 4 3 1 2
17 2 5 2 1 2
18 2 6 3 1 2
19 2 7 3 1 2
20 2 8 2 1 2
21 2 9 3 1 2
22 2 10 3 1 2
23 2 11 1 1 2
24 2 12 2 1 2
25 3 1 1 1 3
26 3 2 1 1 3
27 3 3 2 1 3
28 3 4 1 1 3
29 3 5 2 1 3
30 3 6 3 1 3
31 3 7 2 1 3
32 3 8 2 1 3
33 3 9 2 1 3
34 3 10 2 1 3
35 3 11 1 1 3
36 3 12 3 1 3
37 4 1 3 1 1
38 4 2 3 1 1
39 4 3 1 1 1
40 4 4 3 1 1
41 4 5 2 1 1
42 4 6 3 1 1
43 4 7 2 1 1
44 4 8 3 1 1
45 4 9 2 1 1
46 4 10 2 1 1
47 4 11 3 1 1
48 4 12 1 1 1
49 5 1 2 2 2
50 5 2 3 2 2
51 5 3 3 2 2
52 5 4 2 2 2
53 5 5 3 2 2
54 5 6 2 2 2
55 5 7 1 2 2
56 5 8 1 2 2
57 5 9 1 2 2
58 5 10 1 2 2
59 5 11 2 2 2
60 5 12 2 2 2

If a value appears in the row, all subsequent rows should take this value (with dplyr)

I'm just starting to learn R and I'm already facing the first bigger problem.
Let's take the following panel dataset as an example:
N=5
T=3
time<-rep(1:T, times=N)
id<- rep(1:N,each=T)
dummy<- c(0,0,1,1,0,0,0,1,0,0,0,1,0,1,0)
df<-as.data.frame(cbind(id, time,dummy))
id time dummy
1 1 1 0
2 1 2 0
3 1 3 1
4 2 1 1
5 2 2 0
6 2 3 0
7 3 1 0
8 3 2 1
9 3 3 0
10 4 1 0
11 4 2 0
12 4 3 1
13 5 1 0
14 5 2 1
15 5 3 0
I now want the dummy variable for all rows of a cross section to take the value 1 after the 1 for this cross section appears for the first time. So, what I want is:
id time dummy
1 1 1 0
2 1 2 0
3 1 3 1
4 2 1 1
5 2 2 1
6 2 3 1
7 3 1 0
8 3 2 1
9 3 3 1
10 4 1 0
11 4 2 0
12 4 3 1
13 5 1 0
14 5 2 1
15 5 3 1
So I guess I need something like:
df_new<-df %>%
group_by(id) %>%
???
I already tried to set all zeros to NA and use the na.locf function, but it didn't really work.
Anybody got an idea?
Thanks!
Use cummax
df %>%
group_by(id) %>%
mutate(dummy = cummax(dummy))
# A tibble: 15 x 3
# Groups: id [5]
# id time dummy
# <dbl> <dbl> <dbl>
# 1 1 1 0
# 2 1 2 0
# 3 1 3 1
# 4 2 1 1
# 5 2 2 1
# 6 2 3 1
# 7 3 1 0
# 8 3 2 1
# 9 3 3 1
#10 4 1 0
#11 4 2 0
#12 4 3 1
#13 5 1 0
#14 5 2 1
#15 5 3 1
Without additional packages you could do
transform(df, dummy = ave(dummy, id, FUN = cummax))

Count non-zero values of column in R [duplicate]

This question already has an answer here:
Add a new column of the sum by group [duplicate]
(1 answer)
Closed 6 years ago.
Suppose i have data frame like this one
DF
Id X Y Z
1 1 5 0
1 2 0 0
1 3 0 5
1 4 9 0
1 5 2 3
1 6 5 0
2 1 5 0
2 2 4 0
2 3 0 6
2 4 9 6
2 5 2 0
2 6 5 2
3 1 5 6
3 2 4 0
3 3 6 5
3 4 9 0
3 5 2 0
3 6 5 0
I want to count the number of non zero entries for variable Z in a particular Id and record that value in a new column Count, so the new data frame will look like
DF1
Id X Y Z Count
1 1 5 0 2
1 2 4 0 2
1 3 6 5 2
1 4 9 0 2
1 5 2 3 2
1 6 5 0 2
2 1 5 0 3
2 2 4 0 3
2 3 6 6 3
2 4 9 6 3
2 5 2 0 3
2 6 5 2 3
3 1 5 6 2
3 2 4 0 2
3 3 6 5 2
3 4 9 0 2
3 5 2 0 2
3 6 5 0 2
We can use base R ave
Counting the number of non-zero values for column Z grouped by Id
df$Count <- ave(df$Z, df$Id, FUN = function(x) sum(x!=0))
df$Count
#[1] 2 2 2 2 2 2 3 3 3 3 3 3 2 2 2 2 2 2
You can try this, it gives you exactly what you want:
library(data.table)
dt <- data.table(df)
dt[, Count := sum(Z != 0), by = Id]
dt
# Id X Y Z Count
# 1: 1 1 5 0 2
# 2: 1 2 0 0 2
# 3: 1 3 0 5 2
# 4: 1 4 9 0 2
# 5: 1 5 2 3 2
# 6: 1 6 5 0 2
# 7: 2 1 5 0 3
# 8: 2 2 4 0 3
# 9: 2 3 0 6 3
# 10: 2 4 9 6 3
# 11: 2 5 2 0 3
# 12: 2 6 5 2 3
# 13: 3 1 5 6 2
# 14: 3 2 4 0 2
# 15: 3 3 6 5 2
# 16: 3 4 9 0 2
# 17: 3 5 2 0 2
# 18: 3 6 5 0 2
This will also work:
df$Count <- rep(aggregate(Z~Id, df[df$Z != 0,], length)$Z, table(df$Id))
Id X Y Z Count
1 1 1 5 0 2
2 1 2 0 0 2
3 1 3 0 5 2
4 1 4 9 0 2
5 1 5 2 3 2
6 1 6 5 0 2
7 2 1 5 0 3
8 2 2 4 0 3
9 2 3 0 6 3
10 2 4 9 6 3
11 2 5 2 0 3
12 2 6 5 2 3
13 3 1 5 6 2
14 3 2 4 0 2
15 3 3 6 5 2
16 3 4 9 0 2
17 3 5 2 0 2
18 3 6 5 0 2

How do I add a vector where I collapse scores from individuals within pairs?

I have done an experiment in which participants have solved a task in pairs, with another participant. Each participant has then received a score for how well they did the task. Pairs have gone through different amounts of trials.
I have a data frame similar to the one below:
participant <- c(1,1,2,2,3,3,3,4,4,4,5,6)
pair <- c(1,1,1,1,2,2,2,2,2,2,3,3)
trial <- c(1,2,1,2,1,2,3,1,2,3,1,1)
score <- c(2,3,6,3,4,7,3,1,8,5,4,3)
data <- data.frame(participant, pair, trial, score)
participant pair trial score
1 1 1 2
1 1 2 3
2 1 1 6
2 1 2 3
3 2 1 4
3 2 2 7
3 2 3 3
4 2 1 1
4 2 2 8
4 2 3 5
5 3 1 4
6 3 1 3
I would like to add a new vector to the data frame, where each participant gets the numeric difference between their own score and the other participant's score within each trial.
Does someone have an idea about how one might do that?
It should end up looking something like this:
participant pair trial score difference
1 1 1 2 4
1 1 2 3 0
2 1 1 6 4
2 1 2 3 0
3 2 1 4 3
3 2 2 7 1
3 2 3 3 2
4 2 1 1 3
4 2 2 8 1
4 2 3 5 2
5 3 1 4 1
6 3 1 3 1
Here's a solution that involves first reordering data such that each sequential pair of rows corresponds to a single pair within a single trial. This allows us to make a single call to diff() to extract the differences:
data <- data[order(data$trial,data$pair,data$participant),];
data$diff <- rep(diff(data$score)[c(T,F)],each=2L)*c(-1L,1L);
data;
## participant pair trial score diff
## 1 1 1 1 2 -4
## 3 2 1 1 6 4
## 5 3 2 1 4 3
## 8 4 2 1 1 -3
## 11 5 3 1 4 1
## 12 6 3 1 3 -1
## 2 1 1 2 3 0
## 4 2 1 2 3 0
## 6 3 2 2 7 -1
## 9 4 2 2 8 1
## 7 3 2 3 3 -2
## 10 4 2 3 5 2
I assumed you wanted the sign to capture the direction of the difference. So, for instance, if a participant has a score 4 points below the other participant in the same trial-pair, then I assumed you would want -4. If you want all-positive values, you can remove the multiplication by c(-1L,1L) and add a call to abs():
data$diff <- rep(abs(diff(data$score)[c(T,F)]),each=2L);
data;
## participant pair trial score diff
## 1 1 1 1 2 4
## 3 2 1 1 6 4
## 5 3 2 1 4 3
## 8 4 2 1 1 3
## 11 5 3 1 4 1
## 12 6 3 1 3 1
## 2 1 1 2 3 0
## 4 2 1 2 3 0
## 6 3 2 2 7 1
## 9 4 2 2 8 1
## 7 3 2 3 3 2
## 10 4 2 3 5 2
Here's a solution built around ave() that doesn't require reordering the whole data.frame first:
data$diff <- ave(data$score,data$trial,data$pair,FUN=function(x) abs(diff(x)));
data;
## participant pair trial score diff
## 1 1 1 1 2 4
## 2 1 1 2 3 0
## 3 2 1 1 6 4
## 4 2 1 2 3 0
## 5 3 2 1 4 3
## 6 3 2 2 7 1
## 7 3 2 3 3 2
## 8 4 2 1 1 3
## 9 4 2 2 8 1
## 10 4 2 3 5 2
## 11 5 3 1 4 1
## 12 6 3 1 3 1
Here's how you can get the score of the other participant in the same trial-pair:
data$other <- ave(data$score,data$trial,data$pair,FUN=rev);
data;
## participant pair trial score other
## 1 1 1 1 2 6
## 2 1 1 2 3 3
## 3 2 1 1 6 2
## 4 2 1 2 3 3
## 5 3 2 1 4 1
## 6 3 2 2 7 8
## 7 3 2 3 3 5
## 8 4 2 1 1 4
## 9 4 2 2 8 7
## 10 4 2 3 5 3
## 11 5 3 1 4 3
## 12 6 3 1 3 4
Or, assuming the data.frame has been reordered as per the initial solution:
data$other <- c(rbind(data$score[c(F,T)],data$score[c(T,F)]));
data;
## participant pair trial score other
## 1 1 1 1 2 6
## 3 2 1 1 6 2
## 5 3 2 1 4 1
## 8 4 2 1 1 4
## 11 5 3 1 4 3
## 12 6 3 1 3 4
## 2 1 1 2 3 3
## 4 2 1 2 3 3
## 6 3 2 2 7 8
## 9 4 2 2 8 7
## 7 3 2 3 3 5
## 10 4 2 3 5 3
Alternative, using matrix() instead of rbind():
data$other <- c(matrix(data$score,2L)[2:1,]);
data;
## participant pair trial score other
## 1 1 1 1 2 6
## 3 2 1 1 6 2
## 5 3 2 1 4 1
## 8 4 2 1 1 4
## 11 5 3 1 4 3
## 12 6 3 1 3 4
## 2 1 1 2 3 3
## 4 2 1 2 3 3
## 6 3 2 2 7 8
## 9 4 2 2 8 7
## 7 3 2 3 3 5
## 10 4 2 3 5 3
Here is an option using data.table:
library(data.table)
setDT(data)[,difference := abs(diff(score)), by = .(pair, trial)]
data
# participant pair trial score difference
# 1: 1 1 1 2 4
# 2: 1 1 2 3 0
# 3: 2 1 1 6 4
# 4: 2 1 2 3 0
# 5: 3 2 1 4 3
# 6: 3 2 2 7 1
# 7: 3 2 3 3 2
# 8: 4 2 1 1 3
# 9: 4 2 2 8 1
#10: 4 2 3 5 2
#11: 5 3 1 4 1
#12: 6 3 1 3 1
A slightly faster option would be:
setDT(data)[, difference := abs((score - shift(score))[2]) , by = .(pair, trial)]
If we need the value of the other pair:
data[, other:= rev(score) , by = .(pair, trial)]
data
# participant pair trial score difference other
# 1: 1 1 1 2 4 6
# 2: 1 1 2 3 0 3
# 3: 2 1 1 6 4 2
# 4: 2 1 2 3 0 3
# 5: 3 2 1 4 3 1
# 6: 3 2 2 7 1 8
# 7: 3 2 3 3 2 5
# 8: 4 2 1 1 3 4
# 9: 4 2 2 8 1 7
#10: 4 2 3 5 2 3
#11: 5 3 1 4 1 3
#12: 6 3 1 3 1 4
Or using dplyr:
library(dplyr)
data %>%
group_by(pair, trial) %>%
mutate(difference = abs(diff(score)))
# participant pair trial score difference
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 1 1 2 4
#2 1 1 2 3 0
#3 2 1 1 6 4
#4 2 1 2 3 0
#5 3 2 1 4 3
#6 3 2 2 7 1
#7 3 2 3 3 2
#8 4 2 1 1 3
#9 4 2 2 8 1
#10 4 2 3 5 2
#11 5 3 1 4 1
#12 6 3 1 3 1

Resources