Sorting out the data with specific headers in R - r

A small sample of the data are as follows:
df<-read.table (text=" ID Class1a Time1a MD1a MD2a Class1b Time1b MD1b MD2b Class2a Time2a MD3a MD4a Class2b Time2b MD3b MD4b Class3a Time3a MD5a MD6a Class3b Time3b MD5b MD6b
1 1 1 1 2 2 1 1 2 9 2 2 2 10 2 1 1 17 3 2 2 18 3 1 1
2 3 1 1 1 4 1 2 1 11 2 2 1 12 2 1 1 19 3 2 1 20 3 1 1
3 5 1 2 1 6 1 2 2 13 2 1 1 14 2 2 2 21 3 1 1 22 3 2 2
4 7 1 1 1 8 1 2 2 15 2 1 1 16 2 1 1 23 3 1 1 24 3 1 1
", header=TRUE)
I want to get the following output, especially headers
ID Class Time MD MD1 MD2
1 1 1 1-2 1 2
2 3 1 1-2 1 1
3 5 1 1-2 2 1
4 7 1 1-2 1 1
1 2 1 1-2 1 2
2 4 1 1-2 2 2
3 6 1 1-2 2 2
4 8 1 1-2 2 2
1 9 2 3-4 2 2
2 11 2 3-4 2 1
3 13 2 3-4 1 1
4 15 2 3-4 1 1
1 10 2 3-4 2 1
2 12 2 3-4 2 1
3 14 2 3-4 2 2
4 16 2 3-4 2 1
1 17 3 5-6 2 2
2 19 3 5-6 2 2
3 21 3 5-6 1 2
4 23 3 5-6 1 2
1 18 3 5-6 1 1
2 20 3 5-6 1 1
3 22 3 5-6 2 2
4 24 3 5-6 1 1
df1<- df %>% pivot_longer(
cols = starts_with("Time"),
names_to = "Q",
values_to = "Score",
values_drop_na = TRUE)
df2<- df1 %>% pivot_longer(
cols = starts_with("Class"),
names_prefix = "MD",
values_drop_na = TRUE
) %>% dplyr::select(-value)
But I have failed the get the output of interest

This answer started as a pivot_longer example using names_pattern, but while renaming some of them made sense, it becomes less intuitive how to easily extract the MD column (e.g., 1-2, 3-4) during the pivoting process.
Instead, let's split the frame by column-group, rename the columns as you'd like, then bind_rows them.
bind_rows(
lapply(split.default(df[,-1], cumsum(grepl("Class", names(df)[-1]))),
function(Z) {
out <- transform(Z,
ID = df$ID,
MD = paste(gsub("\\D", "", grep("^MD", names(Z), value = TRUE)), collapse = "-"))
names(out)[1:4] <- c("Class", "Time", "MD1", "MD3")
out
})
)
# Class Time MD1 MD3 ID MD
# 1 1 1 1 2 1 1-2
# 2 3 1 1 1 2 1-2
# 3 5 1 2 1 3 1-2
# 4 7 1 1 1 4 1-2
# 5 2 1 1 2 1 1-2
# 6 4 1 2 1 2 1-2
# 7 6 1 2 2 3 1-2
# 8 8 1 2 2 4 1-2
# 9 9 2 2 2 1 3-4
# 10 11 2 2 1 2 3-4
# 11 13 2 1 1 3 3-4
# 12 15 2 1 1 4 3-4
# 13 10 2 1 1 1 3-4
# 14 12 2 1 1 2 3-4
# 15 14 2 2 2 3 3-4
# 16 16 2 1 1 4 3-4
# 17 17 3 2 2 1 5-6
# 18 19 3 2 1 2 5-6
# 19 21 3 1 1 3 5-6
# 20 23 3 1 1 4 5-6
# 21 18 3 1 1 1 5-6
# 22 20 3 1 1 2 5-6
# 23 22 3 2 2 3 5-6
# 24 24 3 1 1 4 5-6
This relies on:
ID being the first column (ergo df[,-1] and names(df)[-1]), and
Each group of columns starting with a Class* column.

Related

identify whenever values repeat in r

I have a dataframe like this.
data <- data.frame(Condition = c(1,1,2,3,1,1,2,2,2,3,1,1,2,3,3))
I want to populate a new variable Sequence which identifies whenever Condition starts again from 1.
So the new dataframe would look like this.
Thanks in advance for the help!
data <- data.frame(Condition = c(1,1,2,3,1,1,2,2,2,3,1,1,2,3,3),
Sequence = c(1,1,1,1,2,2,2,2,2,2,3,3,3,3,3))
base R
data$Sequence2 <- cumsum(c(TRUE, data$Condition[-1] == 1 & data$Condition[-nrow(data)] != 1))
data
# Condition Sequence Sequence2
# 1 1 1 1
# 2 1 1 1
# 3 2 1 1
# 4 3 1 1
# 5 1 2 2
# 6 1 2 2
# 7 2 2 2
# 8 2 2 2
# 9 2 2 2
# 10 3 2 2
# 11 1 3 3
# 12 1 3 3
# 13 2 3 3
# 14 3 3 3
# 15 3 3 3
dplyr
library(dplyr)
data %>%
mutate(
Sequence2 = cumsum(Condition == 1 & lag(Condition != 1, default = TRUE))
)
# Condition Sequence Sequence2
# 1 1 1 1
# 2 1 1 1
# 3 2 1 1
# 4 3 1 1
# 5 1 2 2
# 6 1 2 2
# 7 2 2 2
# 8 2 2 2
# 9 2 2 2
# 10 3 2 2
# 11 1 3 3
# 12 1 3 3
# 13 2 3 3
# 14 3 3 3
# 15 3 3 3
This took a while. Finally I find this solution:
library(dplyr)
data %>%
group_by(Sequnce = cumsum(
ifelse(Condition==1, lead(Condition)+1, Condition)
- Condition==1)
)
Condition Sequnce
<dbl> <int>
1 1 1
2 1 1
3 2 1
4 3 1
5 1 2
6 1 2
7 2 2
8 2 2
9 2 2
10 3 2
11 1 3
12 1 3
13 2 3
14 3 3
15 3 3

Filter ids when a maximum score is not observed in r

I need to filter ids that do not have maximum score points in them. Here is my sample dataset looks like
df <- data.frame(id = c(1,1,1,1,1, 2,2,2,2, 3,3,3,3,3, 4,4,4,4,4, 5,5,5),
score = c(0,1,2,0,1, 1,0,1,1, 0,1,2,3,3, 3,1,2,0,3, 0,1,0),
max.score = c(2,2,2,2,2, 1,1,1,1, 4,4,4,4,4, 3,3,3,3,3, 2,2,2))
> df
id score max.score
1 1 0 2
2 1 1 2
3 1 2 2
4 1 0 2
5 1 1 2
6 2 1 1
7 2 0 1
8 2 1 1
9 2 1 1
10 3 0 4
11 3 1 4
12 3 2 4
13 3 3 4
14 3 3 4
15 4 3 3
16 4 1 3
17 4 2 3
18 4 0 3
19 4 3 3
20 5 0 2
21 5 1 2
22 5 0 2
In this dataframe, I need to filter ids c(3,5) because these ids do not have the max.score in them. The desired output would be:
> df
id score max.score
1 3 0 4
2 3 1 4
3 3 2 4
4 3 3 4
5 3 3 4
6 5 0 2
7 5 1 2
8 5 0 2
Any ideas?
Thanks

dplyr: comparing values within a variable dependent on another variable

How can I compare values within a variable dependent on another variable with dplyr?
The df is based on choice data (long format) from a survey. It has one variable that indicates a participants id, another that indicates the choice instance and one that indicates which alternative was chosen.
In my data I have the feeling that a lot of people tend to get bored of the task and therefore stick to one alternative for every instance. I would therefore like to identify people who always selected the same option from a certain instance onwards till the end.
Here is an example df:
set.seed(0)
df <- tibble(
id = rep(1:5,each=12),
inst = rep(1:12,5),
alt = sample(1:3, size =60, replace=T),
)
That looks like the following:
id inst alt
1 1 1 3
2 1 2 1
3 1 3 2
4 1 4 2
5 1 5 3
6 1 6 1
7 1 7 3
8 1 8 3
9 1 9 2
10 1 10 2
11 1 11 1 <-
12 1 12 1 <-
13 2 1 1
14 2 2 3
...
I would like to create two new variables count and count_alt. The new variable count should indicate how often the same value appeared in alt based on id and inst, only counting values from the end of id. So for participant (id==1) the count variable should be 2, since alternative 1 was chosen in the last two instances (11 & 12). The count_alt would take the value 1 (always the same as inst == 12)
The new df schould look like the following
id inst alt count count_alt
1 1 1 3 2 1
2 1 2 1 2 1
3 1 3 2 2 1
4 1 4 2 2 1
5 1 5 3 2 1
6 1 6 1 2 1
7 1 7 3 2 1
8 1 8 3 2 1
9 1 9 2 2 1
10 1 10 2 2 1
11 1 11 1 2 1
12 1 12 1 2 1
...
I would prefer to solve this with dplyr and not with a loop since I want to incooperate it into further data wrangling steps.
See if that solves it:
library(dplyr)
df %>%
group_by(id) %>%
mutate(
count = cumsum(alt != lag(alt, default = "rndm")),
count = sum(count == max(count)),
count_alt = alt[n()]
)
Output:
id inst alt count count_alt
1 1 1 3 2 1
2 1 2 1 2 1
3 1 3 2 2 1
4 1 4 2 2 1
5 1 5 3 2 1
6 1 6 1 2 1
7 1 7 3 2 1
8 1 8 3 2 1
9 1 9 2 2 1
10 1 10 2 2 1
11 1 11 1 2 1
12 1 12 1 2 1
13 2 1 1 1 2
14 2 2 3 1 2
15 2 3 2 1 2
16 2 4 3 1 2
17 2 5 2 1 2
18 2 6 3 1 2
19 2 7 3 1 2
20 2 8 2 1 2
21 2 9 3 1 2
22 2 10 3 1 2
23 2 11 1 1 2
24 2 12 2 1 2
25 3 1 1 1 3
26 3 2 1 1 3
27 3 3 2 1 3
28 3 4 1 1 3
29 3 5 2 1 3
30 3 6 3 1 3
31 3 7 2 1 3
32 3 8 2 1 3
33 3 9 2 1 3
34 3 10 2 1 3
35 3 11 1 1 3
36 3 12 3 1 3
37 4 1 3 1 1
38 4 2 3 1 1
39 4 3 1 1 1
40 4 4 3 1 1
41 4 5 2 1 1
42 4 6 3 1 1
43 4 7 2 1 1
44 4 8 3 1 1
45 4 9 2 1 1
46 4 10 2 1 1
47 4 11 3 1 1
48 4 12 1 1 1
49 5 1 2 2 2
50 5 2 3 2 2
51 5 3 3 2 2
52 5 4 2 2 2
53 5 5 3 2 2
54 5 6 2 2 2
55 5 7 1 2 2
56 5 8 1 2 2
57 5 9 1 2 2
58 5 10 1 2 2
59 5 11 2 2 2
60 5 12 2 2 2

How to calculate recency in R

I have the following data:
set.seed(20)
round<-rep(1:10,2)
part<-rep(1:2, c(10,10))
game<-rep(rep(1:2,c(5,5)),2)
pay1<-sample(1:10,20,replace=TRUE)
pay2<-sample(1:10,20,replace=TRUE)
pay3<-sample(1:10,20,replace=TRUE)
decs<-sample(1:3,20,replace=TRUE)
previous_max<-c(0,1,0,0,0,0,0,1,0,0,0,0,1,1,1,0,0,1,1,0)
gamematrix<-cbind(part,game,round,pay1,pay2,pay3,decs,previous_max )
gamematrix<-data.frame(gamematrix)
Here is the output:
part game round pay1 pay2 pay3 decs previous_max
1 1 1 1 9 5 6 2 0
2 1 1 2 8 1 1 1 1
3 1 1 3 3 5 5 3 0
4 1 1 4 6 1 5 1 0
5 1 1 5 10 3 8 3 0
6 1 2 6 10 1 5 1 0
7 1 2 7 1 10 7 3 0
8 1 2 8 1 10 8 2 1
9 1 2 9 4 1 5 1 0
10 1 2 10 4 7 7 2 0
11 2 1 1 8 4 1 1 0
12 2 1 2 8 5 5 2 0
13 2 1 3 1 9 3 1 1
14 2 1 4 8 2 10 2 1
15 2 1 5 2 6 2 3 1
16 2 2 6 5 5 6 2 0
17 2 2 7 4 5 1 2 0
18 2 2 8 2 10 5 2 1
19 2 2 9 3 7 3 2 1
20 2 2 10 9 3 1 1 0
How can I calculate a new indicator variable "previous_max",which returns whether in the next round of the same game, the same participant choose the maximal payoff from the previous round.
So I want something like follows:
Participant (part) 1:
In the first round of each game, previous_max is "0" (no previous round), in round 2, previous_max ="1", because in round 1, the maximal pay was max(pay1,pay2,pay3)=max(9,5,6)=9, and in round 2, the participant's decisions (decs) was 1 (which was the maximal value in previous round).
In round 3, previous_max=0, because the maximal value in round 2 was 8 (which is "pay1"), but the participant choose "3" (which is pay3).
Here's a solution using dplyr and purr::map.
I would have preferred to use group_by than split but max.col ignores groups and I don't know of a dplyr equivalent`.
the output is slightly different but I think it's because of your mistakes, please explain if not and I'll update my answer.
library(purrr)
library(dplyr)
gamematrix %>%
split(.$part) %>%
map(~ .x %>% mutate(
prev_max = as.integer(
decs ==
c(0,max.col(.[c("pay1","pay2","pay3")])[-n()]) # the number of the max columns, offset by one
))) %>%
bind_rows
# ` part game round pay1 pay2 pay3 decs prev_max
# 1 1 1 1 9 5 6 2 0
# 2 1 1 2 8 1 1 1 1
# 3 1 1 3 3 5 5 3 0
# 4 1 1 4 6 1 5 1 0
# 5 1 1 5 10 3 8 3 0
# 6 1 2 6 10 1 5 1 1
# 7 1 2 7 1 10 7 3 0
# 8 1 2 8 1 10 8 2 1
# 9 1 2 9 4 1 5 1 0
# 10 1 2 10 4 7 7 2 0
# 11 2 1 1 8 4 1 1 0
# 12 2 1 2 8 5 5 2 0
# 13 2 1 3 1 9 3 1 1
# 14 2 1 4 8 2 10 2 1
# 15 2 1 5 2 6 2 3 1
# 16 2 2 6 5 5 6 2 1
# 17 2 2 7 4 5 1 2 0
# 18 2 2 8 2 10 5 2 1
# 19 2 2 9 3 7 3 2 1
# 20 2 2 10 9 3 1 1 0

expand.grid with unknown set of variables

So, expand.grid returns a df of all the combinations of the vectors passed.
df <- expand.grid(1:3, 1:3)
df <- expand.grid(1:3, 1:3, 1:3)
What I would like is a generalized function that takes 1 parameter (number of vectors) and returns the appropriate data frame.
combinations <- function(n) {
return(expand.grid(0, 1, ... n))
}
Such that
combinations(2) returns(expand.grid(1:3, 1:3))
combinations(3) returns(expand.grid(1:3, 1:3, 1:3))
combinations(4) returns(expand.grid(1:3, 1:3, 1:3, 1:3))
etc.
combinations <- function(n)
expand.grid(rep(list(1:3),n))
> combinations(2)
Var1 Var2
1 1 1
2 2 1
3 3 1
4 1 2
5 2 2
6 3 2
7 1 3
8 2 3
9 3 3
> combinations(3)
Var1 Var2 Var3
1 1 1 1
2 2 1 1
3 3 1 1
4 1 2 1
5 2 2 1
6 3 2 1
7 1 3 1
8 2 3 1
9 3 3 1
10 1 1 2
11 2 1 2
12 3 1 2
13 1 2 2
14 2 2 2
15 3 2 2
16 1 3 2
17 2 3 2
18 3 3 2
19 1 1 3
20 2 1 3
21 3 1 3
22 1 2 3
23 2 2 3
24 3 2 3
25 1 3 3
26 2 3 3
27 3 3 3

Resources