I have data in which subjects completed multiple ratings per day over 6-7 days. The number of ratings per day varies. The data set includes subject ID, date, and the ratings. I would like to create a new variable that recodes the dates for each subject into "study day" --- so 1 for first day of ratings, 2 for second day of ratings, etc.
For example, I would like to take this:
id Date Rating
1 10/20/2018 2
1 10/20/2018 3
1 10/20/2018 5
1 10/21/2018 1
1 10/21/2018 7
1 10/21/2018 9
1 10/22/2018 4
1 10/22/2018 5
1 10/22/2018 9
2 11/15/2018 1
2 11/15/2018 3
2 11/15/2018 4
2 11/16/2018 3
2 11/16/2018 1
2 11/17/2018 0
2 11/17/2018 2
2 11/17/2018 9
And end up with this:
id Day Date Rating
1 1 10/20/2018 2
1 1 10/20/2018 3
1 1 10/20/2018 5
1 2 10/21/2018 1
1 2 10/21/2018 7
1 2 10/21/2018 9
1 3 10/22/2018 4
1 3 10/22/2018 5
1 3 10/22/2018 9
2 1 11/15/2018 1
2 1 11/15/2018 3
2 1 11/15/2018 4
2 2 11/16/2018 3
2 2 11/16/2018 1
2 3 11/17/2018 0
2 3 11/17/2018 2
2 3 11/17/2018 9
I was going to look into setting up some kind of loop, but I thought it would be worth asking if there is a more efficient way to pull this off. Are there any functions that would allow me to automate this sort of thing? Thanks very much for any suggestions.
Since you want to reset the count after every id , makes this question a bit different.
Using only base R, we can split the Date based on id and then create a count of each distinct group.
df$Day <- unlist(sapply(split(df$Date, df$id), function(x) match(x,unique(x))))
df
# id Date Rating Day
#1 1 10/20/2018 2 1
#2 1 10/20/2018 3 1
#3 1 10/20/2018 5 1
#4 1 10/21/2018 1 2
#5 1 10/21/2018 7 2
#6 1 10/21/2018 9 2
#7 1 10/22/2018 4 3
#8 1 10/22/2018 5 3
#9 1 10/22/2018 9 3
#10 2 11/15/2018 1 1
#11 2 11/15/2018 3 1
#12 2 11/15/2018 4 1
#13 2 11/16/2018 3 2
#14 2 11/16/2018 1 2
#15 2 11/17/2018 0 3
#16 2 11/17/2018 2 3
#17 2 11/17/2018 9 3
I don't know how I missed this but thanks to #thelatemail who reminded that this is basically the same as
library(dplyr)
df %>%
group_by(id) %>%
mutate(Day = match(Date, unique(Date)))
AND
df$Day <- as.numeric(with(df, ave(Date, id, FUN = function(x) match(x, unique(x)))))
If you want a slightly hacky dplyr version....you can use the date column and convert it to a numeric date then manipulate that number to give the desired result
library(tidyverse)
library(lubridate)
df <- data_frame(id=c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2),
Date= c('10/20/2018', '10/20/2018', '10/20/2018', '10/21/2018', '10/21/2018', '10/21/2018',
'10/22/2018', '10/22/2018', '10/22/2018','11/15/2018', '11/15/2018', '11/15/2018',
'11/16/2018', '11/16/2018', '11/17/2018', '11/17/2018', '11/17/2018'),
Rating=c(2,3,5,1,7,9,4,5,9,1,3,4,3,1,0,2,9))
df %>%
group_by(id) %>%
mutate(
Date = mdy(Date),
Day = as.numeric(Date),
Day = Day-min(Day)+1)
# A tibble: 17 x 4
# Groups: id [2]
id Date Rating Day
<dbl> <date> <dbl> <dbl>
1 1 2018-10-20 2 1
2 1 2018-10-20 3 1
3 1 2018-10-20 5 1
4 1 2018-10-21 1 2
5 1 2018-10-21 7 2
6 1 2018-10-21 9 2
7 1 2018-10-22 4 3
8 1 2018-10-22 5 3
9 1 2018-10-22 9 3
10 2 2018-11-15 1 1
11 2 2018-11-15 3 1
12 2 2018-11-15 4 1
13 2 2018-11-16 3 2
14 2 2018-11-16 1 2
15 2 2018-11-17 0 3
16 2 2018-11-17 2 3
17 2 2018-11-17 9 3
Related
In my toy data, for each unique study, the numeric variables (sample and group) must have an order starting from 1. But:
For example, in study 1, we see that there are two unique sample values (1 & 3), so 3 must be replaced with 2.
For example, in study 2, we see that there is one unique group value (2), so it must be replaced with 1.
In study 3, both sample and group seem ok meaning their unique values are 1 and 2 (no replacing needed).
For this toy data, my desired output is shown below. But I appreciate a functional solution that can automatically replace any number of numeric variables in a data.frame that have lost their order just like I showed in my toy data.
m="
study sample group outcome
1 1 1 A
1 1 1 B
1 1 2 A
1 1 2 B
1 3 1 A
1 3 1 B
1 3 2 A
1 3 2 B
2 1 2 A
2 1 2 B
2 2 2 A
2 2 2 B
2 3 2 A
2 3 2 B
3 1 1 A
3 1 1 B
3 1 2 A
3 1 2 B
3 2 1 A
3 2 1 B
3 2 2 A
3 2 2 B"
data <- read.table(text=m, h=T)
Desired_output="
study sample group outcome
1 1 1 A
1 1 1 B
1 1 2 A
1 1 2 B
1 2 1 A
1 2 1 B
1 2 2 A
1 2 2 B
2 1 1 A
2 1 1 B
2 2 1 A
2 2 1 B
2 3 1 A
2 3 1 B
3 1 1 A
3 1 1 B
3 1 2 A
3 1 2 B
3 2 1 A
3 2 1 B
3 2 2 A
3 2 2 B"
You can do:
library(dplyr)
data %>%
group_by(study) %>%
mutate(across(tidyselect::vars_select_helpers$where(is.numeric),
function(x) as.numeric(as.factor(x)))) %>%
as.data.frame()
The resultant data frame looks like this:
study sample group outcome
1 1 1 1 A
2 1 1 1 B
3 1 1 2 A
4 1 1 2 B
5 1 2 1 A
6 1 2 1 B
7 1 2 2 A
8 1 2 2 B
9 2 1 1 A
10 2 1 1 B
11 2 2 1 A
12 2 2 1 B
13 2 3 1 A
14 2 3 1 B
15 3 1 1 A
16 3 1 1 B
17 3 1 2 A
18 3 1 2 B
19 3 2 1 A
20 3 2 1 B
21 3 2 2 A
22 3 2 2 B
Here is an alternative (not as elegant as #Allan Cameron +1 ) dplyr solution:
library(dplyr)
df %>%
group_by(study) %>%
mutate(x = n()/length(unique(sample)),
sample = rep(row_number(), each=x, length.out = n()),
y = length(unique(group)),
group = ifelse(y==1, 1, group)) %>%
select(-x, -y)
study sample group outcome
<int> <int> <dbl> <chr>
1 1 1 1 A
2 1 1 1 B
3 1 1 2 A
4 1 1 2 B
5 1 2 1 A
6 1 2 1 B
7 1 2 2 A
8 1 2 2 B
9 2 1 1 A
10 2 1 1 B
11 2 2 1 A
12 2 2 1 B
13 2 3 1 A
14 2 3 1 B
15 3 1 1 A
16 3 1 1 B
17 3 1 2 A
18 3 1 2 B
19 3 2 1 A
20 3 2 1 B
21 3 2 2 A
22 3 2 2 B
i need some help:
i got this df:
df <- data.frame(month = c(1,1,1,1,1,2,2,2,2,2),
day = c(1,2,3,4,5,1,2,3,4,5),
flow = c(2,5,7,8,5,4,6,7,9,2))
month day flow
1 1 1 2
2 1 2 5
3 1 3 7
4 1 4 8
5 1 5 5
6 2 1 4
7 2 2 6
8 2 3 7
9 2 4 9
10 2 5 2
but i want to know the day of min per month:
month day flow dayminflowofthemonth
1 1 1 2 1
2 1 2 5 1
3 1 3 7 1
4 1 4 8 1
5 1 5 5 1
6 2 1 4 5
7 2 2 6 5
8 2 3 7 5
9 2 4 9 5
10 2 5 2 5
this repetition is not a problem, i will use pivot fuction
tks people!
We can use which.min to return the index of 'min'imum 'flow' per group and use that to get the corresponding 'day' to create the column with mutate
library(dplyr)
df <- df %>%
group_by(month) %>%
mutate(dayminflowofthemonth = day[which.min(flow)]) %>%
ungroup
-output
df
# A tibble: 10 x 4
# month day flow dayminflowofthemonth
# <dbl> <dbl> <dbl> <dbl>
# 1 1 1 2 1
# 2 1 2 5 1
# 3 1 3 7 1
# 4 1 4 8 1
# 5 1 5 5 1
# 6 2 1 4 5
# 7 2 2 6 5
# 8 2 3 7 5
# 9 2 4 9 5
#10 2 5 2 5
Another option using indexing inside dplyr pipeline:
library(dplyr)
#Code
newdf <- df %>% group_by(month) %>% mutate(Val=day[flow==min(flow)][1])
Output:
# A tibble: 10 x 4
# Groups: month [2]
month day flow Val
<dbl> <dbl> <dbl> <dbl>
1 1 1 2 1
2 1 2 5 1
3 1 3 7 1
4 1 4 8 1
5 1 5 5 1
6 2 1 4 5
7 2 2 6 5
8 2 3 7 5
9 2 4 9 5
10 2 5 2 5
Here is a base R option using ave
transform(
df,
dayminflowofthemonth = ave(day*(ave(flow,month,FUN = min)==flow),month,FUN = max)
)
which gives
month day flow dayminflowofthemonth
1 1 1 2 1
2 1 2 5 1
3 1 3 7 1
4 1 4 8 1
5 1 5 5 1
6 2 1 4 5
7 2 2 6 5
8 2 3 7 5
9 2 4 9 5
10 2 5 2 5
One more base R approach:
df$dayminflowofthemonth <- by(
df,
df$month,
function(x) x$day[which.min(x$flow)]
)[df$month]
I have a question about a dataset with financial transactions:
Account_from Account_to Value
1 1 2 25.0
2 1 3 30.0
3 2 1 28.0
4 2 3 10.0
5 2 3 12.0
6 3 1 40.0
7 3 1 30.0
8 3 1 20.0
Each row represents a transaction. I would like to create an extra column with a variable containing the information of the number of interactions with each unique account.
That it would look like the following:
Account_from Account_to Value Count_interactions_out Count_interactions_in
1 1 2 25.0 2 2
2 1 3 30.0 2 2
3 2 1 28.0 2 1
4 2 3 10.0 2 1
5 2 3 12.0 2 1
6 3 1 40.0 1 2
7 3 1 30.0 1 2
8 3 1 20.0 1 2
Account 3 only interacts with account 1, therefore Count_interactions_out is 1. However, it receives interactions from account 1 and 2, therefore the count_interactions_in is 2.
How can I apply this to the whole dataset?
Thanks
Here's an approach using dplyr
library(dplyr)
financial.data %>%
group_by(Account_from) %>%
mutate(Count_interactions_out = nlevels(factor(Account_to))) %>%
ungroup() %>%
group_by(Account_to) %>%
mutate(Count_interactions_in = nlevels(factor(Account_from))) %>%
ungroup()
Here is a solution with base R, where ave() is used
df <- cbind(df,
with(df, list(
Count_interactions_out = ave(Account_to,Account_from,FUN = function(x) length(unique(x))),
Count_interactions_in = ave(Account_from,Account_to,FUN = function(x) length(unique(x)))[match(Account_from,Account_to,)])))
such that
> df
Account_from Account_to Value Count_interactions_out Count_interactions_in
1 1 2 25 2 2
2 1 3 30 2 2
3 2 1 28 2 1
4 2 3 10 2 1
5 2 3 12 2 1
6 3 1 40 1 2
7 3 1 30 1 2
8 3 1 20 1 2
or
df <- within(df, list(
Count_interactions_out <- ave(Account_to,Account_from,FUN = function(x) length(unique(x))),
Count_interactions_in <- ave(Account_from,Account_to,FUN = function(x) length(unique(x)))[match(Account_from,Account_to,)]))
such that
> df
Account_from Account_to Value Count_interactions_in Count_interactions_out
1 1 2 25 2 2
2 1 3 30 2 2
3 2 1 28 1 2
4 2 3 10 1 2
5 2 3 12 1 2
6 3 1 40 2 1
7 3 1 30 2 1
8 3 1 20 2 1
I have a long data frame like this one:
set.seed(17)
players<-rep(1:2, c(5,5))
decs<-sample(1:3,10,replace=TRUE)
world<-sample(1:2,10,replace=TRUE)
gamematrix<-cbind(players,decs,world)
gamematrix<-data.frame(gamematrix)
gamematrix
players decs world
1 1 1 1
2 1 3 1
3 1 2 2
4 1 3 2
5 1 2 2
6 2 2 2
7 2 1 2
8 2 1 1
9 2 3 2
10 2 1 2
I want to create for each player a new variable, that is based on the first appearance of the decs==3 variable, and the state of the world.
That is, if when the first appearance of "decs", the state of the world was "1", then the new variable should get the value of "6", otherwise, "7", as follows:
players decs world player_type
1 1 1 1 6
2 1 3 1 6
3 1 2 2 6
4 1 3 2 6
5 1 2 2 6
6 2 2 2 7
7 2 1 2 7
8 2 1 1 7
9 2 3 2 7
10 2 1 2 7
Any ideas how to do it?
This tidyverse approach might be a little cumbersome but it should give you what you want.
library(tidyverse)
left_join(
gamematrix,
gamematrix %>%
filter(decs == 3) %>%
group_by(players) %>%
slice(1) %>%
mutate(player_type = ifelse(world == 1, 6, 7)) %>%
select(players, player_type),
by = 'players'
)
# players decs world player_type
#1 1 1 1 6
#2 1 3 1 6
#3 1 2 2 6
#4 1 3 2 6
#5 1 2 2 6
#6 2 2 2 7
#7 2 1 2 7
#8 2 1 1 7
#9 2 3 2 7
#10 2 1 2 7
The idea is to filter you data for observations where decs == 3, extract the first element per 'players', add player_type subject to the state of the 'world' and finally merge with your original data.
An option is to use cumsum(decs==3) == 1 to find first occurrence of decs == 3 for a player. Now, dplyr::case_when can be used to assign player type.
library(dplyr)
gamematrix %>% group_by(players) %>%
mutate(player_type = case_when(
world[first(which(cumsum(decs==3)==1))] == 1 ~ 6L,
world[first(which(cumsum(decs==3)==1))] == 2 ~ 7L,
TRUE ~ NA_integer_))
# # A tibble: 10 x 4
# # Groups: players [2]
# players decs world player_type
# <int> <int> <int> <int>
# 1 1 1 1 6
# 2 1 3 1 6
# 3 1 2 2 6
# 4 1 3 2 6
# 5 1 2 2 6
# 6 2 2 2 7
# 7 2 1 2 7
# 8 2 1 1 7
# 9 2 3 2 7
# 10 2 1 2 7
We could use data.table
library(data.table)
setDT(gamematrix)[, player_type := c(7, 6)[any(decs == 3& world == 1) + 1],
by = players]
gamematrix
# players decs world player_type
# 1: 1 1 1 6
# 2: 1 3 1 6
# 3: 1 2 2 6
# 4: 1 3 2 6
# 5: 1 2 2 6
# 6: 2 2 2 7
# 7: 2 1 2 7
# 8: 2 1 1 7
# 9: 2 3 2 7
#10: 2 1 2 7
I have a data frame like this (with many more rows):
id act_l_n pas_l_n act_q_p pas_q_p act_l_p pas_l_p act_q_n pas_q_n
1 14 8 14 10 21 11 21 11
2 19 9 11 17 22 11 20 11
Every column name contains information about 3 variables separated by '_' (each has 2 levels named act/pas, l/q, n/p). Values are scores corresponding to each combination of variables (i.e. 1 of 8 conditions).
I need to move 3 variables to 3 separate columns, mark their levels by digits, and move corresponding value to separate column called "score". So from 1st row of current data frame I'd get something like this:
id score actpas lq pn
1 14 1 1 1
1 8 2 1 1
1 14 1 2 2
1 10 2 2 2
1 21 1 1 2
1 11 2 1 2
1 21 1 2 1
1 11 2 2 1
I've tried wrangling this with dplyr using gather and separate functions, but I can't really get what I need. Help with dplyr would be most appriciated!
If I understand well:
df<-read.table(textConnection(
"id,act_l_n,pas_l_n,act_q_p,pas_q_p,act_l_p,pas_l_p,act_q_n,pas_q_n
1,14,8,14,10,21,11,21,11
2,19,9,11,17,22,11,20,11"),
header=TRUE,sep=",")
library(tidyr)
library(dplyr)
gather(df,k,score,-id) %>% mutate(v1=1+as.integer(substr(k,1,3)=="pas")
,v2=1+as.integer(substr(k,5,5)=="q")
,v3=1+as.integer(substr(k,7,7)=="p")) %>%
select(-2) %>% arrange(id)
# id score v1 v2 v3
#1 1 14 1 1 1
#2 1 8 2 1 1
#3 1 14 1 2 2
#4 1 10 2 2 2
#5 1 21 1 1 2
#6 1 11 2 1 2
#7 1 21 1 2 1
#8 1 11 2 2 1
#9 2 19 1 1 1
#10 2 9 2 1 1
#11 2 11 1 2 2
#12 2 17 2 2 2
#13 2 22 1 1 2
#14 2 11 2 1 2
#15 2 20 1 2 1
#16 2 11 2 2 1