Find index of first specific result in each group from r - r

I have a dataset as below:
the outcome have no relationship with contact_date, when a subscriber response a cold call, we mark it successful contact attempt(1) else (0). The count is how many times we called the subscriber.
subscriber_id outcome contact_date queue multiple_number count
(int) (int) (date) (fctr) (int) (int)
1 1 1 2015-01-29 2 1 1
2 1 0 2015-02-21 2 1 2
3 1 0 2015-03-29 2 1 3
4 1 1 2015-04-30 2 1 4
5 2 0 2015-01-29 2 1 1
6 2 0 2015-02-21 2 1 2
7 2 0 2015-03-29 2 1 3
8 2 0 2015-04-30 2 1 4
9 2 1 2015-05-31 2 1 5
10 2 1 2015-08-25 5 1 6
11 2 0 2015-10-30 5 1 7
12 2 0 2015-12-14 5 1 8
13 3 1 2015-01-29 2 1 1
I would like to get the count number for the first outcome ==1 for each subscriber, could you please tell me how can I get it? the final data set I would like is:
(Please noticed some may don't have any success call, in this case, I would like to mark the first_success as 0)
subscriber_id first_success
1 1
2 5
3 1
...

require(dplyr)
data %>% group_by(subscriber_id) %>% filter(outcome==1) %>%
slice(which.min(contact_date)) %>% data.frame() %>%
select(subscriber_id,count)

Related

R define a new variable as count starting when condition is met

so I´m trying to add two new variables to my dataframe. A variable named start, which is supposed to be a a running count from 0 to whatever number of rows there are for one group, and a second variable named stop which is practically the same, but starting at 1. The count should start, once the value of a second variable scores >0. It is further important, that the count continues until the last value of the group (so it shouldn´t stop if Var1=0 again) and that NAs are ignored in the sense, that counting continues.
Consider the following dataset as an example
ID Var1 start stop
1 0
1 1 0 1
1 4 1 2
1 2 2 3
1 NA 3 4
1 4 4 5
2 0
2 0
2 3 0 1
2 5 1 2
2 9 2 3
2 0 3 4
I don´t really care for the values start and stop take on before Var1>0 first, so whether it´s 0 or NA is not important
Thanks very much for the good answers in advance!!
Dirty solution to the problem, will probably work just take out the extra columns that I made as steps with select
library(tidyverse)
df_example <- read_table("ID Var1 start stop
1 0
1 1 0 1
1 4 1 2
1 2 2 3
1 NA 3 4
1 4 4 5
2 0
2 0
2 3 0 1
2 5 1 2
2 9 2 3
2 0 3 4")
df_example %>%
group_by(ID) %>%
mutate(greater_1 = if_else(replace_na(Var1,1) > 0,1,0),
run_sum = cumsum(greater_1),
to_fill = if_else(run_sum == 1,1,NA_real_)) %>%
fill(to_fill) %>%
mutate(end2 = cumsum(to_fill %>% replace_na(0)),
star2 = if_else(end2 -1 > 0,end2 -1,0))
#> # A tibble: 12 x 9
#> # Groups: ID [2]
#> ID Var1 start stop greater_1 run_sum to_fill end2 star2
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0 NA NA 0 0 NA 0 0
#> 2 1 1 0 1 1 1 1 1 0
#> 3 1 4 1 2 1 2 1 2 1
#> 4 1 2 2 3 1 3 1 3 2
#> 5 1 NA 3 4 1 4 1 4 3
#> 6 1 4 4 5 1 5 1 5 4
#> 7 2 0 NA NA 0 0 NA 0 0
#> 8 2 0 NA NA 0 0 NA 0 0
#> 9 2 3 0 1 1 1 1 1 0
#> 10 2 5 1 2 1 2 1 2 1
#> 11 2 9 2 3 1 3 1 3 2
#> 12 2 0 3 4 0 3 1 4 3
Created on 2020-08-04 by the reprex package (v0.3.0)

R - Calculate Time Elapsed Since Last Events with Multiple Event Types and IDs

Similar questions have been asked before where the question is how to calculate the number of observations since an event. I have a further request. How to calculate the number of days since the same type of observation but also to find the number of days since any other type of event. I also have ids.
To illustrate please see below. I am trying to do this in R using Datatables but to little result.
What I have:
date event id
2000-07-06 2 1
2000-07-07 1 1
2000-07-09 0 1
2000-07-10 0 1
2000-07-15 2 1
2000-07-16 1 1
2000-07-20 0 1
2000-07-21 1 1
2000-07-06 1 2
2000-07-07 2 2
2000-07-15 0 2
2000-07-16 0 2
2000-07-17 2 2
2000-07-18 1 2
and what I would like to have is as follows:
date event id days_since_event_1 days_since_event_2
2000-07-06 2 1 NA NA
2000-07-07 1 1 NA 1
2000-07-09 0 1 2 3
2000-07-10 0 1 3 4
2000-07-15 2 1 8 9
2000-07-16 1 1 9 1
2000-07-20 0 1 4 5
2000-07-21 1 1 5 6
2000-07-06 1 2 NA NA
2000-07-07 2 2 1 NA
2000-07-15 0 2 9 8
2000-07-16 0 2 10 9
2000-07-17 2 2 11 10
2000-07-18 1 2 12 1
The two events are mutually exclusive, that is, they cannot take place on the same day.
Hope to hear some good advice. All the best.
The following uses the Chron Library to calculate difference in the dates
library(chron)
df$date <- chron(as.character(df$date),format=c(date="y-m-d"))
for(j in unique(df$id)){
DaysSince1 <-NA
DaysSince2 <-NA
RowsWithID <- grep(j,df$id)
for(i in RowsWithID){
df$days_since_event_1[i] <- df$date[i]-df$date[i-DaysSince1]
df$days_since_event_2[i] <- df$date[i]-df$date[i-DaysSince2]
if(df$event[i]==1){DaysSince1<-1}
else{DaysSince1<-DaysSince1+1}
if(df$event[i]==2){DaysSince2<-1}
else{DaysSince2<-DaysSince2+1}
}
}
This code gives the following results
> df
date event id days_since_event_1 days_since_event_2
1 00-07-06 2 1 NA NA
2 00-07-07 1 1 NA 1
3 00-07-09 0 1 2 3
4 00-07-10 0 1 3 4
5 00-07-15 2 1 8 9
6 00-07-16 1 1 9 1
7 00-07-20 0 1 4 5
8 00-07-21 1 1 5 6
9 00-07-06 1 2 NA NA
10 00-07-07 2 2 1 NA
11 00-07-15 0 2 9 8
12 00-07-16 0 2 10 9
13 00-07-17 2 2 11 10
14 00-07-18 1 2 12 1
To address you comment, you can do the following in Base R to get the number of observations rather than days. No Libraries needed.
for(j in unique(df$id)){
ObsSince1 <-NA
ObsSince2 <-NA
RowsWithID <- grep(j,df$id)
for(i in RowsWithID){
df$Obs_since_event_1[i] <- ObsSince1
df$Obs_since_event_2[i] <- ObsSince2
if(df$event[i]==1){ObsSince1<-1}
else{ObsSince1<-ObsSince1+1}
if(df$event[i]==2){ObsSince2<-1}
else{ObsSince2<-ObsSince2+1}
}
}
You should get the following output
> df
date event id Obs_since_event_1 Obs_since_event_2
1 2000-07-06 2 1 NA NA
2 2000-07-07 1 1 NA 1
3 2000-07-09 0 1 1 2
4 2000-07-10 0 1 2 3
5 2000-07-15 2 1 3 4
6 2000-07-16 1 1 4 1
7 2000-07-20 0 1 1 2
8 2000-07-21 1 1 2 3
9 2000-07-06 1 2 NA NA
10 2000-07-07 2 2 1 NA
11 2000-07-15 0 2 2 1
12 2000-07-16 0 2 3 2
13 2000-07-17 2 2 4 3
14 2000-07-18 1 2 5 1
You could subset your Dates for all with a specific event encoding, e.g.:
date.2 = DATAFRAME[which(DATAFRAME[,2]==2),1]
and then just do
DATAFRAME[which(DATAFRAME[,2]==2),5] = as.numeric(diff.Date(date.2))
and so on.
Possibly this is even easier to do, but this was the first thing coming to my mind.
DATAFRAME is just the name of your dataframe here.
edit: If I see it correctly you want NAs wherever ID and event column are different to each other? Then you could just go on with:
DATAFRAME[which(DATAFRAME[,2] != DATAFRAME[,3]),c(4,5)] = NA or something like that

If a value appears in the row, all subsequent rows should take this value (with dplyr)

I'm just starting to learn R and I'm already facing the first bigger problem.
Let's take the following panel dataset as an example:
N=5
T=3
time<-rep(1:T, times=N)
id<- rep(1:N,each=T)
dummy<- c(0,0,1,1,0,0,0,1,0,0,0,1,0,1,0)
df<-as.data.frame(cbind(id, time,dummy))
id time dummy
1 1 1 0
2 1 2 0
3 1 3 1
4 2 1 1
5 2 2 0
6 2 3 0
7 3 1 0
8 3 2 1
9 3 3 0
10 4 1 0
11 4 2 0
12 4 3 1
13 5 1 0
14 5 2 1
15 5 3 0
I now want the dummy variable for all rows of a cross section to take the value 1 after the 1 for this cross section appears for the first time. So, what I want is:
id time dummy
1 1 1 0
2 1 2 0
3 1 3 1
4 2 1 1
5 2 2 1
6 2 3 1
7 3 1 0
8 3 2 1
9 3 3 1
10 4 1 0
11 4 2 0
12 4 3 1
13 5 1 0
14 5 2 1
15 5 3 1
So I guess I need something like:
df_new<-df %>%
group_by(id) %>%
???
I already tried to set all zeros to NA and use the na.locf function, but it didn't really work.
Anybody got an idea?
Thanks!
Use cummax
df %>%
group_by(id) %>%
mutate(dummy = cummax(dummy))
# A tibble: 15 x 3
# Groups: id [5]
# id time dummy
# <dbl> <dbl> <dbl>
# 1 1 1 0
# 2 1 2 0
# 3 1 3 1
# 4 2 1 1
# 5 2 2 1
# 6 2 3 1
# 7 3 1 0
# 8 3 2 1
# 9 3 3 1
#10 4 1 0
#11 4 2 0
#12 4 3 1
#13 5 1 0
#14 5 2 1
#15 5 3 1
Without additional packages you could do
transform(df, dummy = ave(dummy, id, FUN = cummax))

Indexing by multiple criteria between 2 data frames in R

I have the following data frame:
id day event
1 1 1
1 3 1
2 1 0
2 4 0
2 9 0
2 15 0
3 2 0
3 5 0
4 1 1
4 8 1
4 11 1
What i want is when an event has a value zero then all the event values become one except from the last one(by date). So the output should be the following:
id day event
1 1 1
1 3 1
2 1 1
2 4 1
2 9 1
2 15 0
3 2 1
3 5 0
4 1 1
4 8 1
4 11 1
Any help?
We could use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'id', if any of the 'event' is 0 (!event) for that particular 'id', we replicate 1 for the length of that group -1 (.N-1) and concatenate with 0 or else to return the 'event' value, assign (:=) to update the 'event' column.
library(data.table)
setDT(df1)[, event :=if(any(!event)) c(rep(1L, .N-1),0L) else event, by = id]
df1
# id day event
# 1: 1 1 1
# 2: 1 3 1
# 3: 2 1 1
# 4: 2 4 1
# 5: 2 9 1
# 6: 2 15 0
# 7: 3 2 1
# 8: 3 5 0
# 9: 4 1 1
#10: 4 8 1
#11: 4 11 1
Or using dplyr, we group by 'id' and change the 'event' column by taking the lead of the logical vector that is replicated and add with another logical vector (all(event)).
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(event= lead(rep(any(!event), n()), default=0) + all(event))
# id day event
# (int) (int) (dbl)
#1 1 1 1
#2 1 3 1
#3 2 1 1
#4 2 4 1
#5 2 9 1
#6 2 15 0
#7 3 2 1
#8 3 5 0
#9 4 1 1
#10 4 8 1
#11 4 11 1

Extracting event rows from a data frame

I have this data frame:
df <-
ID var TIME value method
1 3 0 2 1
1 3 2 2 1
1 3 3 0 1
1 4 0 10 1
1 4 2 10 1
1 4 4 5 1
1 4 6 5 1
2 3 0 2 1
2 3 2 2 1
2 3 3 0 1
2 4 0 10 1
2 4 2 10 1
2 4 4 5 1
2 4 6 5 1
I want to extract rows that has a new eventin value column. For example, for ID=1, var=3 has a value of 2 at TIME=0. This value stays the same at TIME=1, so I would take the first row at TIME=0 only and discard the second row. However, the third row, the value for var=3 has changed into zero, so I have also to extract this row. And so on for the rest of the variables. This has to be applied for every subject ID. For the above df, the result should be as follows:
dfevent <-
ID var TIME value method
1 3 0 2 1
1 3 3 0 1
1 4 0 10 1
1 4 4 5 1
2 3 0 2 1
2 3 3 0 1
2 4 0 10 1
2 4 4 5 1
Could any one help me doing this in R? I have a huge data set and I want to extract the information at which a new event has occurred for the value of every var. I have 4 variables in the data frame numbered (3, 4,5,6, and 7). The above is an example for 2 variables (variable number: 3 and 4).
This does it using dplyr
library(dplyr)
df %>%
group_by(ID, var) %>%
mutate(tf = ifelse(value==lag(value), 1, 0)) %>%
filter(is.na(tf) | tf==0) %>%
select(-tf)
# ID var TIME value method
#1 1 3 0 2 1
#2 1 3 3 0 1
#3 1 4 0 10 1
#4 1 4 4 5 1
#5 2 3 0 2 1
#6 2 3 3 0 1
#7 2 4 0 10 1
#8 2 4 4 5 1
basically, I created an extra variable that returns a '1' when the value is the same as the preceding row within groups of unique ID/var combinations. We then get rid of this variable before returning the output.
Base solution:
df[with(df, abs(ave(value,ID,FUN=function(x) c(1,diff(x)) ))) > 0,]
# ID var TIME value method
#1 1 3 0 2 1
#3 1 3 3 0 1
#4 1 4 0 10 1
#6 1 4 4 5 1
#8 2 3 0 2 1
#10 2 3 3 0 1
#11 2 4 0 10 1
#13 2 4 4 5 1
From the expected results, you may also try rleid from data.table
library(data.table)#data.table_1.9.5
setDT(df)[df[, .I[1L] , list(ID, var, rleid(value))]$V1]
# ID var TIME value method
#1: 1 3 0 2 1
#2: 1 3 3 0 1
#3: 1 4 0 10 1
#4: 1 4 4 5 1
#5: 2 3 0 2 1
#6: 2 3 3 0 1
#7: 2 4 0 10 1
#8: 2 4 4 5 1
Or a similar approach as #thelatemail
setDT(df)[df[, .I[abs(c(1,diff(value)))>0] , ID]$V1]
Or
unique(setDT(df)[, id:=rleid(value)], by=c('ID', 'var', 'id'))

Resources