How to make time to event in unbalanced panel data in r? - r

I have unbalanced panel data with a binary variable indicating if the event occurred or not. I want to control for time dependency. The way to do this is to control for the time it that had elapsed since the event has occured before.
Here is a reproducible example, with a vector of what I am trying to achieve. Thanks!
id year onset time_since_event
1 1 1989 0 0
2 1 1990 0 1
3 1 1991 1 2
4 1 1992 0 0
5 1 1993 0 1
6 1 1994 0 2
7 2 1989 0 0
8 2 1990 1 1
9 2 1991 0 0
10 2 1992 1 1
11 2 1993 0 2
12 2 1994 0 3
13 3 1991 0 0
14 3 1992 0 1
15 3 1993 0 2
˚
id <- c(1,1,1,1,1,2,2,2,2,3,3)
year <- c(1989,1990,1991,1992,1993,1994,1989,1990,1991,1992,1993,1994,1991,1992,1993)
onset <- c(0,0,1,0,0,0,0,1,0,1,0,0,0,0)
time_since_event<-c(0,1,2,0,1,2,0,1,2,3,0,1,2) #what I want to create
df <- data.frame(cbind(id, year, onset,time_since_event))

Try this:
id <- c(1,1,1,1,1,2,2,2,2,3,3)
year <- c(1989,1990,1991,1992,1993,1989,1990,1991,1992,1991,1992)
onset <- c(0,0,1,0,0,0,1,0,1,0,0)
period <- c(0, cumsum(onset)[-length(onset)])
time_since_event <- ave(year, id, period, FUN=function(x) x-x[1])
df <- data.frame(id, year, onset, time_since_event)
I created a variable called period which describes the different periods until each event. It doesn't matter that the periods overlap patients, since we're going to group by patient and by period, so the count will start over if it's a new patient or a new period.
Using the ave() function allows us to assign values within each grouping. Here we're analyzing year based on the grouping variables id and period. The function I used at the end just subtracts the first value from the current value within each grouping.

Related

R - Expand dataframe to create panel data

I have a dataset where I observe individuals for different years (e.g., individual 1 is observed in 2012 and 2014, while individuals 2 and 3 are only observed in 2016). I would like to expand the data for each individual (i.e., each individual would have 3 rows: 2012, 2014 and 2016) in order to create a panel data with an indicator for whether an individual is observed or not.
My initial dataset is:
year
individual_id
rank
2012
1
11
2014
1
16
2016
2
76
2016
3
125
And I would like to get something like that:
year
individual_id
rank
present
2012
1
11
1
2014
1
16
1
2016
1
.
0
2012
2
.
0
2014
2
.
0
2016
2
76
1
2012
3
.
0
2014
3
.
0
2016
3
125
1
So far I have tried to play with "expand":
bys researcher: egen count=count(year)
replace count=3-count+1
bys researcher: replace count=. if _n>1
expand count
which gives me 3 rows per individual. Unfortunately this copies one of the initial row, but I am unable to go from there to the final desired dataset.
Thanks in advance for your help!
You can use expand.grid to create a data frame of all combinations your inputs. Then full join the tables together and add a condition to determine if the individual was present that year or not.
library(dplyr)
dt = data.frame(
year = c(2012,2014,2016,2016),
individual_id = c(1,1,2,3),
rank = c(11,16,76,125)
)
exp = expand.grid(year = c(2012,2014,2016), individual_id = c(1:3))
dt %>%
full_join(exp, by = c("year","individual_id")) %>%
mutate(present = ifelse(!is.na(rank), 1, 0)) %>%
arrange(individual_id, year)
year individual_id rank present
1 2012 1 11 1
2 2014 1 16 1
3 2016 1 NA 0
4 2012 2 NA 0
5 2014 2 NA 0
6 2016 2 76 1
7 2012 3 NA 0
8 2014 3 NA 0
9 2016 3 125 1

I am trying to get seasonal melt information from a daily data. cannot aggregate by season

I am trying to create a new table from an existing one.
I've selected the columns I need, Month, Year, and Temperature. There are is one row for each day.
I've managed to add another column, with a 1 or 0 for each day above freezing.
I would now like to aggregate the rows so I have a row for each season and a column for each year, as well as an annual total row.
KAN_U <- kan_u_df %>%
select(Year, MonthOfYear, AirTemperature.C.)
KAN_U$Melt <- as.numeric(KAN_U$AirTemperature.C. > 0)
head(KAN_U)
Year MonthOfYear AirTemperature.C. Melt
1 2009 4 -999.00 0
2 2009 4 -25.30 0
3 2009 4 -23.44 0
4 2009 4 -28.18 0
5 2009 4 -32.15 0
6 2009 4 -24.35 0'
I would like my final table to look as such
Total Winter Spring Summer Autumn
2009 10 0 2 7 1
2010 10 0 2 7 1
2011 10 0 2 7 1

Define a dummy variable based on binary code in R

Take the following patient data example from a hospital.
YEAR <- sample(1980:1995,15, replace=T)
Pat_ID <- sample(1:100,15)
sex <- c(1,0,1,0,1,0,0,1,0,0,0,0,1,0,0)
df1 <- data.frame(Pat_ID,YEAR,sex)
I want to introduce a dummy variable $PAIR_IDENTIFIER that takes a new value each time a new sex==1 appears. The problem is there is no constant patern to the sex variable.
You see sometimes the succeeding 1 appears in the ith+2 position and then ith+3 position etc.
so $PAIR_IDENTIFIER <- c(1,1,2,2,3,3,3,4,4,4,4,4 .....)
You can do this by simply using the cumsum,
df1$PAIR_IDENTIFIER <- cumsum(df1$sex)
df1
# Pat_ID YEAR sex PAIR_IDENTIFIER
#1 54 1991 1 1
#2 100 1992 0 1
#3 6 1995 1 2
#4 99 1994 0 2
#5 42 1988 1 3
#6 65 1990 0 3
#7 53 1994 0 3
#8 96 1987 1 4

How to create Time-to-Event variable?

Dear all: I'm thinking of creating a "time to event" variable in R and need your expertice to get it done. Below you can see a small sample of my data. The time variable is in years and it starts at 0 and resets itself when Event = 1.
In the real data the observation period starts in 1989 but there are some countries (that had not ratified certain conventions before 1989) that come in later on, like the US in the example below. Whenever it starts, the first value for the "time to event" variable should be zero.
Thanks for all suggestions!
Country year Event **Time-to-event**
USA 2000 0 0
USA 2001 0 1
USA 2002 1 2
USA 2003 0 0
USA 2004 0 1
USA 2005 0 2
USA 2006 1 3
USA 2007 0 0
USA 2008 1 1
USA 2009 0 0
USA 2010 0 1
USA 2011 0 2
USA 2012 0 3
We can use ave
i1 <- with(df2, ave(Event, Country, FUN=
function(x) cumsum(c(TRUE, diff(x)<0))))
df2$Time_to_event <- with(df2, ave(i1, i1, Country, FUN= seq_along)-1)
df2$Time_to_event
#[1] 0 1 2 0 1 2 3 0 1 0 1 2 3
count_until(x) is always equal to rev(count_since(rev(x))).
one might use something like this:
count_since<-function(trigger)
{
i <- seq_along(trigger)
(i - cummax(i*trigger))*cummax(trigger)
}
count_until<-function(x)rev(count_since(rev(x)))
> count_until(1:10%%5==0)
[1] 4 3 2 1 0 4 3 2 1 0

Merge/Append in R – how to add variables without generating more rows

I have two datasets – data A and data B. Data A contains 30.000 observations while data B has 10.000 observations. Both datasets have 156 countries – noted with their ISO–number.
I want to add some of the variables in data B to data A (let's say the variable Y*). However, I face problems when merging these two datasets.
Below you can see the samples of the datasets
Data A
Country ISO year X
A 1 1990 0
A 1 1991 0
A 1 1992 0
A 1 1993 0
A 1 1994 1
B 2 1990 0
B 2 1991 0
B 2 1992 0
B 2 1993 0
B 2 1994 1
Data B
Country ISO year Y*
A 1 1990 1
A 1 1994 0
B 2 1990 1
B 2 1992 0
So I am interested in getting the variable Y* into my data A. To be more precise, I want to add it by country and year.
Below you see the code that I use to add the Y* variable. I have used this code many times and it works perfectly. I cannot figure out why it doesn't work in this case.
variables <- c("Country", "year", "Y*")
newdata <- merge(DataA, DataB[,variables], by=c("Country","Year"), all.x=TRUE)
When I run this code, I get "newdata" with the variable Y* but with 5 times more rows than Data A.
Question: Is there any relatively simple and efficient ways of doing this properly? Is there something with the structure of dataset B that creates more rows? In any ways, I am grateful for all kinds of suggestions that could solve this problem.
This is the outcome I want to get:
Country ISO year X Y*
A 1 1990 0 1
A 1 1991 0 0
A 1 1992 0 0
A 1 1993 0 0
A 1 1994 1 0
B 2 1990 0 1
B 2 1991 0 0
B 2 1992 0 0
B 2 1993 0 0
B 2 1994 1 0
Using the merge. Make sure to readjust the values of the Y* variable
z <- merge(DataA,DataB, by = intersect(names(DataA), names(DataB)), all = TRUE)
require(dplyr)
left_join(DataA,DataB %>% select(Country,year,Y*), by=c("Country"="Country","year"="year"))

Resources