about the equivalent R code for forvalues in Stata [duplicate] - r

This question already has an answer here:
about the equivalent command for forvalues
(1 answer)
Closed 4 years ago.
When we deal with the longitudinal recurrent events. I would like to detect the events which meet some condition.
data <- data.frame(id=c(rep(1, 4), rep(2, 3), rep(3, 3), rep(4,4)),
event=c(1, 1, 0, 0, 1, 1, 0, 1, 1, 0,1,1,1,1),
eventcount=c(1, 2, 0, 0, 1, 2, 0, 1, 2, 3,1,2,3,4),
firstevent=c(1, 0, 0, 0, 1, 0, 0, 1, 0, 0,1,0,0,0),
time=c(100, 250, 150, 300, 240, 400, 150, 200, 320, 360,100,210,220,235) )
When we want to detect events which occurred from 100 days to 150 days after the first event.
In Stata, we can use
gen event2=1 if id==id[_n-1]& time-time[_n-1]>100 & time-time[_n-1]<=150 & firstevent[_n-1]==1 & firstevent==0 & event==1
forvalues i = 2/4 {
replace event2=1 if id==id[_n-`i']& time-time[_n-`i']>100 &time-time[_n-`i']<=150 & firstevent[_n-`i']==1 & firstevent==0 & event==1
}
I would like to obtain the dataset below.
data_after <- data.frame(id=c(rep(1, 4), rep(2, 3), rep(3, 3), rep(4,4)),
event=c(1, 1, 0, 0, 1, 1, 0, 1, 1, 0,1,1,1,1),
eventcount=c(1, 2, 0, 0, 1, 2, 0, 1, 2, 3,1,2,3,4),
firstevent=c(1, 0, 0, 0, 1, 0, 0, 1, 0, 0,1,0,0,0),
time=c(100, 250, 150, 300, 240, 400, 150, 200, 320, 360,100,210,220,235),
afterevent=c(NA,1,NA,NA,NA,NA,NA,NA,1,NA,NA,1,1,1))
How can we write equivalent R code for this Stata code?
In the original data, we have more ids and more recurrent events.

This would correspond to your output:
library(dplyr)
data %>%
group_by(id) %>%
mutate(afterevent = ifelse(time <= (time[firstevent == 1] + 150) &
(time >= time[firstevent == 1] + 100), 1, NA))
Output:
# A tibble: 14 x 6
# Groups: id [4]
id event eventcount firstevent time afterevent
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 100 NA
2 1 1 2 0 250 1
3 1 0 0 0 150 NA
4 1 0 0 0 300 NA
5 2 1 1 1 240 NA
6 2 1 2 0 400 NA
7 2 0 0 0 150 NA
8 3 1 1 1 200 NA
9 3 1 2 0 320 1
10 3 0 3 0 360 NA
11 4 1 1 1 100 NA
12 4 1 2 0 210 1
13 4 1 3 0 220 1
14 4 1 4 0 235 1

Related

subset a time series around a specific value in a data.table

This might be a basic question but how can you subset a time series around a specific value in a data.table? I have data:
DT <- data.table(time = seq(1, 14), b = c(0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1 ,0, 0))
Now I want to look for all rows with 1 in b and keep, let's say 2 rows, before and after the found observation. It would be nice to have a flexible window where you define the wanted time window before and after the detected observation. The result should look like this:
RES <- data.table(time = c(2, 3, 4, 5, 6, 10, 11, 12, 13, 14), b = c(0, 0, 1, 0, 0, 0, 0, 1, 0, 0))
Here is one (quick and dirty) option:
n <- 2L
DT[{rows <- rep(which(b==1), each = n*2L+1L) + -n:n; fifelse(rows %between% c(1,.N), rows, 0)}]
time b
1: 2 0
2: 3 0
3: 4 1
4: 5 0
5: 6 0
6: 10 0
7: 11 0
8: 12 1
9: 13 0
10: 14 0
To get the result in separate data.frames you can use lapply():
lapply(
DT[b==1, which=TRUE],
function(i) DT[{rows <- rep(i, each = n*2L+1L) + -n:n; fifelse(rows %between% c(1,.N), rows, 0)}]
)
# [[1]]
# time b
# 1: 2 0
# 2: 3 0
# 3: 4 1
# 4: 5 0
# 5: 6 0
#
# [[2]]
# time b
# 1: 10 0
# 2: 11 0
# 3: 12 1
# 4: 13 0
# 5: 14 0

Match household partners without loop

I am trying to avoid using a loop in my attempt to recode household partners' variable identification.
hldid denotes the household while persid the person in the household.
The variable partner indicates the persid of the partner and child indicate if the row is a child.
What is missing from the variable partner is the persid for both partners.
For instance for hldid == 1, the persid == 1 has a value of 0 for partner while it should be 2.
This is how the data looks like:
> test
hldid persid age sex relresp partner child
1 1 1 26 2 0 0 0
2 1 2 26 1 1 1 0
3 2 1 59 2 0 0 0
4 2 2 64 1 1 1 0
5 3 1 76 2 0 0 0
6 4 1 65 2 0 0 0
7 4 2 64 1 1 1 0
8 5 1 52 2 0 0 0
9 5 2 51 1 1 1 0
10 5 3 20 2 21 0 1
11 5 4 14 2 21 0 1
12 7 1 69 1 0 0 0
13 7 2 70 2 1 1 0
I managed to create a quite ugly loop, however it is too slow for the entire dataset.
test$partnerREC = test$partner
for(i in 1:13){
for(j in 1:13){
if(
test$hldid[i] == test$hldid[i+1] & # verify if household is the same
(test$persid[i] == test$partner[j])
)
{
test$partnerREC[i] = test$persid[j] # put the persid for each partner
}
}
}
> test
hldid persid age sex relresp partner child partnerREC
1 1 1 26 2 0 0 0 2
2 1 2 26 1 1 1 0 1
3 2 1 59 2 0 0 0 2
4 2 2 64 1 1 1 0 1
5 3 1 76 2 0 0 0 0
6 4 1 65 2 0 0 0 2
7 4 2 64 1 1 1 0 1
8 5 1 52 2 0 0 0 2
9 5 2 51 1 1 1 0 1
10 5 3 20 2 21 0 1 0
11 5 4 14 2 21 0 1 0
12 7 1 69 1 0 0 0 2
13 7 2 70 2 1 1 0 1
Any idea how I could use data.table for solving this?
test = structure(list(hldid = c(1, 1, 2, 2, 3, 4, 4, 5, 5, 5, 5, 7,
7), persid = c(1, 2, 1, 2, 1, 1, 2, 1, 2, 3, 4, 1, 2), age = c(26,
26, 59, 64, 76, 65, 64, 52, 51, 20, 14, 69, 70), sex = c(2, 1,
2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 2), relresp = c(0, 1, 0, 1, 0,
0, 1, 0, 1, 21, 21, 0, 1), partner = c(0, 1, 0, 1, 0, 0, 1, 0,
1, 0, 0, 0, 1), child = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0,
0)), class = "data.frame", row.names = c(NA, -13L))
A possible solution:
library(data.table) # load the package
setDT(test) # convert 'test' to a 'data.table'
test[, partnerREC := persid[c(pmin(2,.N):1,rep(0,(pmax(.N,2)-2)))] *
(persid %in% 1:2) *
(.N != 1)
, by = hldid][]
which gives:
> test
hldid persid age sex relresp partner child partnerREC
1: 1 1 26 2 0 0 0 2
2: 1 2 26 1 1 1 0 1
3: 2 1 59 2 0 0 0 2
4: 2 2 64 1 1 1 0 1
5: 3 1 76 2 0 0 0 0
6: 4 1 65 2 0 0 0 2
7: 4 2 64 1 1 1 0 1
8: 5 1 52 2 0 0 0 2
9: 5 2 51 1 1 1 0 1
10: 5 3 20 2 21 0 1 0
11: 5 4 14 2 21 0 1 0
12: 7 1 69 1 0 0 0 2
13: 7 2 70 2 1 1 0 1
This solution is based on the assumption (derived from the example data) that only 'persid' 1 & 2 are partners, anything higher are children.
What this does:
Group by hldid
Change the order of the first two persons in a hould but only when there are more people in a household with
persid[c(pmin(2,.N):1,rep(0,(pmax(.N,2)-2)))] where pmin is used to assure that a vector of length 1 is constructed when a household has only one person.
Multiply that with (persid %in% 1:2)to get zores for children.
Multiply that with (.N != 1) to get a value of zero for households of one person.
df <- data.frame(matrix(data = NA, ncol = 7))
names(df) <- names(test)
for(id in unique(test$hldid)){
t <- test[test$hldid==id,]
t$partner[t$persid == t$partner[t$partner!=0]] <- t$persid[which(t$partner!=0)]
df <- rbind(df, t)
}
df <- df[-1,]
A base R solution is more complicated than Jaap's data.table solution.
I work with a copy.
test2 <- test
After running the code in the question, run the following.
test2$partnerREC <- test2$partner
sp <- split(test2, test2$hldid)
test2 <- lapply(sp, function(DF){
i <- with(DF, which(persid %in% partner))
j <- with(DF, which(partner %in% persid))
#cat("i:", i, "\tj:", j, "\n")
DF$partnerREC[i] <- DF$persid[j]
DF
})
test2 <- do.call(rbind, test2)
row.names(test2) <- NULL
Now compare both results.
identical(test, test2)
#[1] TRUE
You can get there through some dplyr steps to join the data on itself and update the value of partner when persid == partner.
test2 <- left_join(test, test %>% select(hldid, persid, partner) %>% filter(partner != 0), by=c("hldid")) %>%
filter(persid.x == partner.y) %>%
mutate(partner.x = persid.y)
This will give you the head of household with their partner id matched up but you'd have to rejoin this to the original data (I'm not sure the equivalent of SQL update in dplyr lingo).
1. Create your test data.frame
library(tidyverse)
test <- tribble(
~hldid, ~persid, ~age, ~sex, ~relresp, ~partner, ~child,
1, 1, 26, 2, 0, 0, 0,
1, 2, 26, 1, 1, 1, 0,
2, 1, 59, 2, 0, 0, 0,
2, 2, 64, 1, 1, 1, 0,
3, 1, 76, 2, 0, 0, 0,
4, 1, 65, 2, 0, 0, 0,
4, 2, 64, 1, 1, 1, 0,
5, 1, 52, 2, 0, 0, 0,
5, 2, 51, 1, 1, 1, 0,
5, 3, 20, 2, 21, 0, 1,
5, 4, 14, 2, 21, 0, 1,
7, 1, 69, 1, 0, 0, 0,
7, 2, 70, 2, 1, 1, 0)
2. arrange(), group_by(), and mutate() come to rescue
test %>%
# arrange the data in case the raw data did not
arrange(hldid, child, persid) %>%
# group each household
group_by(hldid) %>%
# match first and second household person as each other's partner
mutate(partnerREC = ifelse(persid == first(persid), nth(persid, 2), first(persid))) %>%
# correct partnerREC for child and single
mutate(partnerREC = ifelse(child == 1 | is.na(partnerREC), 0, partnerREC))
# un-group it
ungroup()
A loop solution using rcpp
Source the rcpp script
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector HHgrid(CharacterVector hid, NumericVector persid, NumericVector partner,
NumericVector partnerRec) {
int nrows = hid.size();
for (int i = 1; i < nrows - 1; i ++){
for (int j = 0; j < nrows - 1; j++){
if( (hid(i) == hid(i+1)) & ( persid(i) == partner(j) ) ){
partnerRec(i) = persid(j);
}
}
Rcout << i << std::endl;
}
return(partnerRec);
}
and run the function
HHgrid(hid = test$hldid, persid = test$persid, partner = test$partner, partnerRec = test$partnerRec)
There is just a slight issue with the first case (if you know how to fix it)

How to count the number of rows from the last non-zero value (or to the next, if the lattest greater than k)

I've got a data like below:
ex <- structure(list(id = 1:20, V1 = c(1, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0)), class = "data.frame", row.names = c(NA, -20L))
Now I need to create a new column that would count the number of rows from the latest non-zero value. One additional condition is that if that number is greater than k then we would count the number of rows before the next non-zero value (with - sign). So assuming k = 10 the output would look like below:
structure(list(id = 1:20, V1 = c(1, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0), counter = c(0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, -2, -1, 0, 1, 2)), class = "data.frame", row.names = c(NA, -20L))
What's the efficient way to do this using tidyverse packages, if possible?
A base R approach with ave, where we first calculate number of rows from the last non-zero value and then for values greater than k we reverse the index position and add the negative sign to get sequence in -2, -1 order.
ave(ex$V1, cumsum(ex$V1 != 0) , FUN = function(x) {
inds <- seq_along(x) - 1
ifelse(inds > k, -rev(inds) - 1, inds)
})
#[1] 0 1 2 3 0 1 2 3 4 5 6 7 8 9 10 -2 -1 0 1 2
Another option:
ex <- structure(list(id = 1:20, V1 = c(1, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0)), class = "data.frame", row.names = c(NA, -20L))
library(tidyverse)
k_LIMIT <- 10
ex %>%
mutate(my_group_var = cumsum(V1)) %>%
group_by(my_group_var) %>%
mutate(want = ifelse((row_number() - 1) > k_LIMIT,
-(max((row_number() - 1)) - (row_number() - 1) + 1),
(row_number() - 1))) %>%
ungroup() %>%
select(-my_group_var)
# A tibble: 20 x 3
# id V1 want
# <int> <dbl> <dbl>
# 1 1 1 0
# 2 2 0 1
# 3 3 0 2
# 4 4 0 3
# 5 5 9 0
# 6 6 0 1
# 7 7 0 2
# 8 8 0 3
# 9 9 0 4
#10 10 0 5
#11 11 0 6
#12 12 0 7
#13 13 0 8
#14 14 0 9
#15 15 0 10
#16 16 0 -2
#17 17 0 -1
#18 18 1 0
#19 19 0 1
#20 20 0 2
Here is one option with data.table
library(data.table)
k <- 10
setDT(ex)[, grp := cumsum(V1 != 0)
][, counter := seq_len(.N) - 1, grp
][counter > k, counter := -as.numeric(rev(seq_len(.N))), grp
][, grp := NULL][]
#. id V1 counter
# 1: 1 1 0
# 2: 2 0 1
# 3: 3 0 2
# 4: 4 0 3
# 5: 5 9 0
# 6: 6 0 1
# 7: 7 0 2
# 8: 8 0 3
# 9: 9 0 4
#10: 10 0 5
#11: 11 0 6
#12: 12 0 7
#13: 13 0 8
#14: 14 0 9
#15: 15 0 10
#16: 16 0 -2
#17: 17 0 -1
#18: 18 1 0
#19: 19 0 1
#20: 20 0 2
You can use calculate step as difference versus lag id, additional condition on value could be added and then build sequences with conditions whether it is reverse or not
Steps <- ex%>%
rbind(c(nrow(ex)+1,1)) %>%
filter(sign(V1)!=0) %>%
mutate(step= id-c(1,(lag(id)[-1])))
c <- c()
k<-10
for (i in 1:nrow(Steps)){
c<- c(c,c(0:max(0,min(Steps[i,3], k)-1)))
if(Steps[i,3]> k){
c <- c(c, (k-Steps[i,3]+1):-1)
}
}
c<-c[ex$id]

Conditions including comparison of consequtive elements within columns

Could you please help me with the following task. Here is a template of my dataset:
Category <- c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3)
PrevRule <- c(-1, -1, -1, -1, -1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2)
UserRule <- c(2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2, 0, 0, 0, 2, 2, 2, 2, 2, 2, 0, 0, 0, 1, 1, 1, 1)
Correct <- c(0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1)
df <- data.frame(Category, PrevRule, UserRule, Correct)
I need to create an additional PP column according to the following rules:
i is row index. n is maximum rows for each Category.
For Category == 1:
If Correct[i:i+2] == 0 then PP[i+2:n] = UserRule[i+2]
Else PP = NA
For Category > 1:
Do PP = PrevRule until Correct[i:i+2] == 0
Then PP[i+2:n] = UserRule[i+2]
So, in the end the PP column should look like in the following table (column starting with NA)
Category PrevRule UserRule Correct PP
1 1 -1 2 0 NA
2 1 -1 2 0 NA
3 1 -1 2 0 2
4 1 -1 1 1 2
5 1 -1 1 1 2
6 1 -1 1 1 2
7 1 -1 1 1 2
8 1 -1 1 1 2
9 2 1 1 0 1
10 2 1 1 0 1
11 2 1 2 1 1
12 2 1 0 0 1
13 2 1 0 0 1
14 2 1 0 0 0
15 2 1 2 1 0
16 2 1 2 1 0
17 2 1 2 1 0
18 2 1 2 1 0
19 2 1 2 1 0
20 3 2 2 0 2
21 3 2 0 0 2
22 3 2 0 0 2
23 3 2 0 0 0
24 3 2 1 1 0
25 3 2 1 1 0
26 3 2 1 1 0
27 3 2 1 1 0
Any advice and suggestions would be greatly appreciated!
Thank you!
It was realy difficult to understand what you ment, but here is my solution. It is not perfect as it uses a while-Loop but it should work:
ret <- array(dim = nrow(df))
i <- 1
while(i < nrow(df) - 3){
if(Category[i] == 1 & all(Correct[i + 0:2] == 0)){
tmp <- max(which(Category==1)) #end of category 1
ret[(i + 2):tmp] <- UserRule[i+2]
#set index i to the index-value of the last element in category 1
i <- tmp
}else{
ret[i] <- NA #actual not necessary, because PP is NA per default.
}
print("From now on, I will only ask clear questions!")
if(Category[i] >= 2){
ret[i] <- PrevRule[i]
if(all(Correct[(i-2):i] == 0)){#3 consecutive 0 in Correct
tmp <- max(which(Category == Category[i])) #end of current category
ret[i:tmp] <- UserRule[i]
i <- tmp #set index i to the index-value of the last element in the current category
}
}
i <- i + 1
}
df$PP <- ret

data.table challenge: defining a strata variable for recurrent events

I am trying to create a variable called strata for recurrent events. The idea is to define a variable that counts events, but filling previous records. What defines the counting is the variable event as is shown below:
id event cov strata year
1 0 0 1 12
1 0 1 1 13
1 1 1 1 14
1 0 1 2 15
1 1 0 2 16
1 1 1 3 17
1 0 0 4 18
1 0 1 4 19
1 0 1 4 20
I have tried something like this at least for the event records:
id <- c(rep(1,9), rep(2,5), rep(3,7))
event <- c(0,0,1,0,1,1,0,0,0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1)
cov <- c(0,1,1,1,0,1,0,1,1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1)
year <- c(seq(12,20,1), seq(12,16,1), seq(12,18,1))
dat <- data.table(id,event, cov, year)
dat[,strata:=seq(.N),by="id"]
But it doesn't work.
Any ideas?
Thanks
Here is an updated solution.
dat <- data.table(id,event, cov, year)
dat[, strata := c(1L,head(cumsum(event)+1L,-1L)), by = id]

Resources