R - matching values in a dataframe column to another dataframes row names - r

I think this is a fairly challenging data manipulation problem in R, and have struggled constructing a function that can achieve this. The context is organizing basketball players who play different positions into a lineup together, subject to what position each player plays. For some clarity, here is an example of the dataframe I am working with, in two different forms:
dput(my_df)
structure(list(Name = c("C.J. McCollum", "DeMar DeRozan", "Jimmy Butler",
"Jonas Valanciunas", "Kevin Durant", "Markieff Morris", "Pascal Siakam",
"Pau Gasol"), Pos1 = c("PG", "SG", "SG", "C", "SF", "SF", "PF",
"C"), Pos2 = c("SG", "", "SF", "", "PF", "PF", "", "")), .Names = c("Name",
"Pos1", "Pos2"), class = "data.frame", row.names = c(18L, 33L,
62L, 68L, 78L, 92L, 106L, 111L))
my_df
Name Pos1 Pos2
18 C.J. McCollum PG SG
33 DeMar DeRozan SG
62 Jimmy Butler SG SF
68 Jonas Valanciunas C
78 Kevin Durant SF PF
92 Markieff Morris SF PF
106 Pascal Siakam PF
111 Pau Gasol C
dput(my_df2)
structure(list(Name = c("C.J. McCollum", "DeMar DeRozan", "Jimmy Butler",
"Jonas Valanciunas", "Kevin Durant", "Markieff Morris", "Pascal Siakam",
"Pau Gasol"), Pos1 = c("PG", "SG", "SG", "C", "SF", "SF", "PF",
"C"), Pos2 = c("SG", "", "SF", "", "PF", "PF", "", ""), PG = c(1,
0, 0, 0, 0, 0, 0, 0), SG = c(1, 1, 1, 0, 0, 0, 0, 0), SF = c(0,
0, 1, 0, 1, 1, 0, 0), PF = c(0, 0, 0, 0, 1, 1, 1, 0), C = c(0,
0, 0, 1, 0, 0, 0, 1), BackupG = c(1, 1, 1, 0, 0, 0, 0, 0), BackupF = c(0,
0, 1, 0, 1, 1, 1, 0), Man8 = c(1, 1, 1, 1, 1, 1, 1, 1)), .Names = c("Name",
"Pos1", "Pos2", "PG", "SG", "SF", "PF", "C", "BackupG", "BackupF",
"Man8"), row.names = c(18L, 33L, 62L, 68L, 78L, 92L, 106L, 111L
), class = "data.frame")
my_df2
Name Pos1 Pos2 PG SG SF PF C BackupG BackupF Man8
18 C.J. McCollum PG SG 1 1 0 0 0 1 0 1
33 DeMar DeRozan SG 0 1 0 0 0 1 0 1
62 Jimmy Butler SG SF 0 1 1 0 0 1 1 1
68 Jonas Valanciunas C 0 0 0 0 1 0 0 1
78 Kevin Durant SF PF 0 0 1 1 0 0 1 1
92 Markieff Morris SF PF 0 0 1 1 0 0 1 1
106 Pascal Siakam PF 0 0 0 1 0 0 1 1
111 Pau Gasol C 0 0 0 0 1 0 0 1
In a basketball lineup, we want 1 player set for each of the 5 positions in basketball (PG, SG, PF, SF, C), we also want 1 backup guard (a PG or SG is a guard), 1 backup forward (a PF or FS is a forward), and an 8th player who can play any position. With this group of 8 players, we could construct the lineup in this way:
Name
PG C.J. McCollum
SG DeMar DeRozan
PF Kevin Durant
SF Markieff Morris
C Pau Gasol
Backup G Jimmy Butler
Backup F Pascal Siakam
8th Man Jonas Valanciunas
Ofcourse there is some flexibility with this (Kevin Durant and Markieff Morris could have been switched, in fact theres several players who could have switched spots in the 2nd dataframe). I would like to be able to organize my_df into this 2nd dataframes format in a fairly quick matter, something that takes the Pos1 and Pos2 columns from my_df, is able to check the rownames of the 2nd dataframe, and then fill in the player names.
There is a puzzle aspect to this however. Of note is that, not all players have a second position, but those players who do have a second position can be listed at either of the two positions. (for example, Jimmy Butler can be set as a SG, a SF, a Backup G, a backup F, or the 8th man, whereas Pau Gasol can only be set as a C, or as the 8th man). Additionally, while C.J. McCollum is listed as a PG and SG, he is the only player in my_df who is listed as a PG, and therefore must go in the PG row of the second dataframe.
Any thoughts are appreciated with this! I can provide more context if needed.
(edit: potentially editing my_df, adding Pos3, Pos4, Pos5 columns for whether a player can be a backup G, backup F, or 8th Man, may help as well, and is something I am currently working with).
Edit - see Simplify this grid such that each row and column has 1 value for a revised version of my question, which is a simpler problem to solve but will give me a solution to this question!

This approach is guaranteed to return a result if there is one, in fact it will return all viable combinations.
st<-as.matrix(my_df2[4:dim(my_df2)[2]]) # Make a numeric matrix
## allCombinationsAux may not be necessary if you are using a combinatorics library
allCombinationsAux<-function(z,nreg,x){
if(sum(nreg)>1){
innerLoop<-do.call(rbind,lapply(x[nreg&(z!=x)], test1,nreg&(z!=x),x))
ret<-cbind(z,innerLoop )
}
else{
ret<-x[nreg]
}
ret
}
## Find all the possible row combinations for the matrix
combs<-do.call(rbind,lapply(x,function(y) allCombinationsAux(y,y!=x,x)))
## Identify which combinations are valid
inds<-which(apply(combs,1,function(x) sum(diag(st[x,]))==8))
## Select valid matricies
validChoices<-lapply(inds,function(x) st[combs[x,],])
Make a matrix out of my_df2
Find all possible matrix substitutions
Iterate through all possible matrices testing if the diags are all 1
Select those matrices that are valid
To get the output to look like your example you can run
validChoices<-lapply(inds,function(x) {
matr<-st[combs[x,],]
retVal<-data.frame(Name=my_df2[combs[x,],"Name"])
rownames(retVal)<-colnames(matr)
retVal
})

Related

problem while changing col names with str_to_title

I have a data set that looks like this:
It can be build using codes:
df<- structure(list(`Med` = c("DOCETAXEL",
"BEVACIZUMAB", "CARBOPLATIN", "CETUXIMAB", "DOXORUBICIN", "IRINOTECAN"
), `2.4 mg` = c(0, 0, 0, 0, 1, 0), `PRIOR CANCER THERAPY` = c(4L,
3L, 3L, 3L, 3L, 3L), `PRIOR CANCER SURGERY` = c(0, 0, 0, 0, 0,
0), `PRIOR RADIATION THERAPY` = c(0, 0, 0, 0, 0, 0)), row.names = c(NA,
6L), class = "data.frame")
Now I would like to change col name that are not start with number to proper case. How should I do it? I thought I could use str_to_title. I have tried many ways can not get it to work. Here is the codes that I tried:
# try1:
df[,3:5] %>% setNames(str_to_title(colnames(df[,3:5])))
#try2:
df[,3:5] <- df[,3:5]%>% rename_with (str_to_title)
# try3:
colnames(df[,3:5])<- str_to_title(colnames(df[,3:5]))
What did I do wrong? there is no error message, just the col names did not get updated. Could anyone help me identify the issue, or maybe show me a better way if you have?
Here I have small data then I can find the col number. If I want it to auto correct the col names to proper case, how can I do that?
Thanks.
We can use
library(dplyr)
library(stringr)
df %>%
rename_at(3:5, ~ str_to_title(.))
-output
# Med 2.4 mg Prior Cancer Therapy Prior Cancer Surgery Prior Radiation Therapy
#1 DOCETAXEL 0 4 0 0
#2 BEVACIZUMAB 0 3 0 0
#3 CARBOPLATIN 0 3 0 0
#4 CETUXIMAB 0 3 0 0
#5 DOXORUBICIN 1 3 0 0
#6 IRINOTECAN 0 3 0 0
Or using rename_with
df %>%
rename_with(~ str_to_title(.), 3:5)

Weighted random sampling for Monte Carlo simulation in R

I would like to run a Monte Carlo simulation. I have a data.frame where rows are unique IDs which have a probability of association with one of the columns. The data entered into the columns can be treated as the weights for that probability. I want to randomly sample each row in the data.frame based on the weights listed for each row. Each row should only return one value per run. The data.frame structure looks like this:
ID, X2000, X2001, X2002, X2003, X2004
X11, 0, 0, 0.5, 0.5, 0
X33, 0.25, 0.25, 0.25, 0.25, 0
X55, 0, 0, 0, 0, 1
X77, 0.5, 0, 0, 0, 0.5
For weighting, "X11" should either return X2002 or X2003, "X33" should have an equal probability of returning X2000, X2001, X2002, or X2003, should be equal with no chance of returning X2004. The only possible return for "X55" should be X2004.
The output data I am interested in are the IDs and the column that was sampled for that run, although it would probably be simpler to return something like this:
ID, X2000, X2001, X2002, X2003, X2004
X11, 0, 0, 1, 0, 0
X33, 1, 0, 0, 0, 0
X55, 0, 0, 0, 0, 1
X77, 1, 0, 0, 0, 0
Your data.frame is transposed - the sample() function takes a probability vector. However, your probability vector is rowwise which means it's harder to extract from a data.frame.
To get around this - you can import your ID column as a row.name. This allows you to be able to access it during an apply() statement. Note the apply() will coerce the data.frame to a matrix which means only one data type is allowed. That's why the IDs needed to be rownames - otherwise we'd have a probability vector of characters instead of numerics.
mc_df <- read.table(
text =
'ID X2000 X2001 X2002 X2003 X2004
X11 0 0 0.5 0.5 0
X33 0.25 0.25 0.25 0.25 0
X55 0 0 0 0 1
X77 0.5 0 0 0 0.5'
, header = T
,row.names = 1)
From there, can use the apply function:
apply(mc_df, 1, function(x) sample(names(x), size = 200, replace = T, prob = x))
Or you could make it fancy
apply(mc_df, 1, function(x) table(sample(names(x), size = 200, replace = T, prob = x)))
$X11
X2002 X2003
102 98
$X33
X2000 X2001 X2002 X2003
54 47 64 35
$X55
X2004
200
$X77
X2000 X2004
103 97
Fancier:
apply(mc_df, 1, function(x) table(sample(as.factor(names(x)), size = 200, replace = T, prob = x)))
X11 X33 X55 X77
X2000 0 51 0 99
X2001 0 50 0 0
X2002 91 57 0 0
X2003 109 42 0 0
X2004 0 0 200 101
Or fanciest:
prop.table(apply(mc_df
, 1
, function(x) table(sample(as.factor(names(x)), size = 200, replace = T, prob = x)))
,2)
X11 X33 X55 X77
X2000 0.00 0.270 0 0.515
X2001 0.00 0.235 0 0.000
X2002 0.51 0.320 0 0.000
X2003 0.49 0.175 0 0.000
X2004 0.00 0.000 1 0.485

Search rows looking for 2 conditions (OR)

I have the following table, with ordered variables:
table <- data.frame(Ident = c("Id_01", "Id_02", "Id_03", "Id_04", "Id_05", "Id_06"),
X01 = c(NA, 18, 0, 14, 0, NA),
X02 = c(0, 16, 0, 17, 0, 53),
X03 = c(NA, 15, 20, 30, 0, 72),
X04 = c(0, 17, 0, 19, 0, NA),
X05 = c(NA, 29, 21, 23, 0, 73),
X06 = c(0, 36, 22, 19, 0, 55))
Ident X01 X02 X03 X04 X05 X06
Id_01 NA 0 NA 0 NA 0
Id_02 18 16 15 17 29 36
Id_03 0 0 20 0 21 22
Id_04 14 17 30 19 23 19
Id_05 0 0 0 0 0 0
Id_06 NA 53 72 NA 73 55
From a previous question, I have the following code provided from a user here, to search by row for one condition (1st and 2nd position > 0) and returning the position of the ocurrence (name of the variable for the specific position):
apply(table[-1], 1, function(x) {
i1 <- x > 0 & !is.na(x)
names(x)[which(i1[-1] & i1[-length(i1)])[1]]})
I'm looking to add a second condition to the apply code, so the conditions needs to be:
1st and 2nd ocurrence (consecutive) > 0
OR
1st and 3rd ocurrence > 0
Considering this change, the output of the evaluation for the table posted before should be:
For Id_01: never occurs (NA?)
For Id_02: 1st position (X01)
For Id_03: 3rd position (X03)
For Id_04: 1st position (X01)
For Id_05: never occurs (NA?)
For Id_06: 2nd position (X02)
Thanks in advance!
We can use lag and lead from dplyr
library(dplyr)
f1 <- function(x) {
i1 <- x > 0 & !is.na(x)
which((i1 & lag(i1, default = i1[1])) |
(i1 & lead(i1, n = 3, default = i1[1])))[1]
}
n1 <- apply(table[-1], 1, f1)
names(table)[-1][n1]
#[1] NA "X01" "X03" "X01" NA "X02"
Or use pmap
library(purrr)
n1 <- pmap_int(table[-1], ~ c(...) %>%
f1)
names(table)[-1][n1]

Row by row application in R [duplicate]

I have my data in the form of a data.table given below
structure(list(atp = c(1, 0, 1, 0, 0, 1), len = c(2, NA, 3, NA,
NA, 1), inv = c(593, 823, 668, 640, 593, 745), GU = c(36, 94,
57, 105, 48, 67), RUTL = c(100, NA, 173, NA, NA, 7)), .Names = c("atp",
"len", "inv", "GU", "RUTL"), row.names = c(NA, -6L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x0000000000320788>)
I need to form 4 new columns csi_begin,csi_end, IRQ and csi_order. the value of csi_begin and csi_end when atp=1 depends directly on inv and gu values.
But when atp is not equal to 1 csi_begin and csi_end depends on inv and gu values and IRQ value of previous row
The value of IRQ depends on csi_order of that row if atp==1 else its 0 and csi_order value depends on two rows previous csi_begin value.
I have written the condition with the help of for loop.
Below is the code given
lostsales<-function(transit)
{
if (transit$atp==1)
{
transit$csi_begin[i]<-(transit$inv)[i]
transit$csi_end[i]<-transit$csi_begin[i]-transit$GU[i]
}
else
{
transit$csi_begin[i]<-(transit$inv)[i]+transit$IRQ[i-1]
transit$csi_end[i]<-transit$csi_begin[i]-transit$GU[i]
}
if (transit$csi_begin[i-2]!= NA)
{
transit$csi_order[i]<-transit$csi_begin[i-2]
}
else
{ transit$csi_order[i]<-0}
if (transit$atp==1)
{
transit$IRQ[i]<-transit$csi_order[i]-transit$RUTL[i]
}
else
{
transit$IRQ[i]<-0
}
}
Can anyone help me how to do efficient looping with data.tables using setkeys? As my data set is very large and I cannot use for loop else the timing would be very high.
Adding the desired outcome to your example would be very helpful, as I'm having trouble following the if/then logic. But I took a stab at it anyway:
library(data.table)
# Example data:
dt <- structure(list(atp = c(1, 0, 1, 0, 0, 1), len = c(2, NA, 3, NA, NA, 1), inv = c(593, 823, 668, 640, 593, 745), GU = c(36, 94, 57, 105, 48, 67), RUTL = c(100, NA, 173, NA, NA, 7)), .Names = c("atp", "len", "inv", "GU", "RUTL"), row.names = c(NA, -6L), class = c("data.table", "data.frame"), .internal.selfref = "<pointer: 0x0000000000320788>")
# Add a row number:
dt[,rn:=.I]
# Use this function to get the value from a previous (shiftLen is negative) or future (shiftLen is positive) row:
rowShift <- function(x, shiftLen = 1L) {
r <- (1L + shiftLen):(length(x) + shiftLen)
r[r<1] <- NA
return(x[r])
}
# My attempt to follow the seemingly circular if/then rules:
lostsales2 <- function(transit) {
# If atp==1, set csi_begin to inv and csi_end to csi_begin - GU:
transit[atp==1, `:=`(csi_begin=inv, csi_end=inv-GU)]
# Set csi_order to the value of csi_begin from two rows prior:
transit[, csi_order:=rowShift(csi_begin,-2)]
# Set csi_order to 0 if csi_begin from two rows prior was NA
transit[is.na(csi_order), csi_order:=0]
# Initialize IRQ to 0
transit[, IRQ:=0]
# If ATP==1, set IRQ to csi_order - RUTL
transit[atp==1, IRQ:=csi_order-RUTL]
# If ATP!=1, set csi_begin to inv + IRQ value from previous row, and csi_end to csi_begin - GU
transit[atp!=1, `:=`(csi_begin=inv+rowShift(IRQ,-1), csi_end=inv+rowShift(IRQ,-1)-GU)]
return(transit)
}
lostsales2(dt)
## atp len inv GU RUTL rn csi_begin csi_end csi_order IRQ
## 1: 1 2 593 36 100 1 593 557 0 -100
## 2: 0 NA 823 94 NA 2 NA NA 0 0
## 3: 1 3 668 57 173 3 668 611 593 420
## 4: 0 NA 640 105 NA 4 640 535 0 0
## 5: 0 NA 593 48 NA 5 593 545 668 0
## 6: 1 1 745 67 7 6 745 678 640 633
Is this output close to what you were expecting?

workflow for creating timevarying covariates in r

I have a huge data file in long format-parts of it supplied below. Each ID can have several rows, where status is the final status. However I need to do the analysis with time varying covariates and hence need to create two new time variables and update the status variable. I´ve been struggling with this for some time now and I cannot figure out how to do this efficiently as there can be as many as four rows per ID. The time varying variable is NUM.AFTER.DIAG. If NUM.AFTER.DIAG==0 then it is easy, where time1=0 and time2=STATUSDATE. However when NUM.AFTER.DIAG==1 then I need to make a new row where time1=0, time2=DOB-DATE.DIAG and NUM.AFTER.DIAG=0 and also make sure STATUS="B". The second row would then be time1=time2 from the previous row and time2=STATUSDATE-DATE.DIAG-time1 from this row. Equally if there are more rows then the different rows needs to be subtracted from each other. Also if NUM.AFTER.DIAG==0 but there are multiple rows then all extra rows can be deleted.
Any ideas for an efficient solution to this?
I´ve looked at john Fox unfold command, but it assumes that all the intervals are in wide format to begin with.
Edit: The table as requested. As for the censor variable: "D"=event (death)
structure(list(ID = c(187L, 258L, 265L, 278L, 281L, 281L, 283L,
283L, 284L, 291L, 292L, 292L, 297L, 299L, 305L, 305L, 311L, 311L,
319L, 319L, 319L, 322L, 322L, 329L, 329L, 333L, 333L, 333L, 334L,
334L), STATUS = c("D", "B", "B", "B", "B", "B", "D", "D", "B",
"B", "B", "B", "D", "D", "D", "D", "B", "B", "B", "B", "B", "D",
"D", "B", "B", "D", "D", "D", "D", "D"), STATUSDATE = structure(c(11153,
15034, 15034, 15034, 15034, 15034, 5005, 5005, 15034, 15034,
15034, 15034, 6374, 5005, 7562, 7562, 15034, 15034, 15034, 15034,
15034, 7743, 7743, 15034, 15034, 4670, 4670, 4670, 5218, 5218
), class = "Date"), DATE.DIAG = structure(c(4578, 4609, 4578,
4487, 4670, 4670, 4517, 4517, 4640, 4213, 4397, 4397, 4397, 4487,
4213, 4213, 4731, 4731, 4701, 4701, 4701, 4397, 4397, 4578, 4578,
4275, 4275, 4275, 4456, 4456), class = "Date"), DOB = structure(c(NA,
13010, NA, NA, -1082, -626, 73, 1353, 13679, NA, 1626, 3087,
-626, -200, 2814, 3757, 1930, 3787, 6740, 13528, 14167, 5462,
6557, 7865, 9235, -901, -504, -108, -535, -78), class = "Date"),
NUM.AFTER.DIAG = c(0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 2, 3, 1, 2, 1, 2, 0, 0, 0, 0, 0)), .Names = c("ID",
"STATUS", "STATUSDATE", "DATE.DIAG", "DOB", "NUM.AFTER.DIAG"), row.names = c(NA,
30L), class = "data.frame")
EDIT: I did come up with a solution, although probably not very efficient.
u1<-ddply(p,.(ID),function(x) {
if (x$NUM.AFTER.DIAG==0){
x$time1<-0
x$time2<-x$STATUSDATE-x$DATE.DIAG
x<-x[1,]
}
else {
x<-rbind(x,x[1,])
x<-x[order(x$DOB),]
u<-max(x$NUM.AFTER.DIAG)
x$NUM.AFTER.DIAG<-0:u
x$time1[1]<-0
x$time2[1:(u)]<-x$DOB[2:(u+1)]-x$DATE.DIAG[2:(u+1)]
x$time2[u+1]<-x$STATUSDATE[u]-x$DATE.DIAG[u]
x$time1[2:(u+1)]<-x$time2[1:u]
x$STATUS[1:u]<-"B"
}
x
}
)
Ok, I've tried something, but I'm not sure I understand your transformation process entirely, so let me know if there are some mistakes. In general ddply will be slow (even when .parallel = TRUE), when there are many individuals, mainly because at the end it has to bring all the data sets of all individuals together and rbind (or rbind.fill) them, which takes forever for a multitude of data.frame objects.
So here's a suggestion, where dat.orig is your toy data set:
I would first split the task in two:
1) NUM.AFTER.DIAG == 0
2) NUM.AFTER.DIAG == 1
1) It seems that if NUM.AFTER.DIAG == 0, except of calculating time2 and extract first row if an ID occurs more than once (like ID 333), there is not much to do in part 1):
## erase multiple occurences
dat <- dat.orig[!(duplicated(dat.orig$ID) & dat.orig$NUM.AFTER.DIAG == 0), ]
dat0 <- dat[dat$NUM.AFTER.DIAG == 0, ]
dat0$time1 <- 0
dat0$time2 <- difftime(dat0$STATUSDATE, dat0$DATE.DIAG, unit = "days")
time.na <- is.na(dat0$DOB)
dat0$time1[time.na] <- dat0$time2[time.na] <- NA
> dat0
ID STATUS STATUSDATE DATE.DIAG DOB NUM.AFTER.DIAG time1 time2
1 187 D 2000-07-15 1982-07-15 <NA> 0 NA NA days
3 265 B 2011-03-01 1982-07-15 <NA> 0 NA NA days
4 278 B 2011-03-01 1982-04-15 <NA> 0 NA NA days
5 281 B 2011-03-01 1982-10-15 1967-01-15 0 0 10364 days
7 283 D 1983-09-15 1982-05-15 1970-03-15 0 0 488 days
10 291 B 2011-03-01 1981-07-15 <NA> 0 NA NA days
11 292 B 2011-03-01 1982-01-15 1974-06-15 0 0 10637 days
13 297 D 1987-06-15 1982-01-15 1968-04-15 0 0 1977 days
14 299 D 1983-09-15 1982-04-15 1969-06-15 0 0 518 days
15 305 D 1990-09-15 1981-07-15 1977-09-15 0 0 3349 days
17 311 B 2011-03-01 1982-12-15 1975-04-15 0 0 10303 days
26 333 D 1982-10-15 1981-09-15 1967-07-15 0 0 395 days
29 334 D 1984-04-15 1982-03-15 1968-07-15 0 0 762 days
2) is a little trickier, but all you actually have to do is insert one more row and calculate the time variables:
## create subset with relevant observations
dat.unfold <- dat[dat$NUM.AFTER.DIAG != 0, ]
## compute time differences
time1 <- difftime(dat.unfold$DOB, dat.unfold$DATE.DIAG, unit = "days")
time1[time1 < 0] <- 0
time2 <- difftime(dat.unfold$STATUSDATE, dat.unfold$DATE.DIAG, unit = "days")
## calculate indices for individuals
n.obs <- daply(dat.unfold, .(ID), function(z) max(z$NUM.AFTER.DIAG) + 1)
df.new <- data.frame(ID = rep(unique(dat.unfold$ID), times = n.obs))
rle.new <- rle(df.new$ID)
ind.last <- cumsum(rle.new$lengths)
ind.first <- !duplicated(df.new$ID)
ind.first.w <- which(ind.first)
ind.second <- ind.first.w + 1
ind2.to.last <- unlist(sapply(seq_along(ind.second),
function(z) ind.second[z]:ind.last[z]))
## insert time variables
df.new$time2 <- df.new$time1 <- NA
df.new$time1[ind.first] <- 0
df.new$time1[!ind.first] <- time1
df.new$time2[!ind.first] <- time2
df.new$time2[ind2.to.last - 1] <- time1
this gives me:
> df.new
ID time1 time2
1 258 0 8401
2 258 8401 10425
3 284 0 9039
4 284 9039 10394
5 319 0 2039
6 319 2039 8827
7 319 8827 9466
8 319 9466 10333
9 322 0 1065
10 322 1065 2160
11 322 2160 3346
12 329 0 3287
13 329 3287 4657
14 329 4657 10456
This should work for the STATUS variable and the other variables in similar fashion.
When both steps are working separately, you just have to do one rbind step at the end.

Resources