I have a dataset containing time periods during which an intervention is happening. We have two types of interventions. I have the start and end date of each intervention. I would now like to extract the time (in days) when there is no overlap between the two types and how much overlap there is.
Here's an example dataset:
data <- data.table( id = seq(1,21),
type = as.character(c(1,2,2,2,2,2,2,2,1,1,1,1,1,2,1,2,1,1,1,1,1)),
start_dt = as.Date(c("2015-01-09", "2015-04-14", "2015-06-19", "2015-10-30", "2016-03-01", "2016-05-24",
"2016-08-03", "2017-08-18", "2017-08-18", "2018-02-01", "2018-05-07", "2018-08-09",
"2019-01-31", "2019-03-22", "2019-05-16", "2019-11-04", "2019-11-04", "2020-02-06",
"2020-05-28", "2020-08-25", "2020-12-14")),
end_dt = as.Date(c("2017-07-24", "2015-05-04", "2015-08-27", "2015-11-19", "2016-03-21", "2016-06-09",
"2017-07-18", "2019-02-21", "2018-01-23", "2018-04-25", "2018-07-29", "2019-01-15",
"2019-04-24", "2019-09-13", "2019-10-13", "2020-12-23", "2020-01-26", "2020-04-29",
"2020-08-19", "2020-11-16", "2021-03-07")))
> data
id type start_dt end_dt
1: 1 1 2015-01-09 2017-07-24
2: 2 2 2015-04-14 2015-05-04
3: 3 2 2015-06-19 2015-08-27
4: 4 2 2015-10-30 2015-11-19
5: 5 2 2016-03-01 2016-03-21
6: 6 2 2016-05-24 2016-06-09
7: 7 2 2016-08-03 2017-07-18
8: 8 2 2017-08-18 2019-02-21
9: 9 1 2017-08-18 2018-01-23
10: 10 1 2018-02-01 2018-04-25
11: 11 1 2018-05-07 2018-07-29
12: 12 1 2018-08-09 2019-01-15
13: 13 1 2019-01-31 2019-04-24
14: 14 2 2019-03-22 2019-09-13
15: 15 1 2019-05-16 2019-10-13
16: 16 2 2019-11-04 2020-12-23
17: 17 1 2019-11-04 2020-01-26
18: 18 1 2020-02-06 2020-04-29
19: 19 1 2020-05-28 2020-08-19
20: 20 1 2020-08-25 2020-11-16
21: 21 1 2020-12-14 2021-03-07
Here's a plot of the data for a better view of what I want to know:
library(ggplot2)
ggplot(data = data,
aes(x = start_dt, xend = end_dt, y = id, yend = id, color = type)) +
geom_segment(size = 2) +
xlab("") +
ylab("") +
theme_bw()
I'll describe the first part of the example: we have an intervention of type 1 from 2015-01-09 until 2017-07-24. From 2015-04-14 however, also intervention type 2 is happening. This means that we only have "pure" type 1 from 2015-01-09 to 2015-04-13, which is 95 days.
Then we have an overlapping period from 2015-04-14 to 2015-05-04, which is 21 days. Then we again have a period with only type 1 from 2015-05-05 to 2015-06-18, which is 45 days. In total, we now have had (95 + 45 =) 140 days of "pure" type 1 and 21 days of overlap. Then we continue like this for the entire time period.
I would like to know the total time (in days) of "pure" type 1, "pure" type 2 and overlap.
Alternatively, if also possible, I would like to organise the data such, that I get all the seperate time periods extracted, meaning that the data would look something like this (type 3 = overlap):
> data_adjusted
id type start_dt end_dt
1: 1 1 2015-01-09 2015-04-14
2: 2 3 2015-04-15 2015-05-04
3: 3 1 2015-05-05 2015-06-18
4: 4 3 2015-06-19 2015-08-27
........
The time in days spent in each intervention type can then easily be calculated from data_adjuted.
I have similar answers using dplyr or just marking overlapping time periods, but I have not found an answer to my specific case.
Is there an efficient way to calculate this using data.table?
This method does a small explosion of looking at all dates in the range, so it may not scale very well if your data gets large.
library(data.table)
alldates <- data.table(date = seq(min(data$start_dt), max(data$end_dt), by = "day"))
data[alldates, on = .(start_dt <= date, end_dt >= date)] %>%
.[, .N, by = .(start_dt, type) ] %>%
.[ !is.na(type), ] %>%
dcast(start_dt ~ type, value.var = "N") %>%
.[, r := do.call(rleid, .SD), .SDcols = setdiff(colnames(.), "start_dt") ] %>%
.[, .(type = fcase(is.na(`1`[1]), "2", is.na(`2`[1]), "1", TRUE, "3"),
start_dt = min(start_dt), end_dt = max(start_dt)), by = r ]
# r type start_dt end_dt
# <int> <char> <Date> <Date>
# 1: 1 1 2015-01-09 2015-04-13
# 2: 2 3 2015-04-14 2015-05-04
# 3: 3 1 2015-05-05 2015-06-18
# 4: 4 3 2015-06-19 2015-08-27
# 5: 5 1 2015-08-28 2015-10-29
# 6: 6 3 2015-10-30 2015-11-19
# 7: 7 1 2015-11-20 2016-02-29
# 8: 8 3 2016-03-01 2016-03-21
# 9: 9 1 2016-03-22 2016-05-23
# 10: 10 3 2016-05-24 2016-06-09
# 11: 11 1 2016-06-10 2016-08-02
# 12: 12 3 2016-08-03 2017-07-18
# 13: 13 1 2017-07-19 2017-07-24
# 14: 14 3 2017-08-18 2018-01-23
# 15: 15 2 2018-01-24 2018-01-31
# 16: 16 3 2018-02-01 2018-04-25
# 17: 17 2 2018-04-26 2018-05-06
# 18: 18 3 2018-05-07 2018-07-29
# 19: 19 2 2018-07-30 2018-08-08
# 20: 20 3 2018-08-09 2019-01-15
# 21: 21 2 2019-01-16 2019-01-30
# 22: 22 3 2019-01-31 2019-02-21
# 23: 23 1 2019-02-22 2019-03-21
# 24: 24 3 2019-03-22 2019-04-24
# 25: 25 2 2019-04-25 2019-05-15
# 26: 26 3 2019-05-16 2019-09-13
# 27: 27 1 2019-09-14 2019-10-13
# 28: 28 3 2019-11-04 2020-01-26
# 29: 29 2 2020-01-27 2020-02-05
# 30: 30 3 2020-02-06 2020-04-29
# 31: 31 2 2020-04-30 2020-05-27
# 32: 32 3 2020-05-28 2020-08-19
# 33: 33 2 2020-08-20 2020-08-24
# 34: 34 3 2020-08-25 2020-11-16
# 35: 35 2 2020-11-17 2020-12-13
# 36: 36 3 2020-12-14 2020-12-23
# 37: 37 1 2020-12-24 2021-03-07
# r type start_dt end_dt
It drops the id field, I don't know how to map it well back to your original data.
#r2evans solution is more complete, but if you want to explore the use offoverlaps you can start with something like this:
#split into two frames
data = split(data,by="type")
# key the second frame
setkey(data[[2]], start_dt, end_dt)
# create the rows that have overlaps
overlap = foverlaps(data[[1]],data[[2]], type="any", nomatch=0)
# get the overlapping time periods
overlap[, .(start_dt = max(start_dt,i.start_dt), end_dt=min(end_dt,i.end_dt)), by=1:nrow(overlap)][,type:=3]
Output:
nrow start_dt end_dt type
1: 1 2015-04-14 2015-05-04 3
2: 2 2015-06-19 2015-08-27 3
3: 3 2015-10-30 2015-11-19 3
4: 4 2016-03-01 2016-03-21 3
5: 5 2016-05-24 2016-06-09 3
6: 6 2016-08-03 2017-07-18 3
7: 7 2017-08-18 2018-01-23 3
8: 8 2018-02-01 2018-04-25 3
9: 9 2018-05-07 2018-07-29 3
10: 10 2018-08-09 2019-01-15 3
11: 11 2019-01-31 2019-02-21 3
12: 12 2019-03-22 2019-04-24 3
13: 13 2019-05-16 2019-09-13 3
14: 14 2019-11-04 2020-01-26 3
15: 15 2020-02-06 2020-04-29 3
16: 16 2020-05-28 2020-08-19 3
17: 17 2020-08-25 2020-11-16 3
18: 18 2020-12-14 2020-12-23 3
The sum of those overlap days is 1492.
I'm trying to identify periods/episodes of exposition to a drug with prescriptions. If those prescriptions are separated for 30 days it's considered a new period/episode of exposition. Prescriptions can overlap during certain time or be consecutive. If the sum of separated days of two consecutive prescripction is greater than 30 days it's not considered a new episode.
I have data like this:
id = c(rep(1,3), rep(2,6), rep(3,5))
start = as.Date(c("2017-05-10", "2017-07-28", "2017-11-23", "2017-01-27", "2017-10-02", "2018-05-14", "2018-05-25", "2018-11-26", "2018-12-28", "2016-01-01", "2016-03-02", "2016-03-20", "2016-04-25", "2016-06-29"))
end = as.Date(c("2017-07-27", "2018-01-28", "2018-03-03", "2017-04-27", "2018-05-13", "2018-11-14", "2018-11-25", "2018-12-27", "2019-06-28", "2016-02-15", "2016-03-05", "2016-03-24", "2016-04-29", "2016-11-01"))
DT = data.table(id, start, end)
DT
id start end
1: 1 2017-05-10 2017-07-27
2: 1 2017-07-28 2018-01-28
3: 1 2017-11-23 2018-03-03
4: 2 2017-01-27 2017-04-27
5: 2 2017-10-02 2018-05-13
6: 2 2018-05-14 2018-11-14
7: 2 2018-05-25 2018-11-25
8: 2 2018-11-26 2018-12-27
9: 2 2018-12-28 2019-06-28
10: 3 2016-01-01 2016-02-15
11: 3 2016-03-02 2016-03-05
12: 3 2016-03-20 2016-03-24
13: 3 2016-04-25 2016-04-29
14: 3 2016-06-29 2016-11-01
I calculated the difference of start and last end observation (last_diffdays)
DT[, last_diffdays := start-shift(end, n=1L), by = .(id)][is.na(last_diffdays), last_diffdays := 0][]
id start end last_diffdays
1: 1 2017-05-10 2017-07-27 0 days
2: 1 2017-07-28 2018-01-28 1 days
3: 1 2017-11-23 2018-03-03 -66 days
4: 2 2017-01-27 2017-04-27 0 days
5: 2 2017-10-02 2018-05-13 158 days
6: 2 2018-05-14 2018-11-14 1 days
7: 2 2018-05-25 2018-11-25 -173 days
8: 2 2018-11-26 2018-12-27 1 days
9: 2 2018-12-28 2019-06-28 1 days
10: 3 2016-01-01 2016-02-15 0 days
11: 3 2016-03-02 2016-03-05 16 days
12: 3 2016-03-20 2016-03-24 15 days
13: 3 2016-04-25 2016-04-29 32 days
14: 3 2016-06-29 2016-11-01 61 days
This shows when an overlap happens (negative values) or not (positive values). I think an ifelse/fcase statement here would be a bad idea and I'm not comfortable doing it.
I think a good output for this job would be something like:
id start end last_diffdays noexp_days period
1: 1 2017-05-10 2017-07-27 0 days 0 1
2: 1 2017-07-28 2018-01-28 1 days 1 1
3: 1 2017-11-23 2018-03-03 -66 days 0 1
4: 2 2017-01-27 2017-04-27 0 days 0 1
5: 2 2017-10-02 2018-05-13 158 days 158 2
6: 2 2018-05-14 2018-11-14 1 days 1 2
7: 2 2018-05-25 2018-11-25 -173 days 0 2
8: 2 2018-11-26 2018-12-27 1 days 1 2
9: 2 2018-12-28 2019-06-28 1 days 1 2
10: 3 2016-01-01 2016-02-15 0 days 0 1
11: 3 2016-03-02 2016-03-05 16 days 16 1
12: 3 2016-03-20 2016-03-24 15 days 15 1
13: 3 2016-04-25 2016-04-29 32 days 32 2
14: 3 2016-06-29 2016-11-01 61 days 61 3
I manually calculated the days without exposition (noexp_days) of the before prescription.
I dunno If I'm the right path but I think I need to calculate noexp_days variable and then make a cumsum((noexp_days)>30)+1.
If there is a much better solution I don't see or any other possibility I haven't considered I will appreciate to read about them.
Thanks in advance for any help! :)
Try :
library(data.table)
DT[, noexp_days := pmax(as.integer(last_diffdays), 0)]
DT[, period := cumsum(noexp_days > 30) + 1, id]
DT
# id start end last_diffdays noexp_days period
# 1: 1 2017-05-10 2017-07-27 0 days 0 1
# 2: 1 2017-07-28 2018-01-28 1 days 1 1
# 3: 1 2017-11-23 2018-03-03 -66 days 0 1
# 4: 2 2017-01-27 2017-04-27 0 days 0 1
# 5: 2 2017-10-02 2018-05-13 158 days 158 2
# 6: 2 2018-05-14 2018-11-14 1 days 1 2
# 7: 2 2018-05-25 2018-11-25 -173 days 0 2
# 8: 2 2018-11-26 2018-12-27 1 days 1 2
# 9: 2 2018-12-28 2019-06-28 1 days 1 2
#10: 3 2016-01-01 2016-02-15 0 days 0 1
#11: 3 2016-03-02 2016-03-05 16 days 16 1
#12: 3 2016-03-20 2016-03-24 15 days 15 1
#13: 3 2016-04-25 2016-04-29 32 days 32 2
#14: 3 2016-06-29 2016-11-01 61 days 61 3
We have written a package to analyse a large number of events in relation to time windows.
To do the analysis we need to establish a number of attributes of the windows and cross-references
between them.
This has been done using data.table in its native syntax. Examples of some of the steps is included in the reprex below.
We are now looking to re-frame this package using dplyr/dtplyr for readability and sharing with other
parties.
While I can write the 'queries' in dplyr syntax, I am not seeing a tidyverse way to apply updates to the underlying tables - adding columns, updating rows etc. without repeatedly creating and replacing copies.
When the data is large, the 'update in place' features of data.table are very desirable. Is there a way to take advantage of this in the dplyr syntax? (I have hit barriers with immutable = FALSE and attempts to use rows_update())
library(data.table)
set.seed <- 123
#Create a table of events with timestamp and an event type (501 events randomly generated over the previous 30 days)
DT1 <- data.table(timeStamp = as.POSIXct('2021-03-25') - as.integer(runif(501)*60*1440*30),
eventType=c('A', 'B', 'C'))
setkey(DT1, timeStamp)
print(DT1)
#> timeStamp eventType
#> 1: 2021-02-23 00:42:37 A
#> 2: 2021-02-23 04:21:43 A
#> 3: 2021-02-23 05:23:51 C
#> 4: 2021-02-23 06:45:36 C
#> 5: 2021-02-23 08:34:32 B
#> ---
#> 497: 2021-03-24 11:32:09 A
#> 498: 2021-03-24 13:49:53 B
#> 499: 2021-03-24 14:26:55 C
#> 500: 2021-03-24 18:11:33 C
#> 501: 2021-03-24 20:13:51 A
#Create a table of time windows. One for each date represented with an early and late time for each
#Assign this a class (in this example the value of the most common eventType)
DT2 <- DT1[,keyby=.(date=lubridate::date(timeStamp)),
.(earlyTime = min(timeStamp - 1),
lateTime = max(timeStamp + 1),
as = sum(eventType == 'A'),
bs = sum(eventType == 'B'),
cs = sum(eventType == 'C'))][
,.(date,
earlyTime,
lateTime,
class=ifelse(as >= bs & as >= cs, 'A', ifelse(bs >= cs, 'B', 'C')))]
print(head(DT2))
#> date earlyTime lateTime class
#> 1: 2021-02-23 2021-02-23 00:42:36 2021-02-23 23:14:13 B
#> 2: 2021-02-24 2021-02-24 04:10:27 2021-02-24 21:28:14 B
#> 3: 2021-02-25 2021-02-25 03:38:29 2021-02-25 21:55:44 A
#> 4: 2021-02-26 2021-02-26 01:49:00 2021-02-26 23:40:51 B
#> 5: 2021-02-27 2021-02-27 00:18:40 2021-02-27 22:42:46 A
#> 6: 2021-02-28 2021-02-28 02:50:25 2021-02-28 22:44:44 A
#Give each row in DT2 a row number (so that we can readily cross-reference between rows)
DT2[order(lateTime), rn := .I]
#For each row, get the row number of the previous instance of this class
DT2[order(class, rn), prevOfClass := shift(rn, 1), by=.(class)]
print(head(DT2))
#> date earlyTime lateTime class rn prevOfClass
#> 1: 2021-02-23 2021-02-23 00:42:36 2021-02-23 23:14:13 B 1 NA
#> 2: 2021-02-24 2021-02-24 04:10:27 2021-02-24 21:28:14 B 2 1
#> 3: 2021-02-25 2021-02-25 03:38:29 2021-02-25 21:55:44 A 3 NA
#> 4: 2021-02-26 2021-02-26 01:49:00 2021-02-26 23:40:51 B 4 2
#> 5: 2021-02-27 2021-02-27 00:18:40 2021-02-27 22:42:46 A 5 3
#> 6: 2021-02-28 2021-02-28 02:50:25 2021-02-28 22:44:44 A 6 5
#For each row that is not a 'C' find the previous and next instances of a C type row
#Note that when we assigned rn we ensured that the rows were in ascending time order
#so rn can be used as a proxy for sorting by time
DT2[class=='C'][DT2[class != 'C'],
on=.(rn > rn),
by=.EACHI,
.(rn=i.rn, nextC = min(x.rn), prevC = min(x.prevOfClass))]
#> rn rn nextC prevC
#> 1: 1 1 8 NA
#> 2: 2 2 8 NA
#> 3: 3 3 8 NA
#> 4: 4 4 8 NA
#> 5: 5 5 8 NA
#> 6: 6 6 8 NA
#> 7: 7 7 8 NA
#> 8: 9 9 13 8
#> 9: 10 10 13 8
#> 10: 11 11 13 8
#> 11: 12 12 13 8
#> 12: 14 14 16 13
#> 13: 15 15 16 13
#> 14: 17 17 26 16
#> 15: 18 18 26 16
#> 16: 19 19 26 16
#> 17: 20 20 26 16
#> 18: 21 21 26 16
#> 19: 22 22 26 16
#> 20: 23 23 26 16
#> 21: 24 24 26 16
#> 22: 25 25 26 16
#> 23: 28 28 30 27
#> 24: 29 29 30 27
#> rn rn nextC prevC
#But I want to add this information as additional columns to the base table
DT2[DT2[class=='C'][DT2[class != 'C'],
on=.(rn > rn),
by=.EACHI,
.(rn=i.rn, nextC = min(x.rn), prevC = min(x.prevOfClass))],
on = .(rn),
':='(nextC=i.nextC, prevC = i.prevC)
]
print(DT2[,.(rn, date, class, prevOfClass, nextC, prevC)])
#> rn date class prevOfClass nextC prevC
#> 1: 1 2021-02-23 B NA 8 NA
#> 2: 2 2021-02-24 B 1 8 NA
#> 3: 3 2021-02-25 A NA 8 NA
#> 4: 4 2021-02-26 B 2 8 NA
#> 5: 5 2021-02-27 A 3 8 NA
#> 6: 6 2021-02-28 A 5 8 NA
#> 7: 7 2021-03-01 A 6 8 NA
#> 8: 8 2021-03-02 C NA NA NA
#> 9: 9 2021-03-03 A 7 13 8
#> 10: 10 2021-03-04 A 9 13 8
#> 11: 11 2021-03-05 B 4 13 8
#> 12: 12 2021-03-06 A 10 13 8
#> 13: 13 2021-03-07 C 8 NA NA
#> 14: 14 2021-03-08 A 12 16 13
#> 15: 15 2021-03-09 B 11 16 13
#> 16: 16 2021-03-10 C 13 NA NA
#> 17: 17 2021-03-11 A 14 26 16
#> 18: 18 2021-03-12 B 15 26 16
#> 19: 19 2021-03-13 A 17 26 16
#> 20: 20 2021-03-14 B 18 26 16
#> 21: 21 2021-03-15 A 19 26 16
#> 22: 22 2021-03-16 A 21 26 16
#> 23: 23 2021-03-17 A 22 26 16
#> 24: 24 2021-03-18 A 23 26 16
#> 25: 25 2021-03-19 B 20 26 16
#> 26: 26 2021-03-20 C 16 NA NA
#> 27: 27 2021-03-21 C 26 NA NA
#> 28: 28 2021-03-22 B 25 30 27
#> 29: 29 2021-03-23 A 24 30 27
#> 30: 30 2021-03-24 C 27 NA NA
#> rn date class prevOfClass nextC prevC
#What would be the best approach to this using dplyr / dtplyr syntax?
#In practice there are many hundreds of thousands of rows in the tables
#and...
#There are many more update and enrichments that need to be applied
#some of which add new columns, others will update just a few rows
#in a column
#So 'mutate in place/by reference' is highly desirable
Created on 2021-03-25 by the reprex package (v1.0.0)
I have a data for selling some product and I would like to calculate the growth rate of this data such that N_win and N_lose are the win and lose over a period of time 1-19 March. Also, I would like to predict the growth rate and win and lose?
Date N_win N_lose tot1 tot2
1 2018-03-01 0 0 0 0
2 2018-03-02 1 0 1 1
3 2018-03-03 0 0 1 1
4 2018-03-04 1 0 2 2
5 2018-03-05 3 0 5 5
6 2018-03-06 0 0 5 5
7 2018-03-07 2 0 7 7
8 2018-03-08 4 0 11 11
9 2018-03-09 4 0 15 15
10 2018-03-10 5 0 20 20
11 2018-03-11 1 1 21 20
12 2018-03-12 24 1 45 44
13 2018-03-13 41 1 86 85
14 2018-03-14 17 2 103 101
15 2018-03-15 15 3 118 115
16 2018-03-16 15 6 133 127
17 2018-03-17 38 6 171 165
18 2018-03-18 67 6 238 232
I tried to apply this function but it seems not working
Growthrate = function(x1,x2, n){
gr = (x2/x1)^(1/n)-1
return(gr)
}
GR = NULL
for(i in 1:length(DF[,1])){
GR[i] = Growthrate(DF[i,2],DF[i+1,2], sum(i))
}
I have a data frame in R as follows called Ident.
Date coredata.Ident.
1 2017-09-01 <NA>
2 2017-09-03 <NA>
3 2017-09-04 <NA>
4 2017-09-05 0
5 2017-09-06 0
6 2017-09-07 0
7 2017-09-08 0
8 2017-09-10 0
9 2017-09-11 Doji
10 2017-09-12 <NA>
11 2017-09-13 0
12 2017-09-14 Bull.Engulfing
13 2017-09-15 0
14 2017-09-17 0
15 2017-09-18 Bear.Engulfing
16 2017-09-19 Doji
17 2017-09-20 Bear.Engulfing
18 2017-09-21 Bull.Engulfing
19 2017-09-22 0
20 2017-09-24 0
21 2017-09-25 Bear.Engulfing
22 2017-09-26 0
23 2017-09-27 0
24 2017-09-28 0
25 2017-09-29 0
I would like to assign the next date after that there is a Bull.Engulfing to a variable called DateSelect1 and then the second Bull.Engulfing would be assigned to DateSelect2 etc. So that all of the Bull.Engulfing have a date assigned to them.
So in this example, as there is a Bull.Engulfing on 2017-09-14 line 12, DateSelect1 should be 2017-09-15 as it is the next row. Hope this makes sense.
TIA
Assuming the input in the Note subset the data frame and assign each date in that subset to a variable:
dates <- as.Date(subset(DF, coredata.Ident. == "Bull.Engulfing")$Date)
for(i in seq_along(dates)) assign(paste0("DateSelect", i), dates[i])
DateSelect1
## [1] "2017-09-14"
DateSelect2
## [1] "2017-09-21"
Note: The input in reproducible form is:
Lines <- "
Date coredata.Ident.
1 2017-09-01 <NA>
2 2017-09-03 <NA>
3 2017-09-04 <NA>
4 2017-09-05 0
5 2017-09-06 0
6 2017-09-07 0
7 2017-09-08 0
8 2017-09-10 0
9 2017-09-11 Doji
10 2017-09-12 <NA>
11 2017-09-13 0
12 2017-09-14 Bull.Engulfing
13 2017-09-15 0
14 2017-09-17 0
15 2017-09-18 Bear.Engulfing
16 2017-09-19 Doji
17 2017-09-20 Bear.Engulfing
18 2017-09-21 Bull.Engulfing
19 2017-09-22 0
20 2017-09-24 0
21 2017-09-25 Bear.Engulfing
22 2017-09-26 0
23 2017-09-27 0
24 2017-09-28 0
25 2017-09-29 0"
DF <- read.table(text = Lines)