I have got have a data.table that looks like this
library(dplyr)
library(data.table)
dt <- data.table(ID=c("A001","A002","A003","A004"),start_time=c('2019-06-18 05:18:00','2020-03-04 05:00:00',
'2019-05-10 19:00:00','2020-01-06 22:42:00'),end_time=c('2019-06-18 08:41:00','2020-03-04 05:04:00',
'2019-05-10 19:08:00','2020-01-07 03:10:00'))
ID
start_time end_time duration
1: A001 2019-06-18 05:18:00 2019-06-18 08:41:00 203 mins
2: A002 2020-03-04 05:59:00 2020-03-04 06:04:00 5 mins
3: A003 2019-05-10 19:00:00 2019-05-10 19:08:00 8 mins
4: A004 2020-01-06 22:42:00 2020-01-07 03:10:00 268 mins
Duration was simply calculated as
dt$start_time <- as.POSIXct(dt$start_time, tz='UTC')
dt$end_time <- as.POSIXct(dt$end_time, tz='UTC')
dt <- dt %>% mutate(duration = (end_time-start_time))
I need to duplicate rows where duration is larger than the end of the hour from start_time (records that cover > 1 hour). I need to change for them start time (beginning of the hour), end time - end of hour OR the original end time if if's the last row (last viewing hour),and duration accordingly, so that the final output would look like:
dt_expected <- data.table(ID=c("A001","A001","A001","A001","A002","A002","A003","A004","A004","A004","A004","A004","A004"),
start_time=c('2019-06-18 05:18:00','2019-06-18 06:00:00','2019-06-18 07:00:00','2019-06-18 08:00:00', '2020-03-04 05:59:00', '2020-03-04 06:00:00', '2019-05-10 19:00:00',
'2020-01-06 22:42:00', '2020-01-06 23:00:00','2020-01-07 00:00:00','2020-01-07 01:00:00','2020-01-07 02:00:00','2020-01-07 03:00:00'),
end_time=c('2019-06-18 05:59:00','2019-06-18 06:59:00','2019-06-18 07:59:00','2019-06-18 08:41:00','2020-03-04 05:59:00','2020-03-04 06:04:00', '2019-05-10 19:08:00', '2020-01-06 22:59:00','2020-01-06 23:59:00','2020-01-07 00:59:00','2020-01-07 01:59:00', '2020-01-07 02:59:00','2020-01-07 03:10:00'),
duration = c(12,60,60,41,1,4,8,18,60,60,60,60,10))
Note that records for ID A002 should also be duplicated as duration happened in 2 different hours.
ID start_time end_time duration
1: A001 2019-06-18 05:18:00 2019-06-18 05:59:00 12
2: A001 2019-06-18 06:00:00 2019-06-18 06:59:00 60
3: A001 2019-06-18 07:00:00 2019-06-18 07:59:00 60
4: A001 2019-06-18 08:00:00 2019-06-18 08:41:00 41
5: A002 2020-03-04 05:59:00 2020-03-04 05:59:00 1
6: A002 2020-03-04 06:00:00 2020-03-04 06:04:00 4
7: A003 2019-05-10 19:00:00 2019-05-10 19:08:00 8
8: A004 2020-01-06 22:42:00 2020-01-06 22:59:00 18
9: A004 2020-01-06 23:00:00 2020-01-06 23:59:00 60
10: A004 2020-01-07 00:00:00 2020-01-07 00:59:00 60
11: A004 2020-01-07 01:00:00 2020-01-07 01:59:00 60
12: A004 2020-01-07 02:00:00 2020-01-07 02:59:00 60
13: A004 2020-01-07 03:00:00 2020-01-07 03:10:00 10
I think this is pretty close to what you're looking for.
This creates new rows of start and end times, one row for each hour using map from purrr.
Then, for each ID, it will determine start_time and end_time using pmin.
First, for the end_time, it takes the minimum value between that row's end_time and an hour later than the start_time for that row. For example, the first row for A001 would have end_time of 6:00, which is the ceiling_date time for 5:18 to the nearest hour, and less than 6:18 from the sequence generated from map. For the last row for A001, the end_time is 8:41, which is less than the ceiling_date time of 9:00.
The start_time will take the minimum value between the last row's end_time and that row's start_time. For example, the second row of A001 will have 6:00, which is the row above's end_time which is less than 6:18 from the sequence generated from map.
Note that one row has 0 minutes for duration - the time fell right on the hour (19:00:00). These could be filtered out.
library(purrr)
library(dplyr)
library(tidyr)
library(lubridate)
dt %>%
rowwise() %>%
mutate(start_time = map(start_time, ~seq.POSIXt(., ceiling_date(end_time, "hour"), by = "hour"))) %>%
unnest(start_time) %>%
group_by(ID) %>%
mutate(end_time = pmin(ceiling_date(start_time, unit = "hour"), end_time),
start_time = pmin(floor_date(lag(end_time, default = first(end_time)), unit = "hour"), start_time),
duration = difftime(end_time, start_time, units = "mins"))
Output
ID start_time end_time duration
<chr> <dttm> <dttm> <drtn>
1 A001 2019-06-18 05:18:00 2019-06-18 06:00:00 42 mins
2 A001 2019-06-18 06:00:00 2019-06-18 07:00:00 60 mins
3 A001 2019-06-18 07:00:00 2019-06-18 08:00:00 60 mins
4 A001 2019-06-18 08:00:00 2019-06-18 08:41:00 41 mins
5 A002 2020-03-04 05:59:00 2020-03-04 06:00:00 1 mins
6 A002 2020-03-04 06:00:00 2020-03-04 06:04:00 4 mins
7 A003 2019-05-10 19:00:00 2019-05-10 19:00:00 0 mins
8 A003 2019-05-10 19:00:00 2019-05-10 19:08:00 8 mins
9 A004 2020-01-06 22:42:00 2020-01-06 23:00:00 18 mins
10 A004 2020-01-06 23:00:00 2020-01-07 00:00:00 60 mins
11 A004 2020-01-07 00:00:00 2020-01-07 01:00:00 60 mins
12 A004 2020-01-07 01:00:00 2020-01-07 02:00:00 60 mins
13 A004 2020-01-07 02:00:00 2020-01-07 03:00:00 60 mins
14 A004 2020-01-07 03:00:00 2020-01-07 03:10:00 10 mins
Related
I have some air pollution data measured by hours.
Datetime
PM2.5
Station.id
2020-01-01 00:00:00
10
1
2020-01-01 01:00:00
NA
1
2020-01-01 02:00:00
15
1
2020-01-01 03:00:00
NA
1
2020-01-01 04:00:00
7
1
2020-01-01 05:00:00
20
1
2020-01-01 06:00:00
30
1
2020-01-01 00:00:00
NA
2
2020-01-01 01:00:00
17
2
2020-01-01 02:00:00
21
2
2020-01-01 03:00:00
55
2
I have a very large number of data collected from many stations. Using R, what is the most efficient way to remove a day when it has 1. A total of 18 hours of missing data AND 2. 8 hours continuous missing data.
PS. The original data can be either NAs have already been removed OR NAs are inserted.
The "most efficient" way will almost certainly use data.table. Something like this:
library(data.table)
setDT(your_data)
your_data[, date := as.IDate(Datetime)][,
if(
!(sum(is.na(PM2.5)) >= 18 &
with(rle(is.na(PM2.5)), max(lengths[values])) >= 8
)) .SD,
by = .(date, station.id)
]
# date Datetime PM2.5
# 1: 2020-01-01 2020-01-01 00:00:00 10
# 2: 2020-01-01 2020-01-01 01:00:00 NA
# 3: 2020-01-01 2020-01-01 02:00:00 15
# 4: 2020-01-01 2020-01-01 03:00:00 NA
# 5: 2020-01-01 2020-01-01 04:00:00 7
# 6: 2020-01-01 2020-01-01 05:00:00 20
# 7: 2020-01-01 2020-01-01 06:00:00 30
Using this sample data:
your_data = fread(text = 'Datetime PM2.5
2020-01-01 00:00:00 10
2020-01-01 01:00:00 NA
2020-01-01 02:00:00 15
2020-01-01 03:00:00 NA
2020-01-01 04:00:00 7
2020-01-01 05:00:00 20
2020-01-01 06:00:00 30')
I have a dataset of hourly observations with the format %Y%m%d %H:%M that results like this 2020-03-01 01:00:00 for various days. How can filter filter out a certain time interval? My goal is to maintain the observations between 08:00 and 20:00.
You can extract the hour value from the column and keep the rows between 8 and 20 hours.
df$hour <- as.integer(format(df$datetime, '%H'))
result <- subset(df, hour >= 8 & hour <= 20)
result
# datetime hour
#9 2020-01-01 08:00:00 8
#10 2020-01-01 09:00:00 9
#11 2020-01-01 10:00:00 10
#12 2020-01-01 11:00:00 11
#13 2020-01-01 12:00:00 12
#14 2020-01-01 13:00:00 13
#15 2020-01-01 14:00:00 14
#16 2020-01-01 15:00:00 15
#17 2020-01-01 16:00:00 16
#18 2020-01-01 17:00:00 17
#19 2020-01-01 18:00:00 18
#20 2020-01-01 19:00:00 19
#21 2020-01-01 20:00:00 20
#33 2020-01-02 08:00:00 8
#34 2020-01-02 09:00:00 9
#35 2020-01-02 10:00:00 10
#...
#...
data
df <- data.frame(datetime = seq(as.POSIXct('2020-01-01 00:00:00', tz = 'UTC'),
as.POSIXct('2020-01-10 00:00:00', tz = 'UTC'), 'hour'))
between(hour( your_date_value ), 8, 19)
I am trying to split rows in an excel file based on day and time. The data is from a study which participants will need to wear a tracking watch. Each row of the data set is started with participants put on the watch (Variable: 'Wear Time Start ') and ended with them taking off the device (Variable: 'Wear Time End').
I need to calculate how many hours of each participant wearing the device on each day (NOT each time period in one row).
Data set before split:
ID WearStart WearEnd
1 01 2018-05-14 09:00:00 2018-05-14 20:00:00
2 01 2018-05-14 21:30:00 2018-05-15 02:00:00
3 01 2018-05-15 07:00:00 2018-05-16 22:30:00
4 01 2018-05-16 23:00:00 2018-05-16 23:40:00
5 01 2018-05-17 01:00:00 2018-05-19 15:00:00
6 02 ...
Some explanation about the data set before split: the data type of 'WearStart' and 'WearEnd' are POSIXlt.
Desired output after split:
ID WearStart WearEnd Interval
1 01 2018-05-14 09:00:00 2018-05-14 20:00:00 11
2 01 2018-05-14 21:30:00 2018-05-15 00:00:00 2.5
3 01 2018-05-15 00:00:00 2018-05-15 02:00:00 2
4 01 2018-05-15 07:00:00 2018-05-16 00:00:00 17
5 01 2018-05-16 00:00:00 2018-05-16 22:30:00 22.5
4 01 2018-05-16 23:00:00 2018-05-16 23:40:00 0.4
5 01 2018-05-17 01:00:00 2018-05-18 00:00:00 23
6 01 2018-05-18 00:00:00 2018-05-19 00:00:00 24
7 01 2018-05-19 00:00:00 2018-05-19 15:00:00 15
Then I need to accumulate hours based on day:
ID Wear_Day Total_Hours
1 01 2018-05-14 13.5
2 01 2018-05-15 19
3 01 2018-05-16 22.9
4 01 2018-05-17 23
5 01 2018-05-18 24
4 01 2018-05-19 15
So, I reworked the entire answer. Please, review the code. I am pretty sure this is what you want.
Short summary
The problem is that you need to split rows which start and end on different dates. And you need to do this recursively. So, I split the dataframe into a list of 1-row dataframes. For each I check whether start and end is on the same day. If not, I make it a 2-row dataframe with the adjusted start and end times. This is then split up again into a list of 1-row dataframes and so on so forth.
In the end there is a nested list of 1-row dataframes where start and end is on the same day. And this list is then recursively bound together again.
# Load Packages ---------------------------------------------------------------------------------------------------
library(tidyverse)
library(lubridate)
df <- tribble(
~ID, ~WearStart, ~WearEnd
, 01, "2018-05-14 09:00:00", "2018-05-14 20:00:00"
, 01, "2018-05-14 21:30:00", "2018-05-15 02:00:00"
, 01, "2018-05-15 07:00:00", "2018-05-16 22:30:00"
, 01, "2018-05-16 23:00:00", "2018-05-16 23:40:00"
, 01, "2018-05-17 01:00:00", "2018-05-19 15:00:00"
)
df <- df %>% mutate_at(vars(starts_with("Wear")), ymd_hms)
# Helper Functions ------------------------------------------------------------------------------------------------
endsOnOtherDay <- function(df){
as_date(df$WearStart) != as_date(df$WearEnd)
}
split1rowInto2Days <- function(df){
df1 <- df
df2 <- df
df1$WearEnd <- as_date(df1$WearStart) + days(1) - milliseconds(1)
df2$WearStart <- as_date(df2$WearStart) + days(1)
rbind(df1, df2)
}
splitDates <- function(df){
if (nrow(df) > 1){
return(df %>%
split(f = 1:nrow(df)) %>%
lapply(splitDates) %>%
reduce(rbind))
}
if (df %>% endsOnOtherDay()){
return(df %>%
split1rowInto2Days() %>%
splitDates())
}
df
}
# The actual Calculation ------------------------------------------------------------------------------------------
df %>%
splitDates() %>%
mutate(wearDuration = difftime(WearEnd, WearStart, units = "hours")
, wearDay = as_date(WearStart)) %>%
group_by(ID, wearDay) %>%
summarise(wearDuration_perDay = sum(wearDuration))
ID wearDay wearDuration_perDay
<dbl> <date> <drtn>
1 1 2018-05-14 13.50000 hours
2 1 2018-05-15 19.00000 hours
3 1 2018-05-16 23.16667 hours
4 1 2018-05-17 23.00000 hours
5 1 2018-05-18 24.00000 hours
6 1 2018-05-19 15.00000 hours
Here is my solution to your question with just using basic functions in R:
#step 1: read data from file
d <- read.csv("dt.csv", header = TRUE)
d
ID WearStart WearEnd
1 1 2018-05-14 09:00:00 2018-05-14 20:00:00
2 1 2018-05-14 21:30:00 2018-05-15 02:00:00
3 1 2018-05-15 07:00:00 2018-05-16 22:30:00
4 1 2018-05-16 23:00:00 2018-05-16 23:40:00
5 1 2018-05-17 01:00:00 2018-05-19 15:00:00
6 2 2018-05-16 11:30:00 2018-05-16 11:40:00
7 2 2018-05-16 22:05:00 2018-05-22 22:42:00
#step 2: change class of WearStart and WearEnd to POSIlct
d$WearStart <- as.POSIXlt(d$WearStart, tryFormats = "%Y-%m-%d %H:%M")
d$WearEnd <- as.POSIXlt(d$WearEnd, tryFormats = "%Y-%m-%d %H:%M")
#step 3: calculate time interval (days and hours) for each record
timeInt <- function(d) {
WearStartDay <- as.Date(d$WearStart, "%Y/%m/%d")
Interval_days <- as.numeric(difftime(d$WearEnd,d$WearStart, units = "days"))
Days <- WearStartDay + seq(0, Interval_days,1)
N_FullBTWDays <- length(Days) - 2
if (N_FullBTWDays >= 0) {
sd <- d$WearStart
sd_h <- 24 - sd$hour -1
sd_m <- (60 - sd$min)/60
sd_total <- sd_h + sd_m
hours <- sd_total
hours <- c(hours, rep(24,N_FullBTWDays))
ed <- d$WearEnd
ed_h <- ed$hour
ed_m <- ed$min/60
ed_total <- ed_h + ed_m
hours <- c(hours,ed_total)
} else {
hours <- as.numeric(difftime(d$WearEnd,d$WearStart, units = "hours"))
}
df <- data.frame(id = rep(d$ID, length(Days)), days = Days, hours = hours)
return(df)
}
df <- data.frame(matrix(ncol = 3, nrow = 0))
colnames(df) <- c("id", "days", "hours")
for ( i in 1:nrow(d)) {
df <- rbind(df,timeInt(d[i,]))
}
id days hours
1 1 2018-05-14 11.0000000
2 1 2018-05-14 4.5000000
3 1 2018-05-15 17.0000000
4 1 2018-05-16 22.5000000
5 1 2018-05-16 0.6666667
6 1 2018-05-17 23.0000000
7 1 2018-05-18 24.0000000
8 1 2018-05-19 15.0000000
9 2 2018-05-16 0.1666667
10 2 2018-05-16 1.9166667
11 2 2018-05-17 24.0000000
12 2 2018-05-18 24.0000000
13 2 2018-05-19 24.0000000
14 2 2018-05-20 24.0000000
15 2 2018-05-21 24.0000000
16 2 2018-05-22 22.7000000
#daily usage of device for each customer
res <- as.data.frame(tapply(df$hours, list(df$days,df$id), sum))
res[is.na(res)] <- 0
res$date <- rownames(res)
res
1 2 date
2018-05-14 15.50000 0.000000 2018-05-14
2018-05-15 17.00000 0.000000 2018-05-15
2018-05-16 23.16667 2.083333 2018-05-16
2018-05-17 23.00000 24.000000 2018-05-17
2018-05-18 24.00000 24.000000 2018-05-18
2018-05-19 15.00000 24.000000 2018-05-19
2018-05-20 0.00000 24.000000 2018-05-20
2018-05-21 0.00000 24.000000 2018-05-21
2018-05-22 0.00000 22.700000 2018-05-22
I have a data frame with two date/time columns:
BeginTime EndTime Value
-----------------------------------------------------
1 2019-01-03 13:45:00 2019-01-03 17:30:00 41
2 2019-01-03 13:30:00 2019-01-03 14:30:00 20
3 2019-01-03 16:45:00 2019-01-03 19:00:00 23
That I need to transform into this:
Time Value
--------------------------------
1 2019-01-03 13:45:00 41
2 2019-01-03 14:00:00 41
3 2019-01-03 14:15:00 41
4 2019-01-03 14:30:00 41
5 2019-01-03 14:45:00 41
6 2019-01-03 15:00:00 41
7 2019-01-03 15:15:00 41
8 2019-01-03 13:30:00 20
9 2019-01-03 13:45:00 20
10 2019-01-03 14:00:00 20
11 2019-01-03 14:15:00 20
12 2019-01-03 16:45:00 23
But not sure how to do that. Any suggestions?
Code to create testdf
testdf <- data.frame(c("2019-01-03 13:45:00", "2019-01-03 13:30:00", "2019-01-03 16:45:00"),
c("2019-01-03 15:30:00", "2019-01-03 14:30:00", "2019-01-03 17:00:00"),
c(41,20,23))
colnames(testdf) <-c("BeginTime", "EndTime", "Value")
testdf$BeginTime <- as.POSIXct(testdf$BeginTime)
testdf$EndTime <- as.POSIXct(testdf$EndTime)
(I know there's probably a way to create the columns as POSIXct initially but this works)
We can use seq between the two columns and create a list with complete 15 minute intervals for each Beginning - Endand then use rep based on their length to get the value, i.e.
l1 <- Map(function(x, y)seq(x, y, by = '15 mins'), testdf$BeginTime, testdf$EndTime)
data.frame(Time = do.call(c, l1), value = rep(testdf$Value, lengths(l1)))
which gives,
Time value
1 2019-01-03 13:45:00 41
2 2019-01-03 14:00:00 41
3 2019-01-03 14:15:00 41
4 2019-01-03 14:30:00 41
5 2019-01-03 14:45:00 41
6 2019-01-03 15:00:00 41
7 2019-01-03 15:15:00 41
8 2019-01-03 15:30:00 41
9 2019-01-03 13:30:00 20
10 2019-01-03 13:45:00 20
11 2019-01-03 14:00:00 20
12 2019-01-03 14:15:00 20
13 2019-01-03 14:30:00 20
14 2019-01-03 16:45:00 23
15 2019-01-03 17:00:00 23
Using the tidyverse
testdf <- data.frame(c("2019-01-03 13:45:00", "2019-01-03 13:30:00", "2019-01-03 16:45:00"),
c("2019-01-03 15:30:00", "2019-01-03 14:30:00", "2019-01-03 17:00:00"),
c(41,20,23))
colnames(testdf) <-c("BeginTime", "EndTime", "Value")
testdf$BeginTime <- as.POSIXct(testdf$BeginTime)
testdf$EndTime <- as.POSIXct(testdf$EndTime)
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
testdf %>%
mutate(periods = map2(BeginTime,EndTime,seq,by = '15 mins')) %>%
unnest(periods)
#> # A tibble: 15 x 4
#> BeginTime EndTime Value periods
#> <dttm> <dttm> <dbl> <dttm>
#> 1 2019-01-03 13:45:00 2019-01-03 15:30:00 41 2019-01-03 13:45:00
#> 2 2019-01-03 13:45:00 2019-01-03 15:30:00 41 2019-01-03 14:00:00
#> 3 2019-01-03 13:45:00 2019-01-03 15:30:00 41 2019-01-03 14:15:00
#> 4 2019-01-03 13:45:00 2019-01-03 15:30:00 41 2019-01-03 14:30:00
#> 5 2019-01-03 13:45:00 2019-01-03 15:30:00 41 2019-01-03 14:45:00
#> 6 2019-01-03 13:45:00 2019-01-03 15:30:00 41 2019-01-03 15:00:00
#> 7 2019-01-03 13:45:00 2019-01-03 15:30:00 41 2019-01-03 15:15:00
#> 8 2019-01-03 13:45:00 2019-01-03 15:30:00 41 2019-01-03 15:30:00
#> 9 2019-01-03 13:30:00 2019-01-03 14:30:00 20 2019-01-03 13:30:00
#> 10 2019-01-03 13:30:00 2019-01-03 14:30:00 20 2019-01-03 13:45:00
#> 11 2019-01-03 13:30:00 2019-01-03 14:30:00 20 2019-01-03 14:00:00
#> 12 2019-01-03 13:30:00 2019-01-03 14:30:00 20 2019-01-03 14:15:00
#> 13 2019-01-03 13:30:00 2019-01-03 14:30:00 20 2019-01-03 14:30:00
#> 14 2019-01-03 16:45:00 2019-01-03 17:00:00 23 2019-01-03 16:45:00
#> 15 2019-01-03 16:45:00 2019-01-03 17:00:00 23 2019-01-03 17:00:00
Created on 2020-01-07 by the reprex package (v0.3.0)
I've got a dataset with the following shape
ID Start Time End Time
1 01/01/2017 00:15:00 01/01/2017 07:15:00
2 01/01/2017 04:45:00 01/01/2017 06:15:00
3 01/01/2017 10:20:00 01/01/2017 20:15:00
4 01/01/2017 02:15:00 01/01/2017 00:15:00
5 02/01/2017 15:15:00 03/01/2017 00:30:00
6 03/01/2017 07:00:00 04/01/2017 09:15:00
I would like to count every 15 min for an entire year how many items have started but not finished, so count the number of times with a start time greater or equal than the time I'm looking at and an end time less or equal than the time I'm looking at.
I'm looking for an approach using tidyverse/dplyr if possible.
Any help or guidance would be very much appreciated.
If I understand correctly, the OP wants to count the number of simultaneously active events.
One possibility to tackle this question is the coverage() function from Bioconductor's IRange package. Another one is to aggregate in a non-equi join which is available with the data.table package.
Non-equi join
# create sequence of datetimes (limited to 4 days for demonstration)
seq15 <- seq(lubridate::as_datetime("2017-01-01"),
lubridate::as_datetime("2017-01-05"), by = "15 mins")
# aggregate within a non-equi join
library(data.table)
result <- periods[.(time = seq15), on = .(Start.Time <= time, End.Time > time),
.(time, count = sum(!is.na(ID))), by = .EACHI][, .(time, count)]
result
time count
1: 2017-01-01 00:00:00 0
2: 2017-01-01 00:15:00 1
3: 2017-01-01 00:30:00 1
4: 2017-01-01 00:45:00 1
5: 2017-01-01 01:00:00 1
---
381: 2017-01-04 23:00:00 0
382: 2017-01-04 23:15:00 0
383: 2017-01-04 23:30:00 0
384: 2017-01-04 23:45:00 0
385: 2017-01-05 00:00:00 0
The result can be visualized graphically:
library(ggplot2)
ggplot(result) + aes(time, count) + geom_step()
Data
periods <- readr::read_table(
"ID Start.Time End.Time
1 01/01/2017 00:15:00 01/01/2017 07:15:00
2 01/01/2017 04:45:00 01/01/2017 06:15:00
3 01/01/2017 10:20:00 01/01/2017 20:15:00
4 01/01/2017 02:15:00 01/01/2017 00:15:00
5 02/01/2017 15:15:00 03/01/2017 00:30:00
6 03/01/2017 07:00:00 04/01/2017 09:15:00"
)
# convert date strings to class Date
library(data.table)
cols <- names(periods)[names(periods) %like% "Time$"]
setDT(periods)[, (cols) := lapply(.SD, lubridate::dmy_hms), .SDcols = cols]
periods
ID Start.Time End.Time
1: 1 2017-01-01 00:15:00 2017-01-01 07:15:00
2: 2 2017-01-01 04:45:00 2017-01-01 06:15:00
3: 3 2017-01-01 10:20:00 2017-01-01 20:15:00
4: 4 2017-01-01 02:15:00 2017-01-01 00:15:00
5: 5 2017-01-02 15:15:00 2017-01-03 00:30:00
6: 6 2017-01-03 07:00:00 2017-01-04 09:15:00