Say I have a POSIXct vector like
timestamps = seq(as.POSIXct("2021-01-23"), as.POSIXct("2021-01-24"), length.out = 6)
I would like to round these times up to the nearest hour of the day in a vector:
hours_of_day = c(6, 14, 20)
i.e., the following result:
timestamps result
1 2021-01-23 00:00:00 2021-01-23 02:00:00
2 2021-01-23 04:48:00 2021-01-23 14:00:00
3 2021-01-23 09:36:00 2021-01-23 14:00:00
4 2021-01-23 14:24:00 2021-01-23 20:00:00
5 2021-01-23 19:12:00 2021-01-23 20:00:00
6 2021-01-24 00:00:00 2021-01-24 02:00:00
Is there a vectorized solution to this (or otherwise fast)? I have a few million timestamps and need to apply it for several hours_of_day.
One way to simplify this problem is to (1) find the next hours_of_day for each lubridate::hour(timestamps) and then (2) result = lubridate::floor_date(timestamps) + next_hour_of_day * 3600. But how to do step 1 vectorized?
Convert to as.POSIXlt, which allows you to extract hours and minutes, and calculate decimal hours. In an lapply/sapply combination first look up where these are less than the hours of the day vector, and choose the maximum hour using which.max. Now create new date-time using ISOdate and add one day ifelse date-time is smaller than original time.
timestamps <- as.POSIXlt(timestamps)
h <- hours_of_day[sapply(lapply(with(timestamps, hour + min/60 + sec/3600),
`<=`, hours_of_day), which.max)]
r <- with(timestamps, ISOdate(1900 + year, mon + 1, mday, h,
tz=attr(timestamps, "tzone")[[1]]))
r[r < timestamps] <- r[r < timestamps] + 86400
Result
r
# [1] "2021-01-23 06:00:00 CET" "2021-01-23 06:00:00 CET"
# [3] "2021-01-23 14:00:00 CET" "2021-01-23 20:00:00 CET"
# [5] "2021-01-23 20:00:00 CET" "2021-01-24 06:00:00 CET"
# [7] "2021-01-25 06:00:00 CET" "2021-01-27 20:00:00 CET"
data.frame(timestamps, r)
# timestamps r
# 1 2021-01-23 00:00:00 2021-01-23 06:00:00
# 2 2021-01-23 04:48:00 2021-01-23 06:00:00
# 3 2021-01-23 09:36:00 2021-01-23 14:00:00
# 4 2021-01-23 14:24:00 2021-01-23 20:00:00
# 5 2021-01-23 19:12:00 2021-01-23 20:00:00
# 6 2021-01-24 00:00:00 2021-01-24 06:00:00
# 7 2021-01-24 23:59:00 2021-01-25 06:00:00
# 8 2021-01-27 20:00:00 2021-01-27 20:00:00
Note: I've added "2021-01-24 23:59:00 CET" to timestamps to demonstrate the date change.
Benchmark
Tested on a length 1.4e6 vector.
# Unit: seconds
# expr min lq mean median uq max neval cld
# POSIX() 32.96197 33.06495 33.32104 33.16793 33.50057 33.83321 3 a
# lubridate() 47.36412 47.57762 47.75280 47.79113 47.94715 48.10316 3 b
Data:
timestamps <- structure(c(1611356400, 1611373680, 1611390960, 1611408240, 1611425520,
1611442800, 1611529140, 1611774000), class = c("POSIXct", "POSIXt"
))
hours_of_day <- c(6, 14, 20)
I would extract the hour component, use cut to bin it, and assign the binned hours back to the original:
hours_of_day = c(2, 14, 20)
library(lubridate)
library(magrittr) ## just for the pipe
new_hours = timestamps %>%
hour %>%
cut(breaks = c(0, hours_of_day), labels = hours_of_day, include.lowest = TRUE) %>%
as.character() %>%
as.integer()
result = floor_date(timestamps, "hour")
hour(result) = new_hours
result
# [1] "2021-01-23 02:00:00 EST" "2021-01-23 14:00:00 EST" "2021-01-23 14:00:00 EST"
# [4] "2021-01-23 14:00:00 EST" "2021-01-23 20:00:00 EST" "2021-01-24 02:00:00 EST"
Building on the approach by #jay.sf, I made a function for floor as well while adding support for NA values.
floor_date_to = function(timestamps, hours_of_day) {
# Handle NA with a temporary filler so code below doesn't break
na_timestamps = is.na(timestamps)
timestamps[na_timestamps] = as.POSIXct("9999-12-31")
# Proceed as usual
timestamps = as.POSIXlt(timestamps)
hours_of_day = rev(hours_of_day) # floor-specific: because which.max returns the first index by default
nearest_hour = hours_of_day[sapply(lapply(with(timestamps, hour + min/60 + sec/3600), `<`, hours_of_day), function(x) which.max(-x))] # floor-specific: negative which.max()
rounded = with(timestamps, ISOdate(1900 + year, mon + 1, mday, nearest_hour, tz = attr(timestamps, "tzone")[1]))
rounded[rounded > timestamps] = rounded[rounded > timestamps] - 86400 # floor: use minus
return(rounded)
timestamps[na_timestamps] = NA # Overwrite with NA again
}
Related
Problem
Currently, I have a large flight crew schedule dataset, with a start and end time, where my goal is to identify whether an employee was working a night shift. A night shift is defined as any portion of the shift between 01:00:00 and 05:59:59. I have looked at functions such as %overlaps%, but these seem not to work for only timestamps. Some sample data (in UTC-tz):
library(lubridate)
df <- data.frame(start = ymd_hms(c("2018-09-19 23:30:00", "2018-09-19 17:00:00", "2018-09-22 04:30:00")),
end = ymd_hms(c('2018-09-20 07:05:00', "2018-09-19 21:00:00", "2018-09-22 12:00:00")))
Solution
Ideally, I would like to get the following output, with a Boolean variable indicating whether the employee worked a night shift:
start end night.shift
2018-09-19 23:30:00 | 2018-09-20 07:05:00 | TRUE
2018-09-19 17:00:00 | 2018-09-19 21:00:00 | FALSE
2018-09-22 04:30:00 | 2018-09-22 12:00:00 | TRUE
Thanks in advance!
Using seq.POSIXt
transform(df, night.shift=mapply(\(x, y) any(
as.POSIXct(outer(as.Date(c(x, y)), c('01:00:00', '05:59:59'), paste), tz='GMT') %in%
seq.POSIXt(x, y, by='sec')),
start, end))
# start end night.shift
# 1 2018-09-19 23:30:00 2018-09-20 07:05:00 TRUE
# 2 2018-09-19 17:00:00 2018-09-19 21:00:00 FALSE
# 3 2018-09-22 04:30:00 2018-09-22 12:00:00 TRUE
or, almost twice as fast, %inrange% from the data.table package.
library(data.table)
transform(df, night.shift=mapply(\(x, y) any(
as.POSIXct(outer(as.Date(c(x, y)), c('01:00:00', '05:59:59'), paste), tz='GMT') %inrange%
c(x, y)),
start, end))
# start end night.shift
# 1 2018-09-19 23:30:00 2018-09-20 07:05:00 TRUE
# 2 2018-09-19 17:00:00 2018-09-19 21:00:00 FALSE
# 3 2018-09-22 04:30:00 2018-09-22 12:00:00 TRUE
You can use interval() or %--% to create an Interval object and int_overlaps() to test if two intervals overlap.
library(dplyr)
library(lubridate)
df %>%
mutate(
night.shift = int_overlaps(
(date(start) + hms("01:00:00")) %--% (date(start) + hms("05:59:59")),
start %--% end
) | int_overlaps(
(date(end) + hms("01:00:00")) %--% (date(end) + hms("05:59:59")),
start %--% end
)
)
Another way is using %within% to check whether a date-time object falls within an interval.
df %>%
rowwise() %>%
mutate(
night.shift = any(outer(date(c(start, end)), hms(c("01:00:00", "05:59:59")), `+`) %within% (start %--% end))
) %>%
ungroup()
Output
# # A tibble: 4 × 3
# start end night.shift
# <dttm> <dttm> <lgl>
# 1 2018-09-19 23:30:00 2018-09-20 07:05:00 TRUE
# 2 2018-09-19 17:00:00 2018-09-19 21:00:00 FALSE
# 3 2018-09-22 04:30:00 2018-09-22 12:00:00 TRUE
# 4 2018-09-22 04:30:00 2018-09-23 00:30:00 TRUE
Reference
Utilities for creation and manipulation of Interval objects
This is super janky and not optimized, but it works (and was fun to figure out). You'll want to vectorize it if possible.
library(lubridate)
df <- data.frame(start = ymd_hms(c("2018-09-19 23:30:00", "2018-09-19 17:00:00", "2018-09-22 04:30:00")),
end = ymd_hms(c('2018-09-20 07:05:00', "2018-09-19 21:00:00", "2018-09-22 12:00:00")))
night <- interval( hms::as_hms(3600), hms::as_hms(21599), tz = "UTC")
print(night)
for(i in 1:3) {
s = df$start[i]
f = df$end[i]
start_seconds = hms::as_hms(60*60*hour(s) + 60*minute(s) + second(s))
end_seconds = hms::as_hms(60*60*hour(f) + 60*minute(f) + second(f))
interval <- interval(start_seconds, end_seconds, tz = "UTC")
t <- int_overlaps(night, interval)
print(t)
}
I have a start and end date for individuals and i need to estimate if the time passed from the start to the end is within 2 days
or 3 plus days.These dates are assign to record ids, how can i filter ones that ended within 2 days (from the start date)
and the ones that ended after 3 days or later.
Record_id <- c("2245","6728","5122","9287")
Start <- c("2021-01-13 CST" ,"2021-01-21 CST" ,"2021-01-17 CST","2021-01-13 CST")
End <- c("2021-01-21 18:00:00 CST", "2021-01-22 16:00:00 CST", "2021-01-22 13:00:00 CST","2021-01-25 15:00:00 CST")
I tried using
elapsed.time <- DF$start %--% DF$End
time.duration <- as.duration(elapsed.time)
but I am getting error because End date contains hour.Thank you.
Here's a dplyr pipe that will include both constraints (2 and 3 days):
df %>%
mutate(across(Start:End, as.POSIXct)) %>%
mutate(d = difftime(End, Start, units = "days")) %>%
filter(!between(difftime(End, Start, units = "days"), 2, 3))
# # A tibble: 4 x 4
# Record_id Start End d
# <chr> <dttm> <dttm> <drtn>
# 1 2245 2021-01-13 00:00:00 2021-01-21 18:00:00 8.750000 days
# 2 6728 2021-01-21 00:00:00 2021-01-22 16:00:00 1.666667 days
# 3 5122 2021-01-17 00:00:00 2021-01-22 13:00:00 5.541667 days
# 4 9287 2021-01-13 00:00:00 2021-01-25 15:00:00 12.625000 days
I included mutate(d= so that we can see what the actual differences are. If you were looking to remove those, then use filter(between(..)) (no !).
In the case of the data you provided, all observations are less than 2 or more than 3 days. I'll expand this range so that we can see it in effect:
df %>%
mutate(across(Start:End, as.POSIXct)) %>%
mutate(d = difftime(End, Start, units = "days")) %>%
filter(!between(difftime(End, Start, units = "days"), 1, 6))
# # A tibble: 2 x 4
# Record_id Start End d
# <chr> <dttm> <dttm> <drtn>
# 1 2245 2021-01-13 00:00:00 2021-01-21 18:00:00 8.750 days
# 2 9287 2021-01-13 00:00:00 2021-01-25 15:00:00 12.625 days
Data
df <- structure(list(Record_id = c("2245", "6728", "5122", "9287"), Start = c("2021-01-13 CST", "2021-01-21 CST", "2021-01-17 CST", "2021-01-13 CST"), End = c("2021-01-21 18:00:00 CST", "2021-01-22 16:00:00 CST", "2021-01-22 13:00:00 CST", "2021-01-25 15:00:00 CST")), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"))
I just converted the character to a date time with lubridate and then subtracted the dates. What you'll get back are days. I then filter for dates that are within 2 days.
Record_id<- c("2245","6728","5122","9287")
Start<-c("2021-01-13 CST" ,"2021-01-21 CST" ,"2021-01-17 CST","2021-01-13 CST")
End<-c("2021-01-21 18:00:00 CST", "2021-01-22 16:00:00 CST", "2021-01-22 13:00:00 CST","2021-01-25 15:00:00 CST")
df <- dplyr::tibble(x = Record_id, y = Start, z = End)
df %>%
dplyr::mutate_at(vars(y:z), ~ lubridate::as_datetime(.)) %>%
dplyr::mutate(diff = as.numeric(z - y)) %>%
dplyr::filter(diff <= 2 )
How do you set 0:00 as end of day instead of 23:00 in an hourly data? I have this struggle while using period.apply or to.period as both return days ending at 23:00. Here is an example :
x1 = xts(seq(as.POSIXct("2018-02-01 00:00:00"), as.POSIXct("2018-02-05 23:00:00"), by="hour"), x = rnorm(120))
The following functions show periods ends at 23:00
to.period(x1, OHLC = FALSE, drop.date = FALSE, period = "days")
x1[endpoints(x1, 'days')]
So when I am aggregating the hourly data to daily, does someone have an idea how to set the end of day at 0:00?
As already pointed out by another answer here, to.period on days computes on the data with timestamps between 00:00:00 and 23:59:59.9999999 on the day in question. so 23:00:00 is seen as the last timestamp in your data, and 00:00:00 corresponds to a value in the next day "bin".
What you can do is shift all the timestamps back 1 hour, use to.period get the daily data points from the hour points, and then using align.time to get the timestamps aligned correctly.
(More generally, to.period is useful for generating OHLCV type data, and so if you're say generating say hourly bars from ticks, it makes sense to look at all the ticks between 23:00:00 and 23:59:59.99999 in the bar creation. then 00:00:00 to 00:59:59.9999.... would form the next hourly bar and so on.)
Here is an example:
> tail(x1["2018-02-01"])
# [,1]
# 2018-02-01 18:00:00 -1.2760349
# 2018-02-01 19:00:00 -0.1496041
# 2018-02-01 20:00:00 -0.5989614
# 2018-02-01 21:00:00 -0.9691905
# 2018-02-01 22:00:00 -0.2519618
# 2018-02-01 23:00:00 -1.6081656
> head(x1["2018-02-02"])
# [,1]
# 2018-02-02 00:00:00 -0.3373271
# 2018-02-02 01:00:00 0.8312698
# 2018-02-02 02:00:00 0.9321747
# 2018-02-02 03:00:00 0.6719425
# 2018-02-02 04:00:00 -0.5597391
# 2018-02-02 05:00:00 -0.9810128
> head(x1["2018-02-03"])
# [,1]
# 2018-02-03 00:00:00 2.3746424
# 2018-02-03 01:00:00 0.8536594
# 2018-02-03 02:00:00 -0.2467268
# 2018-02-03 03:00:00 -0.1316978
# 2018-02-03 04:00:00 0.3079848
# 2018-02-03 05:00:00 0.2445634
x2 <- x1
.index(x2) <- .index(x1) - 3600
> tail(x2["2018-02-01"])
# [,1]
# 2018-02-01 18:00:00 -0.1496041
# 2018-02-01 19:00:00 -0.5989614
# 2018-02-01 20:00:00 -0.9691905
# 2018-02-01 21:00:00 -0.2519618
# 2018-02-01 22:00:00 -1.6081656
# 2018-02-01 23:00:00 -0.3373271
x.d2 <- to.period(x2, OHLC = FALSE, drop.date = FALSE, period = "days")
> x.d2
# [,1]
# 2018-01-31 23:00:00 0.12516594
# 2018-02-01 23:00:00 -0.33732710
# 2018-02-02 23:00:00 2.37464235
# 2018-02-03 23:00:00 0.51797747
# 2018-02-04 23:00:00 0.08955208
# 2018-02-05 22:00:00 0.33067734
x.d2 <- align.time(x.d2, n = 86400)
> x.d2
# [,1]
# 2018-02-01 0.12516594
# 2018-02-02 -0.33732710
# 2018-02-03 2.37464235
# 2018-02-04 0.51797747
# 2018-02-05 0.08955208
# 2018-02-06 0.33067734
Want to convince yourself? Try something like this:
x3 <- rbind(x1, xts(x = matrix(c(1,2), nrow = 2), order.by = as.POSIXct(c("2018-02-01 23:59:59.999", "2018-02-02 00:00:00"))))
x3["2018-02-01 23/2018-02-02 01"]
# [,1]
# 2018-02-01 23:00:00.000 -1.6081656
# 2018-02-01 23:59:59.999 1.0000000
# 2018-02-02 00:00:00.000 -0.3373271
# 2018-02-02 00:00:00.000 2.0000000
# 2018-02-02 01:00:00.000 0.8312698
x3.d <- to.period(x3, OHLC = FALSE, drop.date = FALSE, period = "days")
> x3.d <- align.time(x3.d, 86400)
> x3.d
[,1]
2018-02-02 1.00000000
2018-02-03 -0.09832625
2018-02-04 -0.65075506
2018-02-05 -0.09423664
2018-02-06 0.33067734
See that the value of 2 on 00:00:00 did not form the last observation in the day for 2018-02-02 (00:00:00), which went from 2018-02-01 00:00:00 to 2018-02-01 23:59:59.9999.
Of course, if you want the daily timestamp to be the start of the day, not the end of the day, which would be 2018-02-01 as start of bar for the first row, in x3.d above, you could shift back the day by one. You could do this relatively safely for most timezones, when your data doesn't involve weekend dates:
index(x3.d) = index(x3.d) - 86400
I say relatively safetly, because there are corner cases when there are time shifts in a time zone. e.g. Be careful with day light savings. Simply subtracting -86400 can be a problem when going from Sunday to Saturday in time zones where day light saving occurs:
#e.g. bad: day light savings occurs on this weekend for US EST
z <- xts(x = 9, order.by = as.POSIXct("2018-03-12", tz = "America/New_York"))
> index(z) - 86400
[1] "2018-03-10 23:00:00 EST"
i.e. the timestamp is off by one hour, when you really want the midnight timestamp (00:00:00).
You could get around this problem using something much safer like this:
library(lubridate)
# right
> index(z) - days(1)
[1] "2018-03-11 EST"
I don't think this is possible because 00:00 is the start of the day. From the manual:
These endpoints are aligned in POSIXct time to the zero second of the day at the beginning, and the 59.9999th second of the 59th minute of the 23rd hour of the final day
I think the solution here is to use minutes instead of hours. Using your example:
x1 = xts(seq(as.POSIXct("2018-02-01 00:00:00"), as.POSIXct("2018-02-05 23:59:99"), by="min"), x = rnorm(7200))
to.period(x1, OHLC = FALSE, drop.date = FALSE, period = "day")
x1[endpoints(x1, 'day')]
as I failed to solve my problem with PHP/MySQL or Excel due to the data size, I'm trying to do my very first steps with R now and struggle a bit. The problem is this: I have a second-by-second CSV-file with half a year of data, that looks like this:
metering,timestamp
123,2016-01-01 00:00:00
345,2016-01-01 00:00:01
243,2016-01-01 00:00:02
101,2016-01-01 00:00:04
134,2016-01-01 00:00:06
As you see, there are some seconds missing every once in a while (don't ask me, why the values are written before the timestamp, but that's how I received the data…). Now I try to calculate the amount of values (= seconds) that are missing.
So my idea was
to create a vector that is correct (includes all sec-by-sec timestamps),
match the given CSV file with that new vector, and
sum up all the timestamps with no value.
I managed to make step 1 happen with the following code:
RegularTimeSeries <- seq(as.POSIXct("2016-01-01 00:00:00", tz = "UTC"), as.POSIXct("2016-01-01 00:00:30", tz = "UTC"), by = "1 sec")
write.csv(RegularTimeSeries, file = "RegularTimeSeries.csv")
To have an idea what I did I also exported the vector to a CSV that looks like this:
"1",2016-01-01 00:00:00
"2",2016-01-01 00:00:01
"3",2016-01-01 00:00:02
"4",2016-01-01 00:00:03
"5",2016-01-01 00:00:04
"6",2016-01-01 00:00:05
"7",2016-01-01 00:00:06
Unfortunately I have no idea how to go on with step 2 and 3. I found some very similar examples (http://www.r-bloggers.com/fix-missing-dates-with-r/, R: Insert rows for missing dates/times), but as a total R noob I struggled to translate these examples to my given sec-by-sec data.
Some hints for the greenhorn would be very very helpful – thank you very much in advance :)
In the tidyverse,
library(dplyr)
library(tidyr)
# parse datetimes
df %>% mutate(timestamp = as.POSIXct(timestamp)) %>%
# complete sequence to full sequence from min to max by second
complete(timestamp = seq.POSIXt(min(timestamp), max(timestamp), by = 'sec'))
## # A tibble: 7 x 2
## timestamp metering
## <time> <int>
## 1 2016-01-01 00:00:00 123
## 2 2016-01-01 00:00:01 345
## 3 2016-01-01 00:00:02 243
## 4 2016-01-01 00:00:03 NA
## 5 2016-01-01 00:00:04 101
## 6 2016-01-01 00:00:05 NA
## 7 2016-01-01 00:00:06 134
If you want the number of NAs (i.e. the number of seconds with no data), add on
%>% tally(is.na(metering))
## # A tibble: 1 x 1
## n
## <int>
## 1 2
You can check which values of your RegularTimeSeries are in your broken time series using which and %in%. First create BrokenTimeSeries from your example:
RegularTimeSeries <- seq(as.POSIXct("2016-01-01 00:00:00", tz = "UTC"), as.POSIXct("2016-01-01 00:00:30", tz = "UTC"), by = "1 sec")
BrokenTimeSeries <- RegularTimeSeries[-c(3,6,9)] # remove some seconds
This will give you the indeces of values within RegularTimeSeries that are not in BrokenTimeSeries:
> which(!(RegularTimeSeries %in% BrokenTimeSeries))
[1] 3 6 9
This will return the actual values:
> RegularTimeSeries[which(!(RegularTimeSeries %in% BrokenTimeSeries))]
[1] "2016-01-01 00:00:02 UTC" "2016-01-01 00:00:05 UTC" "2016-01-01 00:00:08 UTC"
Maybe I'm misunderstanding your problem but you can count the number of missing seconds simply subtracting the length of your broken time series from RegularTimeSeries or getting the length of any of the two resulting vectors above.
> length(RegularTimeSeries) - length(BrokenTimeSeries)
[1] 3
> length(which(!(RegularTimeSeries %in% BrokenTimeSeries)))
[1] 3
> length(RegularTimeSeries[which(!(RegularTimeSeries %in% BrokenTimeSeries))])
[1] 3
If you want to merge the files together to see the missing values you can do something like this:
#data with regular time series and a "step"
df <- data.frame(
RegularTimeSeries
)
df$BrokenTimeSeries[RegularTimeSeries %in% BrokenTimeSeries] <- df$RegularTimeSeries
df$BrokenTimeSeries <- as.POSIXct(df$BrokenTimeSeries, origin="2015-01-01", tz="UTC")
resulting in:
> df[1:12,]
RegularTimeSeries BrokenTimeSeries
1 2016-01-01 00:00:00 2016-01-01 00:00:00
2 2016-01-01 00:00:01 2016-01-01 00:00:01
3 2016-01-01 00:00:02 <NA>
4 2016-01-01 00:00:03 2016-01-01 00:00:02
5 2016-01-01 00:00:04 2016-01-01 00:00:03
6 2016-01-01 00:00:05 <NA>
7 2016-01-01 00:00:06 2016-01-01 00:00:04
8 2016-01-01 00:00:07 2016-01-01 00:00:05
9 2016-01-01 00:00:08 <NA>
10 2016-01-01 00:00:09 2016-01-01 00:00:06
11 2016-01-01 00:00:10 2016-01-01 00:00:07
12 2016-01-01 00:00:11 2016-01-01 00:00:08
If all you want is the number of missing seconds, it can be done much more simply. First find the number of seconds in your timerange, and then subtract the number of rows in your dataset. This could be done in R along these lines:
n.seconds <- difftime("2016-06-01 00:00:00", "2016-01-01 00:00:00", units="secs")
n.rows <- nrow(my.data.frame)
n.missing.values <- n.seconds - n.rows
You might change the time range and the variable of your data frame.
Hope it helps
d <- (c("2016-01-01 00:00:01",
"2016-01-01 00:00:02",
"2016-01-01 00:00:03",
"2016-01-01 00:00:04",
"2016-01-01 00:00:05",
"2016-01-01 00:00:06",
"2016-01-01 00:00:10",
"2016-01-01 00:00:12",
"2016-01-01 00:00:14",
"2016-01-01 00:00:16",
"2016-01-01 00:00:18",
"2016-01-01 00:00:20",
"2016-01-01 00:00:22"))
d <- as.POSIXct(d)
for (i in 2:length(d)){
if(difftime(d[i-1],d[i], units = "secs") < -1 ){
c[i] <- d[i]
}
}
class(c) <- c('POSIXt','POSIXct')
c
[1] NA NA NA
NA NA
[6] NA "2016-01-01 00:00:10 EST" "2016-01-01 00:00:12
EST" "2016-01-01 00:00:14 EST" "2016-01-01 00:00:16 EST"
[11] "2016-01-01 00:00:18 EST" "2016-01-01 00:00:20 EST" "2016-01-01
00:00:22 EST"
I have a POSIXct class vector containing am hours and I want to replace the values in a data frame containing a character class column. When I do the replacement the class changes to character. I'm proceeding as follows:
class(data2014.im.t[,2])
[1] "character"
class(horas.am)
[1] "POSIXct" "POSIXt"
head(horas.am)
[1] "1970-01-01 09:00:00 COT" "1970-01-01 10:00:00 COT" "1970-01-01 11:00:00 COT" "1970-01-01 12:00:00 COT"
[5] "1970-01-01 01:00:00 COT" "1970-01-01 02:00:00 COT"
data2014.im.t[grep("([a])", data2014.im.t[,2]), 2] <- horas.am
class(data2014.im.t[,2])
[1] "character"
head(data2014.im.t[,2])
[1] "50400" "54000" "57600" "104400" "64800" "68400"
Evidently I would like to have a POSIXct column containing hours. Any thoughts?
You should explicitly do the conversion yourself
#sample data
horas.am <- seq(as.POSIXct("2014-01-01 05:00:00"), length.out=10, by="2 hours")
data2014.im.t <- data.frame(a=1:10, b=rep("a",10), stringsAsFactors=FALSE)
class(data2014.im.t[,2])
# [1] "character"
class(horas.am)
# [1] "POSIXct" "POSIXt"
# NO:
data2014.im.t[grep("([a])", data2014.im.t[,2]), 2] <- horas.am
# YES
data2014.im.t[grep("([a])", data2014.im.t[,2]), 2] <- as.character(horas.am)
data2014.im.t
# a b
# 1 1 2014-01-01 05:00:00
# 2 2 2014-01-01 07:00:00
# 3 3 2014-01-01 09:00:00
# 4 4 2014-01-01 11:00:00
# 5 5 2014-01-01 13:00:00
# 6 6 2014-01-01 15:00:00
# 7 7 2014-01-01 17:00:00
# 8 8 2014-01-01 19:00:00
# 9 9 2014-01-01 21:00:00
# 10 10 2014-01-01 23:00:00
class(data2014.im.t[,2])
# [1] "character"