Convert string to datatime object in r - r

I have a list of strings times. I want to convert those strings to datetime object. I tried as.POSIXct and did not get expected outcomes. I want datetimes like this 00:30, 01:30 ect...
Is there any easy code for doing this?
> times
[1] "00:30" "01:30" "02:30" "03:30" "04:30" "05:30" "06:30" "07:30" "08:30" "09:30" "10:30" "11:30" "12:30" "13:30" "14:30"
[16] "15:30" "16:30" "17:30" "18:30" "19:30" "20:30" "21:30" "22:30" "23:30"
> times <- as.POSIXct(times, format = '%H:%M')
[1] "2020-03-11 00:30:00 CDT" "2020-03-11 01:30:00 CDT" "2020-03-11 02:30:00 CDT" "2020-03-11 03:30:00 CDT"
[5] "2020-03-11 04:30:00 CDT" "2020-03-11 05:30:00 CDT" "2020-03-11 06:30:00 CDT" "2020-03-11 07:30:00 CDT"
[9] "2020-03-11 08:30:00 CDT" "2020-03-11 09:30:00 CDT" "2020-03-11 10:30:00 CDT" "2020-03-11 11:30:00 CDT"
[13] "2020-03-11 12:30:00 CDT" "2020-03-11 13:30:00 CDT" "2020-03-11 14:30:00 CDT" "2020-03-11 15:30:00 CDT"
[17] "2020-03-11 16:30:00 CDT" "2020-03-11 17:30:00 CDT" "2020-03-11 18:30:00 CDT" "2020-03-11 19:30:00 CDT"
[21] "2020-03-11 20:30:00 CDT" "2020-03-11 21:30:00 CDT" "2020-03-11 22:30:00 CDT" "2020-03-11 23:30:00 CDT"

As previous comments and answers suggested, the POSIXct (i.e., datetime) class in R always stores dates along with times. If you convert from a character object with just times to that class, today's date is added by default (if you want another date, you could do, for example, this: as.POSIXct(paste("2020-01-01", times), format = "%Y-%m-%d %H:%M")).
However, this should almost never be a problem since you can use format(times, format = "%H:%M") or for ggplot2 scale_x_datetime to get just the times back. For plotting, this would look something like this:
times <- c("00:30", "01:30", "02:30", "03:30", "04:30", "05:30", "06:30", "07:30", "08:30", "09:30", "10:30", "11:30", "12:30", "13:30", "14:30",
"15:30", "16:30", "17:30", "18:30", "19:30", "20:30", "21:30", "22:30", "23:30")
library(tidyverse)
df <- tibble(
time_chr = times,
time = as.POSIXct(times, format = "%H:%M"),
value = rnorm(length(times))
)
df
#> # A tibble: 24 x 3
#> time_chr time value
#> <chr> <dttm> <dbl>
#> 1 00:30 2020-03-12 00:30:00 0.352
#> 2 01:30 2020-03-12 01:30:00 -0.547
#> 3 02:30 2020-03-12 02:30:00 -0.574
#> 4 03:30 2020-03-12 03:30:00 0.843
#> 5 04:30 2020-03-12 04:30:00 0.798
#> 6 05:30 2020-03-12 05:30:00 -0.620
#> 7 06:30 2020-03-12 06:30:00 0.213
#> 8 07:30 2020-03-12 07:30:00 1.21
#> 9 08:30 2020-03-12 08:30:00 0.370
#> 10 09:30 2020-03-12 09:30:00 0.497
#> # … with 14 more rows
ggplot(df, aes(x = time, y = value)) +
geom_line() +
scale_x_datetime(date_labels = "%H:%M")
Created on 2020-03-12 by the reprex package (v0.3.0)

In base you can use as.difftime to convert a string to time object:
as.difftime(times, "%H:%M")
#Time differences in mins
# [1] 30 90 150 210 270 330 390 450 510 570 630 690 750 810 870
#[16] 930 990 1050 1110 1170 1230 1290 1350 1410
You can also use the hms package:
library(hms)
head(as_hms(paste0(times, ":00")))
#00:30:00
#01:30:00
#02:30:00
#03:30:00
#04:30:00
#05:30:00
or the lubridate package as already suggested by #jpmam1
library(lubridate)
hm(times)
# [1] "30M 0S" "1H 30M 0S" "2H 30M 0S" "3H 30M 0S" "4H 30M 0S"
# [6] "5H 30M 0S" "6H 30M 0S" "7H 30M 0S" "8H 30M 0S" "9H 30M 0S"
#[11] "10H 30M 0S" "11H 30M 0S" "12H 30M 0S" "13H 30M 0S" "14H 30M 0S"
#[16] "15H 30M 0S" "16H 30M 0S" "17H 30M 0S" "18H 30M 0S" "19H 30M 0S"
#[21] "20H 30M 0S" "21H 30M 0S" "22H 30M 0S" "23H 30M 0S"
as.POSIXct stores dates with your times. If you need it only for plotting that will cause no problem and use the answer from #JBGruber. Storing dates where there are no dates should be avoided or the dates should be set to values where it is clear that they are wrong.
head(as.POSIXct(paste("9999-1-1", times)))
#[1] "9999-01-01 00:30:00 CET" "9999-01-01 01:30:00 CET"
#[3] "9999-01-01 02:30:00 CET" "9999-01-01 03:30:00 CET"
#[5] "9999-01-01 04:30:00 CET" "9999-01-01 05:30:00 CET"

Related

How to unify the time format in r?

I have a dataset test. It was like 30-06-22 23:55:00, 1/7/2022 0:00 AM in excel, I have no idea why there are two different formats in one column and I can't change the format in excel. It's weird because the time format of each month from the 1st to the 12th is different from the rest of the days. Therefore, I import the data to R and try to unify the formats by using parse_date_time() function. But now it changed to 44568" in R. And I got Warning message: 20737 failed to parse. after running the code test$Time<- parse_date_time(test$Time, orders= c("%d-%m-%y %H%M%S","%d/%m/%Y %I:%M:%S %p" ))
I was so confused about why the formats are different and how to unify the data formats in the same way like 1/7/2022 0:00 AM (d/m/Y H:M AM/PM)
test<- c("30-06-22 20:35:00", "30-06-22 20:40:00", "30-06-22 20:45:00",
"30-06-22 20:50:00", "30-06-22 20:55:00", "30-06-22 21:00:00",
"30-06-22 21:05:00", "30-06-22 21:10:00", "30-06-22 21:15:00",
"30-06-22 21:20:00", "30-06-22 21:25:00", "30-06-22 21:30:00",
"30-06-22 21:35:00", "30-06-22 21:40:00", "30-06-22 21:45:00",
"30-06-22 21:50:00", "30-06-22 21:55:00", "30-06-22 22:00:00",
"30-06-22 22:05:00", "30-06-22 22:10:00", "30-06-22 22:15:00",
"30-06-22 22:20:00", "30-06-22 22:25:00", "30-06-22 22:30:00",
"30-06-22 22:35:00", "30-06-22 22:40:00", "30-06-22 22:45:00",
"30-06-22 22:50:00", "30-06-22 22:55:00", "30-06-22 23:00:00",
"30-06-22 23:05:00", "30-06-22 23:10:00", "30-06-22 23:15:00",
"30-06-22 23:20:00", "30-06-22 23:25:00", "30-06-22 23:30:00",
"30-06-22 23:35:00", "30-06-22 23:40:00", "30-06-22 23:45:00",
"30-06-22 23:50:00", "30-06-22 23:55:00", "44568", "44568.003472222219",
"44568.006944444445", "44568.010416666664", "44568.013888888891",
"44568.017361111109", "44568.020833333336", "44568.024305555555",
"44568.027777777781", "44568.03125", "44568.034722222219", "44568.038194444445",
"44568.041666666664", "44568.045138888891", "44568.048611111109",
"44568.052083333336", "44568.055555555555", "44568.059027777781",
"44568.0625", "44568.065972222219", "44568.069444444445", "44568.072916666664",
"44568.076388888891", "44568.079861111109", "44568.083333333336",
"44568.086805555555", "44568.090277777781", "44568.09375", "44568.097222222219",
"44568.100694444445", "44568.104166666664", "44568.107638888891",
"44568.111111111109", "44568.114583333336", "44568.118055555555",
"44568.121527777781", "44568.125", "44568.128472222219", "44568.131944444445",
"44568.135416666664", "44568.138888888891", "44568.142361111109",
"44568.145833333336", "44568.149305555555", "44568.152777777781",
"44568.15625", "44568.159722222219", "44568.163194444445", "44568.166666666664",
"44568.170138888891", "44568.173611111109", "44568.177083333336",
"44568.180555555555", "44568.184027777781", "44568.1875", "44568.190972222219",
"44568.194444444445", "44568.197916666664", "44568.201388888891",
"44568.204861111109")
We may do
library(parsedate)
library(dplyr)
v1 <- as.numeric(test)
v1 <- coalesce(openxlsx::convertToDateTime(v1), parse_date(test))
v1
-output
[> v1
[1] "2022-06-30 20:35:00 UTC" "2022-06-30 20:40:00 UTC" "2022-06-30 20:45:00 UTC" "2022-06-30 20:50:00 UTC" "2022-06-30 20:55:00 UTC" "2022-06-30 21:00:00 UTC"
[7] "2022-06-30 21:05:00 UTC" "2022-06-30 21:10:00 UTC" "2022-06-30 21:15:00 UTC" "2022-06-30 21:20:00 UTC" "2022-06-30 21:25:00 UTC" "2022-06-30 21:30:00 UTC"
[13] "2022-06-30 21:35:00 UTC" "2022-06-30 21:40:00 UTC" "2022-06-30 21:45:00 UTC" "2022-06-30 21:50:00 UTC" "2022-06-30 21:55:00 UTC" "2022-06-30 22:00:00 UTC"
[19] "2022-06-30 22:05:00 UTC" "2022-06-30 22:10:00 UTC" "2022-06-30 22:15:00 UTC" "2022-06-30 22:20:00 UTC" "2022-06-30 22:25:00 UTC" "2022-06-30 22:30:00 UTC"
[25] "2022-06-30 22:35:00 UTC" "2022-06-30 22:40:00 UTC" "2022-06-30 22:45:00 UTC" "2022-06-30 22:50:00 UTC" "2022-06-30 22:55:00 UTC" "2022-06-30 23:00:00 UTC"
[31] "2022-06-30 23:05:00 UTC" "2022-06-30 23:10:00 UTC" "2022-06-30 23:15:00 UTC" "2022-06-30 23:20:00 UTC" "2022-06-30 23:25:00 UTC" "2022-06-30 23:30:00 UTC"
[37] "2022-06-30 23:35:00 UTC" "2022-06-30 23:40:00 UTC" "2022-06-30 23:45:00 UTC" "2022-06-30 23:50:00 UTC" "2022-06-30 23:55:00 UTC" "2022-01-07 05:00:00 UTC"
[43] "2022-01-07 05:05:00 UTC" "2022-01-07 05:10:00 UTC" "2022-01-07 05:15:00 UTC" "2022-01-07 05:20:00 UTC" "2022-01-07 05:25:00 UTC" "2022-01-07 05:30:00 UTC"
[49] "2022-01-07 05:35:00 UTC" "2022-01-07 05:40:00 UTC" "2022-01-07 05:45:00 UTC" "2022-01-07 05:50:00 UTC" "2022-01-07 05:55:00 UTC" "2022-01-07 06:00:00 UTC"
[55] "2022-01-07 06:05:00 UTC" "2022-01-07 06:10:00 UTC" "2022-01-07 06:15:00 UTC" "2022-01-07 06:20:00 UTC" "2022-01-07 06:25:00 UTC" "2022-01-07 06:30:00 UTC"
[61] "2022-01-07 06:35:00 UTC" "2022-01-07 06:40:00 UTC" "2022-01-07 06:45:00 UTC" "2022-01-07 06:50:00 UTC" "2022-01-07 06:55:00 UTC" "2022-01-07 07:00:00 UTC"
[67] "2022-01-07 07:05:00 UTC" "2022-01-07 07:10:00 UTC" "2022-01-07 07:15:00 UTC" "2022-01-07 07:20:00 UTC" "2022-01-07 07:25:00 UTC" "2022-01-07 07:30:00 UTC"
[73] "2022-01-07 07:35:00 UTC" "2022-01-07 07:40:00 UTC" "2022-01-07 07:45:00 UTC" "2022-01-07 07:50:00 UTC" "2022-01-07 07:55:00 UTC" "2022-01-07 08:00:00 UTC"
[79] "2022-01-07 08:05:00 UTC" "2022-01-07 08:10:00 UTC" "2022-01-07 08:15:00 UTC" "2022-01-07 08:20:00 UTC" "2022-01-07 08:25:00 UTC" "2022-01-07 08:30:00 UTC"
[85] "2022-01-07 08:35:00 UTC" "2022-01-07 08:40:00 UTC" "2022-01-07 08:45:00 UTC" "2022-01-07 08:50:00 UTC" "2022-01-07 08:55:00 UTC" "2022-01-07 09:00:00 UTC"
[91] "2022-01-07 09:05:00 UTC" "2022-01-07 09:10:00 UTC" "2022-01-07 09:15:00 UTC" "2022-01-07 09:20:00 UTC" "2022-01-07 09:25:00 UTC" "2022-01-07 09:30:00 UTC"
[97] "2022-01-07 09:35:00 UTC" "2022-01-07 09:40:00 UTC" "2022-01-07 09:45:00 UTC" "2022-01-07 09:50:00 UTC" "2022-01-07 09:55:00 UTC"

Time difference format %d-%m-%Y %H:%M days hours minutest in R

I'm trying to find the time interval of a site visit
I formated the time date columns as following:
as.POSIXct( StartedDateTime, format = "%d/%m/%Y %H:%M")
as.POSIXct( EndDateTime, format = "%d/%m/%Y %H:%M")
Data Sample:
VisitID <- c"1015799589" "1015808075" "1015814910" "1015816258"
"1015823399" "1015825771" "1015826824" "1015830050"
"1015838465" "1015840018" "1015842349" "1015843419"
StartedDateTime <- c"2019-11-27 22:02:00 GMT" "2019-11-27 19:36:00 GMT" "2019-11-28 08:33:00 GMT"
"2019-11-27 19:49:00 GMT" "2019-11-27 22:56:00 GMT" "2019-11-27 16:28:00 GMT"
"2019-07-04 09:48:00 BST" "2019-07-03 08:20:00 BST" "2019-07-02 02:57:00 BST"
"2019-07-02 02:28:00 BST" "2019-07-02 08:46:00 BST" "2019-07-02 04:22:00 BST"
EndDateTime <- c"2019-12-02 16:52:00 GMT" "2019-12-19 08:00:00 GMT" "2019-04-02 13:11:00 BST"
"2019-04-09 09:59:00 BST" "2019-12-04 09:00:00 GMT" "2019-12-04 09:00:00 GMT"
"2019-12-04 09:00:00 GMT" "2019-04-02 17:00:00 BST" "2019-04-02 17:00:00 BST"
"2019-04-12 14:00:00 BST" "2019-04-12 14:00:00 BST" "2019-04-03 08:00:00 BST"
I tried to find the time interval (some visits will last for more than two days
VisitDuration<- difftime(EndDateTime, StartDateTime, units= "secs")
then
seconds_to_period(VisitDuration)
VisitDuration
"18d 12H 18M 0S" "8d 6H 27M 0S"
"4d 4H 43M 0S" "-3M 0S"
"1d 5H 31M 0S" "2d 8H 21M 0S"
"-32M 0S" "2d 3H 10M 0S"
I have two issues
Whenever I tried to plot the visit duration I get a very weird graph and it doesn't arrange chronologically
Also, I wanted to plot the start and End visit time in one graph in lines to compare them, but with n luck
Any suggestion for a better way to compare the date and time?
The data set is about 30 thousand something

switch to DST: round_date() returns NAs

In 2013, the switch from Central European Time (CET) to Central European Summer Time (CEST) took place on Sunday 2013-03-31. Clocks are advanced by one hour from 2am to 3pm, so basically there is no 2am.
start <- strptime("2013-03-31 01:00:00", format="%F %T", tz="CET")
times <- start + (0:5) * 60*15
times
[1] "2013-03-31 01:00:00 CET" "2013-03-31 01:15:00 CET"
[3] "2013-03-31 01:30:00 CET" "2013-03-31 01:45:00 CET"
[5] "2013-03-31 03:00:00 CEST" "2013-03-31 03:15:00 CEST"
Rounding the vector times to hours gives NAs. Even for times before 01:30, which aren't affected by the transition at all.
library(lubridate)
round_date(times, unit = "hour")
[1] "2013-03-31 01:00:00 CET" NA
[3] NA NA
[5] NA "2013-03-31 03:00:00 CEST"
This seems to be a bug, or am I missing something? I am running:
sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=German_Austria.1252 LC_CTYPE=German_Austria.1252
[3] LC_MONETARY=German_Austria.1252 LC_NUMERIC=C
[5] LC_TIME=German_Austria.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.3.3
loaded via a namespace (and not attached):
[1] digest_0.6.4 memoise_0.2.1 plyr_1.8.1 Rcpp_0.11.2 stringr_0.6.2
It looks like the culprit is ceiling_date which is called by round_date:
ceiling_date(times,"hour")
[1] "2013-03-31 01:00:00 CET" NA
[3] NA NA
[5] NA "2013-03-31 04:00:00 CEST"
Looking at the code it works by adding 1 to the hour, thereby creating a non-existant time. It is definitely a bug.
base::round has support for times to do what you want though:
round(times,"hour")
[1] "2013-03-31 01:00:00 CET" "2013-03-31 01:00:00 CET"
[3] "2013-03-31 03:00:00 CEST" "2013-03-31 03:00:00 CEST"
[5] "2013-03-31 03:00:00 CEST" "2013-03-31 03:00:00 CEST"
It's an edge case and you could consider the behavior a bug. round_date uses ceiling_date and there this happens:
y <- floor_date(times - eseconds(1), "hour")
#[1] "2013-03-31 00:00:00 CET" "2013-03-31 01:00:00 CET" "2013-03-31 01:00:00 CET" "2013-03-31 01:00:00 CET" "2013-03-31 01:00:00 CET" "2013-03-31 03:00:00 CEST"
hour(y) <- hour(y) + 1
#[1] "2013-03-31 01:00:00 CET" NA NA NA NA "2013-03-31 04:00:00 CEST"
As you see it tries to increment 2013-03-31 01:00:00 CET by one hour and doesn't deal correctly with the time zones.
The root issue is probably in the "hour<-" POSIXct S4 method.
This has been fixed in master:
> times <- ymd_hms("2013-03-31 01:00:00 CET", "2013-03-31 01:15:00 CEST",
+ "2013-03-31 01:30:00 CEST", "2013-03-31 01:45:00 CEST",
+ "2013-03-31 03:00:00 CEST", "2013-03-31 03:15:00 CEST",
+ tz = "Europe/Amsterdam")
> round_date(times, unit = "hour")
[1] "2013-03-31 01:00:00 CET" "2013-03-31 01:00:00 CET" "2013-03-31 03:00:00 CEST"
[4] "2013-03-31 03:00:00 CEST" "2013-03-31 03:00:00 CEST" "2013-03-31 03:00:00 CEST"
> ceiling_date(times, unit = "hour")
[1] "2013-03-31 01:00:00 CET" "2013-03-31 03:00:00 CEST" "2013-03-31 03:00:00 CEST"
[4] "2013-03-31 03:00:00 CEST" "2013-03-31 03:00:00 CEST" "2013-03-31 04:00:00 CEST"

melt.data.frame() changes behavior how POSIXct columns are printed

Melting the dataframe t.wide changes how the column "time" (class POSIXct) is printed.
t.wide <- data.frame(product=letters[1:5],
result=c(2, 4, 0, 0, 1),
t1=as.POSIXct("2014-05-26") + seq(0, 10800, length.out=5),
t2=as.POSIXct("2014-05-27") + seq(0, 10800, length.out=5),
t3=as.POSIXct("2014-05-28") + seq(0, 10800, length.out=5))
library(reshape2)
t.long <- melt(t.wide, measure.vars=c("t1", "t2", "t3"), value.name="time")
t.long$time
[1] 1401055200 1401057900 1401060600 1401063300 1401066000 1401141600 1401144300
[8] 1401147000 1401149700 1401152400 1401228000 1401230700 1401233400 1401236100
[15] 1401238800
attr(,"class")
[1] "POSIXct" "POSIXt"
Strangely, if print() is called explicitly, the object is printed as expected (timestamps, not their numeric representation).
print(t.long$time)
[1] "2014-05-26 00:00:00 CEST" "2014-05-26 00:45:00 CEST" "2014-05-26 01:30:00 CEST"
[4] "2014-05-26 02:15:00 CEST" "2014-05-26 03:00:00 CEST" "2014-05-27 00:00:00 CEST"
[7] "2014-05-27 00:45:00 CEST" "2014-05-27 01:30:00 CEST" "2014-05-27 02:15:00 CEST"
[10] "2014-05-27 03:00:00 CEST" "2014-05-28 00:00:00 CEST" "2014-05-28 00:45:00 CEST"
[13] "2014-05-28 01:30:00 CEST" "2014-05-28 02:15:00 CEST" "2014-05-28 03:00:00 CEST"
Setting the attributes to the same value as before magically changes how the object is printed.
attributes(t.long$time) <- attributes(t.long$time)
t.long$time
[1] "2014-05-26 00:00:00 CEST" "2014-05-26 00:45:00 CEST" "2014-05-26 01:30:00 CEST"
[4] "2014-05-26 02:15:00 CEST" "2014-05-26 03:00:00 CEST" "2014-05-27 00:00:00 CEST"
[7] "2014-05-27 00:45:00 CEST" "2014-05-27 01:30:00 CEST" "2014-05-27 02:15:00 CEST"
[10] "2014-05-27 03:00:00 CEST" "2014-05-28 00:00:00 CEST" "2014-05-28 00:45:00 CEST"
[13] "2014-05-28 01:30:00 CEST" "2014-05-28 02:15:00 CEST" "2014-05-28 03:00:00 CEST"
Can anyone explain this behavior?
UPDATE:
I opened this as Issue #50 on the git repo hadley/reshape2.
UPDATE: FIXED
This issue has been fixed in the development version of reshape2.
Thanks #kevin-ushey!
I believe the reason is because after the reshaping for whatever reason R does not think that t.long$time has attributes. For some reason the OBJECT flag (which indicates the vector has attributes) in the SEXP header for your vector is not being set. When you copy the attributes back to it, the OBJECT flag gets set and the correct print method is dispatched...
# No "OBJ" in SEXP header (the '[NAM(2),ATT]' part below)
.Internal(inspect( t.long$time ) )
##10359e548 14 REALSXP g0c6 [NAM(2),ATT] (len=15, tl=0) 1.40106e+09,...
# Now we have "OBJ" in the SEXP header indicating attributes
# So the print method for POSIXct get dispatched...
attributes(t.long$time) <- attributes(t.long$time)
.Internal(inspect( t.long$time ) )
##1118d7f50 14 REALSXP g0c6 [OBJ,NAM(2),ATT] (len=15, tl=0) 1.40106e+09,...
From the R Internals document...
The actual autoprinting is done by PrintValueEnv in file print.c. If the object to be printed has the S4 bit set and S4 methods dispatch is on, show is called to print the object. Otherwise, if the object bit is set (so the object has a "class" attribute), print is called to dispatch methods: for objects without a class the internal code of print.default is called.
Check the difference between..
print.default(t.long$time)
# [1] 1401058800 1401061500 1401064200 1401066900 1401069600 1401145200 1401147900 1401150600 1401153300 1401156000 1401231600 1401234300
#[13] 1401237000 1401239700 1401242400
#attr(,"class")
#[1] "POSIXct" "POSIXt"
print.POSIXct(t.long$time)
# [1] "2014-05-26 00:00:00 BST" "2014-05-26 00:45:00 BST" "2014-05-26 01:30:00 BST" "2014-05-26 02:15:00 BST" "2014-05-26 03:00:00 BST"
# [6] "2014-05-27 00:00:00 BST" "2014-05-27 00:45:00 BST" "2014-05-27 01:30:00 BST" "2014-05-27 02:15:00 BST" "2014-05-27 03:00:00 BST"
#[11] "2014-05-28 00:00:00 BST" "2014-05-28 00:45:00 BST" "2014-05-28 01:30:00 BST" "2014-05-28 02:15:00 BST" "2014-05-28 03:00:00 BST"
Now I can only speculate, but perhaps this is due to some internal code in reshape2 and is related to this warning..
One thing to watch is that if you copy attributes from one object to another you may (un)set the "class" attribute and so need to copy the object and S4 bits as well. There is a macro/function DUPLICATE_ATTRIB to automate this.

Converting irregular timestamp data to regularly spaced data using R

In a database I have data with associated timestamps. The timestamp is random in nature and has resolution upto minutes. I want to make this data uniform using R with respect to timestamps (with seconds resolution) with NA replaced by the previous value. Also, every timestamp should contain data for all the symbols. I have tried some timeseries packages for making the data uniform but have not been succcessful.
This is the code I have run so far
library("RPostgreSQL")
library(DBI)
library(sqldf)
drv <- dbDriver("PostgreSQL")
ch <- dbConnect(drv, dbname="derivativesData",
user="postgres", password="postgres")
companyFrame <- dbGetQuery(ch, "select * from derData")
companyFrame$trade_time
[1] "2011-06-01 09:00:00 IST" "2011-06-01 09:00:00 IST"
[3] "2011-06-01 09:00:00 IST" "2011-06-01 09:00:00 IST"
[5] "2011-06-01 09:00:00 IST" "2011-06-01 09:00:00 IST"
[7] "2011-06-01 09:00:00 IST" "2011-06-01 09:00:00 IST"
[9] "2011-06-01 09:00:00 IST" "2011-06-01 09:01:00 IST"
[11] "2011-06-01 09:01:00 IST" "2011-06-01 09:01:00 IST"
[13] "2011-06-01 09:02:00 IST" "2011-06-01 09:02:00 IST"
[15] "2011-06-01 09:02:00 IST" "2011-06-01 09:03:00 IST"
[17] "2011-06-01 09:04:00 IST" "2011-06-01 09:04:00 IST"
[19] "2011-06-01 09:05:00 IST" "2011-06-01 09:05:00 IST"
[21] "2011-06-01 09:06:00 IST" "2011-06-01 09:06:00 IST"
[23] "2011-06-01 09:06:00 IST" "2011-06-01 09:07:00 IST"
[25] "2011-06-01 09:08:00 IST" "2011-06-01 09:09:00 IST"
[27] "2011-06-01 09:10:00 IST" "2011-06-01 09:10:00 IST"
I want to convert this data into uniform format with say 10secs resolution.
Here I will use a 10 minutes resolution as your times don't have any seconds...
With the following sample data :
R> time <- c("2011-06-01 09:00:00 IST", "2011-06-01 09:00:00 IST", "2011-06-01 09:01:00 IST",
+ "2011-06-01 09:06:00 IST", "2011-06-01 09:10:00 IST", "2011-06-01 09:15:00 IST")
You can first convert the strings to a POSIXlt date format :
R> time2 <- strptime(time, format="%Y-%m-%d %X")
R> time2
[1] "2011-06-01 09:00:00" "2011-06-01 09:00:00" "2011-06-01 09:01:00"
[4] "2011-06-01 09:06:00" "2011-06-01 09:10:00" "2011-06-01 09:15:00"
Then you could use the minute function from the lubridate package to alter the minute components of your date and round it to a 10 minutes resolution, for example :
R> library(lubridate)
R> minute(time2) <- minute(time2) %/% 10 * 10
R> time2
[1] "2011-06-01 09:00:00 CEST" "2011-06-01 09:00:00 CEST"
[3] "2011-06-01 09:00:00 CEST" "2011-06-01 09:00:00 CEST"
[5] "2011-06-01 09:10:00 CEST" "2011-06-01 09:10:00 CEST"
Try the data.table package and it's roll=TRUE feature. See ?data.table and the vignettes where it talks about fast last observation carried forward.

Resources