Creating an xts object results in altered timestamps - r

Suppose I have:
R> str(data)
'data.frame': 4 obs. of 2 variables:
$ datetime: Factor w/ 4 levels "2011-01-05 09:30:00.001",..: 1 2 3 4
$ price : num 18.3 18.3 18.3 18.3
R> data
datetime price
1 2011-01-05 09:30:00.001 18.31
2 2011-01-05 09:30:00.321 18.33
3 2011-01-05 09:30:01.511 18.33
4 2011-01-05 09:30:02.192 18.34
When I try to load this into an xts object the timestamps are subtly altered:
R> x <- xts(data[-1], as.POSIXct(strptime(data$datetime, '%Y-%m-%d %H:%M:%OS')))
R> str(x)
An ‘xts’ object from 2011-01-05 09:30:00.000 to 2011-01-05 09:30:02.191 containing:
Data: num [1:4, 1] 18.3 18.3 18.3 18.3
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr "price"
Indexed by objects of class: [POSIXct,POSIXt] TZ:
xts Attributes:
NULL
R> x
price
2011-01-05 09:30:00.000 18.31
2011-01-05 09:30:00.321 18.33
2011-01-05 09:30:01.510 18.33
2011-01-05 09:30:02.191 18.34
You'll notice that the timestamps have been altered. The first entry now occurs at 09:30:00.000 instead of what the original data said, 09:30:00.001. The third and fourth rows are also incorrect.
What's causing this? Am I doing something fundamentally wrong? I've tried various incantations to get the data into an xts object and they all seem to exhibit this behavior.
EDIT: Add sessionInfo()
R> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] xts_0.8-2 zoo_1.7-4
loaded via a namespace (and not attached):
[1] grid_2.13.1 lattice_0.19-30 tools_2.13.1
EDIT 2: If I modify my source data to be microsecond precision as follows:
datetime,price
2011-01-05 09:30:00.001000,18.31
2011-01-05 09:30:00.321000,18.33
2011-01-05 09:30:01.511000,18.33
2011-01-05 09:30:02.192000,18.34
And then load it so I have:
R> test
datetime price
1 2011-01-05 09:30:00.001000 18.31
2 2011-01-05 09:30:00.321000 18.33
3 2011-01-05 09:30:01.511000 18.33
4 2011-01-05 09:30:02.192000 18.34
And then, finally, convert it into an xts object and set the index format:
R> x <- xts(test[,-1], as.POSIXct(strptime(test$datetime, '%Y-%m-%d %H:%M:%OS')))
R> indexFormat(x) <- '%Y-%m-%d %H:%M:%OS6'
R> x
[,1]
2011-01-05 09:30:00.000999 18.31
2011-01-05 09:30:00.321000 18.33
2011-01-05 09:30:01.510999 18.33
2011-01-05 09:30:02.191999 18.34
You can see the effect as well. I was hoping that adding the extra precision would help, but unfortunately it does not.
EDIT 3: Please see #DWin's answer for an end-to-end test case that reproduces this behavior.
EDIT 4: The behavior does not appear to be millisecond oriented. The following shows the same altered result of a microsecond resolution timestamp. If I change my input data to:
R> data
datetime price
1 2011-01-05 09:30:00.001001 18.31
2 2011-01-05 09:30:00.321001 18.33
3 2011-01-05 09:30:01.511001 18.33
4 2011-01-05 09:30:02.192005 18.34
And then create an xts object:
R> x <- xts(data[-1],
as.POSIXct(strptime(as.character(data$datetime), '%Y-%m-%d %H:%M:%OS')))
R> indexFormat(x) <- '%Y-%m-%d %H:%M:%OS6'
R> x
price
2011-01-05 09:30:00.001000 18.31
2011-01-05 09:30:00.321001 18.33
2011-01-05 09:30:01.511001 18.33
2011-01-05 09:30:02.192004 18.34
EDIT 5: It would appear to be a floating point precision issue. Observe:
R> t <- as.POSIXct("2011-01-05 09:30:00.001001")
R> t
[1] "2011-01-05 09:30:00.001 CST"
R> as.numeric(t)
[1] 1294241400.0010008812
This exhibits the error behavior, and is consistent with the example in EDIT 4. However, using an example that didn't show the error:
R> t <- as.POSIXct("2011-01-05 09:30:01.511001")
R> t
[1] "2011-01-05 09:30:01.511001 CST"
R> as.numeric(t)
[1] 1294241401.5110011101
It seems as if xts or some underlying component is rounding down rather than to the nearest?

You have your times in a factor:
R> str(data)
'data.frame': 4 obs. of 2 variables:
$ datetime: Factor w/ 4 levels "2011-01-05 09:30:00.001",..: 1 2 3 4
[...]
That is not the best place to start. You need to convert to character. Hence instead of
x <- xts(data[-1], as.POSIXct(strptime(data$datetime, '%Y-%m-%d %H:%M:%OS')))
I would suggest
x <- xts(data[-1],
order.by=as.POSIXct(strptime(as.character(data$datetime),
'%Y-%m-%d %H:%M:%OS')))
In my experience, the as.character() around a factor is critical. Factors are powerful for modeling, they are however a bit of a nuisance when you get them accidentally from reading data. Use stringsAsFactor=FALSE to your advantage and avoid them on data import.
Edit: So this appears to point to the strptime/strftime implementations. To make matters more interesting, R takes some of these from the operating system and reimplements some in src/main/datetime.c.
Also, pay attention to the smallest epsilon you can add to a time variable and still have R see them as equal. On my 64-bit Linux system, this happens 10^-7 :
R> sapply(seq(1, 8), FUN=function(x) identical(now, now+1/10^x))
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
R>

It seems the problem is only in printing. Using the OP's original data:
ind <- as.POSIXct(strptime(data$datetime, '%Y-%m-%d %H:%M:%OS'))
as.numeric(ind)*1e6 # as expected
# [1] 1294241400001000 1294241400321000 1294241401511000 1294241402192000
ind # wrong
# [1] "2011-01-05 09:30:00.000 CST" "2011-01-05 09:30:00.321 CST"
# [3] "2011-01-05 09:30:01.510 CST" "2011-01-05 09:30:02.191 CST"
x <- xts(data[-1], ind)
x # wrong
# price
# 2011-01-05 09:30:00.000 18.31
# 2011-01-05 09:30:00.321 18.33
# 2011-01-05 09:30:01.510 18.33
# 2011-01-05 09:30:02.191 18.34
as.numeric(index(x))*1e6 # but the underlying index values are as expected
# [1] 1294241400001000 1294241400321000 1294241401511000 1294241402192000

I post this just so people who want to explore it can have a reproducible example which shows that it happens on more than just the OP's system. as.character to the factor does not keep it from occurring.
dat <- read.table(textConnection(" datetime\tprice
1\t2011-01-05 09:30:00.001\t18.31
2\t2011-01-05 09:30:00.321\t18.33
3\t2011-01-05 09:30:01.511\t18.33
4\t2011-01-05 09:30:02.192\t18.34"), header =TRUE, sep="\t")
as.character(dat$datetime)
#[1] "2011-01-05 09:30:00.001" "2011-01-05 09:30:00.321" "2011-01-05 09:30:01.511"
#[4] "2011-01-05 09:30:02.192"
strptime(as.character(dat$datetime), '%Y-%m-%d %H:%M:%OS')
#[1] "2011-01-05 09:30:00" "2011-01-05 09:30:00" "2011-01-05 09:30:01"
#[4] "2011-01-05 09:30:02"
as.POSIXct(strptime(as.character(dat$datetime),
'%Y-%m-%d %H:%M:%OS'))
#[1] "2011-01-05 09:30:00 EST" "2011-01-05 09:30:00 EST" "2011-01-05 09:30:01 EST"
#[4] "2011-01-05 09:30:02 EST"
x <- xts(dat[-1],
order.by=as.POSIXct(strptime(as.character(dat$datetime),
'%Y-%m-%d %H:%M:%OS')))
x
#### price
2011-01-05 09:30:00 18.31
2011-01-05 09:30:00 18.33
2011-01-05 09:30:01 18.33
2011-01-05 09:30:02 18.34
indexFormat(x) <- '%Y-%m-%d %H:%M:%OS6'
x
price
2011-01-05 09:30:00.000999 18.31
2011-01-05 09:30:00.321000 18.33
2011-01-05 09:30:01.510999 18.33
2011-01-05 09:30:02.191999 18.34
sessionInfo()
R version 2.13.1 RC (2011-07-03 r56263)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] grid splines stats graphics grDevices utils datasets methods
[9] base
other attached packages:
[1] xts_0.8-2 zoo_1.7-4 sculpt3d_0.2-2 RGtk2_2.20.12
[5] rgl_0.92.798 survey_3.24 hexbin_1.26.0 spam_0.23-0
[9] xtable_1.5-6 polspline_1.1.5 Ryacas_0.2-10 XML_3.4-0
[13] rms_3.3-1 Hmisc_3.8-3 survival_2.36-9 sos_1.3-0
[17] brew_1.0-6 lattice_0.19-30
loaded via a namespace (and not attached):
[1] cluster_1.14.0 tools_2.13.1

Related

Why does Rcpp corrupt xts object?

Say I have an xts object and return the index via an Rcpp function. Touching the xts object in this way seems to corrupt the xts object.
It can be fixed by forcing a deep copy.
While i do have a work-around, i don't understand why the problem exists -- or why my hack is required?
Using the suggested code from dirk's Rcpp Gallery, the xts object is corrupted once touched.
// [[Rcpp::export]]
DatetimeVector xtsIndex(NumericMatrix X) {
DatetimeVector v(NumericVector(X.attr("index")));
return v;
}
require(xts)
xx <- xts(1:10, order.by = seq.Date(Sys.Date(), by = "day", length.out = 10))
xtsIndex(xx)
...
> print(xx)
Error in Ops.POSIXt(.index(x), 86400) :
'%/%' not defined for "POSIXt" objects
Tweaking the code to force a deep copy prevents the corruption.
// [[Rcpp::export]]
DatetimeVector xtsIndex_deep(NumericMatrix X) {
DatetimeVector v = clone(NumericVector(X.attr("index")));
return v;
}
> xtsIndex_deep(xx)
[1] "2021-05-13 UTC" "2021-05-14 UTC" "2021-05-15 UTC" "2021-05-16 UTC" "2021-05-17 UTC"
[6] "2021-05-18 UTC" "2021-05-19 UTC" "2021-05-20 UTC" "2021-05-21 UTC" "2021-05-22 UTC"
> xx
[,1] [,2]
2021-05-13 1 10
2021-05-14 2 9
2021-05-15 3 8
2021-05-16 4 7
2021-05-17 5 6
2021-05-18 6 5
2021-05-19 7 4
2021-05-20 8 3
2021-05-21 9 2
2021-05-22 10 1
what's going on?
I cannot reproduce that with a simpler attribute extraction function all is well and xx is not altered:
> cppFunction("SEXP xtsIndex(NumericMatrix X) { SEXP s = X.attr(\"index\"); return s; } ")
> xx <- xts(1:10, order.by = seq.Date(Sys.Date(), by = "day", length.out = 10))
> head(xx)
[,1]
2021-05-13 1
2021-05-14 2
2021-05-15 3
2021-05-16 4
2021-05-17 5
2021-05-18 6
>
> xtsIndex(xx)
[1] 1620864000 1620950400 1621036800 1621123200 1621209600 1621296000
[7] 1621382400 1621468800 1621555200 1621641600
attr(,"tzone")
[1] "UTC"
attr(,"tclass")
[1] "Date"
>
> head(xx)
[,1]
2021-05-13 1
2021-05-14 2
2021-05-15 3
2021-05-16 4
2021-05-17 5
2021-05-18 6
>
The function xtsIndex will create a copy on input (as our xts object contains an integer sequence as data the NumericMatrix will surely be a copied object, but it retains the attribute we can extract).
Note, however, how the Date sequence from xx is now displayed in units of a POSIXct or Datetime. This looks like a possible error from a coercion which xts (or possibly Rcpp but I think it is xts here) may do here. You are probably better off starting with a POSIXct time object even if it daily data.
Doing so also allow us to properly type the extractor function for Datetime:
> cppFunction("DatetimeVector xtsIndex(NumericMatrix X) {
return DatetimeVector(wrap(X.attr(\"index\"))); } ")
> xx <- xts(1:10, order.by = as.POSIXct(seq.Date(Sys.Date(), by = "day", length.out = 10)))
> head(xx)
[,1]
2021-05-12 19:00:00 1
2021-05-13 19:00:00 2
2021-05-14 19:00:00 3
2021-05-15 19:00:00 4
2021-05-16 19:00:00 5
2021-05-17 19:00:00 6
> head(xtsIndex(xx))
[1] "2021-05-12 19:00:00 CDT" "2021-05-13 19:00:00 CDT" "2021-05-14 19:00:00 CDT"
[6] "2021-05-15 19:00:00 CDT" "2021-05-16 19:00:00 CDT" "2021-05-17 19:00:00 CDT"
> head(xx)
[,1]
2021-05-12 19:00:00 1
2021-05-13 19:00:00 2
2021-05-14 19:00:00 3
2021-05-15 19:00:00 4
2021-05-16 19:00:00 5
2021-05-17 19:00:00 6
>

How to create a sequence of times with different intervals based on a start time

I want to generate a sequence of times based on a starting time, with intervals of first 15 minutes, then a repeat of an interval of 30 minutes and 4x 20 minutes. I'm very new to R, so don't really know where to start with this.
start_time <- "2019-08-13 12:00"
intervals <- c(0, 15, 30, rep(20,4))
seconds <- cumsum(intervals)*60
lapply(seconds, function(x)strptime(start_time, format="%Y-%m-%d %H:%M") + x)
# [[1]]
# [1] "2019-08-13 12:00:00 +08"
#
# [[2]]
# [1] "2019-08-13 12:15:00 +08"
#
# [[3]]
# [1] "2019-08-13 12:45:00 +08"
#
# [[4]]
# [1] "2019-08-13 13:05:00 +08"
#
# [[5]]
# [1] "2019-08-13 13:25:00 +08"
#
# [[6]]
# [1] "2019-08-13 13:45:00 +08"
#
# [[7]]
# [1] "2019-08-13 14:05:00 +08"
If you want to work with date and time, the answer by #Adam Quek is OK. Alternatively if you want to work with times only, you can use the "times" class found in the chron package.
For example, you can set your starting time to 9 o’clock by typing:
starting<-times("09:00:00")
Then you can add 15 minutes by:
starting+times("00:15:00")
[1] 09:15:00
You can also sum a fraction with the number of minutes you want to add in the numerator and the number of minutes in one day (60*24=1440) in the denominator
starting+15/1440
[1] 09:15:00
Therefore, you can create your sequence by:
minutesToAdd<-c(0,15,30,rep(20,4))
starting + cumsum(minutesToAdd/1440)
[1] 09:00:00 09:15:00 09:45:00 10:05:00 10:25:00 10:45:00 11:05:00

Why does dplyr convert POSIXct objects

I have a date-time object of class POSIXct. I need to adjust the values by adding several hours. I understand that I can do this using basic addition. For example, I can add 5 hours to a POSIXct object like so:
x <- as.POSIXct("2009-08-02 18:00:00", format="%Y-%m-%d %H:%M:%S")
x
[1] "2009-08-02 18:00:00 PDT"
x + (5*60*60)
[1] "2009-08-02 23:00:00 PDT"
Now I have a data frame in which some times are ok and some are bad.
> df
set_time duration up_time
1 2009-05-31 14:10:00 3 2009-05-31 11:10:00
2 2009-08-02 18:00:00 4 2009-08-02 23:00:00
3 2009-08-03 01:20:00 5 2009-08-03 06:20:00
4 2009-08-03 06:30:00 2 2009-08-03 11:30:00
Note that the first data frame entry has an 'up_time' less than the 'set_time'. So in this context a 'good' time is one where the set_time < up_time. And a 'bad' time is one in which set_time > up_time. I want to leave the good entries alone and fix the bad entries. The bad entries should be fixed by creating an 'up_time' that is equal to the 'set_time' + duration. I do this with the following dplyr pipe:
df1 <- tbl_df(df) %>% mutate(up_time = ifelse(set_time > up_time, set_time +
(duration*60*60), up_time))
df1
# A tibble: 4 x 3
set_time duration up_time
<dttm> <dbl> <dbl>
1 2009-05-31 14:10:00 3. 1243815000.
2 2009-08-02 18:00:00 4. 1249279200.
3 2009-08-03 01:20:00 5. 1249305600.
4 2009-08-03 06:30:00 2. 1249324200.
Up time has been coerced to numeric:
> str(df1)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 4 obs. of 3 variables:
$ set_time: POSIXct, format: "2009-05-31 14:10:00" "2009-08-02 18:00:00"
"2009-08-03 01:20:00" "2009-08-03 06:30:00"
$ duration: num 3 4 5 2
$ up_time : num 1.24e+09 1.25e+09 1.25e+09 1.25e+09
I can convert it back to the desired POSIXct format using:
> as.POSIXct(df1$up_time,origin="1970-01-01")
[1] "2009-05-31 17:10:00 PDT" "2009-08-02 23:00:00 PDT" "2009-08-03 06:20:00
PDT" "2009-08-03 11:30:00 PDT"
But I feel like this last step shouldn't be necessary. Can I/How can I avoid having dplyr change my variable formatting?

Class changing when assigning rows from vector. R 3.10, Windows 7, 64bit

I have a POSIXct class vector containing am hours and I want to replace the values in a data frame containing a character class column. When I do the replacement the class changes to character. I'm proceeding as follows:
class(data2014.im.t[,2])
[1] "character"
class(horas.am)
[1] "POSIXct" "POSIXt"
head(horas.am)
[1] "1970-01-01 09:00:00 COT" "1970-01-01 10:00:00 COT" "1970-01-01 11:00:00 COT" "1970-01-01 12:00:00 COT"
[5] "1970-01-01 01:00:00 COT" "1970-01-01 02:00:00 COT"
data2014.im.t[grep("([a])", data2014.im.t[,2]), 2] <- horas.am
class(data2014.im.t[,2])
[1] "character"
head(data2014.im.t[,2])
[1] "50400" "54000" "57600" "104400" "64800" "68400"
Evidently I would like to have a POSIXct column containing hours. Any thoughts?
You should explicitly do the conversion yourself
#sample data
horas.am <- seq(as.POSIXct("2014-01-01 05:00:00"), length.out=10, by="2 hours")
data2014.im.t <- data.frame(a=1:10, b=rep("a",10), stringsAsFactors=FALSE)
class(data2014.im.t[,2])
# [1] "character"
class(horas.am)
# [1] "POSIXct" "POSIXt"
# NO:
data2014.im.t[grep("([a])", data2014.im.t[,2]), 2] <- horas.am
# YES
data2014.im.t[grep("([a])", data2014.im.t[,2]), 2] <- as.character(horas.am)
data2014.im.t
# a b
# 1 1 2014-01-01 05:00:00
# 2 2 2014-01-01 07:00:00
# 3 3 2014-01-01 09:00:00
# 4 4 2014-01-01 11:00:00
# 5 5 2014-01-01 13:00:00
# 6 6 2014-01-01 15:00:00
# 7 7 2014-01-01 17:00:00
# 8 8 2014-01-01 19:00:00
# 9 9 2014-01-01 21:00:00
# 10 10 2014-01-01 23:00:00
class(data2014.im.t[,2])
# [1] "character"

Why are my csv file dates not parsing via mdy (lubridate)

I'm hoping someone can shed some light on why lubridate isn't parsing my dates correctly. I'm reading in a fairly large csv file to a data frame, so my issue isn't necessarily reproducible, but I'll show my steps:
require("lubridate")
pricingAddy = "C/DailyData.csv"
pricingData = as.data.frame(read.csv(pricingAddy, header = TRUE, stringsAsFactors = FALSE))
sampleHead = head(pricingData)
sampleHead
Symbol TradeDate PX_OPEN PX_HIGH PX_LOW PX_LAST PX_VOLUME MOV_AVG_20D MOV_AVG_50D MOV_AVG_100D MOV_AVG_200D
1 A 1/2/2014 57.10 57.100 56.15 56.21 1916160 56.0765 53.7096 51.5385 47.7321
2 A 1/3/2014 56.39 57.345 56.26 56.92 1866651 56.2435 53.8276 51.6432 47.8032
3 A 1/6/2014 57.40 57.700 56.56 56.64 1777472 56.4005 53.9474 51.7404 47.8781
4 A 1/7/2014 56.95 57.630 56.93 57.45 1463208 56.5315 54.0740 51.8498 47.9591
5 A 1/8/2014 57.33 58.540 57.17 58.39 2659468 56.6980 54.2044 51.9641 48.0454
6 A 1/9/2014 58.40 58.680 57.87 58.41 1757647 56.8515 54.3428 52.0803 48.1284
mdy(sampleHead["TradeDate"])
[1] NA
Warning message:
All formats failed to parse. No formats found.
dts = c("1/2/2014", "1/3/2014", "1/6/2014", "1/7/2014", "1/8/2014", "1/9/2014")
sampleHead["TradeDate"] == dts
TradeDate
1 TRUE
2 TRUE
3 TRUE
4 TRUE
5 TRUE
6 TRUE
mdy(dts)
[1] "2014-01-02 UTC" "2014-01-03 UTC" "2014-01-06 UTC" "2014-01-07 UTC" "2014-01-08 UTC" "2014-01-09 UTC"
Any takers? I haven't seen this before. Thanks in advance...

Resources