I inherited some R code that analyses simulation results. At one point, that code calls the xts package's to.monthly function with indexAt = 'yearmon' to summarize some values in a zoo.
That code normally runs without issue. Recently, however, when analysing simulations over much older data, the call to to.monthly generated some disturbing Warning messages like this:
Warning in zoo(xx, order.by = index(x), ...) :
some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique
I culled my data down to the minimum size that still exhibits this Warning. Start with this R code:
library(xts)
z = structure(c(-1062503.35419463, -1080996.55425821, -1099783.92018741,
-1122831.06978888, -1138804.79976585, -1158620.33101501, -1163717.44859603,
-1183250.17288897, -1212428.97863421, -1234981.23171341, -1253605.89670471,
-1269885.84780747, -1272023.98376509, -1284471.17954946, -1313114.61914572,
-1334861.551294, -1349971.87378146, -1360596.77251109, -1363047.71977556,
-1383840.30131117, -1407963.97518998, -1427010.7195352, -1451908.36211767,
-1464563.94519573, -1470017.67402451, -1503642.02732151, -1529231.67395429,
-1560593.79655716, -1582052.24505653, -1595391.99583389), index = structure(c(1111985820,
1112072340, 1112158740, 1112245140, 1112331540, 1112392740, 1112587140,
1112673540, 1112759880, 1112846340, 1112932200, 1112993940, 1113191940,
1113278340, 1113364560, 1113451080, 1113537540, 1113598740, 1113796560,
1113883140, 1113969540, 1114055940, 1114142220, 1114203540, 1114401480,
1114487940, 1114574280, 1114660740, 1114747080, 1114808340), class = c("POSIXct",
"POSIXt")), class = "zoo")
class(z)
head(z)
tail(z)
Then execute this call to to.monthly:
to.monthly(z, indexAt = 'yearmon', name = "Monthly")
On my machine that generates this output:
Warning in zoo(xx, order.by = index(x), ...) :
some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique
Warning in zoo(xx, order.by = index(x), ...) :
some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique
Monthly.Open Monthly.High Monthly.Low Monthly.Close
Apr 2005 -1062503 -1062503 -1138805 -1138805
Apr 2005 -1158620 -1158620 -1595392 -1595392
Note the Warning messages, followed by the result of to.monthly, which is a zoo that has the duplicate position of "Apr 2005".
I spent some time executing the code in to.monthly line by line, and determined that the bug actually happens inside to.monthly's call to to.period.
In particular, I found that the xx local variable inside to.period is initially calculated correctly, but after the line
indexClass(xx) <- indexAt
is executed that is when the positions of xx become non-unique.
That behavior sure looks like a bug in the xts package's to.period function to me.
I would love to hear from someone who knows how to.monthly/to.period/yearmon really works either confirm that this is a bug, or explain to me why it is not and give me a work around.
I found this possibly related report on the xts github page (which I do not fully understand).
Concerning my machine:
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)
...
other attached packages:
...
xts_0.10-0
zoo_1.8-0
When I startup Rgui, I see this Warning message about xts:
Warning: package ‘xts’ was built under R version 3.4.2
This looks like a bug, unrelated to #158. The problem is that the index of z is POSIXct in your local timezone. You aggregate to monthly, which doesn't have a timezone (so xts sets the timezone attribute to "UTC").
But the change in timezone occurs on the POSIXct index, which changes the local time before the index is converted to "yearmon". So, depending on your local timezone's offset from UTC, this may convert the first (last) observation in a month into the last (first) observation of the prior (next) month.
To illustrate:
Sys.setenv(TZ = "America/Chicago")
debugonce(xts:::`indexClass<-.xts`)
to.monthly(z, indexAt="yearmon", name="monthly")
# <snip>
# Browse[2]>
# debug: attr(attr(x, "index"), "tzone") <- "UTC"
# Browse[2]> print(x) # When timezone is "America/Chicago"
# monthly.Open monthly.High monthly.Low monthly.Close
# 2005-03-31 22:59:00 -1062503 -1062503 -1138805 -1138805
# 2005-04-29 15:59:00 -1158620 -1158620 -1595392 -1595392
# Browse[2]>
# debug: attr(attr(x, "index"), "tclass") <- value
# Browse[2]> print(x) # When timezone is "UTC"
# monthly.Open monthly.High monthly.Low monthly.Close
# 2005-04-01 04:59:00 -1062503 -1062503 -1138805 -1138805
# 2005-04-29 20:59:00 -1158620 -1158620 -1595392 -1595392
# Warning message:
# timezone of object (UTC) is different than current timezone ().
You can see that the call to attr(attr(x, "index"), "tzone") <- "UTC" pushed the last observation in March into the first day of April (note that the debugger lists the next call it will evaluate above my calls to print(x)).
Thanks for narrowing it down to the indexClass<- call. That made it a lot easier for me to debug!
Related
I have downloaded the DJI historical data from Yahoo as a csv for further analysis in R. Out of curiosity getSymbols("^DJI") didn't seem to work, but I digress.
The point is that I don't know how to turn this csv file into a time series format.
Here is the output and problem so far:
> DJI = read.csv("^DJI.csv")
> head(DJI)
Date Open High Low Close Adj.Close Volume
1 1/29/1985 1277.72 1295.49 1266.89 1292.62 1292.62 13560000
2 1/30/1985 1297.37 1305.10 1278.93 1287.88 1287.88 16820000
3 1/31/1985 1283.24 1293.40 1272.64 1286.77 1286.77 14070000
4 2/1/1985 1276.94 1286.11 1269.77 1277.72 1277.72 10980000
5 2/4/1985 1272.08 1294.94 1268.99 1290.08 1290.08 11630000
6 2/5/1985 1294.06 1301.13 1278.60 1285.23 1285.23 13800000
> chartSeries(DJI)
Error in try.xts(x, error = "chartSeries requires an xtsible object") :
chartSeries requires an xtsible object
So the {quantmod} function chartSerie is requesting an .xts file, but the Date column in DJI is not immediately recognized as such:
> DJI = as.Date(DJI$Date)
Error in charToDate(x) :
character string is not in a standard unambiguous format
EDIT after the answer below:
> head(DJI)
Open High Low Close Adj.Close Volume
1985-01-29 1277.72 1295.49 1266.89 1292.62 1292.62 13560000
1985-01-30 1297.37 1305.10 1278.93 1287.88 1287.88 16820000
1985-01-31 1283.24 1293.40 1272.64 1286.77 1286.77 14070000
1985-02-01 1276.94 1286.11 1269.77 1277.72 1277.72 10980000
1985-02-04 1272.08 1294.94 1268.99 1290.08 1290.08 11630000
1985-02-05 1294.06 1301.13 1278.60 1285.23 1285.23 13800000
> is.ts(DJI)
[1] FALSE
To convert the dates you need a format statement...
DJI$Date <- as.Date(DJI$Date,format="%m/%d/%Y")
quantmod needs dates in xts objects to be row names rather than a separate column. You should therefore also do
rownames(DJI) <- DJI$Date
DJI$Date <- NULL #to remove the column
chartSeries(DJI)
In zoo an NA yearqtr is converted to the string "NA QNA" (which is not NA). For example
library(zoo)
qq <- as.yearqtr(c('2015 Q1', NA))
is.na(as.character(qq)) == is.na(qq) # returns TRUE FALSE
In contrast with base date we have:
dd <- as.Date(c('2015-1-1', NA))
is.na(as.character(dd)) == is.na(dd) # returns TRUE TRUE
My impression is that the date behavior is the expected behavior. Should I report this to zoo? (And if so, what is the best way to do so? Email maintainer?)
Thanks for pointing out this bug. And yes, the simplest way to report such problems is by e-mail to the maintainer (= me).
I've just fixed the problem in the development version of zoo (1.8-0 to be) on R-Forge. After running install.packages("zoo", repos="http://R-Forge.R-project.org") you should get the expected behavior:
library("zoo")
qq <- as.yearqtr(c("2015 Q1", NA))
as.character(qq)
## [1] "2015 Q1" NA
is.na(as.character(qq)) == is.na(qq)
## [1] TRUE TRUE
A new CRAN release is planned in the next days or next week.
I want to subset a range of quarterly data held inside an xts object.
I see the documentation says "xts provides facilities for indexing based on any of the current time-based classes. These include yearqtr"
However I have tried the following, which do produce a range of data but not the dates I request.
a = as.xts(ts(rnorm(20), start=c(1980,1), freq=4))
a["1983"] # Returns 1983Q2 - 1984Q1 ?
a["1983-01/"] # Begins in 1983Q2 ?
a["1981-01/1983-03"] # Returns 1981Q2 - 1983Q2 ?
a[as.yearqtr("1981 Q2")] # Correct
a[as.yearqtr("1981 Q1")/as.yearqtr("1983 Q3")] # Does not work
Looks like a timezone issue. The xts index is always a POSIXct object, even if the index class is something else. Like a Date classed index, the yearqtr (and yearmon) classed index should have the timezone set to "UTC".
> a <- as.xts(ts(rnorm(20), start=c(1980,1), freq=4), tzone="UTC")
> a["1983"]
[,1]
1983 Q1 1.4877302
1983 Q2 -0.4594768
1983 Q3 -0.1906189
1983 Q4 -1.1518943
Warning message:
timezone of object (UTC) is different than current timezone ().
You can safely ignore the warning. If it really bothers you, you can set your R session's timezone to "UTC" via:
> Sys.setenv(TZ="UTC")
> a <- as.xts(ts(rnorm(20), start=c(1980,1), freq=4))
> a["1983"]
[,1]
1983 Q2 1.84636890
1983 Q3 -0.06872544
1983 Q4 -2.29822631
1984 Q1 -1.46025131
This will never work:
a[as.yearqtr("1981 Q1")/as.yearqtr("1983 Q3")] # Does not work
It looks like you're trying to do something like: a["1981 Q1/1983 Q3"], which isn't supported because "YYYY Qq" is not an ISO8601 format.
I installed the quantmod package and I'm trying to import a csv file with 1 minute intraday data. Here is a sample GAZP.csv file:
"D";"T";"Open";"High";"Low";"Close";"Vol"
20130902;100100;132.2000000;133.0500000;131.9200000;132.5000000;131760
20130902;100200;132.3700000;132.5700000;132.2500000;132.2900000;66090
20130902;100300;132.3600000;132.5000000;132.2600000;132.4700000;37500
I've tried:
> getSymbols('GAZP',src='csv')
Error in `colnames<-`(`*tmp*`, value = c("GAZP.Open", "GAZP.High", "GAZP.Low", :
length of 'dimnames' [2] not equal to array extent
> getSymbols.csv('GAZP',src='csv')
> # or
> getSymbols.csv('GAZP',env,dir="c:\\!!",extension="csv")
Error in missing(verbose) : 'missing' can only be used for arguments
How should I properly use the getSymbols.csv command to read such data?
#Vladimir, if you are not insisting to use the "getSymbols" function from the quantmod package you can import your csv file - assuming it is in your working directory - as zoo object with the line:
GAZP=read.zoo("GAZP.csv",sep=";",header=TRUE,index.column=list(1,2),FUN = function(D,T) as.POSIXct(paste(D, T), format="%Y%m%d %H%M%S"))
and convert it to a xts object if you want.
GAZP.xts <- as.xts(GAZP)
> GAZP
Open High Low Close Vol
2013-09-02 10:01:00 132.20 133.05 131.92 132.50 131760
2013-09-02 10:02:00 132.37 132.57 132.25 132.29 66090
2013-09-02 10:03:00 132.36 132.50 132.26 132.47 37500
R's base strptime function is giving me output I do not expect.
This works as expected:
strptime(20130203235959, "%Y%m%d%H%M%S")
# yields "2013-02-03 23:59:59"
This too:
strptime(20130202240000, "%Y%m%d%H%M%S")
# yields "2013-02-03"
...but this does not. Why?
strptime(20130203000000, "%Y%m%d%H%M%S")
# yields NA
UPDATE
The value 20130204000000 showed up in a log I generated on a Mac 10.7.5 system using the command:
➜ ~ echo `date +"%Y%m%d%H%M%S"`
20130204000000
UPDATE 2
I even tried lubridate, which seem to be the recommendation:
> parse_date_time(c(20130205000001), c("%Y%m%d%H%M%S"))
1 parsed with %Y%m%d%H%M%S
[1] "2013-02-05 00:00:01 UTC"
> parse_date_time(c(20130205000000), c("%Y%m%d%H%M%S"))
1 failed to parse.
[1] NA
...and then funnily enough, it printed out "00:00:00" when I added enough seconds to now() to reach midnight:
> now() + new_duration(13000)
[1] "2013-02-10 00:00:00 GMT"
I should use character and not numeric when I parse my dates:
> strptime(20130203000000, "%Y%m%d%H%M%S") # No!
[1] NA
> strptime("20130203000000", "%Y%m%d%H%M%S") # Yes!
[1] "2013-02-03"
The reason for this seems to be that my numeric value gets cast to character, and I used one too many digits:
> as.character(201302030000)
[1] "201302030000"
> as.character(2013020300000)
[1] "2013020300000"
> as.character(20130203000000)
[1] "2.0130203e+13" # This causes the error: it doesn't fit "%Y%m%d%H%M%S"
> as.character(20130203000001)
[1] "20130203000001" # And this is why anything other than 000000 worked.
A quick lesson in figuring out the type you need from the docs: In R, execute help(strptime) and see a popup similar to the image below.
The red arrow points to the main argument to the function, but does not specify the type (which is why I just tried numeric).
The green arrow points to the type, which is in the document's title.
you are essentially asking for the "zeroeth" second, which obviously doesn't exist :)
# last second of february 3rd
strptime(20130203235959, "%Y%m%d%H%M%S")
# first second of february 4rd -- notice this 'rounds up' to feb 4th
# even though it says february 3rd
strptime(20130203240000, "%Y%m%d%H%M%S")
# no such second
strptime(20130204000000, "%Y%m%d%H%M%S")
# 2nd second of february 4th
strptime(20130204000001, "%Y%m%d%H%M%S")