I want to subset a range of quarterly data held inside an xts object.
I see the documentation says "xts provides facilities for indexing based on any of the current time-based classes. These include yearqtr"
However I have tried the following, which do produce a range of data but not the dates I request.
a = as.xts(ts(rnorm(20), start=c(1980,1), freq=4))
a["1983"] # Returns 1983Q2 - 1984Q1 ?
a["1983-01/"] # Begins in 1983Q2 ?
a["1981-01/1983-03"] # Returns 1981Q2 - 1983Q2 ?
a[as.yearqtr("1981 Q2")] # Correct
a[as.yearqtr("1981 Q1")/as.yearqtr("1983 Q3")] # Does not work
Looks like a timezone issue. The xts index is always a POSIXct object, even if the index class is something else. Like a Date classed index, the yearqtr (and yearmon) classed index should have the timezone set to "UTC".
> a <- as.xts(ts(rnorm(20), start=c(1980,1), freq=4), tzone="UTC")
> a["1983"]
[,1]
1983 Q1 1.4877302
1983 Q2 -0.4594768
1983 Q3 -0.1906189
1983 Q4 -1.1518943
Warning message:
timezone of object (UTC) is different than current timezone ().
You can safely ignore the warning. If it really bothers you, you can set your R session's timezone to "UTC" via:
> Sys.setenv(TZ="UTC")
> a <- as.xts(ts(rnorm(20), start=c(1980,1), freq=4))
> a["1983"]
[,1]
1983 Q2 1.84636890
1983 Q3 -0.06872544
1983 Q4 -2.29822631
1984 Q1 -1.46025131
This will never work:
a[as.yearqtr("1981 Q1")/as.yearqtr("1983 Q3")] # Does not work
It looks like you're trying to do something like: a["1981 Q1/1983 Q3"], which isn't supported because "YYYY Qq" is not an ISO8601 format.
Related
I am using data.table::fread to import a large dataset with 7.5 million rows and 56 columns.
I specify variable classes for certain variables to be read as character using colClasses argument.
Data looks fine after importing, and column classes are correctly made.
However, issues start to occur when I filter the data. Unfortunately, I can't construct an example here because I don't know exactly what the problem is.
But basically bugs occur when I use binary logic to filter some rows.
Below codes show the bug:
First of all 'id' column is read as character and str/glimpse/class/mode all confirm it. Then why does this happen:
mydata[mydata$id == 01005845, year]
[1] 2015 2014 2013 2012
mydata[mydata$id == "01005845", year]
[1] 2011 2010 2009 2008 2007 2006 2005
To further test I rechecked the data class for these specific observations, which still shows as character:
typeof(mydata[mydata$id == 01005845, id])
[1] "character"
glimpse(mydata[mydata$id == 01005845, id])
chr [1:4] "01005845" "01005845" "01005845" "01005845"
mydata[mydata$id == 01005845, id] == 01005845
TRUE TRUE TRUE TRUE
This does not make any sense to me because for some other id's all types are character and I don't get this weird result.
All in all this character - integer comparison inconsistency messes up all my analysis. My filterings do not work out correctly and the output gets affected badly.
I appreciate all your help.
Take care :)
My guess is that is because 01005845 != "01005845"
If you type 01005845 in the console, you get
>01005845
#[1] 1005845
01005845 is a number and the 0's at the beginning of the number do not hold any significance, hence it is stripped off. This issue is not related to data.table or fread but a general issue on how numbers are handled.
Since id column is character, you should use
mydata[id == "01005845", year]
This small example demonstrates the issue
library(data.table)
df <- data.table(a = c('2011', '02011', '2011', '2012', '02011'), b = letters[1:5])
df[a == 02011, ]
# a b
#1: 2011 a
#2: 2011 c
df[a == "02011", ]
# a b
#1: 02011 b
#2: 02011 e
I inherited some R code that analyses simulation results. At one point, that code calls the xts package's to.monthly function with indexAt = 'yearmon' to summarize some values in a zoo.
That code normally runs without issue. Recently, however, when analysing simulations over much older data, the call to to.monthly generated some disturbing Warning messages like this:
Warning in zoo(xx, order.by = index(x), ...) :
some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique
I culled my data down to the minimum size that still exhibits this Warning. Start with this R code:
library(xts)
z = structure(c(-1062503.35419463, -1080996.55425821, -1099783.92018741,
-1122831.06978888, -1138804.79976585, -1158620.33101501, -1163717.44859603,
-1183250.17288897, -1212428.97863421, -1234981.23171341, -1253605.89670471,
-1269885.84780747, -1272023.98376509, -1284471.17954946, -1313114.61914572,
-1334861.551294, -1349971.87378146, -1360596.77251109, -1363047.71977556,
-1383840.30131117, -1407963.97518998, -1427010.7195352, -1451908.36211767,
-1464563.94519573, -1470017.67402451, -1503642.02732151, -1529231.67395429,
-1560593.79655716, -1582052.24505653, -1595391.99583389), index = structure(c(1111985820,
1112072340, 1112158740, 1112245140, 1112331540, 1112392740, 1112587140,
1112673540, 1112759880, 1112846340, 1112932200, 1112993940, 1113191940,
1113278340, 1113364560, 1113451080, 1113537540, 1113598740, 1113796560,
1113883140, 1113969540, 1114055940, 1114142220, 1114203540, 1114401480,
1114487940, 1114574280, 1114660740, 1114747080, 1114808340), class = c("POSIXct",
"POSIXt")), class = "zoo")
class(z)
head(z)
tail(z)
Then execute this call to to.monthly:
to.monthly(z, indexAt = 'yearmon', name = "Monthly")
On my machine that generates this output:
Warning in zoo(xx, order.by = index(x), ...) :
some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique
Warning in zoo(xx, order.by = index(x), ...) :
some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique
Monthly.Open Monthly.High Monthly.Low Monthly.Close
Apr 2005 -1062503 -1062503 -1138805 -1138805
Apr 2005 -1158620 -1158620 -1595392 -1595392
Note the Warning messages, followed by the result of to.monthly, which is a zoo that has the duplicate position of "Apr 2005".
I spent some time executing the code in to.monthly line by line, and determined that the bug actually happens inside to.monthly's call to to.period.
In particular, I found that the xx local variable inside to.period is initially calculated correctly, but after the line
indexClass(xx) <- indexAt
is executed that is when the positions of xx become non-unique.
That behavior sure looks like a bug in the xts package's to.period function to me.
I would love to hear from someone who knows how to.monthly/to.period/yearmon really works either confirm that this is a bug, or explain to me why it is not and give me a work around.
I found this possibly related report on the xts github page (which I do not fully understand).
Concerning my machine:
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)
...
other attached packages:
...
xts_0.10-0
zoo_1.8-0
When I startup Rgui, I see this Warning message about xts:
Warning: package ‘xts’ was built under R version 3.4.2
This looks like a bug, unrelated to #158. The problem is that the index of z is POSIXct in your local timezone. You aggregate to monthly, which doesn't have a timezone (so xts sets the timezone attribute to "UTC").
But the change in timezone occurs on the POSIXct index, which changes the local time before the index is converted to "yearmon". So, depending on your local timezone's offset from UTC, this may convert the first (last) observation in a month into the last (first) observation of the prior (next) month.
To illustrate:
Sys.setenv(TZ = "America/Chicago")
debugonce(xts:::`indexClass<-.xts`)
to.monthly(z, indexAt="yearmon", name="monthly")
# <snip>
# Browse[2]>
# debug: attr(attr(x, "index"), "tzone") <- "UTC"
# Browse[2]> print(x) # When timezone is "America/Chicago"
# monthly.Open monthly.High monthly.Low monthly.Close
# 2005-03-31 22:59:00 -1062503 -1062503 -1138805 -1138805
# 2005-04-29 15:59:00 -1158620 -1158620 -1595392 -1595392
# Browse[2]>
# debug: attr(attr(x, "index"), "tclass") <- value
# Browse[2]> print(x) # When timezone is "UTC"
# monthly.Open monthly.High monthly.Low monthly.Close
# 2005-04-01 04:59:00 -1062503 -1062503 -1138805 -1138805
# 2005-04-29 20:59:00 -1158620 -1158620 -1595392 -1595392
# Warning message:
# timezone of object (UTC) is different than current timezone ().
You can see that the call to attr(attr(x, "index"), "tzone") <- "UTC" pushed the last observation in March into the first day of April (note that the debugger lists the next call it will evaluate above my calls to print(x)).
Thanks for narrowing it down to the indexClass<- call. That made it a lot easier for me to debug!
When I try to create an XTS object using an existing column with years as characters, my xts object automatically included today's date instead of only the year as I specified it. Is there any way to only include the year?
Here's my code:
global_totals_ts <- xts(global_totals_m[,-1], as.Date(ts_index, format = "%Y"))
and the output that I get is:
Christians Muslims Hindus Agnostics Buddhists
1900-05-17 557754602 200318122 202973290 3028610 126956371
1910-05-17 611362430 222347113 223383337 3368564 138064000
1950-05-17 870653646 338066461 323138775 129261500 175510794
1970-05-17 1229448027 570772699 462980539 544290164 234957917
2000-05-17 1987502477 1292170756 822391937 660693376 452314303
2005-05-17 2130604801 1427056087 893077485 669224713 477436475
I want the following output:
Christians Muslims Hindus Agnostics Buddhists
1900 557754602 200318122 202973290 3028610 126956371
1910 611362430 222347113 223383337 3368564 138064000
1950 870653646 338066461 323138775 129261500 175510794
1970 1229448027 570772699 462980539 544290164 234957917
2000 1987502477 1292170756 822391937 660693376 452314303
2005 2130604801 1427056087 893077485 669224713 477436475
thanks very much!
Date objects will always have days (because dates have days).
One alternative is to keep it as a date, but floor it by year. Then the dates are always the first day of the year, so that, for instance, group_by() operations will be done by year.
library(lubridate)
global_totals_ts <- xts(global_totals_m[,-1], floor_date(as.Date(ts_index, format = "%Y"), "year"))
I have a CSV file of 1000 daily prices
They are of this format:
1 1.6
2 2.5
3 0.2
4 ..
5 ..
6
7 ..
.
.
1700 1.3
The index is from 1:1700
But I need to specify a begin date and end date this way:
Start period is lets say, 25th january 2009
and the last 1700th value corresponds to 14th may 2013
So far Ive gotten this close to this problem:
> dseries <- ts(dseries[,1], start = ??time??, freq = 30)
How do I go about this? thanks
UPDATE:
managed to create a seperate object with dates as suggested in the answers and plotted it, but the y axis is weird, as shown in the screenshot
Something like this?
as.Date("25-01-2009",format="%d-%m-%Y") + (seq(1:1700)-1)
A better way, thanks to #AnandaMahto:
seq(as.Date("2009-01-25"), by="1 day", length.out=1700)
Plotting:
df <- data.frame(
myDate=seq(as.Date("2009-01-25"), by="1 day", length.out=1700),
myPrice=runif(1700)
)
plot(df)
R stores Date-classed objects as the integer offset from "1970-01-01" but the as.Date.numeric function needs an offset ('origin') which can be any staring date:
rDate <- as.Date.numeric(dseries[,1], origin="2009-01-24")
Testing:
> rDate <- as.Date.numeric(1:10, origin="2009-01-24")
> rDate
[1] "2009-01-25" "2009-01-26" "2009-01-27" "2009-01-28" "2009-01-29"
[6] "2009-01-30" "2009-01-31" "2009-02-01" "2009-02-02" "2009-02-03"
You didn't need to add the extension .numeric since R would automticallly seek out that function if you used the generic stem, as.Date, with an integer argument. I just put it in because as.Date.numeric has different arguments than as.Date.character.
I am using the following code:
dates<-seq(as.Date("1991/1/4"),as.Date("2010/3/1"),"days")
However, I would like to only have working days, how can it be done?
(Assuming that 1991/1/4 is a Monday, I would like to exclude: 1991/6/4 and 1991/7/4.
And that for each week.)
Thank you for your help.
Would this work for you? (note, it requires the timeDate package to be installed)
# install.packages('timeDate')
require(timeDate)
# A ’timeDate’ Sequence
tS <- timeSequence(as.Date("1991/1/4"), as.Date("2010/3/1"))
tS
# Subset weekdays
tW <- tS[isWeekday(tS)]; tW
dayOfWeek(tW)
You are entering your dates incorrectly. In order to use the YYYY/DD/MM input mode which is implied by 1991/1/4 being Monday, you need to have a format string in as.Date.
So the full solution assuming you want to exclude weekends is:
X <- seq( as.Date("1991/1/4", format="%Y/%m/%d"), as.Date("2010/3/1", format="%Y/%m/%d"),"days")
weekdays.X <- X[ ! weekdays(X) %in% c("Saturday", "Sunday") ]
# negation easier since only two cases in exclusion
# probably do not want to print that vector to screen.
str(weekdays.X)
Regarding your comment I am unable to reproduce. I get:
> table(weekdays(weekdays.X) )
Friday Monday Thursday Tuesday Wednesday
1000 1000 999 999 999
I came to this question while looking up business day functions, and since the OP requested "business days" instead of "weekdays", and timeDate also has the isBizday function, this answer uses that.
# A timeDate Sequence
date.sequence <- timeSequence(as.Date("1991-12-15"), as.Date("1992-01-15")); # a short example period with three London holidays
date.sequence;
# holidays in the period
years.included <- unique( as.integer( format( x=date.sequence, format="%Y" ) ) );
holidays <- holidayLONDON(years.included) # (locale was not specified by OP in question nor in profile, so this assumes for example: holidayLONDON; also supported by timeDate are: holidayNERC, holidayNYSE, holidayTSX & holidayZURICH)
# Subset business days
business.days <- date.sequence[isBizday(date.sequence, holidays)];
business.days