How to use .indexyear in xts package - r

I'm trying to use the .indexyear function in the xts package, but can't get my head around how it's supposed to be used.
Below are some code, you can see that the .indexyear returns 112, 113, 114, 115 for the years 2012, 2013, 2014, 2015. I want to see if a certain year exists in the xts object index, so how do I make 2012 %in% .indexyear(a) equal to TRUE?
Code
Browse[1]> index(a)
[1] "2012-12-30 00:00:00 CET" "2013-12-30 00:00:00 CET" "2014-12-30 00:00:00 CET" "2015-12-30 01:00:00 CET"
Browse[1]> .indexyear(a)
[1] 112 113 114 115
Browse[1]> 2014 %in% .index(a) # should actually be TRUE!
[1] FALSE
Browse[1]> 113 %in% .indexyear(a)
[1] TRUE

The .index* functions basically wrap the components of the POSIXlt class. So see the Details section of ?POSIXlt, which says:
'year' years since 1900.
So you need to add 1900 to the output of .indexyear to get what you want.
a <- structure(1:4, .Dim = c(4L, 1L), index = structure(c(1356847200, 1388383200,
1419919200, 1451458800), tzone = "", tclass = c("POSIXct", "POSIXt")),
class = c("xts", "zoo"), .indexCLASS = c("POSIXct", "POSIXt"),
tclass = c("POSIXct", "POSIXt"), .indexTZ = "UTC", tzone = "UTC")
2014 %in% (.indexyear(a)+1900)
# [1] TRUE

Related

R Group ID's with overlapping time intervals

I have a large dataset with multiple groups within the dataset of IDs with Start & Stop datetimes. What I'm trying to do is within each group identify where a subgroup occurred. A subgroup within a group would be when two ID's overlap with their START & END datetime columns. Below is script to create a sample dataset in R for one group. What I want to do is within each group create a column called, "Grp" that groups those subgroups with overlapping START & END datetimes.
What I have...
structure(list(ID = c(1,2,3,4), START = structure(c(1490904000, 1490918400,
1508363100, 1508379300), tzone = "UTC", class = c("POSIXct",
"POSIXt")), END = structure(c(1492050600, 1492247700,
1509062400, 1509031800), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), class = "data.frame", row.names = c(NA, -4L), .Names = c("ID","START",
"END"))
What I want is...
structure(list(ID = c(1,2,3,4), START = structure(c(1490904000, 1508379300,
1508363100, 1490918400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), END = structure(c(1492050600, 1509031800,
1509062400, 1492247700), tzone = "UTC", class = c("POSIXct",
"POSIXt")), Grp = c(1,2,2,1)), class = "data.frame", row.names = c(NA, -4L), .Names = c("ID","START",
"END","Grp"))
I've tried using lubridate's interval, and finding an overlap that way, but no luck. Any help would be greatly appreciated.
Atfter sorting by START, the condition for a new group is that the END of the previous row is less than the START of the next group:
head(df1$END, -1) < tail(df1$START,-1)
df1 <- structure(list(ID = c(1,2,3,4), START = structure(c(1490904000, 1490918400,
1508363100, 1508379300), tzone = "UTC", class = c("POSIXct",
"POSIXt")), END = structure(c(1492050600, 1492247700,
1509062400, 1509031800), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), class = "data.frame", row.names = c(NA, -4L), .Names = c("ID","START",
"END"))
df1
ID START END
1 1 2017-03-30 20:00:00 2017-04-13 02:30:00
2 2 2017-03-31 00:00:00 2017-04-15 09:15:00
3 3 2017-10-18 21:45:00 2017-10-27 00:00:00
4 4 2017-10-19 02:15:00 2017-10-26 15:30:00
df1a <- df1[ order(df1$START), ]
df1a$grp <- cumsum( c( 1, head(df1$END, -1) < tail(df1$START,-1) ))
df1a
#---------------
ID START END grp
1 1 2017-03-30 20:00:00 2017-04-13 02:30:00 1
2 2 2017-03-31 00:00:00 2017-04-15 09:15:00 1
3 3 2017-10-18 21:45:00 2017-10-27 00:00:00 2
4 4 2017-10-19 02:15:00 2017-10-26 15:30:00 2
Here's a function that answers the first part of my response to the comment below:
grp_overlaps <- function(endings, beginnings){
cumsum(c( 1, head(endings, -1) < tail(beginnings, -1) )) }

Date conversion as.yearmon

I have a dataframe that looks like this:
library(zoo)
head(monthly.station6)
[,1]
1995-02-28 00:00:00 2.07
1995-03-01 00:00:00 5.70
1995-04-30 01:00:00 0.65
1995-05-31 01:00:00 1.03
1995-06-30 01:00:00 0.77
1995-07-31 01:00:00 0.39
I am applying this code: monthly.station6[,0] <- as.yearmon(monthly.station6[,0]) to try to convert this into a year month format, but I think the fact that the date column is [,0] is preventing it? Not sure where I am going wrong, any help would be appreciated!
head(monthly.station6)
[,1]
Feb 1995 2.07
Mar 1995 5.70
Apr 1995 0.65
May 1995 1.03
Jun 1995 0.77
Jul 1995 0.39
as requested dput(head(monthly.station6)):
structure(c(2.07, 5.7, 0.65, 1.03, 0.77, 0.39), .indexCLASS = c("POSIXct",
"POSIXt"), .indexTZ = "", tclass = c("POSIXct", "POSIXt"), tzone = "", class = c("xts",
"zoo"), index = structure(c(793929600, 794016000, 799200000,
801878400, 804470400, 807148800), tzone = "", tclass = c("POSIXct",
"POSIXt")), .Dim = c(6L, 1L))
1) index The object's class is "xts", not "data.frame". Use index (or time) to modify monthly.station6:
library(xts)
index(monthly.station6) <- as.yearmon(index(monthly.station6))
2) aggregate.zoo Another possibility is to use aggregate.zoo. That will return a "zoo" object so convert it back to "xts" :
library(xts)
as.xts(aggregate(monthly.station6, as.yearmon))
3) fortify.zoo Since the question mentions data.frame, if what you really wanted was a data.frame then the first statement after library will create a data.frame with a first column of Index and the second will perform the conversion to "yearmon":
library(xts)
DF <- fortify.zoo(monthly.station6)
transform(DF, Index = as.yearmon(Index))
Note: If you want just year then it cannot be an xts object but you could represent it as a data.frame. Using DF from (3):
transform(DF, Index = as.numeric(format(Index, "%Y")))

Scatterplot of two xts time series

I've got two xts time series. A small sample of them:
ts1
[,1]
2009-05-06 00:00:00 38.414
2009-05-06 00:15:00 45.079
2009-05-06 00:30:00 38.878
2009-05-06 00:45:00 49.889
2009-05-06 01:00:00 41.270
2009-05-06 01:15:00 41.050
2009-05-06 01:30:00 38.951
2009-05-06 01:45:00 39.854
2009-05-06 02:00:00 37.803
2009-05-06 02:15:00 42.930
ts2
[,1]
2009-05-06 00:00:00 406.887
2009-05-06 00:15:00 413.298
2009-05-06 00:30:00 409.353
2009-05-06 00:45:00 412.312
2009-05-06 01:00:00 409.353
2009-05-06 01:15:00 415.271
2009-05-06 01:30:00 416.257
2009-05-06 01:45:00 416.257
2009-05-06 02:00:00 416.257
2009-05-06 02:15:00 419.216
Now I want to create a scatterplot ts1 against ts2. According to the documentation of CRAN (and I also found example in stackoverflow in the same way) it should work like this: plot(ts1, ts2). But I get an error.
plot(ts1,ts2)
# Error in plot(xycoords$x, xycoords$y, type = type, axes = FALSE, ann = FALSE, :
# object 'xycoords' not found
What's going wrong? It works great great with normal ts with the ~ sign, but this doesn't work in xts. I also tried plot(ts1[, 1], ts2[, 1]).
The easiest thing to do is to call plot.zoo directly, instead of allowing the plot generic to dispatch to plot.xts.
ts1 <-
structure(c(38.414, 45.079, 38.878, 49.889, 41.27, 41.05, 38.951,
39.854, 37.803, 42.93), .Dim = c(10L, 1L), index = structure(c(1241586000,
1241586900, 1241587800, 1241588700, 1241589600, 1241590500, 1241591400,
1241592300, 1241593200, 1241594100), tzone = "", tclass = c("POSIXct",
"POSIXt")), class = c("xts", "zoo"), .indexCLASS = c("POSIXct",
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "", tzone = "")
ts2 <-
structure(c(406.887, 413.298, 409.353, 412.312, 409.353, 415.271,
416.257, 416.257, 416.257, 419.216), .Dim = c(10L, 1L),
index = structure(c(1241586000, 1241586900, 1241587800, 1241588700,
1241589600, 1241590500, 1241591400, 1241592300, 1241593200, 1241594100),
tzone = "", tclass = c("POSIXct", "POSIXt")), class = c("xts", "zoo"),
.indexCLASS = c("POSIXct", "POSIXt"), tclass = c("POSIXct", "POSIXt"),
.indexTZ = "", tzone = "")
plot.zoo(ts1, ts2)

error while indexing by rows data.table with column containing interval object from package lubridate

I have data like this (changed to protect original data):
View(dose_merged)
SUBJECT_Blinded PACKID SACDPDAT SACRTDAT treatment_interval TS_SDAT TS_EDAT SD_SDAT SD_EDAT
1 1201201 10096 2012-04-25 2010-04-22 58 NA NA 2011-01-03 2013-01-02
2 1101401 10595 2012-01-03 2010-02-31 28 NA NA 2011-01-03 2013-01-02
3 1201001 10971 2011-11-04 2010-02-03 60 NA NA 2011-01-03 2013-01-02
4 1201001 12592 2012-03-01 2010-02-25 55 NA NA 2011-01-03 2013-01-02
With columns types in data table:
> mapply(class, dose_merged)
$SUBJECT_Blinded
[1] "numeric"
$PACKID
[1] "numeric"
$SACDPDAT
[1] "POSIXct" "POSIXt"
$SACRTDAT
[1] "POSIXct" "POSIXt"
$treatment_interval
[1] "Interval"
attr(,"package")
[1] "lubridate"
$SD_SDAT
[1] "POSIXct" "POSIXt"
$SD_EDAT
[1] "POSIXct" "POSIXt"
I am trying to index by rows, for example:
dose_merged[10:15,]
Then, I get error message:
Error in format(x#start, tz = x#tzone, usetz = TRUE) :
trying to get slot "start" from an object (class "Interval") that is not an S4 object
What's going on? :)
The first 4 rows of dput(dose_merge):
structure(list(SUBJECT_Blinded = c(2222001, 2201001, 2201001,
2222022), PACKID = c(10096, 10595, 10971, 12592), SACDPDAT = structure(c(1335304800,
1325545200, 1320361200, 1330556400), class = c("POSIXct", "POSIXt"
), tzone = ""), SACRTDAT = structure(c(1340316000, 1327964400,
1325545200, 1335304800), class = c("POSIXct", "POSIXt"), tzone = ""),
treatment_interval = structure(c(58, 28, 60, 55), class = structure("Interval", package = "lubridate")),
TS_SDAT = structure(c(NA_real_, NA_real_, NA_real_, NA_real_
), class = c("POSIXct", "POSIXt"), tzone = ""), TS_EDAT = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_), class = c("POSIXct", "POSIXt"
), tzone = ""), SD_SDAT = structure(c(1325545200, 1325545200,
1325545200, 1325545200), class = c("POSIXct", "POSIXt"), tzone = ""),
SD_EDAT = structure(c(1357081200, 1357081200, 1357081200,
1357081200), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("SUBJECT_Blinded",
"PACKID", "SACDPDAT", "SACRTDAT", "treatment_interval", "TS_SDAT",
"TS_EDAT", "SD_SDAT", "SD_EDAT"), sorted = "SUBJECT_Blinded", class = c("data.table",
"data.frame"), row.names = c(NA, -4L), .internal.selfref = <pointer: 0x0000000000300788>)

quantmod periodReturn function - how to handle NA Values?

I am using the quantmod function periodReturn, it yields the right results for the column with useable values.
This is the function: periodReturn(timeseries, period='weekly', type='log')
This is the input:
dax_data.csv nikkei_data.csv spx_data.csv
1990-01-04 01:00:00 NA 38713 NA
1990-01-05 01:00:00 NA 38275 NA
1990-01-08 01:00:00 NA 38295 NA
1990-01-09 01:00:00 NA 37951 NA
1990-01-10 01:00:00 NA 37697 NA
1990-01-11 01:00:00 NA 38170 NA
This is the output:
weekly.returns
1999-11-26 01:00:00 NA
1999-12-03 01:00:00 0.026679863
1999-12-10 01:00:00 -0.003482017
1999-12-17 01:00:00 0.041124348
1999-12-22 01:00:00 0.021583488
1999-12-30 01:00:00 0.069259912
I want to use all three columns (ldo).
How do I tell periodReturn to just NA all the rows without data and start as soon as one exists?
Here is the dput of the data to make this reproducible:
dput(head(timeseries))
structure(c(NA, NA, NA, NA, NA, NA, 38713, 38275, 38295, 37951,
37697, 38170, NA, NA, NA, NA, NA, NA), .Dim = c(6L, 3L), .indexCLASS = c("POSIXct",
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "", tzone = "", class = c("xts",
"zoo"), index = structure(c(631411200, 631497600, 631756800,
631843200, 631929600, 632016000), tzone = "", tclass = c("POSIXct",
"POSIXt")), .Dimnames = list(NULL, c("dax_data.csv", "nikkei_data.csv",
"spx_data.csv")))
instead of using timeseries as an argument use
timeseries[apply(!is.na(timeseries), 1, all), ]
periodReturn does not work for multi-column timeseries data. Hence, we have to apply it to all columns and combine the output
weekly_return = do.call(merge.xts,lapply(colnames(timeseries),function(x){
z = periodReturn(timeseries[,x],period = "weekly",type="log");
colnames(z) = x;
return(z)
} ))

Resources