Date conversion as.yearmon - r

I have a dataframe that looks like this:
library(zoo)
head(monthly.station6)
[,1]
1995-02-28 00:00:00 2.07
1995-03-01 00:00:00 5.70
1995-04-30 01:00:00 0.65
1995-05-31 01:00:00 1.03
1995-06-30 01:00:00 0.77
1995-07-31 01:00:00 0.39
I am applying this code: monthly.station6[,0] <- as.yearmon(monthly.station6[,0]) to try to convert this into a year month format, but I think the fact that the date column is [,0] is preventing it? Not sure where I am going wrong, any help would be appreciated!
head(monthly.station6)
[,1]
Feb 1995 2.07
Mar 1995 5.70
Apr 1995 0.65
May 1995 1.03
Jun 1995 0.77
Jul 1995 0.39
as requested dput(head(monthly.station6)):
structure(c(2.07, 5.7, 0.65, 1.03, 0.77, 0.39), .indexCLASS = c("POSIXct",
"POSIXt"), .indexTZ = "", tclass = c("POSIXct", "POSIXt"), tzone = "", class = c("xts",
"zoo"), index = structure(c(793929600, 794016000, 799200000,
801878400, 804470400, 807148800), tzone = "", tclass = c("POSIXct",
"POSIXt")), .Dim = c(6L, 1L))

1) index The object's class is "xts", not "data.frame". Use index (or time) to modify monthly.station6:
library(xts)
index(monthly.station6) <- as.yearmon(index(monthly.station6))
2) aggregate.zoo Another possibility is to use aggregate.zoo. That will return a "zoo" object so convert it back to "xts" :
library(xts)
as.xts(aggregate(monthly.station6, as.yearmon))
3) fortify.zoo Since the question mentions data.frame, if what you really wanted was a data.frame then the first statement after library will create a data.frame with a first column of Index and the second will perform the conversion to "yearmon":
library(xts)
DF <- fortify.zoo(monthly.station6)
transform(DF, Index = as.yearmon(Index))
Note: If you want just year then it cannot be an xts object but you could represent it as a data.frame. Using DF from (3):
transform(DF, Index = as.numeric(format(Index, "%Y")))

Related

Get highest value of X rows

I'm looking for a function to calculate the highest value for the prior X periods on an XTS object. The function would return a vector with such values.
I would believe there are multiple ways to calculate this. Surprisingly I could not find this covered in a prior SO question. I am hoping there is a package with a function already defined for this. If there is none maybe someone knows how to tackle it.
The example below shows how the vector with the highest values of the last 3 periods would look like for XTS object XTS1.
library('xts')
XTS1 <- structure(c(12, 7, 7, 22, 24, 30, 26, 23, 27, 30), .indexCLASS = c("POSIXct", "POSIXt"), .indexTZ = "", tclass = c("POSIXct", "POSIXt"), tzone = "", class = c("xts", "zoo"), .CLASS = structure("double", class = "CLASS"), formattable = structure(list(formatter = "formatC", format = structure(list(format = "f", digits = 2), .Names = c("format", "digits")), preproc = "percent_preproc", postproc = "percent_postproc"), .Names = c("formatter", "format", "preproc", "postproc")), index = structure(c(1413981900, 1413982800, 1413983700, 1413984600, 1413985500, 1413986400, 1413987300, 1413988200, 1413989100, 1413990000), tzone = "", tclass = c("POSIXct", "POSIXt")), .Dim = c(10L, 1L))
#DESIRED OUTPUT
[,1] GetHighest(3)
2014-10-22 08:45:00 12 NA
2014-10-22 09:00:00 7 12
2014-10-22 09:15:00 7 12
2014-10-22 09:30:00 22 12
2014-10-22 09:45:00 24 22
2014-10-22 10:00:00 30 24
2014-10-22 10:15:00 26 30
2014-10-22 10:30:00 23 30
2014-10-22 10:45:00 27 30
2014-10-22 11:00:00 30 27
You could use rollapply from zoo.
So it would look something like this:
GetHighest_3 = rollapply(data = XTS1, width = 3, FUN = max)
Then combine it:
cbind(XTS1, GetHighest_3)
The only probelm I see, is that it will probably return NA for the first 2 values, not only the first value, since it has a width of 3.
I wasn't able to test the code, since I don't have access to R right now, so there might be some misspelling.

How to use .indexyear in xts package

I'm trying to use the .indexyear function in the xts package, but can't get my head around how it's supposed to be used.
Below are some code, you can see that the .indexyear returns 112, 113, 114, 115 for the years 2012, 2013, 2014, 2015. I want to see if a certain year exists in the xts object index, so how do I make 2012 %in% .indexyear(a) equal to TRUE?
Code
Browse[1]> index(a)
[1] "2012-12-30 00:00:00 CET" "2013-12-30 00:00:00 CET" "2014-12-30 00:00:00 CET" "2015-12-30 01:00:00 CET"
Browse[1]> .indexyear(a)
[1] 112 113 114 115
Browse[1]> 2014 %in% .index(a) # should actually be TRUE!
[1] FALSE
Browse[1]> 113 %in% .indexyear(a)
[1] TRUE
The .index* functions basically wrap the components of the POSIXlt class. So see the Details section of ?POSIXlt, which says:
'year' years since 1900.
So you need to add 1900 to the output of .indexyear to get what you want.
a <- structure(1:4, .Dim = c(4L, 1L), index = structure(c(1356847200, 1388383200,
1419919200, 1451458800), tzone = "", tclass = c("POSIXct", "POSIXt")),
class = c("xts", "zoo"), .indexCLASS = c("POSIXct", "POSIXt"),
tclass = c("POSIXct", "POSIXt"), .indexTZ = "UTC", tzone = "UTC")
2014 %in% (.indexyear(a)+1900)
# [1] TRUE

change specific hours of xts object with POSIXct index

I have a data frame which looks like this
df = data.frame (time = c("2013-12-23 00:00:00", "2013-12-23 00:13:00", "2013-12-23 00:14:00", "2013-12-23 00:14:01",
"2013-12-24 00:00:00", "2013-12-24 00:12:00", "2013-12-24 00:15:00", "2013-12-24 00:16:00"),
value = c(1, 2, 3, 4, 5, 6, 7, 8))
I transform this data frame to an xts object and use the POSIXct format for the index
df = as.xts(as.numeric(as.character(df[,"value"])), order.by = as.POSIXct(df[,"time"]))
What I now need is to change all the indices whose time is 00:00:00 to 22:00:00.
All other time indices must stay as they are.
The resulting object looks like this
>df
[,1]
2013-12-23 00:13:00 2
2013-12-23 00:14:00 3
2013-12-23 00:14:01 4
2013-12-23 22:00:00 1
2013-12-24 00:12:00 6
2013-12-24 00:15:00 7
2013-12-24 00:16:00 8
2013-12-24 22:00:00 5
Thanks for your help! Pat
We could use sub to replace the '00:00:00' to '22:00:00' in the original dataset and then do the xts conversion
df$time <- as.POSIXct(sub('00:00:00', '22:00:00', df$time),
format='%Y-%m-%d %H:%M:%S')
library(xts)
xts(df$value, order.by=df$time)
# [,1]
#2013-12-23 00:13:00 2
#2013-12-23 00:14:00 3
#2013-12-23 00:14:01 4
#2013-12-23 22:00:00 1
#2013-12-24 00:12:00 6
#2013-12-24 00:15:00 7
#2013-12-24 00:16:00 8
#2013-12-24 22:00:00 5
Here's a function that will shift the zero-hour of an xts object by n seconds.
shiftZeroHour <- function(x, n=1) {
stopifnot(is.xts(x))
# find zero hour
plt <- as.POSIXlt(index(x), tz=indexTZ(x))
isZeroHour <- plt$hour == 0 & plt$min == 0 & plt$sec == 0
# shift zero hour index values
.index(x)[isZeroHour] <- .index(x)[isZeroHour] + n
# ensure index is ordered properly
as.xts(x)
}
Here is how to use it with your sample data:
xdf <- structure(c(1, 2, 3, 4, 5, 6, 7, 8), .Dim = c(8L, 1L),
index = structure(c(1387778400, 1387779180, 1387779240, 1387779241,
1387864800, 1387865520, 1387865700, 1387865760), tzone = "",
tclass = c("POSIXct", "POSIXt")), class = c("xts", "zoo"),
.indexCLASS = c("POSIXct", "POSIXt"), tclass = c("POSIXct", "POSIXt"),
.indexTZ = "", tzone = "")
shiftZeroHour(xdf, 60*60*22)

Scatterplot of two xts time series

I've got two xts time series. A small sample of them:
ts1
[,1]
2009-05-06 00:00:00 38.414
2009-05-06 00:15:00 45.079
2009-05-06 00:30:00 38.878
2009-05-06 00:45:00 49.889
2009-05-06 01:00:00 41.270
2009-05-06 01:15:00 41.050
2009-05-06 01:30:00 38.951
2009-05-06 01:45:00 39.854
2009-05-06 02:00:00 37.803
2009-05-06 02:15:00 42.930
ts2
[,1]
2009-05-06 00:00:00 406.887
2009-05-06 00:15:00 413.298
2009-05-06 00:30:00 409.353
2009-05-06 00:45:00 412.312
2009-05-06 01:00:00 409.353
2009-05-06 01:15:00 415.271
2009-05-06 01:30:00 416.257
2009-05-06 01:45:00 416.257
2009-05-06 02:00:00 416.257
2009-05-06 02:15:00 419.216
Now I want to create a scatterplot ts1 against ts2. According to the documentation of CRAN (and I also found example in stackoverflow in the same way) it should work like this: plot(ts1, ts2). But I get an error.
plot(ts1,ts2)
# Error in plot(xycoords$x, xycoords$y, type = type, axes = FALSE, ann = FALSE, :
# object 'xycoords' not found
What's going wrong? It works great great with normal ts with the ~ sign, but this doesn't work in xts. I also tried plot(ts1[, 1], ts2[, 1]).
The easiest thing to do is to call plot.zoo directly, instead of allowing the plot generic to dispatch to plot.xts.
ts1 <-
structure(c(38.414, 45.079, 38.878, 49.889, 41.27, 41.05, 38.951,
39.854, 37.803, 42.93), .Dim = c(10L, 1L), index = structure(c(1241586000,
1241586900, 1241587800, 1241588700, 1241589600, 1241590500, 1241591400,
1241592300, 1241593200, 1241594100), tzone = "", tclass = c("POSIXct",
"POSIXt")), class = c("xts", "zoo"), .indexCLASS = c("POSIXct",
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "", tzone = "")
ts2 <-
structure(c(406.887, 413.298, 409.353, 412.312, 409.353, 415.271,
416.257, 416.257, 416.257, 419.216), .Dim = c(10L, 1L),
index = structure(c(1241586000, 1241586900, 1241587800, 1241588700,
1241589600, 1241590500, 1241591400, 1241592300, 1241593200, 1241594100),
tzone = "", tclass = c("POSIXct", "POSIXt")), class = c("xts", "zoo"),
.indexCLASS = c("POSIXct", "POSIXt"), tclass = c("POSIXct", "POSIXt"),
.indexTZ = "", tzone = "")
plot.zoo(ts1, ts2)

quantmod periodReturn function - how to handle NA Values?

I am using the quantmod function periodReturn, it yields the right results for the column with useable values.
This is the function: periodReturn(timeseries, period='weekly', type='log')
This is the input:
dax_data.csv nikkei_data.csv spx_data.csv
1990-01-04 01:00:00 NA 38713 NA
1990-01-05 01:00:00 NA 38275 NA
1990-01-08 01:00:00 NA 38295 NA
1990-01-09 01:00:00 NA 37951 NA
1990-01-10 01:00:00 NA 37697 NA
1990-01-11 01:00:00 NA 38170 NA
This is the output:
weekly.returns
1999-11-26 01:00:00 NA
1999-12-03 01:00:00 0.026679863
1999-12-10 01:00:00 -0.003482017
1999-12-17 01:00:00 0.041124348
1999-12-22 01:00:00 0.021583488
1999-12-30 01:00:00 0.069259912
I want to use all three columns (ldo).
How do I tell periodReturn to just NA all the rows without data and start as soon as one exists?
Here is the dput of the data to make this reproducible:
dput(head(timeseries))
structure(c(NA, NA, NA, NA, NA, NA, 38713, 38275, 38295, 37951,
37697, 38170, NA, NA, NA, NA, NA, NA), .Dim = c(6L, 3L), .indexCLASS = c("POSIXct",
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "", tzone = "", class = c("xts",
"zoo"), index = structure(c(631411200, 631497600, 631756800,
631843200, 631929600, 632016000), tzone = "", tclass = c("POSIXct",
"POSIXt")), .Dimnames = list(NULL, c("dax_data.csv", "nikkei_data.csv",
"spx_data.csv")))
instead of using timeseries as an argument use
timeseries[apply(!is.na(timeseries), 1, all), ]
periodReturn does not work for multi-column timeseries data. Hence, we have to apply it to all columns and combine the output
weekly_return = do.call(merge.xts,lapply(colnames(timeseries),function(x){
z = periodReturn(timeseries[,x],period = "weekly",type="log");
colnames(z) = x;
return(z)
} ))

Resources