R attach dates to time series - r

I have a spreadsheet in excel which consists of first row of dates and then subsequent columns that refer to prices of different securities on those dates.
I saved the excel file as a csv and then imported to excel using
prices=read.csv(file="C:/Documents and Settings/Hugh/My Documents/PhD/Option prices.csv",header = TRUE, sep = ",")
This creates the correct time series data
x<-ts(prices[,2])
but does not have the dates attached.
However the dates refer to working days. So although in general they represent Monday-Friday this is not always the case because of holidays etc.
How then can I create a time series where the dates are read in from the first column of the csv file? I can not find an example in R where this is done

As you didn't give any data, here is a made-up data.frame:
R> DF <- data.frame(date="2011-05-15", time=c("08:25:00", "08:45:00",
+ "09:05:11"), val=rnorm(3, 100, 5))
R> DF
date time val
1 2011-05-15 08:25:00 99.5926
2 2011-05-15 08:45:00 95.8724
3 2011-05-15 09:05:11 96.6436
R> DF <- within(DF, posix <- as.POSIXct(paste(date, time)))
R> DF
date time val posix
1 2011-05-15 08:25:00 99.5926 2011-05-15 08:25:00
2 2011-05-15 08:45:00 95.8724 2011-05-15 08:45:00
3 2011-05-15 09:05:11 96.6436 2011-05-15 09:05:11
R>
I used within(), you can use other means to in order to assign new columns. The key is that paste() allows you to combine columns, and you could use other R functions to modify the data as needed.
The key advantage of having dates and times parsed in a suitable type (like POSIXct) is that other functions can then use it. Here is zoo:
R> z <- with(DF, zoo(val, order.by=posix))
R> summary(z)
Index z
Min. :2011-05-15 08:25:00.00 Min. :95.9
1st Qu.:2011-05-15 08:35:00.00 1st Qu.:96.3
Median :2011-05-15 08:45:00.00 Median :96.6
Mean :2011-05-15 08:45:03.67 Mean :97.4
3rd Qu.:2011-05-15 08:55:05.50 3rd Qu.:98.1
Max. :2011-05-15 09:05:11.00 Max. :99.6
R>

Related

bfastts for monthly data

I am working with data collected monthly. In my dataset, there are some months where no data was collected and thus, there is no entry in my data. I have previously used bfastts for similar occurrences when data was collected daily, so that I may have NA values in my data. How may I do the same for monthly data, using bfastts or some other function?
eg. below if needed
2006-06-01 2.260121
2006-07-01 2.306800
2006-08-01 2.246624
2006-09-01 1.724565
2006-11-01 1.630561
2007-05-01 2.228918
2007-06-01 2.228918
2007-07-01 2.22891
I wish to have NA fields for December to March.
The question did not specify what class of object is desired but here are three. zoo supports an irregularly spaced index so it does not need to insert NA's but ts does not and converting from zoo to ts automatically inserts NA's. Convert the ts object back to zoo again or to a data frame to get a zoo or data frame object with NA's.
The zoo and data frame objects use yearmon class for the index which internally represents year/month as year + fraction where fraction is 0, 1/12, ..., 11/12 for Jan, Feb, ..., Dec and displays in meaningful form. as.Date can be used to convert yearmon objects to Date objects although in this case yearmon probably makes more sense since it directly represents year and month without day.
If you want to go in the other direction and remove NA's use na.omit(z_na) or na.omit(DF_na) .
library(zoo)
# zoo object - no NA's
z <- read.zoo(DF, FUN = as.yearmon)
# ts object with NA's
tt <- as.ts(z)
# zoo object with NA's
z_na <- as.zoo(tt)
# data.frame with NA's
DF_na <- fortify.zoo(tt)
Note
Lines <- "2006-06-01 2.260121
2006-07-01 2.306800
2006-08-01 2.246624
2006-09-01 1.724565
2006-11-01 1.630561
2007-05-01 2.228918
2007-06-01 2.228918
2007-07-01 2.22891"
DF <- read.table(text = Lines)

Recode "date & time variable" into two separate variables

I'm a PhD student (not that experienced in R), and I'm trying to recode a string variable, called RecordedDate into two separate variables: a Date variable and a Time variable. I am using RStudio.
An example of values are:
8/6/2018 18:56
7/26/2018 10:43
7/28/2018 8:36
I would like to you the first part of the value (example: 08/6/2018) to reformat this into a date variable, and the second part of the value (example: 18:56) into a time variable.
I'm thinking the first step would be to create code that can break this up into two variables, based on some rule. I’m thinking maybe I can separate separate everything before the "space" into the Date variable, and after the "space" in the Time variable. I am not able to figure this out.
Then, I'm looking for code that would change the Date from a "string" variable to a "date" type variable. I’m not sure if this is correct, but I’m thinking something like:
better_date <- as.Date(Date, "%m/%d/%Y")
Finally, then I would like to change theTime variable to a "time" type format (if this exists). Not sure how to do this part either, but something that indicates hours and minutes. This part is less important than getting the date variable.
Two immediate ways:
strsplit() on the white space
The proper ways: parse, and then format back out.
Only 2. will guarantee you do not end up with hour 27 or minute 83 ...
Examples:
R> data <- c("8/6/2018 18:56", "7/26/2018 10:43", "7/28/2018 8:36")
R> strsplit(data, " ")
[[1]]
[1] "8/6/2018" "18:56"
[[2]]
[1] "7/26/2018" "10:43"
[[3]]
[1] "7/28/2018" "8:36"
R>
And:
R> data <- c("8/6/2018 18:56", "7/26/2018 10:43", "7/28/2018 8:36")
R> df <- data.frame(data)
R> df$pt <- anytime::anytime(df$data) ## anytime package used
R> df$time <- format(df$pt, "%H:%M")
R> df$day <- format(df$pt, "%Y-%m-%d")
R> df
data pt time day
1 8/6/2018 18:56 2018-08-06 18:56:00 18:56 2018-08-06
2 7/26/2018 10:43 2018-07-26 10:43:00 10:43 2018-07-26
3 7/28/2018 8:36 2018-07-28 00:00:00 00:00 2018-07-28
R>
I often collect data in a data.frame (or data.table) and then add column by column.

Finding a more elegant was to aggregate hourly data to mean hourly data using zoo

I have a chunk of data logging temperatures from a few dozen devices every hour for over a year. The data are stored as a zoo object. I'd very much like to summarize those data by looking at the average values for every one of the 24 hours in a day (1am, 2am, 3am, etc.). So that for each device I can see what its average value is for all the 1am times, 2am times, and so on. I can do this with a loop but sense that there must be a way to do this in zoo with an artful use of aggregate.zoo. Any help?
require(zoo)
# random hourly data over 30 days for five series
x <- matrix(rnorm(24 * 30 * 5),ncol=5)
# Assign hourly data with a real time and date
x.DateTime <- as.POSIXct("2014-01-01 0100",format = "%Y-%m-%d %H") +
seq(0,24 * 30 * 60 * 60, by=3600)
# make a zoo object
x.zoo <- zoo(x, x.DateTime)
#plot(x.zoo)
# what I want:
# the average value for each series at 1am, 2am, 3am, etc. so that
# the dimensions of the output are 24 (hours) by 5 (series)
# If I were just working on x I might do something like:
res <- matrix(NA,ncol=5,nrow=24)
for(i in 1:nrow(res)){
res[i,] <- apply(x[seq(i,nrow(x),by=24),],2,mean)
}
res
# how can I avoid the loop and write an aggregate statement in zoo that
# will get me what I want?
Calculate the hour for each time point and then aggregate by that:
hr <- as.numeric(format(time(x.zoo), "%H"))
ag <- aggregate(x.zoo, hr, mean)
dim(ag)
## [1] 24 5
ADDED
Alternately use hours from chron or hour from data.table:
library(chron)
ag <- aggregate(x.zoo, hours, mean)
This is quite similar to the other answer but takes advantage of the fact the the by=... argument to aggregate.zoo(...) can be a function which will be applied to time(x.zoo):
as.hour <- function(t) as.numeric(format(t,"%H"))
result <- aggregate(x.zoo,as.hour,mean)
identical(result,ag) # ag from G. Grothendieck answer
# [1] TRUE
Note that this produces a result identical to the other answer, not not the same as yours. This is because your dataset starts at 1:00am, not midnight, so your loop produces a matrix wherein the 1st row corresponds to 1:00am and the last row corresponds to midnight. These solutions produce zoo objects wherein the first row corresponds to midnight.

Mean hour-of-day and imputation...would this be easier with time calculations?

I'm working with a data set and am imputing NAs for times. I have a simplified example below where I am creating a new column that includes the original data and imputed values for NAs (i.e., the mean of the time of day). The code works fine, but I am so weak with dates I was wondering if there was an easier way to calculate the mean time of day date/time values?
arrivals <- data.frame(
ships=c("Glory","Discover","Intrepid","Enchantment","Summit"),
times=c("8:00","10:00","11:42",NA,"9:20"), stringsAsFactors=FALSE)
sumtime <- sapply(strsplit(as.character(arrivals$times),":"),
function(x) as.numeric(x[1])*60 + as.numeric(x[2]))
avgtime <- paste(trunc((mean(sumtime, na.rm=TRUE)/60)),":",
trunc(mean(sumtime, na.rm=TRUE)%%60), sep="")
arrivals$times2 <- arrivals$times
arrivals$times2[is.na(arrivals$times)] <- avgtime
You can use the chron package to convert your times column to a numeric representation that you can take the average of:
library(chron)
Arrivals <- arrivals[,c("ships","times")]
# Will give some warnings due to the missing value
Arrivals$times <- chron(times.=paste(Arrivals$times, ":00", sep=""))
Arrivals$times[is.na(Arrivals$times)] <- mean(Arrivals$times,na.rm=TRUE)
ships times
1 Glory 08:00:00
2 Discover 10:00:00
3 Intrepid 11:42:00
4 Enchantment 09:45:30
5 Summit 09:20:00

What is the best method to bin intraday volume figures from a stock price timeseries using XTS / ZOO etc in R?

For instance, let's say you have ~10 years of daily 1 min data for the volume of instrument x as follows (in xts format) from 9:30am to 4:30pm :
Date.Time Volume
2001-01-01 09:30:00 1200
2001-01-01 09:31:00 1110
2001-01-01 09:32:00 1303
All the way through to:
2010-12-20 16:28:00 3200
2010-12-20 16:29:00 4210
2010-12-20 16:30:00 8303
I would like to:
Get the average volume at each minute for the entire series (ie average volume over all 10 years at 9:30, 9:31, 9:32...16:28, 16:29, 16:30)
How should I best go about:
Aggregating the data into one minute buckets
Getting the average of those buckets
Reconstituting those "average" buckets back to a single xts/zoo time series?
I've had a good poke around with aggregate, sapply, period.apply functions etc, but just cannot seem to "bin" the data correctly.
It's easy enough to solve this with a loop, but very slow. I'd prefer to avoid a programmatic solution and use a function that takes advantage of C++ architecture (ie xts based solution)
Can anyone offer some advice / a solution?
Thanks so much in advance.
First lets create some test data:
library(xts) # also pulls in zoo
library(timeDate)
library(chron) # includes times class
# test data
x <- xts(1:3, timeDate(c("2001-01-01 09:30:00", "2001-01-01 09:31:00",
"2001-01-02 09:30:00")))
1) aggregate.zoo. Now try converting it to times class and aggregating using this one-liner:
aggregate(as.zoo(x), times(format(time(x), "%H:%M:%S")), mean)
1a) aggregate.zoo (variation). or this variation which converts the shorter aggregate series to times to avoid having to do it on the longer original series:
ag <- aggregate(as.zoo(x), format(time(x), "%H:%M:%S"), mean)
zoo(coredata(ag), times(time(ag)))
2) tapply. An alternative would be tapply which is likely faster:
ta <- tapply(coredata(x), format(time(x), "%H:%M:%S"), mean)
zoo(unname(ta), times(names(ta)))
EDIT: simplified (1) and added (1a) and (2)
Here is a solution with ddply,
but you can probably also use sqldf, tapply, aggregate, by, etc.
# Sample data
minutes <- 10 * 60
days <- 250 * 10
d <- seq.POSIXt(
ISOdatetime( 2011,01,01,09,00,00, "UTC" ),
by="1 min", length=minutes
)
d <- outer( d, (1:days) * 24*3600, `+` )
d <- sort(d)
library(xts)
d <- xts( round(100*rlnorm(length(d))), d )
# Aggregate
library(plyr)
d <- data.frame(
minute=format(index(d), "%H:%M"),
value=coredata(d)
)
d <- ddply(
d, "minute",
summarize,
value=mean(value, na.rm=TRUE)
)
# Convert to zoo or xts
zoo(x=d$value, order.by=d$minute) # The index does not have to be a date or time
xts(x=d$value, order.by=as.POSIXct(sprintf("2012-01-01 %s:00",d$minute), "%Y-%m-%d %H:%M:%S") )

Resources