Converting date to numeric but limiting to number of days in year - r

I want to create date object between 2008-01-01 and 2010-12-31 around 10K of them. I wrote the code for that but I actually want to keep days 1-366 in 2008 because of 2008-02-29 (leap year) I want them to restart after 366 then become 1 on 2009-01-01. I can do this as create only for 2008 then 2009 then 2010 but it won't be convenient. I was reading about lubridate but could not figure it out. I can also filter 1 to 366 then 367-731 but that's not gonna be efficient as well. Anyone knows a better way to do it?
set.seed(123)
tim1=sample(365*3+1,10000,replace = TRUE) ### that plus 1 from feb 29 in 2008
dat1=as.Date(tim1,origin="2007-12-31") # then 1 will be 2008-01-01

You can create a vector of all the target dates and sample from it. To create the vector, there is seq.Date, the seq method for objects of class "Date".
start <- as.Date("2008-01-01")
end <- as.Date("2010-12-31")
s <- seq(start, end, by = "days")
The vector s includes all days between start and end. Now sample from it.
set.seed(123)
dat1 <- sample(s, 10000, TRUE)
Transform the sample into day-of-the-year. See help("strptime")
as.numeric(format(dat1, format = "%j"))
In the end, remove s, it's no longer needed.
rm(s) # tidy up
Edit.
The following two functions do what the question asks for but with two different methods.
f1 is the code above wrapped in a function, f2 uses ave/seq_along/match and is a bit more complicated. The tests show function f2 to be twice as fast than f1
f1 <- function(start_date, end_date, n){
start <- as.Date(start_date)
end <- as.Date(end_date)
s <- seq(start, end, by = "days")
y <- sample(s, n, replace = TRUE)
as.numeric(format(y, format = "%j"))
}
f2 <- function(start_date, end_date, n){
start <- as.Date(start_date)
end <- as.Date(end_date)
s <- seq(start, end, by = "days")
y <- sample(s, n, replace = TRUE)
z <- ave(as.integer(s), lubridate::year(s), FUN = seq_along)
z[match(y, s)]
}
set.seed(123)
x1 <- f1("2008-01-01", "2010-12-31", 100)
set.seed(123)
x2 <- f2("2008-01-01", "2010-12-31", 100)
all.equal(x1, x2)
#[1] TRUE
Now the tests.
library(microbenchmark)
mb <- microbenchmark(
f1 = f1("2008-01-01", "2010-12-31", 1e4),
f2 = f2("2008-01-01", "2010-12-31", 1e4),
times = 50
)
print(mb, order = "median")
ggplot2::autoplot(mb)

Related

How to calculate the historical monthly volatility from daily returns in R?

First I created an xts object, which contains 36 time series showing daily prices from 1980-01-02 to 2020-10-06.
ENERGY_data$time <- as.Date(ENERGY_data$time, format("%Y/%m/%d"))
ENERGY_xts <- ENERGY_data[order(ENERGY_data$time), ]
ENERGY_xts <- as.xts(ENERGY_xts[, 2:37], order.by=ENERGY_xts$time)
Then I calculated the continuously compounded daily returns by using the PerformanceAnalytics function CalculateReturns()
ENERGY_returns.cc <- CalculateReturns(ENERGY_xts, method="compound")
Now I would like to calculate the volatility for each month going from 1980-01-02 to 2020-10-06 on the basis of this formula:
MONTHLY VOLATILITY FORMULA
Could you please give me some hints (in terms of coding)?
Take a look at this function and please note that I simulated returns, since you didn't provide yours.
library(xts)
set.seed(123)
returns <- matrix(rnorm(30*365*5, 0.0001, 0.0002), ncol = 30)
timeindex <- seq.Date(from = as.Date('2000-01-01'), length.out = 365*5, by = 'days')
test_xts <- xts(returns, order.by = timeindex)
calcFrenchVolOneAsset <- function(x){
ndays <- nrow(x)
first_part_of_formula <- sum(x^2)
second_part_of_formula <- 2*sum(x[-1]*x[-nrow(x)])
res <- sqrt(first_part_of_formula + second_part_of_formula)
return(res)
}
calcFrenchVolMultipleAssets <- function(x){
ndays <- nrow(x)
first_part_of_formula <- colSums(x^2)
second_part_of_formula <- 2*colSums(x[-1, ]*x[-nrow(x), ])
res <- sqrt(first_part_of_formula + second_part_of_formula)
return(res)
}
# test for the first month and the first asset
calcFrenchVolOneAsset(test_xts['2000-01', 1])
calcFrenchVolMultipleAssets(test_xts['2000-01', 1])
# apply monthly and on columns
monthly_vols <- apply.monthly(test_xts, calcFrenchVolMultipleAssets)
head(monthly_vols[, c(1:5)])
e1 e1.1 e1.2 e1.3 e1.4
2000-01-31 0.002030192 0.002402946 0.001717494 0.001888513 0.002322648
2000-02-29 0.001983995 0.002343783 0.001789346 0.001671332 0.001824278
2000-03-31 0.001910535 0.002429689 0.001709092 0.002492223 0.002068032
2000-04-30 0.001765052 0.002114554 0.001946232 0.002160436 0.002139949
2000-05-31 0.002269842 0.002476424 0.001626455 0.002030027 0.002400690
2000-06-30 0.002082933 0.001905620 0.001681579 0.001992082 0.002010535

Subsetting time series data by 3 days and saving in the list

Sub setting time series by 3 days and keep saving in list. Such that let suppose first subset is from day 1 to day 3 then the second subset would be from day 2 to day 4 such that every subset has a data of 3 days. Note this is 10 minute data time stamped. And saving every subset in the list depending on the total number of days dat available in the data.
i have tried reproducing it.
time_10 <- seq(ISOdatetime(2001,2,1,0,0,0), ISOdatetime(2001,3,31,0,0,0), by=(200))
a <- as.data.frame(matrix(, nrow = length(time_10), ncol = 4))
names(a)<- c("time_10","var1","var2","var3")
a$time_10 <- time_10
a$var1 <- runif(nrow(a), min=20, max=70)
a$var2 <- runif(nrow(a), min=10, max=50)
a$var3 <- runif(nrow(a), min= 3, max=10)
head(a)
Here is the image I am attaching which will give idea of sub setting the time series data and save it in a list.Here D1:day1,D2:day2,D3:day3 respectively. Using For loop or any other optimum method is appreciated .Note : So inside the For loop 3 days of sub setting of data will keep on happening and getting saved in a list with index as (Subset_n) where n is subset number.
This appears to work for your case. What I do is extract the date and roll a 3 day window (ww) from beginning to the end. For each step, I subset a data.frame based on days in the window and store it into a list.
time_10 <- seq(ISOdatetime(2001,2,1,0,0,0), ISOdatetime(2001,3,31,0,0,0), by=(200))
var1 <- runif(length(time_10), min=20, max=70)
var2 <- runif(length(time_10), min=50, max=90)
var3 <- runif(length(time_10), min=50, max=90)
a <- as.data.frame(matrix(, nrow = length(time_10), ncol = 4))
names(a)<- c("time_10","var1","var2","var3")
a$time_10 <- time_10
a$var1 <- var1
a$var2 <- var2
a$var3 <- var3
date <- strptime(a$time_10, format = "%Y-%m-%d")
td10 <- sort(unique(date))
ww <- 3
out <- vector("list", length(td10) - round(ww/2)) # preallocate a list
for (i in 1:length(td10)) {
bb <- i:(i + ww - 1) # this is the bounding box
if (max(bb) > length(td10)) {
message("End of time series reached, exiting.")
return(NULL)
}
out[[i]] <- a[date %in% td10[bb], ]
}
# check ranges of dates for each subset
lapply(out, FUN = function(x) range(x$time_10))
I believe the following code does what is asked for. It uses function minutes from package lubridate to make date/time arithmetics easier.
days3 <- lubridate::days(3)
d1 <- a$time_10[1]
d2 <- a$time_10[nrow(a)] - lubridate::days(2)
res <- lapply(seq(d1, d2, by = "1 days"), function(d){
i <- which(d <= a$time_10 & a$time_10 < d + days3)
a[i, ]
})
Edit.
I find the number of rows in each dataframe of res cumbersome, making it difficult to check whether the code above did produce the expected result. Here is a way to check it.
check <- lapply(res, function(DF) lubridate::day(DF$time_10))
check <- sapply(check, function(x) rle(x)$values)
head(check, 3)
#[[1]]
#[1] 1 2 3
#
#[[2]]
#[1] 2 3 4
#
#[[3]]
#[1] 3 4 5
rm(check) # tidy up
Data.
I will repost the data creation code, since the original wasn't reproducible.
set.seed(8893)
time_10 <- seq(ISOdatetime(2001,2,1,0,0,0), ISOdatetime(2001,3,31,0,0,0), by=(200))
var1 <- runif(length(time_10), min=20, max=70)
var2 <- runif(length(time_10), min=50, max=90)
var3 <- runif(length(time_10), min=50, max=90)
a <- data.frame(time_10, var1, var2, var3)

Return number of Business days since deadline as integer and add Business days to date with dplyr

I want a function to return the number of Business days since a specific date and to add Business Days to a date, accounting for NAs
However, my solution is sloppy and there should be a more elegant way.
library(dplyr)
library(timeDate)
library(RQuantLib)
library(lubridate)
item <- c("a", "b")
date1 = as.Date(c("2017-11-30", "2017-11-01"))
date2 = as.Date(c("2017-12-01", "2017-11-16"))
d <- data.frame(item, date1, date2, stringsAsFactors=F)
line3 <- c("c", "2017-12-03", NA)
line4 <- c("d", NA, "2017-12-03")
d <- rbind(d, line3, line4)
This function works, but runs very slow accross multiple items, also not very legible.
bizDeadline <- function(x, nBizDys = 10) {
output <- Reduce(rbind, Map((function(x, howMuch = 15) {
x <- as.Date(x, origin = "1960-01-01")
days <- x + 1:(howMuch * 2)
Deadline <- days[isBizday(as.timeDate(days))][howMuch]
data.frame(DateIn = x, Deadline, DayOfWeek = weekdays(Deadline), TimeDiff = difftime(Deadline,
x))
}), x, howMuch = nBizDys))
output$Deadline
}
This would be ideal to exclude holidays and weekends.
d %>% mutate(deadline = bizDeadline(date1, 10))
d$DaysOverdue <- NA
This works with a loop: but doesn't work in vectorized Mutate.
i = 1
for(i in 1:nrow(d)){
d$DaysOverdue[i] = businessDaysBetween("UnitedStates", d$date1[i], today())
}
This function from RQuantLib seems not to be vectorized
d %>% mutate(od = businessDaysBetween("UnitedStates", date1, today())
Any better solutions?
So, I recommend you to use Vectorize function in R. This is easy well to vectorize some function. P.s. This function can't to deal with NA
businessDaysBetween_vec <- Vectorize(businessDaysBetween,vectorize.args = c('from', 'to'))
d[1:2,] %>% mutate(od = businessDaysBetween_vec("UnitedStates", date1, today()))
#Checking and comparing speed of solution
foo_loop <- function() {
for(i in 1:2){
d$DaysOverdue[i] = businessDaysBetween("UnitedStates", d$date1[i], today())
}
}
require(microbenchmark)
require(ggplot2)
res <- microbenchmark(businessDaysBetween_vec(),foo_loop(),times = 1e5)
autoplot(res)

Holt Winters Forecast with Multiple Input Variables

For context, I'm a novice R user, so please forgive any incorrect terminology/processes. I am actively trying to improve my coding ability, but recently have become stumped.
I have the following data set where A * B * C = Output:
Date A B C Output
1/1/2013 177352 0.908329198 0.237047935 38187
1/2/2013 240724 0.852033865 0.237273592 48666
1/3/2013 243932 0.908380204 0.237039845 52524
1/4/2013 221485 0.820543152 0.236356733 42955
1/5/2013 202590 0.818066045 0.240900973 39925
1/6/2013 238038 0.770057722 0.247344561 45339
1/7/2013 271511 0.794258796 0.241252029 52026
1/8/2013 283434 0.807817693 0.233810703 53534
1/9/2013 275016 0.843220031 0.243769917 56530
1/10/2013 255266 0.797791324 0.238562428 48583
1/11/2013 226564 0.815791564 0.236153417 43648
1/12/2013 214366 0.800066242 0.237961133 40812
1/13/2013 256946 0.764845532 0.237640186 46702
1/14/2013 282298 0.816537843 0.234257528 53998
I have a few years worth of data and I'm trying for forecast Output, using A, B, and C. However, when I model out A, B, and C individually, the Output becomes very skewed. If I forecast just Output then I lose the input factors.
What is the best package/code to accomplish this task? I've tried Googling and searching on here numerous different methods, but haven't found the solution I'm looking for.
Here is some of the code:
DataSet1[,"Date"] <- mdy(DataSet[,"Date"])
DataSet1
TotalSet <- ts(DataSet1, frequency = 365, start =c(2013,1))
DataA <- ts(DataSet1$A, frequency = 365, start = c(2013,1))
DataB <- ts(DataSet1$B, frequency = 365, start = c(2013,1))
DataC <- ts(DataSet1$C, frequency = 365, start = c(2013,1))
OutputData <- ts(DataSet$Output, frequency = 365, start = c(2013,1))
ADecompose <- decompose(DataA)
BDecompose <- decompose(DataB)
CDecompose <- decompose(DataC)
OutputDecompose <- decompose(OutputData)
DataAHW <- HoltWinters(DataA, seasonal = "mult")
DataBHW <- HoltWinters(DataB, seasonal = "mult")
DataCHW <- HoltWinters(DataC, seasonal = "mult")
OutputDataHW <- HoltWinters(OutputData, seasonal = "mult")
FC.A <- forecast.HoltWinters(DataAHW)
FC.B <- forecast.HoltWinters(DataBHW)
FC.C <- forecast.HoltWinters(DataCHW)
FC.Output <- forecast.HoltWinters(OutputDataHW)
plot(ForecastVisits)
plot(ForecastCPV)
plot(ForecastRPC)
plot(ForecastRevenue)
Here is another model I built for the Output and I've plugged A, B, and C into it individually then combined them in excel. I'm sure there is a more appropriate way to handle this, but given my lack of experience I am reaching out for help
dataset <- testData
##FORECAST
forecastingFuntion <- function(dataset, lenghtOfForecast)
{
dataset[,"Date"] <- mdy(dataset[,"Date"])
myts <- ts(dataset[,"DataSet$Output"], start = c(2013,1), frequency = 365)
hwModel <- HoltWinters(myts, seasonal = "mult")
future <- data.frame(predict(hwModel, n.ahead = lenghtOfForecast, level = 0.9))
fittedValues <- data.frame(as.numeric(hwModel$fitted[,"xhat"]))
names(fittedValues) <- "fit"
futureDates <- c()
predicitedValues <- rbind(fittedValues, future)
for(i in 1: lenghtOfForecast)
{
futureDateSingle <- data.frame(dataset[nrow(dataset),"Date"] + days(i))
futureDates <- rbind(futureDates, futureDateSingle)
}
names(futureDates) <- "Date"
dates <- data.frame(dataset[366:(nrow(dataset)),"Date"])
names(dates) <- "Date"
dates <- rbind(dates, futureDates)
predictedData <- data.frame(predicitedValues, dates)
names(predictedData) <- c("predictedValues","Date")
finalData2 <- mergeData <- merge(predictedData, dataset, all.x = T, all.y = F, by = "Date")
finalData2
}
finalData2 <- forecastingFuntion(testData, 612)
rm(list=setdiff(ls(), c("finalData2")))
write.csv(finalData2, file="B2BForecastVisits.csv")
Thanks!

Looking up tickers for different time periods in a loop with quantmod

I'm able to loop through and calculate the overnight/over-weekend returns for a list of tickers when the time period is the same for every ticker, but am having trouble when the time period I want to look up is different for each ticker.
For example, with:
symbols <- c("AAPL", "GOOG"," MSFT")
dates <- as.Date(c("2015-01-04", "2015-01-05", "2015-01-06"))
example.df <- data.frame(tickers, dates)
example.df
tickers dates
1 AAPL 2015-01-04
2 GOOG 2015-01-05
3 MSFT 2015-01-06
I'd want the overnight return for AAPL between 2015-01-04 and 2015-01-05, for GOOG between 2015-01-05 and 2015-01-06, etc. If it was a Friday, I'd want the next Monday.
I can can get what I'm looking for by looking up each individual ticker like this:
library(quantmod)
library(dplyr)
# date range accounts for weekends
getSymbols("AAPL", from = "2016-01-04", to = "2016-01-08")
data <- as.data.frame(AAPL)
colnames(data) <- c("open","high","low","close","volume","adj.")
# overnight return calculation
data$overnight.return <- data$open / lag(data$close, default = 0) - 1
data$overnight.return <- paste(round(data$overnight.return * 100, 3), "%",sep= "")
# the overnight/over-weekend returns for the specified date
data.df.final <- slice(data, 2)
Of course that's terribly slow.
Here's as far as I was able to get trying to make a loop out of it:
# needs to be a loop itself and inside the other 'for' loop somehow I think
symbol.list <- example.df[,1]
start <- data[,2]
end <- data[,2] + days(3)
results <- NULL
for (i in symbol.list) {
data <- getSymbols(Symbols = i,
src = "yahoo",
from = start, to = end,
auto.assign = FALSE)
if (inherits(data, 'try-error')) next
colnames(data) <- c("open","high","low","close","volume","adj.")
data <- as.data.frame(data)
data <- cbind(date = rownames(data), data)
data$overnightRtn <- as.vector(data$open / lag(data$close, default = 0) - 1)
data$overnightRtn <- paste(round(data$overnightRtn * 100, 3), "%")
data <- slice(data, 2)
results <- bind_rows(results, data)
}
How can I add the date looping aspect to the above ticker loop?
maybe this is what you are looking for. See that I'm using an index, not the actual list, so I can refer to every element of your data frame (it is not optimized, but it is doing the job you described in the function):
symbols <- c("AAPL", "GOOG"," MSFT") ## " MSFT" has an extra space
dates <- as.Date(c("2015-01-04", "2015-01-05", "2015-01-06"))
example.df <- data.frame(tickers=symbols, dates) ## there was an error here in your example.
symbol.list <- trimws(example.df[,1])
start <- as.Date(example.df[,2])
end <- as.Date(example.df[,2]) + days(3)
results <- NULL
for (i in 1:NROW(symbol.list)) {
try(dataX <- getSymbols(Symbols = symbol.list[i],
src = "yahoo",
from = start[i], to = end[i],
auto.assign = FALSE),silent=T)
if (!exists("dataX")) {cat("Error in ",i,"\n");next}
colnames(dataX) <- c("open","high","low","close","volume","adj.")
dataX <- as.data.frame(dataX)
dataX <- cbind(date = rownames(dataX), dataX)
dataX$overnightRtn <- as.vector(dataX$open / lag(dataX$close, default = 0) - 1)
dataX$overnightRtn <- paste(round(dataX$overnightRtn * 100, 3), "%")
data2 <- slice(dataX, 2);rm(dataX)
results <- if (is.null(results)) data2 else rbind(results, data2)
}

Resources