Converting date to numeric but limiting to number of days in year

Converting date to numeric but limiting to number of days in year - r

I want to create date object between 2008-01-01 and 2010-12-31 around 10K of them. I wrote the code for that but I actually want to keep days 1-366 in 2008 because of 2008-02-29 (leap year) I want them to restart after 366 then become 1 on 2009-01-01. I can do this as create only for 2008 then 2009 then 2010 but it won't be convenient. I was reading about lubridate but could not figure it out. I can also filter 1 to 366 then 367-731 but that's not gonna be efficient as well. Anyone knows a better way to do it?
set.seed(123)
tim1=sample(365*3+1,10000,replace = TRUE) ### that plus 1 from feb 29 in 2008
dat1=as.Date(tim1,origin="2007-12-31") # then 1 will be 2008-01-01

You can create a vector of all the target dates and sample from it. To create the vector, there is seq.Date, the seq method for objects of class "Date".
start <- as.Date("2008-01-01")
end <- as.Date("2010-12-31")
s <- seq(start, end, by = "days")
The vector s includes all days between start and end. Now sample from it.
set.seed(123)
dat1 <- sample(s, 10000, TRUE)
Transform the sample into day-of-the-year. See help("strptime")
as.numeric(format(dat1, format = "%j"))
In the end, remove s, it's no longer needed.
rm(s) # tidy up
Edit.
The following two functions do what the question asks for but with two different methods.
f1 is the code above wrapped in a function, f2 uses ave/seq_along/match and is a bit more complicated. The tests show function f2 to be twice as fast than f1
f1 <- function(start_date, end_date, n){
start <- as.Date(start_date)
end <- as.Date(end_date)
s <- seq(start, end, by = "days")
y <- sample(s, n, replace = TRUE)
as.numeric(format(y, format = "%j"))
}
f2 <- function(start_date, end_date, n){
start <- as.Date(start_date)
end <- as.Date(end_date)
s <- seq(start, end, by = "days")
y <- sample(s, n, replace = TRUE)
z <- ave(as.integer(s), lubridate::year(s), FUN = seq_along)
z[match(y, s)]
}
set.seed(123)
x1 <- f1("2008-01-01", "2010-12-31", 100)
set.seed(123)
x2 <- f2("2008-01-01", "2010-12-31", 100)
all.equal(x1, x2)
#[1] TRUE
Now the tests.
library(microbenchmark)
mb <- microbenchmark(
f1 = f1("2008-01-01", "2010-12-31", 1e4),
f2 = f2("2008-01-01", "2010-12-31", 1e4),
times = 50
)
print(mb, order = "median")
ggplot2::autoplot(mb)

Related

How to calculate the historical monthly volatility from daily returns in R?

First I created an xts object, which contains 36 time series showing daily prices from 1980-01-02 to 2020-10-06.
ENERGY_data$time <- as.Date(ENERGY_data$time, format("%Y/%m/%d"))
ENERGY_xts <- ENERGY_data[order(ENERGY_data$time), ]
ENERGY_xts <- as.xts(ENERGY_xts[, 2:37], order.by=ENERGY_xts$time)
Then I calculated the continuously compounded daily returns by using the PerformanceAnalytics function CalculateReturns()
ENERGY_returns.cc <- CalculateReturns(ENERGY_xts, method="compound")
Now I would like to calculate the volatility for each month going from 1980-01-02 to 2020-10-06 on the basis of this formula:
MONTHLY VOLATILITY FORMULA
Could you please give me some hints (in terms of coding)?

Take a look at this function and please note that I simulated returns, since you didn't provide yours.
library(xts)
set.seed(123)
returns <- matrix(rnorm(30*365*5, 0.0001, 0.0002), ncol = 30)
timeindex <- seq.Date(from = as.Date('2000-01-01'), length.out = 365*5, by = 'days')
test_xts <- xts(returns, order.by = timeindex)
calcFrenchVolOneAsset <- function(x){
ndays <- nrow(x)
first_part_of_formula <- sum(x^2)
second_part_of_formula <- 2*sum(x[-1]*x[-nrow(x)])
res <- sqrt(first_part_of_formula + second_part_of_formula)
return(res)
}
calcFrenchVolMultipleAssets <- function(x){
ndays <- nrow(x)
first_part_of_formula <- colSums(x^2)
second_part_of_formula <- 2*colSums(x[-1, ]*x[-nrow(x), ])
res <- sqrt(first_part_of_formula + second_part_of_formula)
return(res)
}
# test for the first month and the first asset
calcFrenchVolOneAsset(test_xts['2000-01', 1])
calcFrenchVolMultipleAssets(test_xts['2000-01', 1])
# apply monthly and on columns
monthly_vols <- apply.monthly(test_xts, calcFrenchVolMultipleAssets)
head(monthly_vols[, c(1:5)])
e1 e1.1 e1.2 e1.3 e1.4
2000-01-31 0.002030192 0.002402946 0.001717494 0.001888513 0.002322648
2000-02-29 0.001983995 0.002343783 0.001789346 0.001671332 0.001824278
2000-03-31 0.001910535 0.002429689 0.001709092 0.002492223 0.002068032
2000-04-30 0.001765052 0.002114554 0.001946232 0.002160436 0.002139949
2000-05-31 0.002269842 0.002476424 0.001626455 0.002030027 0.002400690
2000-06-30 0.002082933 0.001905620 0.001681579 0.001992082 0.002010535

Subsetting time series data by 3 days and saving in the list

Sub setting time series by 3 days and keep saving in list. Such that let suppose first subset is from day 1 to day 3 then the second subset would be from day 2 to day 4 such that every subset has a data of 3 days. Note this is 10 minute data time stamped. And saving every subset in the list depending on the total number of days dat available in the data.
i have tried reproducing it.
time_10 <- seq(ISOdatetime(2001,2,1,0,0,0), ISOdatetime(2001,3,31,0,0,0), by=(200))
a <- as.data.frame(matrix(, nrow = length(time_10), ncol = 4))
names(a)<- c("time_10","var1","var2","var3")
a$time_10 <- time_10
a$var1 <- runif(nrow(a), min=20, max=70)
a$var2 <- runif(nrow(a), min=10, max=50)
a$var3 <- runif(nrow(a), min= 3, max=10)
head(a)
Here is the image I am attaching which will give idea of sub setting the time series data and save it in a list.Here D1:day1,D2:day2,D3:day3 respectively. Using For loop or any other optimum method is appreciated .Note : So inside the For loop 3 days of sub setting of data will keep on happening and getting saved in a list with index as (Subset_n) where n is subset number.

This appears to work for your case. What I do is extract the date and roll a 3 day window (ww) from beginning to the end. For each step, I subset a data.frame based on days in the window and store it into a list.
time_10 <- seq(ISOdatetime(2001,2,1,0,0,0), ISOdatetime(2001,3,31,0,0,0), by=(200))
var1 <- runif(length(time_10), min=20, max=70)
var2 <- runif(length(time_10), min=50, max=90)
var3 <- runif(length(time_10), min=50, max=90)
a <- as.data.frame(matrix(, nrow = length(time_10), ncol = 4))
names(a)<- c("time_10","var1","var2","var3")
a$time_10 <- time_10
a$var1 <- var1
a$var2 <- var2
a$var3 <- var3
date <- strptime(a$time_10, format = "%Y-%m-%d")
td10 <- sort(unique(date))
ww <- 3
out <- vector("list", length(td10) - round(ww/2)) # preallocate a list
for (i in 1:length(td10)) {
bb <- i:(i + ww - 1) # this is the bounding box
if (max(bb) > length(td10)) {
message("End of time series reached, exiting.")
return(NULL)
}
out[[i]] <- a[date %in% td10[bb], ]
}
# check ranges of dates for each subset
lapply(out, FUN = function(x) range(x$time_10))

I believe the following code does what is asked for. It uses function minutes from package lubridate to make date/time arithmetics easier.
days3 <- lubridate::days(3)
d1 <- a$time_10[1]
d2 <- a$time_10[nrow(a)] - lubridate::days(2)
res <- lapply(seq(d1, d2, by = "1 days"), function(d){
i <- which(d <= a$time_10 & a$time_10 < d + days3)
a[i, ]
})
Edit.
I find the number of rows in each dataframe of res cumbersome, making it difficult to check whether the code above did produce the expected result. Here is a way to check it.
check <- lapply(res, function(DF) lubridate::day(DF$time_10))
check <- sapply(check, function(x) rle(x)$values)
head(check, 3)
#[[1]]
#[1] 1 2 3
#
#[[2]]
#[1] 2 3 4
#
#[[3]]
#[1] 3 4 5
rm(check) # tidy up
Data.
I will repost the data creation code, since the original wasn't reproducible.
set.seed(8893)
time_10 <- seq(ISOdatetime(2001,2,1,0,0,0), ISOdatetime(2001,3,31,0,0,0), by=(200))
var1 <- runif(length(time_10), min=20, max=70)
var2 <- runif(length(time_10), min=50, max=90)
var3 <- runif(length(time_10), min=50, max=90)
a <- data.frame(time_10, var1, var2, var3)

Return number of Business days since deadline as integer and add Business days to date with dplyr

I want a function to return the number of Business days since a specific date and to add Business Days to a date, accounting for NAs
However, my solution is sloppy and there should be a more elegant way.
library(dplyr)
library(timeDate)
library(RQuantLib)
library(lubridate)
item <- c("a", "b")
date1 = as.Date(c("2017-11-30", "2017-11-01"))
date2 = as.Date(c("2017-12-01", "2017-11-16"))
d <- data.frame(item, date1, date2, stringsAsFactors=F)
line3 <- c("c", "2017-12-03", NA)
line4 <- c("d", NA, "2017-12-03")
d <- rbind(d, line3, line4)
This function works, but runs very slow accross multiple items, also not very legible.
bizDeadline <- function(x, nBizDys = 10) {
output <- Reduce(rbind, Map((function(x, howMuch = 15) {
x <- as.Date(x, origin = "1960-01-01")
days <- x + 1:(howMuch * 2)
Deadline <- days[isBizday(as.timeDate(days))][howMuch]
data.frame(DateIn = x, Deadline, DayOfWeek = weekdays(Deadline), TimeDiff = difftime(Deadline,
x))
}), x, howMuch = nBizDys))
output$Deadline
}
This would be ideal to exclude holidays and weekends.
d %>% mutate(deadline = bizDeadline(date1, 10))
d$DaysOverdue <- NA
This works with a loop: but doesn't work in vectorized Mutate.
i = 1
for(i in 1:nrow(d)){
d$DaysOverdue[i] = businessDaysBetween("UnitedStates", d$date1[i], today())
}
This function from RQuantLib seems not to be vectorized
d %>% mutate(od = businessDaysBetween("UnitedStates", date1, today())
Any better solutions?

So, I recommend you to use Vectorize function in R. This is easy well to vectorize some function. P.s. This function can't to deal with NA
businessDaysBetween_vec <- Vectorize(businessDaysBetween,vectorize.args = c('from', 'to'))
d[1:2,] %>% mutate(od = businessDaysBetween_vec("UnitedStates", date1, today()))
#Checking and comparing speed of solution
foo_loop <- function() {
for(i in 1:2){
d$DaysOverdue[i] = businessDaysBetween("UnitedStates", d$date1[i], today())
}
}
require(microbenchmark)
require(ggplot2)
res <- microbenchmark(businessDaysBetween_vec(),foo_loop(),times = 1e5)
autoplot(res)

Holt Winters Forecast with Multiple Input Variables

For context, I'm a novice R user, so please forgive any incorrect terminology/processes. I am actively trying to improve my coding ability, but recently have become stumped.
I have the following data set where A * B * C = Output:
Date A B C Output
1/1/2013 177352 0.908329198 0.237047935 38187
1/2/2013 240724 0.852033865 0.237273592 48666
1/3/2013 243932 0.908380204 0.237039845 52524
1/4/2013 221485 0.820543152 0.236356733 42955
1/5/2013 202590 0.818066045 0.240900973 39925
1/6/2013 238038 0.770057722 0.247344561 45339
1/7/2013 271511 0.794258796 0.241252029 52026
1/8/2013 283434 0.807817693 0.233810703 53534
1/9/2013 275016 0.843220031 0.243769917 56530
1/10/2013 255266 0.797791324 0.238562428 48583
1/11/2013 226564 0.815791564 0.236153417 43648
1/12/2013 214366 0.800066242 0.237961133 40812
1/13/2013 256946 0.764845532 0.237640186 46702
1/14/2013 282298 0.816537843 0.234257528 53998
I have a few years worth of data and I'm trying for forecast Output, using A, B, and C. However, when I model out A, B, and C individually, the Output becomes very skewed. If I forecast just Output then I lose the input factors.
What is the best package/code to accomplish this task? I've tried Googling and searching on here numerous different methods, but haven't found the solution I'm looking for.
Here is some of the code:
DataSet1[,"Date"] <- mdy(DataSet[,"Date"])
DataSet1
TotalSet <- ts(DataSet1, frequency = 365, start =c(2013,1))
DataA <- ts(DataSet1$A, frequency = 365, start = c(2013,1))
DataB <- ts(DataSet1$B, frequency = 365, start = c(2013,1))
DataC <- ts(DataSet1$C, frequency = 365, start = c(2013,1))
OutputData <- ts(DataSet$Output, frequency = 365, start = c(2013,1))
ADecompose <- decompose(DataA)
BDecompose <- decompose(DataB)
CDecompose <- decompose(DataC)
OutputDecompose <- decompose(OutputData)
DataAHW <- HoltWinters(DataA, seasonal = "mult")
DataBHW <- HoltWinters(DataB, seasonal = "mult")
DataCHW <- HoltWinters(DataC, seasonal = "mult")
OutputDataHW <- HoltWinters(OutputData, seasonal = "mult")
FC.A <- forecast.HoltWinters(DataAHW)
FC.B <- forecast.HoltWinters(DataBHW)
FC.C <- forecast.HoltWinters(DataCHW)
FC.Output <- forecast.HoltWinters(OutputDataHW)
plot(ForecastVisits)
plot(ForecastCPV)
plot(ForecastRPC)
plot(ForecastRevenue)
Here is another model I built for the Output and I've plugged A, B, and C into it individually then combined them in excel. I'm sure there is a more appropriate way to handle this, but given my lack of experience I am reaching out for help
dataset <- testData
##FORECAST
forecastingFuntion <- function(dataset, lenghtOfForecast)
{
dataset[,"Date"] <- mdy(dataset[,"Date"])
myts <- ts(dataset[,"DataSet$Output"], start = c(2013,1), frequency = 365)
hwModel <- HoltWinters(myts, seasonal = "mult")
future <- data.frame(predict(hwModel, n.ahead = lenghtOfForecast, level = 0.9))
fittedValues <- data.frame(as.numeric(hwModel$fitted[,"xhat"]))
names(fittedValues) <- "fit"
futureDates <- c()
predicitedValues <- rbind(fittedValues, future)
for(i in 1: lenghtOfForecast)
{
futureDateSingle <- data.frame(dataset[nrow(dataset),"Date"] + days(i))
futureDates <- rbind(futureDates, futureDateSingle)
}
names(futureDates) <- "Date"
dates <- data.frame(dataset[366:(nrow(dataset)),"Date"])
names(dates) <- "Date"
dates <- rbind(dates, futureDates)
predictedData <- data.frame(predicitedValues, dates)
names(predictedData) <- c("predictedValues","Date")
finalData2 <- mergeData <- merge(predictedData, dataset, all.x = T, all.y = F, by = "Date")
finalData2
}
finalData2 <- forecastingFuntion(testData, 612)
rm(list=setdiff(ls(), c("finalData2")))
write.csv(finalData2, file="B2BForecastVisits.csv")
Thanks!

Looking up tickers for different time periods in a loop with quantmod

I'm able to loop through and calculate the overnight/over-weekend returns for a list of tickers when the time period is the same for every ticker, but am having trouble when the time period I want to look up is different for each ticker.
For example, with:
symbols <- c("AAPL", "GOOG"," MSFT")
dates <- as.Date(c("2015-01-04", "2015-01-05", "2015-01-06"))
example.df <- data.frame(tickers, dates)
example.df
tickers dates
1 AAPL 2015-01-04
2 GOOG 2015-01-05
3 MSFT 2015-01-06
I'd want the overnight return for AAPL between 2015-01-04 and 2015-01-05, for GOOG between 2015-01-05 and 2015-01-06, etc. If it was a Friday, I'd want the next Monday.
I can can get what I'm looking for by looking up each individual ticker like this:
library(quantmod)
library(dplyr)
# date range accounts for weekends
getSymbols("AAPL", from = "2016-01-04", to = "2016-01-08")
data <- as.data.frame(AAPL)
colnames(data) <- c("open","high","low","close","volume","adj.")
# overnight return calculation
data$overnight.return <- data$open / lag(data$close, default = 0) - 1
data$overnight.return <- paste(round(data$overnight.return * 100, 3), "%",sep= "")
# the overnight/over-weekend returns for the specified date
data.df.final <- slice(data, 2)
Of course that's terribly slow.
Here's as far as I was able to get trying to make a loop out of it:
# needs to be a loop itself and inside the other 'for' loop somehow I think
symbol.list <- example.df[,1]
start <- data[,2]
end <- data[,2] + days(3)
results <- NULL
for (i in symbol.list) {
data <- getSymbols(Symbols = i,
src = "yahoo",
from = start, to = end,
auto.assign = FALSE)
if (inherits(data, 'try-error')) next
colnames(data) <- c("open","high","low","close","volume","adj.")
data <- as.data.frame(data)
data <- cbind(date = rownames(data), data)
data$overnightRtn <- as.vector(data$open / lag(data$close, default = 0) - 1)
data$overnightRtn <- paste(round(data$overnightRtn * 100, 3), "%")
data <- slice(data, 2)
results <- bind_rows(results, data)
}
How can I add the date looping aspect to the above ticker loop?

maybe this is what you are looking for. See that I'm using an index, not the actual list, so I can refer to every element of your data frame (it is not optimized, but it is doing the job you described in the function):
symbols <- c("AAPL", "GOOG"," MSFT") ## " MSFT" has an extra space
dates <- as.Date(c("2015-01-04", "2015-01-05", "2015-01-06"))
example.df <- data.frame(tickers=symbols, dates) ## there was an error here in your example.
symbol.list <- trimws(example.df[,1])
start <- as.Date(example.df[,2])
end <- as.Date(example.df[,2]) + days(3)
results <- NULL
for (i in 1:NROW(symbol.list)) {
try(dataX <- getSymbols(Symbols = symbol.list[i],
src = "yahoo",
from = start[i], to = end[i],
auto.assign = FALSE),silent=T)
if (!exists("dataX")) {cat("Error in ",i,"\n");next}
colnames(dataX) <- c("open","high","low","close","volume","adj.")
dataX <- as.data.frame(dataX)
dataX <- cbind(date = rownames(dataX), dataX)
dataX$overnightRtn <- as.vector(dataX$open / lag(dataX$close, default = 0) - 1)
dataX$overnightRtn <- paste(round(dataX$overnightRtn * 100, 3), "%")
data2 <- slice(dataX, 2);rm(dataX)
results <- if (is.null(results)) data2 else rbind(results, data2)
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Converting date to numeric but limiting to number of days in year - r

Related

How to calculate the historical monthly volatility from daily returns in R?

Subsetting time series data by 3 days and saving in the list

Return number of Business days since deadline as integer and add Business days to date with dplyr

Holt Winters Forecast with Multiple Input Variables

Looking up tickers for different time periods in a loop with quantmod

Categories

Resources