I am trying to obtain the daily arithmetic and daily geometric averages for each year, for the APPL stock data using R. My implementation on this will be the periodReturn function in the last few lines, but it doesn't seem to work, and an error: '...' used in an incorrect context is given.
How can I modify my code such that I can get the desired output? Some help will be deeply appreciated.
# Get historical price data (daily)
getSymbols('AAPL', from = "2005-01-01")
AAPLdaily <- as.data.frame(AAPL)
head(AAPLdaily)
?to.period
# AAPLweekly <- to.weekly(to.weekly(AAPL, indexAt = 'endof'))
# AAPLweekly <- as.data.frame(AAPLweekly)
# Better to do it in one step like this:
AAPLweekly <- as.data.frame( to.weekly(AAPL, indexAt = 'endof') )
head(AAPLweekly)
AAPLmonthly <- as.data.frame( to.monthly(AAPL, indexAt = 'endof') )
head(AAPLmonthly)
AAPLyearly <- as.data.frame( to.yearly(AAPL, indexAt = 'endof') )
AAPLyearly
# Another way to do this
AAPLweekly1 <- as.data.frame(to.period(AAPL, period = 'weeks', indexAt = 'endof'))
head(AAPLweekly1)
AAPLmonthly1 <- as.data.frame(to.period(AAPL, period = 'months', indexAt = 'endof'))
head(AAPLmonthly1)
AAPLyearly1 <- as.data.frame(to.period(AAPL, period = 'years', indexAt = 'endof'))
head(AAPLyearly1)
########## Another possible method #########
# Change to data.frames
AAPL = as.data.frame(AAPL)
head(AAPL)
# Get Dates
dates <- as.Date(row.names(AAPL))
head(dates)
# Create a cloumn in APPL data frame with the dates
AAPL$dates <- as.Date(row.names(AAPL))
?aggregate
?substr
# Last Day of month
lastDayofMonth <- aggregate(AAPL["dates"], list(month = substr(AAPL$dates, 1, 7)), max)
head(lastDayofMonth)
AAPLmonth <- AAPL[dates %in% lastDayofMonth$dates, ]
head(AAPLmonth)
# Last day of year
lastDayofYear <- aggregate(AAPL["dates"], list(month = substr(AAPL$dates, 1, 4)), max)
head(lastDayofYear)
AAPLyear <- AAPL[dates %in% lastDayofYear$dates, ]
AAPLmonth
AAPLdaily <- as.data.frame( to.daily(AAPL, indexAt = 'endof') )
AAPLdaily
dailyReturn(AAPLdaily)
periodReturn(AAPL,
period='daily',
subset=NULL,
type='arithmetic',
leading=TRUE,
...
)
If what you are asking for is the yearly, monthly, weekly arithmetic/geometric return all you have to do is:
getSymbols('AAPL',from= '2010-01-01')
ROC(AAPL[endpoints(AAPL,on = 'years'),"AAPL.Adjusted"],type='discrete’)
2012-12-31 0.32566879
2013-12-31 0.08069493
2014-12-31 0.40622488
2015-12-31 -0.03013708
2016-12-30 0.12480425
2017-09-20 0.36428706
for the geometric (log) return change the ROC argument to ‘continuous’:
ROC(AAPL[endpoints(AAPL,on = 'years'),"AAPL.Adjusted"],type='continuous’)
2012-12-31 0.28191708
2013-12-31 0.07760429
2014-12-31 0.34090873
2015-12-31 -0.03060053
2016-12-30 0.11760902
2017-09-20 0.31063199
For other periods change the endpoints argument to months or weeks.
Related
First I created an xts object, which contains 36 time series showing daily prices from 1980-01-02 to 2020-10-06.
ENERGY_data$time <- as.Date(ENERGY_data$time, format("%Y/%m/%d"))
ENERGY_xts <- ENERGY_data[order(ENERGY_data$time), ]
ENERGY_xts <- as.xts(ENERGY_xts[, 2:37], order.by=ENERGY_xts$time)
Then I calculated the continuously compounded daily returns by using the PerformanceAnalytics function CalculateReturns()
ENERGY_returns.cc <- CalculateReturns(ENERGY_xts, method="compound")
Now I would like to calculate the volatility for each month going from 1980-01-02 to 2020-10-06 on the basis of this formula:
MONTHLY VOLATILITY FORMULA
Could you please give me some hints (in terms of coding)?
Take a look at this function and please note that I simulated returns, since you didn't provide yours.
library(xts)
set.seed(123)
returns <- matrix(rnorm(30*365*5, 0.0001, 0.0002), ncol = 30)
timeindex <- seq.Date(from = as.Date('2000-01-01'), length.out = 365*5, by = 'days')
test_xts <- xts(returns, order.by = timeindex)
calcFrenchVolOneAsset <- function(x){
ndays <- nrow(x)
first_part_of_formula <- sum(x^2)
second_part_of_formula <- 2*sum(x[-1]*x[-nrow(x)])
res <- sqrt(first_part_of_formula + second_part_of_formula)
return(res)
}
calcFrenchVolMultipleAssets <- function(x){
ndays <- nrow(x)
first_part_of_formula <- colSums(x^2)
second_part_of_formula <- 2*colSums(x[-1, ]*x[-nrow(x), ])
res <- sqrt(first_part_of_formula + second_part_of_formula)
return(res)
}
# test for the first month and the first asset
calcFrenchVolOneAsset(test_xts['2000-01', 1])
calcFrenchVolMultipleAssets(test_xts['2000-01', 1])
# apply monthly and on columns
monthly_vols <- apply.monthly(test_xts, calcFrenchVolMultipleAssets)
head(monthly_vols[, c(1:5)])
e1 e1.1 e1.2 e1.3 e1.4
2000-01-31 0.002030192 0.002402946 0.001717494 0.001888513 0.002322648
2000-02-29 0.001983995 0.002343783 0.001789346 0.001671332 0.001824278
2000-03-31 0.001910535 0.002429689 0.001709092 0.002492223 0.002068032
2000-04-30 0.001765052 0.002114554 0.001946232 0.002160436 0.002139949
2000-05-31 0.002269842 0.002476424 0.001626455 0.002030027 0.002400690
2000-06-30 0.002082933 0.001905620 0.001681579 0.001992082 0.002010535
I want to create date object between 2008-01-01 and 2010-12-31 around 10K of them. I wrote the code for that but I actually want to keep days 1-366 in 2008 because of 2008-02-29 (leap year) I want them to restart after 366 then become 1 on 2009-01-01. I can do this as create only for 2008 then 2009 then 2010 but it won't be convenient. I was reading about lubridate but could not figure it out. I can also filter 1 to 366 then 367-731 but that's not gonna be efficient as well. Anyone knows a better way to do it?
set.seed(123)
tim1=sample(365*3+1,10000,replace = TRUE) ### that plus 1 from feb 29 in 2008
dat1=as.Date(tim1,origin="2007-12-31") # then 1 will be 2008-01-01
You can create a vector of all the target dates and sample from it. To create the vector, there is seq.Date, the seq method for objects of class "Date".
start <- as.Date("2008-01-01")
end <- as.Date("2010-12-31")
s <- seq(start, end, by = "days")
The vector s includes all days between start and end. Now sample from it.
set.seed(123)
dat1 <- sample(s, 10000, TRUE)
Transform the sample into day-of-the-year. See help("strptime")
as.numeric(format(dat1, format = "%j"))
In the end, remove s, it's no longer needed.
rm(s) # tidy up
Edit.
The following two functions do what the question asks for but with two different methods.
f1 is the code above wrapped in a function, f2 uses ave/seq_along/match and is a bit more complicated. The tests show function f2 to be twice as fast than f1
f1 <- function(start_date, end_date, n){
start <- as.Date(start_date)
end <- as.Date(end_date)
s <- seq(start, end, by = "days")
y <- sample(s, n, replace = TRUE)
as.numeric(format(y, format = "%j"))
}
f2 <- function(start_date, end_date, n){
start <- as.Date(start_date)
end <- as.Date(end_date)
s <- seq(start, end, by = "days")
y <- sample(s, n, replace = TRUE)
z <- ave(as.integer(s), lubridate::year(s), FUN = seq_along)
z[match(y, s)]
}
set.seed(123)
x1 <- f1("2008-01-01", "2010-12-31", 100)
set.seed(123)
x2 <- f2("2008-01-01", "2010-12-31", 100)
all.equal(x1, x2)
#[1] TRUE
Now the tests.
library(microbenchmark)
mb <- microbenchmark(
f1 = f1("2008-01-01", "2010-12-31", 1e4),
f2 = f2("2008-01-01", "2010-12-31", 1e4),
times = 50
)
print(mb, order = "median")
ggplot2::autoplot(mb)
I have a monthly dataset of performance (in terms of %) of different sectors in a company in the form
Date |Sector |Value
2016-01-01 |Sect 1 |-20
2016-02-01 |Sect 1 |10
2016-01-01 |Sect 2 |23
2016-02-01 |Sect 1 |10
the data has 20 Sectors and monthly data till June 2018. Now I want to forecast Value for the next month. I used the below code:
combine_ts <- function(data, h=1, frequency= 12, start= c(2016,5),
end=c(2018,6))
{
results <- list()
sectgrowthsub <- data[!duplicated(sectgrowthdf2[,2]),]
sectgrowthts <- ts(sectgrowthsub[,3], frequency = frequency, start = start,
end = end)
for (i in 1:(nrow(sectgrowthsub))) {
results[[i]] <- data.frame(Date =
format(as.Date(time(forecast(auto.arima(sectgrowthts), h)$mean)), "%b-%y"),
SectorName = rep(sectgrowthsub[,2], h),
PointEstimate = forecast(auto.arima(sectgrowthts),
h=h)$mean[i])
}
return(data.table::rbindlist(results))
}
fore <- combine_ts(sectgrowthsub)
The problem in this case is that Value forecast is the same for all the Sectors.
Help is much appreciated
I took the liberty of simplifying the problem a little bit and removed the function to better show the process of modeling groups separately:
library(magrittr)
library(forecast)
dat <- data.frame(value = c(rnorm(36, 5),
rnorm(36, 50)),
group = rep(1:2, each = 36))
# make a list where each element is a group's timeseries
sect_list <- dat %>%
split(dat$group) %>%
lapply(function(x, frequency, start) {
ts(x[["value"]], frequency = 12, start = 1 ) })
# then forecast on each groups timeseries
fc <- lapply(sect_list, function(x) { data.frame(PointEstimate = forecast(x, h=1)$mean ) }) %>%
do.call(rbind, .) # turn into one big data.frame
fc
PointEstimate
1 5.120082
2 49.752510
Let me know if you get hung up on any parts of this.
I'm able to loop through and calculate the overnight/over-weekend returns for a list of tickers when the time period is the same for every ticker, but am having trouble when the time period I want to look up is different for each ticker.
For example, with:
symbols <- c("AAPL", "GOOG"," MSFT")
dates <- as.Date(c("2015-01-04", "2015-01-05", "2015-01-06"))
example.df <- data.frame(tickers, dates)
example.df
tickers dates
1 AAPL 2015-01-04
2 GOOG 2015-01-05
3 MSFT 2015-01-06
I'd want the overnight return for AAPL between 2015-01-04 and 2015-01-05, for GOOG between 2015-01-05 and 2015-01-06, etc. If it was a Friday, I'd want the next Monday.
I can can get what I'm looking for by looking up each individual ticker like this:
library(quantmod)
library(dplyr)
# date range accounts for weekends
getSymbols("AAPL", from = "2016-01-04", to = "2016-01-08")
data <- as.data.frame(AAPL)
colnames(data) <- c("open","high","low","close","volume","adj.")
# overnight return calculation
data$overnight.return <- data$open / lag(data$close, default = 0) - 1
data$overnight.return <- paste(round(data$overnight.return * 100, 3), "%",sep= "")
# the overnight/over-weekend returns for the specified date
data.df.final <- slice(data, 2)
Of course that's terribly slow.
Here's as far as I was able to get trying to make a loop out of it:
# needs to be a loop itself and inside the other 'for' loop somehow I think
symbol.list <- example.df[,1]
start <- data[,2]
end <- data[,2] + days(3)
results <- NULL
for (i in symbol.list) {
data <- getSymbols(Symbols = i,
src = "yahoo",
from = start, to = end,
auto.assign = FALSE)
if (inherits(data, 'try-error')) next
colnames(data) <- c("open","high","low","close","volume","adj.")
data <- as.data.frame(data)
data <- cbind(date = rownames(data), data)
data$overnightRtn <- as.vector(data$open / lag(data$close, default = 0) - 1)
data$overnightRtn <- paste(round(data$overnightRtn * 100, 3), "%")
data <- slice(data, 2)
results <- bind_rows(results, data)
}
How can I add the date looping aspect to the above ticker loop?
maybe this is what you are looking for. See that I'm using an index, not the actual list, so I can refer to every element of your data frame (it is not optimized, but it is doing the job you described in the function):
symbols <- c("AAPL", "GOOG"," MSFT") ## " MSFT" has an extra space
dates <- as.Date(c("2015-01-04", "2015-01-05", "2015-01-06"))
example.df <- data.frame(tickers=symbols, dates) ## there was an error here in your example.
symbol.list <- trimws(example.df[,1])
start <- as.Date(example.df[,2])
end <- as.Date(example.df[,2]) + days(3)
results <- NULL
for (i in 1:NROW(symbol.list)) {
try(dataX <- getSymbols(Symbols = symbol.list[i],
src = "yahoo",
from = start[i], to = end[i],
auto.assign = FALSE),silent=T)
if (!exists("dataX")) {cat("Error in ",i,"\n");next}
colnames(dataX) <- c("open","high","low","close","volume","adj.")
dataX <- as.data.frame(dataX)
dataX <- cbind(date = rownames(dataX), dataX)
dataX$overnightRtn <- as.vector(dataX$open / lag(dataX$close, default = 0) - 1)
dataX$overnightRtn <- paste(round(dataX$overnightRtn * 100, 3), "%")
data2 <- slice(dataX, 2);rm(dataX)
results <- if (is.null(results)) data2 else rbind(results, data2)
}
I'm trying to aggregate a data frame as to obtain a table with weekly averages of a variable. I found the following package provides a nice solution, and I've been using it for aggregating data yearly and monthly. However, the function to aggregate data weekly simply is not working as described. Does anyone has an idea how I can fix this up?
For instance, following the manual:
require(TSAgg)
#Load the data:
data(foo)
##Format the data using the timeSeries function.
foo.ts<-timeSeries(foo[,1], "%d/%m/%Y %H:%M",foo[,3])
##Aggregate the data into 6 days blocks using max
(mean.month <- monthsAgg(foo.ts,mean,6))
#Aggregate the data into weeks, using 7 days and mean:
(foo.week<-daysAgg(foo.ts,mean,7) )
The last command doesn't work. The function is the following:
daysAgg <-
function (data, process, multiple = NULL, na.rm = FALSE)
{
if (is.null(multiple)) {
multiple = 1
}
if (multiple == 1) {
day <- aggregate(data[, 8:length(data)], list(day = data$day,
month = data$month, year = data$year), process, na.rm = na.rm)
days <- ymd(paste(day$year, day$month, day$day))
data2 <- data.frame(date = days, data = day[, 4:length(day)])
names(data2) <- c("Date", names(data[8:length(data)]))
return(data2)
}
temp <- data
day <- aggregate(list(data[, 8:length(data)], count = 1),
list(day = data$day, month = data$month, year = data$year),
process, na.rm = na.rm)
days <- ymd(paste(day$year, day$month, day$day))
data <- data.frame(date = days, day[, 5:length(day) - 1],
count = day[length(day)])
days = paste(multiple, "days")
all.dates <- seq.Date(as.Date(data$date[1]), as.Date(data$date[length(data[,
1])]), by = "day")
dates <- data.frame(date = all.dates)
aggreGated <- merge(dates, data, by = "date", all.x = TRUE)
aggreGated$date <- rep(seq.Date(as.Date(data$date[1]), as.Date(data$date[length(data[,
1])]), by = days), each = multiple, length = length(all.dates))
results <- aggregate(list(aggreGated[2:length(aggreGated)]),
list(date = aggreGated$date), process, na.rm = TRUE)
results <- subset(results, results$count != 0)
results <- results[, -length(results)]
names(results) <- c("Date", names(temp[8:length(temp)]))
return(results)
}
The problem in the code stems from its usage of the function ymd, which attaches " UTC" to the end of all dates it outputs. It is possible to overload the function by defining ymd again using
ymd <- function(x) {
as.Date(x, "%Y %m %d")
}
before you call daysAgg.