I have a monthly dataset of performance (in terms of %) of different sectors in a company in the form
Date |Sector |Value
2016-01-01 |Sect 1 |-20
2016-02-01 |Sect 1 |10
2016-01-01 |Sect 2 |23
2016-02-01 |Sect 1 |10
the data has 20 Sectors and monthly data till June 2018. Now I want to forecast Value for the next month. I used the below code:
combine_ts <- function(data, h=1, frequency= 12, start= c(2016,5),
end=c(2018,6))
{
results <- list()
sectgrowthsub <- data[!duplicated(sectgrowthdf2[,2]),]
sectgrowthts <- ts(sectgrowthsub[,3], frequency = frequency, start = start,
end = end)
for (i in 1:(nrow(sectgrowthsub))) {
results[[i]] <- data.frame(Date =
format(as.Date(time(forecast(auto.arima(sectgrowthts), h)$mean)), "%b-%y"),
SectorName = rep(sectgrowthsub[,2], h),
PointEstimate = forecast(auto.arima(sectgrowthts),
h=h)$mean[i])
}
return(data.table::rbindlist(results))
}
fore <- combine_ts(sectgrowthsub)
The problem in this case is that Value forecast is the same for all the Sectors.
Help is much appreciated
I took the liberty of simplifying the problem a little bit and removed the function to better show the process of modeling groups separately:
library(magrittr)
library(forecast)
dat <- data.frame(value = c(rnorm(36, 5),
rnorm(36, 50)),
group = rep(1:2, each = 36))
# make a list where each element is a group's timeseries
sect_list <- dat %>%
split(dat$group) %>%
lapply(function(x, frequency, start) {
ts(x[["value"]], frequency = 12, start = 1 ) })
# then forecast on each groups timeseries
fc <- lapply(sect_list, function(x) { data.frame(PointEstimate = forecast(x, h=1)$mean ) }) %>%
do.call(rbind, .) # turn into one big data.frame
fc
PointEstimate
1 5.120082
2 49.752510
Let me know if you get hung up on any parts of this.
Related
I want to create date object between 2008-01-01 and 2010-12-31 around 10K of them. I wrote the code for that but I actually want to keep days 1-366 in 2008 because of 2008-02-29 (leap year) I want them to restart after 366 then become 1 on 2009-01-01. I can do this as create only for 2008 then 2009 then 2010 but it won't be convenient. I was reading about lubridate but could not figure it out. I can also filter 1 to 366 then 367-731 but that's not gonna be efficient as well. Anyone knows a better way to do it?
set.seed(123)
tim1=sample(365*3+1,10000,replace = TRUE) ### that plus 1 from feb 29 in 2008
dat1=as.Date(tim1,origin="2007-12-31") # then 1 will be 2008-01-01
You can create a vector of all the target dates and sample from it. To create the vector, there is seq.Date, the seq method for objects of class "Date".
start <- as.Date("2008-01-01")
end <- as.Date("2010-12-31")
s <- seq(start, end, by = "days")
The vector s includes all days between start and end. Now sample from it.
set.seed(123)
dat1 <- sample(s, 10000, TRUE)
Transform the sample into day-of-the-year. See help("strptime")
as.numeric(format(dat1, format = "%j"))
In the end, remove s, it's no longer needed.
rm(s) # tidy up
Edit.
The following two functions do what the question asks for but with two different methods.
f1 is the code above wrapped in a function, f2 uses ave/seq_along/match and is a bit more complicated. The tests show function f2 to be twice as fast than f1
f1 <- function(start_date, end_date, n){
start <- as.Date(start_date)
end <- as.Date(end_date)
s <- seq(start, end, by = "days")
y <- sample(s, n, replace = TRUE)
as.numeric(format(y, format = "%j"))
}
f2 <- function(start_date, end_date, n){
start <- as.Date(start_date)
end <- as.Date(end_date)
s <- seq(start, end, by = "days")
y <- sample(s, n, replace = TRUE)
z <- ave(as.integer(s), lubridate::year(s), FUN = seq_along)
z[match(y, s)]
}
set.seed(123)
x1 <- f1("2008-01-01", "2010-12-31", 100)
set.seed(123)
x2 <- f2("2008-01-01", "2010-12-31", 100)
all.equal(x1, x2)
#[1] TRUE
Now the tests.
library(microbenchmark)
mb <- microbenchmark(
f1 = f1("2008-01-01", "2010-12-31", 1e4),
f2 = f2("2008-01-01", "2010-12-31", 1e4),
times = 50
)
print(mb, order = "median")
ggplot2::autoplot(mb)
Just for fun, I am trying to create a basic savings calculator. My current code is:
value <- function(years,apr,initial,investment) {
df <- as.data.frame(matrix(nrow = years, ncol = 2))
colnames(df) <- c("year","value")
df$years <- c(1:years)
for (i in 1:years) {
current_value <-(last_value+investment)*apr
}
#repeating calculation for the data frame
print(df)
What I am trying to do is have the calculator create a table that displays the value each year. I've adapted my code from an old homework assignment, so I am not concerned with how to make the data frame. However, I do not know how to make the formula for the summation.
I am trying to model
Current Value = (Cumulative Value + Investment)*(Annual Percentage Rate)
As an example, let's say initial value is 10, investment is 10, and the APR is 1.05
(10+10)*(1.05)=21
(21+10)*(1.05)=32.55
(32.55+10)*(1.05)=44.68
and so on.
Year is there to number the rows accordingly.
We can use Reduce with accumulate = TRUE
calc_fun <- function(years,apr,initial,investment) {
value <- Reduce(function(x, y) (x + investment) * y, rep(apr, year), initial,
accumulate = TRUE)
data.frame(year = 0:year, value)
}
calc_fun(3, 1.05, 10, 10)
# year value
#1 0 10.0000
#2 1 21.0000
#3 2 32.5500
#4 3 44.6775
Using for loop we can do
calc_fun1 <- function(years,apr,initial,investment) {
value <- numeric(years + 1)
value[1] <- initial
for (i in 1:years) value[i + 1] <- (value[i] + investment) * apr
data.frame(year = 0:year, value)
}
I am trying to obtain the daily arithmetic and daily geometric averages for each year, for the APPL stock data using R. My implementation on this will be the periodReturn function in the last few lines, but it doesn't seem to work, and an error: '...' used in an incorrect context is given.
How can I modify my code such that I can get the desired output? Some help will be deeply appreciated.
# Get historical price data (daily)
getSymbols('AAPL', from = "2005-01-01")
AAPLdaily <- as.data.frame(AAPL)
head(AAPLdaily)
?to.period
# AAPLweekly <- to.weekly(to.weekly(AAPL, indexAt = 'endof'))
# AAPLweekly <- as.data.frame(AAPLweekly)
# Better to do it in one step like this:
AAPLweekly <- as.data.frame( to.weekly(AAPL, indexAt = 'endof') )
head(AAPLweekly)
AAPLmonthly <- as.data.frame( to.monthly(AAPL, indexAt = 'endof') )
head(AAPLmonthly)
AAPLyearly <- as.data.frame( to.yearly(AAPL, indexAt = 'endof') )
AAPLyearly
# Another way to do this
AAPLweekly1 <- as.data.frame(to.period(AAPL, period = 'weeks', indexAt = 'endof'))
head(AAPLweekly1)
AAPLmonthly1 <- as.data.frame(to.period(AAPL, period = 'months', indexAt = 'endof'))
head(AAPLmonthly1)
AAPLyearly1 <- as.data.frame(to.period(AAPL, period = 'years', indexAt = 'endof'))
head(AAPLyearly1)
########## Another possible method #########
# Change to data.frames
AAPL = as.data.frame(AAPL)
head(AAPL)
# Get Dates
dates <- as.Date(row.names(AAPL))
head(dates)
# Create a cloumn in APPL data frame with the dates
AAPL$dates <- as.Date(row.names(AAPL))
?aggregate
?substr
# Last Day of month
lastDayofMonth <- aggregate(AAPL["dates"], list(month = substr(AAPL$dates, 1, 7)), max)
head(lastDayofMonth)
AAPLmonth <- AAPL[dates %in% lastDayofMonth$dates, ]
head(AAPLmonth)
# Last day of year
lastDayofYear <- aggregate(AAPL["dates"], list(month = substr(AAPL$dates, 1, 4)), max)
head(lastDayofYear)
AAPLyear <- AAPL[dates %in% lastDayofYear$dates, ]
AAPLmonth
AAPLdaily <- as.data.frame( to.daily(AAPL, indexAt = 'endof') )
AAPLdaily
dailyReturn(AAPLdaily)
periodReturn(AAPL,
period='daily',
subset=NULL,
type='arithmetic',
leading=TRUE,
...
)
If what you are asking for is the yearly, monthly, weekly arithmetic/geometric return all you have to do is:
getSymbols('AAPL',from= '2010-01-01')
ROC(AAPL[endpoints(AAPL,on = 'years'),"AAPL.Adjusted"],type='discrete’)
2012-12-31 0.32566879
2013-12-31 0.08069493
2014-12-31 0.40622488
2015-12-31 -0.03013708
2016-12-30 0.12480425
2017-09-20 0.36428706
for the geometric (log) return change the ROC argument to ‘continuous’:
ROC(AAPL[endpoints(AAPL,on = 'years'),"AAPL.Adjusted"],type='continuous’)
2012-12-31 0.28191708
2013-12-31 0.07760429
2014-12-31 0.34090873
2015-12-31 -0.03060053
2016-12-30 0.11760902
2017-09-20 0.31063199
For other periods change the endpoints argument to months or weeks.
For context, I'm a novice R user, so please forgive any incorrect terminology/processes. I am actively trying to improve my coding ability, but recently have become stumped.
I have the following data set where A * B * C = Output:
Date A B C Output
1/1/2013 177352 0.908329198 0.237047935 38187
1/2/2013 240724 0.852033865 0.237273592 48666
1/3/2013 243932 0.908380204 0.237039845 52524
1/4/2013 221485 0.820543152 0.236356733 42955
1/5/2013 202590 0.818066045 0.240900973 39925
1/6/2013 238038 0.770057722 0.247344561 45339
1/7/2013 271511 0.794258796 0.241252029 52026
1/8/2013 283434 0.807817693 0.233810703 53534
1/9/2013 275016 0.843220031 0.243769917 56530
1/10/2013 255266 0.797791324 0.238562428 48583
1/11/2013 226564 0.815791564 0.236153417 43648
1/12/2013 214366 0.800066242 0.237961133 40812
1/13/2013 256946 0.764845532 0.237640186 46702
1/14/2013 282298 0.816537843 0.234257528 53998
I have a few years worth of data and I'm trying for forecast Output, using A, B, and C. However, when I model out A, B, and C individually, the Output becomes very skewed. If I forecast just Output then I lose the input factors.
What is the best package/code to accomplish this task? I've tried Googling and searching on here numerous different methods, but haven't found the solution I'm looking for.
Here is some of the code:
DataSet1[,"Date"] <- mdy(DataSet[,"Date"])
DataSet1
TotalSet <- ts(DataSet1, frequency = 365, start =c(2013,1))
DataA <- ts(DataSet1$A, frequency = 365, start = c(2013,1))
DataB <- ts(DataSet1$B, frequency = 365, start = c(2013,1))
DataC <- ts(DataSet1$C, frequency = 365, start = c(2013,1))
OutputData <- ts(DataSet$Output, frequency = 365, start = c(2013,1))
ADecompose <- decompose(DataA)
BDecompose <- decompose(DataB)
CDecompose <- decompose(DataC)
OutputDecompose <- decompose(OutputData)
DataAHW <- HoltWinters(DataA, seasonal = "mult")
DataBHW <- HoltWinters(DataB, seasonal = "mult")
DataCHW <- HoltWinters(DataC, seasonal = "mult")
OutputDataHW <- HoltWinters(OutputData, seasonal = "mult")
FC.A <- forecast.HoltWinters(DataAHW)
FC.B <- forecast.HoltWinters(DataBHW)
FC.C <- forecast.HoltWinters(DataCHW)
FC.Output <- forecast.HoltWinters(OutputDataHW)
plot(ForecastVisits)
plot(ForecastCPV)
plot(ForecastRPC)
plot(ForecastRevenue)
Here is another model I built for the Output and I've plugged A, B, and C into it individually then combined them in excel. I'm sure there is a more appropriate way to handle this, but given my lack of experience I am reaching out for help
dataset <- testData
##FORECAST
forecastingFuntion <- function(dataset, lenghtOfForecast)
{
dataset[,"Date"] <- mdy(dataset[,"Date"])
myts <- ts(dataset[,"DataSet$Output"], start = c(2013,1), frequency = 365)
hwModel <- HoltWinters(myts, seasonal = "mult")
future <- data.frame(predict(hwModel, n.ahead = lenghtOfForecast, level = 0.9))
fittedValues <- data.frame(as.numeric(hwModel$fitted[,"xhat"]))
names(fittedValues) <- "fit"
futureDates <- c()
predicitedValues <- rbind(fittedValues, future)
for(i in 1: lenghtOfForecast)
{
futureDateSingle <- data.frame(dataset[nrow(dataset),"Date"] + days(i))
futureDates <- rbind(futureDates, futureDateSingle)
}
names(futureDates) <- "Date"
dates <- data.frame(dataset[366:(nrow(dataset)),"Date"])
names(dates) <- "Date"
dates <- rbind(dates, futureDates)
predictedData <- data.frame(predicitedValues, dates)
names(predictedData) <- c("predictedValues","Date")
finalData2 <- mergeData <- merge(predictedData, dataset, all.x = T, all.y = F, by = "Date")
finalData2
}
finalData2 <- forecastingFuntion(testData, 612)
rm(list=setdiff(ls(), c("finalData2")))
write.csv(finalData2, file="B2BForecastVisits.csv")
Thanks!
I'm able to loop through and calculate the overnight/over-weekend returns for a list of tickers when the time period is the same for every ticker, but am having trouble when the time period I want to look up is different for each ticker.
For example, with:
symbols <- c("AAPL", "GOOG"," MSFT")
dates <- as.Date(c("2015-01-04", "2015-01-05", "2015-01-06"))
example.df <- data.frame(tickers, dates)
example.df
tickers dates
1 AAPL 2015-01-04
2 GOOG 2015-01-05
3 MSFT 2015-01-06
I'd want the overnight return for AAPL between 2015-01-04 and 2015-01-05, for GOOG between 2015-01-05 and 2015-01-06, etc. If it was a Friday, I'd want the next Monday.
I can can get what I'm looking for by looking up each individual ticker like this:
library(quantmod)
library(dplyr)
# date range accounts for weekends
getSymbols("AAPL", from = "2016-01-04", to = "2016-01-08")
data <- as.data.frame(AAPL)
colnames(data) <- c("open","high","low","close","volume","adj.")
# overnight return calculation
data$overnight.return <- data$open / lag(data$close, default = 0) - 1
data$overnight.return <- paste(round(data$overnight.return * 100, 3), "%",sep= "")
# the overnight/over-weekend returns for the specified date
data.df.final <- slice(data, 2)
Of course that's terribly slow.
Here's as far as I was able to get trying to make a loop out of it:
# needs to be a loop itself and inside the other 'for' loop somehow I think
symbol.list <- example.df[,1]
start <- data[,2]
end <- data[,2] + days(3)
results <- NULL
for (i in symbol.list) {
data <- getSymbols(Symbols = i,
src = "yahoo",
from = start, to = end,
auto.assign = FALSE)
if (inherits(data, 'try-error')) next
colnames(data) <- c("open","high","low","close","volume","adj.")
data <- as.data.frame(data)
data <- cbind(date = rownames(data), data)
data$overnightRtn <- as.vector(data$open / lag(data$close, default = 0) - 1)
data$overnightRtn <- paste(round(data$overnightRtn * 100, 3), "%")
data <- slice(data, 2)
results <- bind_rows(results, data)
}
How can I add the date looping aspect to the above ticker loop?
maybe this is what you are looking for. See that I'm using an index, not the actual list, so I can refer to every element of your data frame (it is not optimized, but it is doing the job you described in the function):
symbols <- c("AAPL", "GOOG"," MSFT") ## " MSFT" has an extra space
dates <- as.Date(c("2015-01-04", "2015-01-05", "2015-01-06"))
example.df <- data.frame(tickers=symbols, dates) ## there was an error here in your example.
symbol.list <- trimws(example.df[,1])
start <- as.Date(example.df[,2])
end <- as.Date(example.df[,2]) + days(3)
results <- NULL
for (i in 1:NROW(symbol.list)) {
try(dataX <- getSymbols(Symbols = symbol.list[i],
src = "yahoo",
from = start[i], to = end[i],
auto.assign = FALSE),silent=T)
if (!exists("dataX")) {cat("Error in ",i,"\n");next}
colnames(dataX) <- c("open","high","low","close","volume","adj.")
dataX <- as.data.frame(dataX)
dataX <- cbind(date = rownames(dataX), dataX)
dataX$overnightRtn <- as.vector(dataX$open / lag(dataX$close, default = 0) - 1)
dataX$overnightRtn <- paste(round(dataX$overnightRtn * 100, 3), "%")
data2 <- slice(dataX, 2);rm(dataX)
results <- if (is.null(results)) data2 else rbind(results, data2)
}