Time Series Analysis and R Holt Winters - r

I have a seasonal (7 days interval) time series, daily data for 30 days.
What is the best approach for a reasonable forecast?
The time series contains orders made with a app, it shows a seasonality of 1 week (lower sales at the beginning of the week).
I try the holt winters approach with this code:
(m <- HoltWinters(ts,seasonal = "mult"))
plot(m)
plot(fitted(m))
but it gives me an error like: Error in decompose(ts(x[1L:wind], start = start(x), frequency = f),seasonal) :
time series has no or less than 2 periods
What do you suggest?
EDIT:
data here

You must first determine a ts object. Assuming your data is called df:
ts <- ts(df$install, frequency = 7)
(m <- HoltWinters(ts,seasonal = "mult"))
plot(m)
plot(fitted(m))
Then you can make prediction like (10 steps-ahead):
predict(m, n = 10)
Time Series:
Start = c(4, 5)
End = c(5, 7)
Frequency = 7
fit
[1,] 1028.8874
[2,] 1178.4244
[3,] 1372.5466
[4,] 1165.2337
[5,] 866.6185
[6,] 711.6965
[7,] 482.2550
[8,] 719.0593
[9,] 807.6147
[10,] 920.3250
The question about the best method is too difficult to answer. Usually one compares the performance of different models considering their out-of-sample accuracy and chooses the one whith the best result.

You can use df$data to keep the dates that correspond to each day in the ts series.
ts_series <- ts(df$install, frequency = 7)
ts_dates <- as.Date(df$data, format = "%d/%m/%Y")
In a similar way, dates for the forecasted values can be kept in another sequence
m <- HoltWinters(ts_series, seasonal = "mult")
predict_values <- predict(m, 10)
predict_dates <- seq.Date(tail(ts_dates, 1) + 1, length.out = 10, by = "day")
With the dates sequence, the daily series can be plot with dates in x axis with the right format. More control on the x axis ticks can be obtained with the axis.Date function
plot(ts_dates, ts_series, typ = "o"
, ylim = c(0, 4000)
, xlim = c(ts_dates[1], tail(predict_dates, 1))
, xlab = "Date", ylab = "install", las = 1)
lines(predict_dates, predict_values, lty = 2, col = "blue", lwd = 2)
grid()

Related

Fit double logistic function to a time series

For the following time series data:
#1. dates of 15 day frequency:
dates = seq(as.Date("2016-09-01"), as.Date("2020-07-30"), by=15) #96 times observation
#2. water content in crops corresponding to the times given.
water <- c(0.5702722, 0.5631781, 0.5560839, 0.5555985, 0.5519783, 0.5463459,
0.5511598, 0.546652, 0.5361545, 0.530012, 0.5360571, 0.5396569,
0.5683526, 0.6031535, 0.6417821, 0.671358, 0.7015542, 0.7177007,
0.7103561, 0.7036985, 0.6958607, 0.6775161, 0.6545367, 0.6380155,
0.6113306, 0.5846186, 0.5561815, 0.5251135, 0.5085149, 0.495352,
0.485819, 0.4730029, 0.4686458, 0.4616468, 0.4613918, 0.4615532,
0.4827496, 0.5149105, 0.5447824, 0.5776764, 0.6090217, 0.6297454,
0.6399422, 0.6428941, 0.6586344, 0.6507473, 0.6290631, 0.6011123,
0.5744375, 0.5313527, 0.5008027, 0.4770338, 0.4564025, 0.4464508,
0.4309046, 0.4351668, 0.4490393, 0.4701232, 0.4911582, 0.5162941,
0.5490387, 0.5737573, 0.6031149, 0.6400073, 0.6770058, 0.7048311,
0.7255012, 0.739107, 0.7338938, 0.7265202, 0.6940718, 0.6757214,
0.6460862, 0.6163091, 0.5743775, 0.5450822, 0.5057753, 0.4715266,
0.4469859, 0.4303232, 0.4187793, 0.4119401, 0.4201316, 0.426369,
0.4419331, 0.4757525, 0.5070846, 0.5248457, 0.5607567, 0.5859825,
0.6107531, 0.6201754, 0.6356589, 0.6336177, 0.6275579, 0.6214981)
I want to fit a double-logistic function curve to the data.
I found some examples and packages that can be of help,
https://greenbrown.r-forge.r-project.org/man/FitDoubleLogElmore.html
and an example here - Indexes overlap error when using dplyr to run a function.
However, the examples given only consider annual time series.
I have tried to fit the function as:
x <- ts(water, start = c(2016,17), end = c(2020, 16), frequency = 24)
smooth.water = FitDoubleLogBeck(x, weighting = T, hessian = F, plot = T, ninit = 10)
plot(water)
plot(smooth.water$predicted)
plot(water- smooth.water$predicted)
However, this function does not seem to fit the entire time series. How can I run the function to fit the entire time series? Also, I noticed the output is different at different run, and I am not sure what makes that happen.
FitDoubleLogBeck can deal only with 1-year data, so you need analyze the data year by year. To do it just take window for 1 year then fit the data separately for each year.
As for different results at different runs the algorithm randomly chooses the initial parameters. The graph of double logistic curve is bell shaped. However you applying the algorithm to "sine"-like data but the algorithm expects to have "bell". Then it treats the water data as a cloud of points so the results are meaningless and very sensetive to intial parameter setting.
Code:
set.seed(123)
par(mfrow = c(1, 3))
# water vector taken from question above
x <- ts(water, start = c(2016,17), end = c(2020, 16), frequency = 24)
res <- sapply((2017:2019), function(year) {
x2 <- as.vector(window(x, start=c(year, 1), end=c(year, 24)))
smooth.water2 = FitDoubleLogBeck(x2, weighting = T, hessian = F, plot = T, ninit = 10)
title(main = year)
c(year = year, smooth.water2$params)
})
t(res)
Output:
year mn mx sos rsp eos rau
[1,] 2017 -0.7709318 0.17234293 16.324163 -0.6133117 6.750885 -0.7618376
[2,] 2018 -0.8900971 0.09398673 7.529345 0.6701200 17.319465 0.8277409
[3,] 2019 -4.7669470 -0.34648434 15.930455 -0.2570877 10.690043 -0.2267284

R - Date Format in plots

My plot is displaying weird numbers instead of dates. When converting them I find today's date. The issue is that I'm studying past dates. I would like to display on the chart below, the dates corresponding to the values.
This is the Data Frame:
I already converted dates using:
dates <- lubridate::mdy(rv_data_USDC_DAILY$Date)
This is my code for the plot:
par(mfrow = c(1, 2))
plot.ts(x=df_peg$date, y=df_peg$BUSDPEG, type="l", main = "", col="#1F4690", ylab="USD")
abline(h=0)
mtext("BUSD Deviations from Peg")
hist(df_peg$BUSDPEG, xlim = c(-0.01,0.01), main="", col="#D61C4E")
Sample of Data:
date BUSDPEG USDCPEG DAIPEG GEMINIPEG HUSDPEG PAXPEG STASISPEG
1 2022-06-29 -2.383111e-03 0.00010 -0.0044920717 2.048424e-03 -1.543622e-04 1.639760e-02 -0.04452180
2 2022-06-28 -1.414367e-03 0.00005 -0.0020128418 1.531139e-03 -3.612265e-04 1.710005e-02 -0.03907985
Thanks
You can use the plot function only when you want to use dates. It works very well. Here is an example :
date <- seq(from = as.Date("2001-01-01"), to = as.Date("2020-01-01"), by = "quarter")
returns <- rnorm(n = length(date))
plot(x = date, y = returns, type = "l")

I am unable to get the psd range in R package psd to extend to a frequency of 1.5Hz

I have a timeseries for which I need PSD values using R. The data was sampled at non uniform intervals but I did a spline interpolation with the predict command to interpolate readings at exactly 0.01 seconds. I could obtain amplitude values from spec.pgram quite correctly but they are not psd values. However the psd values from the pspectrum command of the psd package are only between 0 and 0.5Hz while my area of interest extends to about 1.2Hz. The time series is: here
Note that your time points are not equidistant. For the sake of this answer, we'll assume a frequency of 12 samples per second.
You have to specify the frequency for psd::pspectrum. Assuming your data is loaded as a data.frame called x:
out <- pspectrum(x[, 2], x.frqsamp = 12)
plot(out)
The pspectrum function also has a more detailed plot:
out <- pspectrum(x[, 2], x.frqsamp = 12, plot = TRUE)
Alternative
You can also use stats::spectrum, but it will require you to create a ts object:
our_ts <- ts(data = x[, 2],
start = 0,
frequency = 12)
plot(stats::spectrum(our_ts))
EDIT: Given new dataset (freq = 100)
x <- read.csv("test2.csv", header = F)
out <- pspectrum(x[, 2], x.frqsamp = 100)
out$freq[which.max(out$spec)]
# [1] 0.265708
our_ts <- ts(data = x[, 2], start = 4, frequency = 100)
out2 <- stats::spectrum(our_ts)
out2$freq[which.max(out2$spec)]
# [1] 0.2777778

R Language, how to create a timeseries from dataframe with datetime and value columns

I have a simple data frame, observations_df, with two columns DateTime and Value:
2002-03-28T19:30:00, 23.53, ...
How to create a time series from data frame observations_df and show time series in graph?
Tutorials are very rich and complex.I have tried different approaches unsuccessfully.
Does this answer your question?
library(tidyverse)
df <- tibble(
date = Sys.Date() + 0:10,
value = runif(n = 11)
)
df %>%
ggplot(aes(x = date, y = value)) +
geom_line()
First you may use ts to create a time series "ts" object.
dat.ts <- ts(dat[,2], start=dat[1,1], end=dat[nrow(dat),1])
# Time Series:
# Start = 18263
# End = 18292
# Frequency = 1
# [1] 0.16804153 0.80751640 0.38494235 0.32773432 0.60210067 0.60439405 0.12463344 0.29460092
# [9] 0.57760992 0.63097927 0.51201590 0.50502391 0.53403535 0.55724944 0.86791949 0.82970869
# [17] 0.11144915 0.70368836 0.89748826 0.27973255 0.22820188 0.01532989 0.12898156 0.09338193
# [25] 0.23688501 0.79114741 0.59973157 0.91014771 0.56042455 0.75570477
Then, actually there is a plot method and you could just do plot(dat.ts). However, "ts" objects store dates numerically, and we want to read "real" dates on the x-axis and therefore probably want to do a manual axis labeling with somewhat "thinned out" elements from the date column using e.g. modulo %%.
labs <- dat$datetime[as.numeric(substr(dat$datetime, 9, 10)) %% 7 == 0]
plot(dat.ts, xaxt="n", main="My Title", col=2, xlab="time", ylab="value")
axis(1, labels=F, at=dat$datetime, tck=-.01)
axis(1, labels=F, at=labs, tck=-.03)
mtext(as.character(labs), 1, 1, at=labs)
legend("topleft", lty=1, legend="time series xy", col=2)
Toy data used
set.seed(3)
dat <- data.frame(datetime=as.Date(seq(1:30), "2020-01-01"),
value=runif(30))

How to create custom indicator? Slope of the line of 50 day EMA

I have been creating a few technical indicators using Quantmod's NewTa function.
I've been trying to create a custom indicator that ideally should be charted using ChartSeries. This indicator should show the slope of the line of the 50 day EMA of the adjusted closing price.
getSymbols("NOVO-B.CO")
p <- na.omit('NOVO-B.CO')
FiftyEMA <- function(x){
MA <- removeNA((EMA(p[,6],n=50)))
}
SlopeFiftyEMA <- function(x){
run=(FiftyEMA(y)/FiftyEMA(x))
}
Slope.Indicator <- newTA(SlopeFiftyEMA,legend.name = "50 Day EMA Slope of Line Indicator")
Slope.Indicator()
This gives me the error: Error in get.current.chob() : improperly set or missing graphics device
I also tried a new code that gives me an actual INDICATOR! Please let me know what you think (if you think it looks correct or not):
First I export the data to excel: (the stock data is still denoted as p)
write.csv(p,"data")
import data
x <- data[,1]
y <- data[,7]
MA <- removeNA(EMA(y,n=50))
length(MA)
length of MA = 1923
l=1:1923
SlopeFiftyEMA <- function(x){
(diff(MA)/diff(l))
}
Slope.Indicator <- newTA(SlopeFiftyEMA,legend.name = "50 Day EMA Slope of Line Indicator")
twelvemonths="last 12 months"
chartSeries(p,subset = twelvemonths,theme = 'white',up.col = 'blue',dn.col = 'grey',name ="Custom Indicators")
Slope.Indicator()
Any Input anyone? Last time I posted there was no indicator
Thanks in advance!
Your first error seems to exist because you don't call chartSeries before calling Slope.indicator(). But your code is a bit messy, including not defining y (maybe you introduce it later in import data).
The approach presented here will plot the slope of the MA according to linear regression, using chart_Series (arguably cleaner plots than the original chartSeries). Two types of slopes are computed, including the one you proposed, which is the differences of the EMA.
getSymbols(c("NOVO-B.CO"))
x <- `NOVO-B.CO`
x[, c(1:4, 6)] <- na.locf(x[, c(1:4, 6)])
x$EMA <- EMA(Cl(x), n = 50)
x <- merge(x, rollSFM(Ra = x[, "EMA"], Rb = 1:NROW(x), n = 20))
x <- merge(x, setNames(diff(x$EMA), "diff1"))
chart_Series(x, subset = "2016/")
add_TA(x$EMA, on = 1, col = "purple")
# Plot the slope of the MA:
add_TA(x$beta, col = "green")
# Plot the 1 lag diff of the moving average:
add_TA(x$diff1, lty = 2)

Resources