Seasonal Theil-Sen estimates with incomplete data - r

I'm new to time series, but am trying to do some simple analyses for monthly water quality values across multiple lakes. I have 20 water bodies, and monthly Dissolved Oxygen values for 2 years and 7 months. I am able to successfully run a nonparametric Mann-Kendall and Theil-Sen estimates for each water body across the time period, but am struggling to figure out how to run similar tests while taking into account seasonality.
Seasonal Mann-Kendalls seem to be working, but seasonal Theil-Sen estimates seem to fail when using sea.sens.slope() from the trend package. I am receiving the error "number of items to replace is not a multiple of replacement length". I THINK i'm getting this because I only have 7 months of data on the final year and not a full 12.
Eg:
library(trend)
library(Kendall)
library(dplyr)
#Sample Dataframe with 20 columns (waterbodies) and 31 months
df <- as.data.frame(matrix(rnorm(31*20,10,1),ncol=20))
#Creating a function to run MK and sen's Slopes on each water body
series.func <- function(x) {
c(mk = MannKendall(x), ss = sens.slope(x))
}
#running the function on each column
results <- df %>% ts(start = c(2019, 01),
end = c(2022, 07),
frequency = 12) %>% lapply(., series.func)
This gives me a list with MK and Theil Sen results for each water body, however when I attempt to do the same thing with seasonal Mann-Kendall and seasonal sen's slopes I get the error I mentioned
#Same function but with seasonal MK and Theil Sen
series.func <- function(x) {
c(smk = smk.test(x), seass = sea.sens.slope(x)
)
}
#running the function on each column
results <- df %>% ts(start = c(2019, 01),
end = c(2022, 07),
frequency = 12) %>% lapply(., series.func)
My questions are:
Is it possible to run sea.sens.slope() when I only have 7 months of data for my third year?
If it is possible, what exactly determines what a "season" is? My data is tropical and there aren't 4 distinct "seasons"
Thank you!

Related

Time series daily data modeling

I am looking to forecast my time series. I have the following period daily data 2021-Jan-1 to 2022-Jul-1.
So I have a column of observations for each day.
what I tried so far:
d1=zoo(data, seq(from = as.Date("2021-01-01"), to = as.Date("2022-07-01"), by = 1))
tsdata <- ts(d1, frequency = 365)
ddata <- decompose(tsdata, "multiplicative")
I get following error here:
Error in decompose(tsdata, "multiplicative") :
time series has no or less than 2 periods
From what i have read it seems like because I do not have two full years? is that correct? I have tried doing it weekly as well:
series <- ts(data, frequency = 52, start = c(2021, 1))
getting the same issue.
How do I go about it without having to extend my dataset to two years since I do not have that, and still being able to decompose it?
Plus when I am actually trying to forecast it, it isn't giving me good enough forecast:
Plot with forecast
My data somewhat resembles a bell curve during that period. so is there a better fitting timeseries model I can apply instead?
A weekly frequency for daily data should have frequency = 7, not 52. It's possible that this fix to your code will produce a model with a seasonal term.
I don't think you'll be able to produce a time series model with annual seasonality with less than 2 years of data.
You can either produce a model with only weekly seasonality (I expect this is what most folks would recommend), or if you truly believe in the annual seasonal pattern exhibited in your data, your "forecast" can be a seasonal naive forecast that is simply last year's value for that particular day. I wouldn't recommend this, because it just seems risky, and I don't really see the same trajectory in your screenshot over 2022 that's apparent in 2021.
decompose requires two full cycles and that a full cycle represent 1 time unit. ts class can't use Date class anyways. To use frequency 7 we must use times 1/7th apart such as 1, 1+1/7, 1+2/7, etc. so that 1 cycle (7 days) covers 1 unit. Then just label the plot appropriately rather than using those times on the X axis. In the code below use %Y in place of %y if the years start in 19?? and end in 20?? so that tapply maintains the order.
# test data
set.seed(123)
s <- seq(from = as.Date("2021-01-01"), to = as.Date("2022-07-01"), by = 1)
data <- rnorm(length(s))
tsdata <- ts(data, freq = 7)
ddata <- decompose(tsdata, "multiplicative")
plot(ddata, xaxt = "n")
m <- tapply(time(tsdata), format(s, "%y/%m"), head, 1)
axis(1, m, names(m))

How to create a baseline forecast with sales data that includes (non-promotional and promotional) sales?

I have a df with a sales column and another column (Base/Promo) coding 1 or 0. I'd like to create a baseline forecast.
So I have to exclude/ignore the promotional sales.
By doing this there will be multiple gaps in my time-series...
Is there a method to forecast baseline sales based on sales data with promotional weeks included?
Here is a reproducible df:
df <- data.frame(Sales =c(50,75,450,56,65,790,45,59,49,63,750,65,49,57,695,834,76,58,69,71,540,830,43),
Non-Promo/Promo=c(0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,1,1,0,0,0,0,1,1,0))
df
I hope you can help with this problem, as I'm out of ideas. Many thanks!!!
Let's assume you have weekly data, my_sales, and promotion are promo.
my_sales = ts(c(50,75,450,56,65,790,45,59,49,63,750,65,49,57,695,834,76,58,69,71,
540,830,43), frequency = 52, start = c(2020,1))
promo = as.data.frame( c(0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,1,1,0,0,0,0,1,1))
colnames(promo) <- "xreg"
We fit an arima model including promotions as following:
library(forecast)
fit <- auto.arima(my_sales, xreg = as.matrix(promo))
To forecast you need to know future events. So I just created this future_promo example for 11 weeks ahead from your example
future_promo = as.data.frame(c(0,0,0,1,0,0,1,0,0,0,0))
colnames(future_promo) <- "xreg"
fc <- forecast(fit, xreg = as.matrix(future_promo))
plot(fc)
fc
fc$mean is your 11 weeks ahead point forecast

Error with large frequency from stl - Time Series analysis

I am studying Time Series with R.
Using the stl() command I get the error
Error in stl(avatar_ts, s.window = "periodic") : series is not
periodic or has less than two periods
Below is my code.
avatar_ts = ts(avatar_data$Gross, start = c(2009,12), frequency=365)
st_decompose_method = stl(avatar_ts, s.window="periodic")
If I change the frequency to 12 or 50, the stl() functions runs.
However my dataset has daily observations from 2009-12-18 till 2010-11-18; a total 318 observation.
(Basically it is daily data but detected a few missing weeks since it is daily movie profit dataset)
How can I use the stl() function with frequency of 365 for the time series data?
The unhelpful answer here is that you can use stl() on daily data if you have at least two years of observations.
You define the s.window as "periodic" but since you have less than one year of data which is insufficient.
See the example with simulated data below.
First, with your number of observations:
x <-rnorm(318)
d <- ts(x, start = c(2009,12), frequency = 365)
st_decompose_method = stl(d, s.window="periodic")
This will give the error:
Error in stl(d, s.window = "periodic") :
series is not periodic or has less than two periods
Now with at least two years of data, so minimum of two periods:
x <-rnorm(731)
d <- ts(x, start = c(2009,12), frequency = 365)
st_decompose_method = stl(d, s.window="periodic")
Which will work.
Note by the way that your ts specification is incorrect as you don't provide the start day.
To do this you could do the following:
days <- seq(as.Date("2009-12-18"), as.Date("2010-11-18"), by = "day")
avatar_ts = ts(avatar_data$Gross,
start = c(2009, as.numeric(format(days[1], "%j"))),
frequency=365)

Weekly and Yearly Seasonality in R

I have daily electric load data from 1-1-2007 till 31-12-2016. I use ts() function to load the data like so
ts_load <- ts(data, start = c(2007,1), end = c(2016,12),frequency = 365)
I want to remove the yearly and weekly seasonality from my data, to decompose the data and remove the seasonality, I use the following code
decompose_load = decompose(ts_load, "additive")
deseasonalized = ts_load - decompose_load$seasonal
My question is, am I doing it right? is this the right way to remove the yearly seasonality? and what is the right way to remove the weekly seasonality?
A few points:
a ts series must have regularly spaced points and the same number of points in each cycle. In the question a frequency of 365 is specified but some years, i.e. leap years, would have 366 points. In particular, if you want the frequency to be a year then you can't use daily or weekly data without adjustment since different years have different numbers of days and the number of weeks in a year is not integer.
decompose does not handle multiple seasonalities. If by weekly you mean remove the effect of Monday, of Tuesday, etc. and if by yearly you mean remove the effect of being 1st of the year, 2nd of the year, etc. then you are asking for multiple seasonalities.
end = c(2017, 12) means the 12th day of 2017 since frequency is 365.
The msts function in the forecast package can handle multiple and non-integer seasonalities.
Staying with base R, another approach is to approximate it by a linear model avoiding all the above problems (but ignoring correlations) and we will discuss that.
Assuming the data shown reproducibly in the Note at the end we define the day of week, dow, and day of year, doy, variables and regress on those with an intercept and trend and then construct just the intercept plus trend plus residuals in the last line of code to deseasonalize. This isn't absolutely necessary but we have used scale to remove the mean of trend in order that the three terms defining data.ds are mutually orthogonal -- Whether or not we do this the third term will be orthogonal to the other 2 by the properties of linear models.
trend <- scale(seq_along(d), TRUE, FALSE)
dow <- format(d, "%a")
doy <- format(d, "%j")
fm <- lm(data ~ trend + dow + doy)
data.ds <- coef(fm)[1] + coef(fm)[2] * trend + resid(fm)
Note
Test data used in reproducible form:
set.seed(123)
d <- seq(as.Date("2007-01-01"), as.Date("2016-12-31"), "day")
n <- length(d)
trend <- 1:n
seas_week <- rep(1:7, length = n)
seas_year <- rep(1:365, length = n)
noise <- rnorm(n)
data <- trend + seas_week + seas_year + noise
you can use the dsa function in the dsa package to adjust a daily time series. The advantage over the regression solution is, that it takes into account that the impact of the season can change over time, which is usually the case.
In order to use that function, your data should be in the xts format (from the xts package). Because in that case the leap year is not ignored.
The code will then look something like this:
install.packages(c("xts", "dsa"))
data = rnorm(365.25*10, 100, 1)
data_xts <- xts::xts(data, seq.Date(as.Date("2007-01-01"), by="days", length.out = length(data)))
sa = dsa::dsa(data_xts, fourier_number = 24)
# the fourier_number is used to model monthly recurring seasonal patterns in the regARIMA part
data_adjusted <- sa$output[,1]

Simple DLNM in R

What I am trying to do is find the relative risk of mortality at the 10th, 50th and 90th percentiles of diurnal temperature range and its additive effects at lags of 0, 1, 3 and 5 days. I'm doing this for a subset of months May-Sept (call subset here for mortality, temperature is already subsetted when read in). I have a code that works below, but no matter what city and what lag I introduce, I get a RR of essentially 1.0, so I believe that something is off or I am missing an argument somewhere. If anyone has more experience with these problems than I, your help would be greatly appreciated.
library('dlnm')
library('splines')
mortdata <- read.table('STLmort.txt', sep="\t", header=T)
morts <- subset(mortdata, Month %in% 5:9)
deaths <- morts$AllMort
tempdata <- read.csv('STLRanges.csv',sep=',',header=T)
temp <- tempdata$Trange
HI <- tempdata$HIrange
#basis.var <- onebasis(1:5, knots=3)
#mklagbasis(maxlag=5, type="poly", degree=3)
basis.temp <- crossbasis(temp,vardegree=3,lag=5)
summary(basis.temp)
model <- glm (deaths ~ basis.temp, family=quasipoisson())
pred.temp <- crosspred(basis.temp, model, at=quantile(temp,c(.10,.50,.90),na.rm=TRUE) , cumul=T)
plot(pred.temp, "slices", var=c(quantile(temp, c(.10, .50, .90),na.rm=TRUE)) ,lag=c(0,1,5))
The problem is you did not put any time variables to control the long-term and seasonal trends in the time-series using DLNM .

Resources