Deweatherize time series - math

Has anyone tried to Deweatherize time series data? Deweatherize meaning removing the weather effects from the data. We are having difficulty incorporating that variable in the time series? Does anyone have any experience with how to use variable in time series? For example, economy, seasonal effects so on.
Bolger bands is one technique to solve the problem. We are still researching, but I wanted to hear from other folks.

I couldn't add a comment on the box. That's why I am answering here.
The time series has time dependent sales. And now we are trying to add weather in that time series.
For example: 1.06 of weather variable means we had 6% more sales than expected because of weather.
There's no lag on this weather variable since we are trying to deweatherize historical data.

Related

How to determine the most significant predictors - multivariate forecasting

I would like to create a forecasting model with time series in R. I have a target time series 'Sales' that I would like to forecast. I also have several time series that represent, for example, GDP or advertising spend. Unfortunately I have a lot of independent time series and I don't know how to figure out the most significant ones. It would be best to find out the most important ones already before building the model.
I have already worked with classification problems, here I have always used the Pearson correlation value. This is not possible with time series, right? How can I determine the correlation for time series and use the correlation to find suitable time series that describe my target time series?
I tried to use the corr.test() function in R, but I think thats not right.

Detecting seasonality for time series and apply cross correlation function

I have a question using R's ccf() function. I have two time series that represents snow water equivalent on the surface and groundwater head under the ground. I want to find out the "propagation" time from the surface to the ground, so I think that using cross correlation between two time series can help me to detect what's the "lag" time between them.
It seems that ccf() function is a proper way to determine the lag of two time series. But according to the mathematical concepts of cross correlation, it seems that it requires stationarity of the input data, and both of my time series are seasonal, because intuitively we know that snow occurs in winter. Data with seasonality is considered as non-stationary, so I think I might need to decompose the data so that it's stationary. Then I used both stl() function and decompose() function to detect whether there is a seasonality pattern, but both of them gave me this error message:
Error in decompose(swefoothill):
time series has no or less than 2 periods
which is pretty self explanatory, both time series don't have a clear seasonality. But that doesn't mean that my data are not seasonal. So I want to ask under this circumstance, is it okay for me to perform a ccf() directly for both time series? I did a sample analysis and the correlation factor figure looks like this:
And I'm observing a cycle pattern here, am I doing it wrong? Thanks a lot for your help!

Forecasting Hospital Bed Demand Using Daily Observations

Basically, my task for the next 3 months is to forecast bed demand and a couple of other variables in a hospital's emergency department. The data is 5 years worth of daily observations of these variables. The data is complete with no missing values.
The goal is to improve the prediction accuracy of the current tool, which is an Excel workbook.
I have not taken any time series or optimization courses in college thus far- so imagine my horror when I realised I had no clue on how to approach this project and that I would be working entirely alone. I was told no one in the department has any experience and no one would be able to help me.
I'm using RStudio, but I'm not very proficient since it was self-taught.
From trying out the questions asked on here as well as YouTube tutorials to learn the appropriate syntax and functions, what I have managed to find out is:
1) My data is a time series and I should apply forecasting models to predict future values based on the historical data I have.
2) Daily observations of a long time series has weekly and annual seasonality, so I should define the data as a multi-seasonal time series.
I first tried defining my data as ts(), then msts(). One of the answers here mentioned zoo() would be more appropriate for daily obervations, so I tried that too. The forecasting models I've tried are snaive, ets, auto.arima and TBATS.
I would like to present the plots of the values/forecasts based on day-of-the-week other than all 365 days of the year, which is the only output I could plot. I tried using frequency = 365 and 7, and start = c(2014, 1) and end= c(2018, 365), but I haven't had any luck.
I would really appreciate any advice and help I could get from anyone. Thank you!
Without looking at your data, have you tried to get started with some basic ARIMA modeling and seeing what results you get from that? It’s a fairly friendly way to get started with time series forecasting, depending on your data. I was forecasting by the hour, but the frequency can be adjusted to whatever you need to forecast in. As you have mentioned, you are looking ot change the frequency. Sometimes it’s easier to see a pattern at larger time intervals, and can aggregate your data at larger time intervals.
For example, this converts daily observations to monthly.
library(xts)
dates <- seq(as.Date('2012-01-01'),as.Date('2019-03-31'),by='days')
beds$date.formatted <- dates
beds.xts <- xts(x=beds$neds.count,as.POSIXct(paste(beds$date.formatted)))
end.month <- endpoints(beds.xts,'months')
beds.month <- period.apply(beds.xts,end.month,sum)
beds.monthly.df <- data.frame(date=index(beds.month),coredata(beds.month))
colnames(beds.monthly.df) <- c('Date','Sessions')
beds.monthly <- ts(sessions.monthly.df$Sessions,start=c(2012,1),end=c(2019,3),frequency=12)
plot(beds.monthly)
I’m not sure if that would answer your question, but as you mentioned you are self-taught and stating out, I can share a script with you to help you go get started with an example, and maybe this would help you? It goes through the whole process of checking you have read your data in as a time series, what is time series data, how to check for non-stationary data and seasonality trends, plots that are useful for this, modeling, prediction, plotting actual vs predicted, accuracy, and further issues with the data that could be hindering your model. The video tutorial series are scripted in Python, but you can follow the end-to-end process of forecasting in ARIMA using the equivalent R script for this tutorial: https://code.datasciencedojo.com/rebeccam/tutorials/blob/master/Time%20Series/r_time_series_example.R
https://tutorials.datasciencedojo.com/time-series-python-reading-data/

What are some R packages for dealing with multivariate time series for data sets with multiple observations?

I am trying to figure out how to approach a data problem that includes observations of multiple equipment units' pressure and temperature measures. The measures are available for a few years as daily or nearly daily values.
This seems like a time series problem (multivariate) and I have found some quality examples. However, because the data set consists of multiple measures taken for each equipment unit, I am a bit stumped on how to proceed. Should I fit a separate time series for each piece of equipment? This seems intuitively wrong, but I am really not sure which package or even approach I can use to work through this.
I would very much appreciate a recommendation or link to some resources.

In R, how can you use Holt-Winters smoothing for a financial ("business-day")-based time series?

In R, how can you use Holt-Winters smoothing for a financial ("business-day")-based time series?
(For example, a stock data time series has an irregular time index).
You don't, for the reasons I gave you in response to your previous question today: because HoltWinters needs ts, you cannot (easily) use it on irregular time series.
You can approximate it by, say, sampling every Wednesday and creating 52-week years from that. But there is no way around the basic fact that "business day"-based series are irregular.
As Dirk said there is no solid way to do this. Even if it runs (gamma=F) it will use a fixed gain on each observation, that is, it will ignore the fact that a week-end is 3 times longer than your other delta times.
It gets worse with intraday data. I think your best bet is to implement the Holt Winters filter yourself. It's actually not all that hard...

Resources