I started using tableau with its integration with R, and I'm using the predicted graphs.
I have 6 years of data (hourly) with multiple seasonalities, as hourly, weekly and yearly.
library(forecast); data <- msts(.arg1, seasonal.periods=c(24, 7 * 24, 365 * 24)
I've applied the above in tableau. It is taking 8 hours to complete but not getting good results. Previously I used the ts() function that was showing good results when I applied f=365,{days wise data}, but on hourly data this is not showing good results.
There may be some seasons that are getting missed. I know tbat() can do the job but I need to improve it over tableau.
Dates are notoriously difficult. The biggest issue is that you're not accounting for leap years, which will happen in any six year window. Holidays make life even more complicated, since some holidays fall in different days of the week depending on the year, which can change observations.
Take a step back. What kind of data do you have? What do you want to learn about it? That will inform the best approach.
Related
Basically, my task for the next 3 months is to forecast bed demand and a couple of other variables in a hospital's emergency department. The data is 5 years worth of daily observations of these variables. The data is complete with no missing values.
The goal is to improve the prediction accuracy of the current tool, which is an Excel workbook.
I have not taken any time series or optimization courses in college thus far- so imagine my horror when I realised I had no clue on how to approach this project and that I would be working entirely alone. I was told no one in the department has any experience and no one would be able to help me.
I'm using RStudio, but I'm not very proficient since it was self-taught.
From trying out the questions asked on here as well as YouTube tutorials to learn the appropriate syntax and functions, what I have managed to find out is:
1) My data is a time series and I should apply forecasting models to predict future values based on the historical data I have.
2) Daily observations of a long time series has weekly and annual seasonality, so I should define the data as a multi-seasonal time series.
I first tried defining my data as ts(), then msts(). One of the answers here mentioned zoo() would be more appropriate for daily obervations, so I tried that too. The forecasting models I've tried are snaive, ets, auto.arima and TBATS.
I would like to present the plots of the values/forecasts based on day-of-the-week other than all 365 days of the year, which is the only output I could plot. I tried using frequency = 365 and 7, and start = c(2014, 1) and end= c(2018, 365), but I haven't had any luck.
I would really appreciate any advice and help I could get from anyone. Thank you!
Without looking at your data, have you tried to get started with some basic ARIMA modeling and seeing what results you get from that? It’s a fairly friendly way to get started with time series forecasting, depending on your data. I was forecasting by the hour, but the frequency can be adjusted to whatever you need to forecast in. As you have mentioned, you are looking ot change the frequency. Sometimes it’s easier to see a pattern at larger time intervals, and can aggregate your data at larger time intervals.
For example, this converts daily observations to monthly.
library(xts)
dates <- seq(as.Date('2012-01-01'),as.Date('2019-03-31'),by='days')
beds$date.formatted <- dates
beds.xts <- xts(x=beds$neds.count,as.POSIXct(paste(beds$date.formatted)))
end.month <- endpoints(beds.xts,'months')
beds.month <- period.apply(beds.xts,end.month,sum)
beds.monthly.df <- data.frame(date=index(beds.month),coredata(beds.month))
colnames(beds.monthly.df) <- c('Date','Sessions')
beds.monthly <- ts(sessions.monthly.df$Sessions,start=c(2012,1),end=c(2019,3),frequency=12)
plot(beds.monthly)
I’m not sure if that would answer your question, but as you mentioned you are self-taught and stating out, I can share a script with you to help you go get started with an example, and maybe this would help you? It goes through the whole process of checking you have read your data in as a time series, what is time series data, how to check for non-stationary data and seasonality trends, plots that are useful for this, modeling, prediction, plotting actual vs predicted, accuracy, and further issues with the data that could be hindering your model. The video tutorial series are scripted in Python, but you can follow the end-to-end process of forecasting in ARIMA using the equivalent R script for this tutorial: https://code.datasciencedojo.com/rebeccam/tutorials/blob/master/Time%20Series/r_time_series_example.R
https://tutorials.datasciencedojo.com/time-series-python-reading-data/
I am writing a machine learning project (I am quite new to this) and now I have gotten a little stuck as to what to do next.
I have 2, somewhat small datasets, one of them has timestamps for when the output has happened, the other one is the same but has the input timestamps, they are in a format: year/month/day/hour/minute/second.
I have tried to do quite a bit of feature engineering and split these columns, as well as looked into the difference between the nearest inputs, and nearest outputs to see understand the time lags better as well as try to see the downtime. I have done a lot of visualizations to see where I can go from here and now I am quite stuck. There isn't any obvious patterns that I can see.
I do not need to do time series forecasting, and am now trying to do anomaly detection on what I have.
My issue is that I have no idea what I should do with this next, maybe you have some advice on what algorithms I can apply?
I am also stuck to see whether I am able to connect the input to its output timestamp, is there any obvious ways that are usually applied to do that?
I mainly want to see patterns, and deviations in the data, I have tried looking at scrap data that is generated. I do not really know what are the good models/experiments to apply and try out in my case.
is there any data mining methods you could advise me to use?
It sounds like you are on the right track!
Here are some ideas to consider:
Is there a trend by day of week? Are weekends peak or not?
Does the hour of the day combined with day of week make a difference?
Have you looked at volume in combination with other variables? A spike in traffic on Wednesday night at 2am could be a red flag.
Basically I'd try to code in seasonality, hour, day of week, month, year, etc. into your data.
Link: How to use machine learning for anomaly detection and condition monitoring;
Mahalanobis distance
I have got around 4 years of data.(US retail data) I aggregated it by (year,weekoftheyear) and built some models and checked the quantity forecast. The performance was not upto the mark. Now I am trying to aggregated data on week basis without considering years.(as all years have almost same behavior in US market and holidays,events fall same date every year). So I end up having only 52 rows of data. I have got around 35 features that I have derived earlier so stepAIC giving infinity error. How do I deal with this issue? Can anyone suggest other good methods in choosing important features instead.Unfortunately I cannot give more information about the data. Thanks in advance.
I have three years of daily revenue data. There is some fairly constant data growth per year, but the data is highly seasonal with huge peaks in Q4 (black friday, before Christmass frenzy, etc) and intra-week seansonaly (high revenue on Monday, less and less during the week, lowest on saturday, starts to pick up on sundays)
Instead of using a boring spreadsheet with linear forecasting, I'd like an R script that takes for input three years worth of daily data and apply an algorithm to predict daily revenue forecast for the next 6 months. I'd love for the input to be just a CSV file with dates and revenue numbers.
I heard ARIMA is good, but an economist friend of mine who has seen my data thinks that forecasting with Kalman Filters would yield very good results.
Could someone post a script to show me how to apply either the ARIMA algo or the Kalman Filter algo to forecast my data? Thanks!
While R certainly has tools that implement these analyses, they are power tools, and it would probably be best if you read up on them and how they work ... (Venables and Ripley's Modern Applied Statistics in S might be a reasonable starting point, although I don't know if it discusses Kalman filters). In the meantime:
??arima
??kalman
?arima
?KalmanLike
Or, having installed the sos package:
library("sos")
findFn("arima forecast")
findFn("kalman forecast")
Or just Google "kalman filter R" (!!) -- I did and found that the first 8 (!) hits looked highly useful (the 9th was an introduction to Kalman filters in MATLAB :-) )
Others may feel differently, but I will generally spend more effort helping someone work their way through an analysis when I can see that they have tried tackling it for themselves ...
This should be solved using Regression. You would have 6 dummy variables for the day of the week impacts. You would have 11 monthly dummy variables for the seasonality. You would have dummy variables for each of the holidays.
In R, how can you use Holt-Winters smoothing for a financial ("business-day")-based time series?
(For example, a stock data time series has an irregular time index).
You don't, for the reasons I gave you in response to your previous question today: because HoltWinters needs ts, you cannot (easily) use it on irregular time series.
You can approximate it by, say, sampling every Wednesday and creating 52-week years from that. But there is no way around the basic fact that "business day"-based series are irregular.
As Dirk said there is no solid way to do this. Even if it runs (gamma=F) it will use a fixed gain on each observation, that is, it will ignore the fact that a week-end is 3 times longer than your other delta times.
It gets worse with intraday data. I think your best bet is to implement the Holt Winters filter yourself. It's actually not all that hard...