I have got around 4 years of data.(US retail data) I aggregated it by (year,weekoftheyear) and built some models and checked the quantity forecast. The performance was not upto the mark. Now I am trying to aggregated data on week basis without considering years.(as all years have almost same behavior in US market and holidays,events fall same date every year). So I end up having only 52 rows of data. I have got around 35 features that I have derived earlier so stepAIC giving infinity error. How do I deal with this issue? Can anyone suggest other good methods in choosing important features instead.Unfortunately I cannot give more information about the data. Thanks in advance.
Related
This is more of general question:
I'm currently looking for any guidlines how to deal with too many observations of daily stock returns in a given dataset. I already removed outlieres but I still have way to much observations. I'm using R.
I want to drasticly compromize the number of observations for my further research without ruin the correlation in my dataset, but I'm not sure how to.
Any suggestions are welcome
Best regards
I am trying to figure out how to approach a data problem that includes observations of multiple equipment units' pressure and temperature measures. The measures are available for a few years as daily or nearly daily values.
This seems like a time series problem (multivariate) and I have found some quality examples. However, because the data set consists of multiple measures taken for each equipment unit, I am a bit stumped on how to proceed. Should I fit a separate time series for each piece of equipment? This seems intuitively wrong, but I am really not sure which package or even approach I can use to work through this.
I would very much appreciate a recommendation or link to some resources.
R noob here, I am running acf's In R to check Auto-correlation on my data before running other tests.
Now I am running into 2 problems. I have time-series data for 26 years (1990-2016).
Problem 1. For some of my variables a couple of years have missing data (1995-1997). For these specific variables I would like to start the acf at the year 1998. Is this possible?
Problem 2. One variable has multiple years with missing data throughout the time-series, but only for odd years. Is it possible to do an acf for only even years?
I could manually adjust the data but would prefer to keep it as one dataset.
Thank you!
I started using tableau with its integration with R, and I'm using the predicted graphs.
I have 6 years of data (hourly) with multiple seasonalities, as hourly, weekly and yearly.
library(forecast); data <- msts(.arg1, seasonal.periods=c(24, 7 * 24, 365 * 24)
I've applied the above in tableau. It is taking 8 hours to complete but not getting good results. Previously I used the ts() function that was showing good results when I applied f=365,{days wise data}, but on hourly data this is not showing good results.
There may be some seasons that are getting missed. I know tbat() can do the job but I need to improve it over tableau.
Dates are notoriously difficult. The biggest issue is that you're not accounting for leap years, which will happen in any six year window. Holidays make life even more complicated, since some holidays fall in different days of the week depending on the year, which can change observations.
Take a step back. What kind of data do you have? What do you want to learn about it? That will inform the best approach.
I have three years of daily revenue data. There is some fairly constant data growth per year, but the data is highly seasonal with huge peaks in Q4 (black friday, before Christmass frenzy, etc) and intra-week seansonaly (high revenue on Monday, less and less during the week, lowest on saturday, starts to pick up on sundays)
Instead of using a boring spreadsheet with linear forecasting, I'd like an R script that takes for input three years worth of daily data and apply an algorithm to predict daily revenue forecast for the next 6 months. I'd love for the input to be just a CSV file with dates and revenue numbers.
I heard ARIMA is good, but an economist friend of mine who has seen my data thinks that forecasting with Kalman Filters would yield very good results.
Could someone post a script to show me how to apply either the ARIMA algo or the Kalman Filter algo to forecast my data? Thanks!
While R certainly has tools that implement these analyses, they are power tools, and it would probably be best if you read up on them and how they work ... (Venables and Ripley's Modern Applied Statistics in S might be a reasonable starting point, although I don't know if it discusses Kalman filters). In the meantime:
??arima
??kalman
?arima
?KalmanLike
Or, having installed the sos package:
library("sos")
findFn("arima forecast")
findFn("kalman forecast")
Or just Google "kalman filter R" (!!) -- I did and found that the first 8 (!) hits looked highly useful (the 9th was an introduction to Kalman filters in MATLAB :-) )
Others may feel differently, but I will generally spend more effort helping someone work their way through an analysis when I can see that they have tried tackling it for themselves ...
This should be solved using Regression. You would have 6 dummy variables for the day of the week impacts. You would have 11 monthly dummy variables for the seasonality. You would have dummy variables for each of the holidays.