I have 2 years of hourly data.I want to check seasonality .
1.Decomposing the series shows seasonality.But since Decomposition is not enough
what else can i use to check seasonality in R?
2.I tried hourly seasonality , I am not sure on the period of seasonality.How to determine the frequency in R?
Frequency is the number of observations per unit of time. But, in my short experience, the unit of time depends on the event you are studying. For example, if you have monthly data of a yearly seasonal event (like the flowering of some plants) and you sampled 5 times each month, frequency will be 5*12. I suggest you decompose your time series and and check for seasonality there. You can use ts, stl and plot.stl. Try to adjust the parameters as best as you can but also try to check what happens when you change them.
Please read through below link, if you feel to keep multiple seasonal periods in data, you can also paste sample of your data here for further suggestions
https://robjhyndman.com/hyndsight/seasonal-periods/
Related
I have a 3-year hourly sales dataset(9-6 pm, Monday to Saturday only) and would like to make predictions either for a day or a week ahead with linear regression. The dataset has excluded all the national holidays since the store is not open. This time-series data presents strong intra-day and intra-week seasonalities and shows a high peak around national holidays. So I extracted the following variables for feature engineering:
(1) time-related features: timestamp("2021-02-01 09:00:00"), hour, weekday, month, year
(2) one-week-lag variable
(3) trend variable by decomposing "trend" from historical data and add as a new predictor
decomp_ts <- decompose(ts)
data$trend <- decomp_ts$trend
(4) holiday dummy variables indicating the day before holidays
The model works fine, but I encounter two questions when deploying the model with the real-time data.
(1) I wanted to use predict() with a future dataset of a week ahead as input for the "newdata" argument. But my mind is twisted as I am not sure how to do with the trend variable. Should I run an additional prediction on trend for the next week and add this info back to the future dataset to predict the sales?
(2) How would you suggest generating the one-week lag data considering the missing values caused by holidays? In my case, a store may open only two days in Christmas week, then the one-week-lag variable for the next week will contain missing values for the days when the store is not open.
I look forward to any suggestions.
I am trying to plot a decomposed time series, but running into an error:
Error in decompose(ts_ret) : time series has no or less than 2 periods`.
I am forcing the time series to a fixed period that is higher than 2.
Why does the ts think the period is less than 2?
Shouldn't the period be set automatically based on the time intervals in the data? (which are daily)
rm(list=ls())
library(jsonlite)
library(xts)
item.id<-18
eve.url<-paste0("http://eve-marketdata.com/api/item_history2.json?char_name=demo®ion_ids=10000002&type_ids=",item.id,"&days=100")
eve.data<-data.frame(fromJSON(txt=eve.url))$emd.row
eve.data$date<-as.POSIXct(eve.data$date,format="%Y-%m-%d",tz="EST")
xxx<-xts(as.numeric(eve.data[,"avgPrice"]),eve.data$date)
colnames(xxx)<-"trit"
ts_ret<-ts(xxx,frequency=52) #but Im setting the periods here.....
plot(decompose(ts_ret))
As #ufelder pointed out my dataset was too small to look at seasonal decomposition because I only had a few months of data (measured hourly), but not an entire seasons worth (which is 4 months). To fix this I had to modify the period of the dataset to once per day by using ts(xxx,frequency=365) so decompose would compare across days, not seasons.
I have a data set for couple of years for prices wherein the prices are very low over the weekends as compared to weekdays.I want to load this as time series object in R. What could be the best approach so that I can capture the weekly seasonality (weekends price dips) of the data.
Thanks!
Use the ts function with frequency=7.
I'm having trouble doing time series for my data set. Most examples have quarterly or monthly frequencies but my issue comes with data that is collect annually or every two years. Consider my code:
data<-data.frame(year=seq(1978,2012,2), number=runif(18,100,500))
time<-ts(data$number, start=1978, frequency=.5)
decomp<-decompose(time)
Error in decompose(time) : time series has no or less than 2 periods
How do I make R recognize time series values from data that is collected over an annual basis? Thanks!
Seasonal decomposition only makes sense with intra-yearly data, because you have seasons within years. So, trying to calculate seasonal effects with decompose on data collected every two years you get the error.
How does the ts() function use its frequency parameter? What is the effect of assigning wrong values as frequency?
I am trying to use 1.5 years of website usage data to build a time series model so that I can forecast the usage for coming periods. I am using data at daily level. What should be the frequency here - 7 or 365 or 365.25?
The frequency is "the" period at which seasonal cycles repeat. I use "the" in scare quotes since, of course, there are often multiple cycles in time series data. For instance, daily data often exhibit weekly patterns (a frequency of 7) and yearly patterns (a frequency of 365 or 365.25 - the difference often does not matter).
In your case, I would assume that weekly patterns dominate, so I would assign frequency=7. If your data exhibits additional patterns, e.g., holiday effects, you can use specialized methods accounting for multiple seasonalities, or work with dummy coding and a regression-based framework.
Here, the frequency parameter is not a frequency that you can observe in the data of your time series. Instead, you have to specify the frequency at which samples of the time series were taken. In your case, this is simply 1 day, or 1.
The value you give here will influence the results you get later when running analysis operations (examples are average requests per time unit or fourier transformation to get the (real) frequencies in the data). E.g. if you wanted to get all your results in the unit of hours instead of in days, you would pass 24 instead of 1 as frequency, because your data samples were taken in a frequency of 24 hours.