I am trying to analyze this time series data from the Wooldridge Econometrics book containing weekly data on the New York Stock Exchange, beginning in the year 1976 January and ending in 1989.
I have never worked with the ts() function before but I understand the general grammar already. What I have difficulties with, is how I should define the frequencies since every 4th year has 366 instead of 365 days. In the book is already stated that for holidays or weekends, the following day was used, when the stock exchange was open.
So how do I exactly deal with this problem of creating a time series object?
Here is a screenshot of the first rows of the data frame:
data frame of nyse
Related
I have a dataset of 2015 with every day of the year. In this dataset, there are actions that happen on any given day. Some days have more actions than others, therefore some days have many more entries than others.
I am trying to create a function that will create an individual dataset per day of the year without having to code 365 of these:
df <- subset(dataset, date== "2015-01-01")
I have looked at dyplyr's group_by(), however I do not want a summary per day, it is important that I get to see the whole observation on any given day for graphing purposes.
I have a 3-year hourly sales dataset(9-6 pm, Monday to Saturday only) and would like to make predictions either for a day or a week ahead with linear regression. The dataset has excluded all the national holidays since the store is not open. This time-series data presents strong intra-day and intra-week seasonalities and shows a high peak around national holidays. So I extracted the following variables for feature engineering:
(1) time-related features: timestamp("2021-02-01 09:00:00"), hour, weekday, month, year
(2) one-week-lag variable
(3) trend variable by decomposing "trend" from historical data and add as a new predictor
decomp_ts <- decompose(ts)
data$trend <- decomp_ts$trend
(4) holiday dummy variables indicating the day before holidays
The model works fine, but I encounter two questions when deploying the model with the real-time data.
(1) I wanted to use predict() with a future dataset of a week ahead as input for the "newdata" argument. But my mind is twisted as I am not sure how to do with the trend variable. Should I run an additional prediction on trend for the next week and add this info back to the future dataset to predict the sales?
(2) How would you suggest generating the one-week lag data considering the missing values caused by holidays? In my case, a store may open only two days in Christmas week, then the one-week-lag variable for the next week will contain missing values for the days when the store is not open.
I look forward to any suggestions.
I have a daily rainfall data for 36 years. I want to analyze the time series, but my data is still in the form of frame data, how I change the frame data into time series. My data is a variable, how to unify the year number with the date and month, so the data is only in one column
You could use a time series package for that, such as fpp i.e. install.packages('fpp'). Since you don't give an example code, I can't really help you properly with it but it's quite easy.
ts(your_data, start =, frequency = ) At start = you put the year or month where you'd start and at frequency = you'd put e.g. 36 since you talk about 36 years.
You might want to check out https://robjhyndman.com/. He has an online (free) book available that walks you through the use of his package as well as providing useful information with respect to time series analysis.
Hope this helps.
I have a data set containing the energy usage by day (date) from 01 Jan 2016 through to 07 Nov 2017 on a daily basis. One of the fields therein is a flag for non working day (nwd) with values of 0 and 1 indicating whether or not this is a working day.
The structure of the data looks like this :-
Date,usage,avgtemp,nwd
2016-01-01,28.5,105986,1
2016-01-02,29.2,105548,1
.
.
.
2017-11-07,98457,23.5,0
I created a data frame with these values - no problems. I then created 2 other data frames, one with nwd = 1 and other with nwd = 1 for the data set for non working and working days respectively.
I am trying to generate a time series (using zoo or xts package - I am open to either) for each of these 2 data frames so that I can then do the non stationarity tests (adf/pp) on them and then do the arima modelling to build a forecast model of the usage.
Can I use a time series for such data sets where the data is not quite regular because each of these series will have gaps - the work day series may have less than 5 continuous days in a week if there are holidays in between. The same would apply to the non working day series.
I cannot summarize this at a weekly level as I need to forecast them at a daily level and possibly at the half hourly level subsequently. I might even want to do 'ardl' modelling later using 'avgtemp' as one of the regressors.
P.S.
Found a post which to some extent is similar to mine but I can't seem to get it going based on the responses there :-
how to convert data frame into time series in R
I'm having trouble doing time series for my data set. Most examples have quarterly or monthly frequencies but my issue comes with data that is collect annually or every two years. Consider my code:
data<-data.frame(year=seq(1978,2012,2), number=runif(18,100,500))
time<-ts(data$number, start=1978, frequency=.5)
decomp<-decompose(time)
Error in decompose(time) : time series has no or less than 2 periods
How do I make R recognize time series values from data that is collected over an annual basis? Thanks!
Seasonal decomposition only makes sense with intra-yearly data, because you have seasons within years. So, trying to calculate seasonal effects with decompose on data collected every two years you get the error.