Using lm() with lag() on time series object in R - r

I have a time series object (ts / mts) called "mydata".
(The dates go from 1980 to 2014)
class(mydata) [1] "mts" "ts" "matrix"
colnames(mydata) [1] "inflation" "unemployment"
equation1 = lm(inflation ~ unemployment + lag(unemployment, 1), data = mydata)
Two questions:
1. Have I specified the lag() correctly? I seem to get lots of NA's.
2. How do I get the residuals to keep the same dates as the time-series?
(i.e: "1981 to 2014" instead of just "1 to 34")

you can try print the output of both unemployment and lagged unemployment to see if there is something unusual happening otherwise the function specification looks fine to me.
You can use cbind(mydata,equation1$residuals) to bind residual together with the rest of your time series so that it will also have the same time.

Related

Converting dataset into Time Series

mydatasetI have this data set which consists of two attributes i.e Year(2016,2017,2018) and Month(JAN TO DEC). The data set contains the average sales value for all the months for the years 2016, 2017 & 2018. Now when I import this data set, it shows that the data set is a "data.frame" . However I want it to be in "ts" . Then I ran this command
data.ts<- as.ts(myData)
to convert my data into "ts". The result is as follows:
class(data.ts)
[1] "mts" "ts" "matrix"
Now, I want my data set to be in "ts" only, meaning when I run the command class(data.ts). It should show "ts" only. How can I convert my data in "ts" only? And does this "mts" and "matrix" matters or not?
Also, when I plot my data using the command
plot(data.ts)
It shows a plot in which Time is on x-axis while Year and Sales are on y-axis. On the other hand, I want to plot a graph which shows the Year in x axis and Sales values of Months on y-axis.
How do I arrange my data such that when I import the dataset, it is already in ts? Or is there any other way to do it? Also, how to arrange the dataset that it shows the Year on x axis by default. I'm really confused as all the videos that I have seen on YouTube has their data already in "ts". Also, their plot shows Year on x-axis. Hope I have made myself clear. Any help would be appreciated.
How can I plot the graph such that Year is on x axis?
Reorder the data in a single variable:
data=as.matrix(data)
data= as.data.frame(t(data))
names(data)=c('x2016','x2017','x2018')
series=c(data$x2016,data$x2017,data$x2018)
Then take just index accordingly to the start point and the frequency of data. In your case looks like monthly from 2016 hence:
data.ts=ts(series ,start=c(2016,1), frequency=12)
plot(data.ts)

Correct imputation for a zooreg object?

My objective is to impute NAs in a zooreg time series object. The pattern of the time series is cyclic. My code is:
#load libraries required
library("zoo")
# create sequence every 15 minutes from 1st Dec to 20th Dec, 2018
timeStamp <- seq.POSIXt(from=as.POSIXct('2018-01-01 00:00:00', tz="UTC"), to=as.POSIXct('2018-01-20 23:45:00', tz="UTC"), by = "15 min")
# data which increases from 12am to 12pm, then decreases till 12 am of next day, for 20 days
readings <- rep(c(seq(1,48,1), seq(48,1,-1)), 20)
dF <- data.frame(timeStamp=timeStamp, readings=readings)
# create a regular zooreg object, frequency is 1 day( 4 readings * 24 hours)
readingsZooReg <- zooreg(dF$readings, order.by = dF$timeStamp, frequency = 4*24)
plot(readingsZooReg)
# force some data to be NAs
window(readingsZooReg, start = as.POSIXct("2018-01-14 00:00:00", tz="UTC"), end = as.POSIXct("2018-01-16 23:45:00", tz="UTC")) <- NA
plot(readingsZooReg)
# plot imputed values
plot(na.approx(readingsZooReg))
The plots are:
Full time series, NAs added, Imputed time series
I'm purposely using zoo here, since the time series I work on are irregular(eg. solar, oil wells, etc)
1) Is my usage of "zooreg" correct? Or would a "zoo" object suffice ?
2) Is my frequency variable right?
3) Why won't na.approx work? I've also tried na.StructTs, the R script hangs.
4) Is there a solution using any other package? xts, ts, etc?
Your current example time-series is a regular time-series.
(a irregular time series would have time-steps with different time distances between observations)
E.g.:
10:00:10, 10:00:20, 10:00:30, 10:00:40, 10:00:50 (regular spaced)
10:00:10, 10:00:17, 10:00:33, 10:00:37, 10:00:50 (irregular spaced)
If you really need to handle irregular spaced time-series, zoo is your go to package. Otherwise you can also use other time series classes as xts and ts.
About the frequency:
You set the frequency of a time-series usually according to a value where you expect patterns to repeat. (in your example this could be 96). In real live this is often 1 day, 1 week, 1 month,....but it can be also different from these like 1,5 days. (e.g. if you have daily returning patterns and 1 minute observations you would set the frequency to 1440).
na.approx of zoo workes perfectly. It is exactly doing what it is expected to. A interpolation between the points 0 before the gap and 0 at the end of the gap will give a straight line at 0. Of course that is probably not the result you expected, because it does not account for seasonality. That is why G. Grothendieck suggests you na.StructTS as a method to choose. (this method is usually better in accounting for seasonality)
The best choice if you are not bound to zoo would in this specific case be using na_seadec from the imputeTS package ( a package solely dedicated to time series imputation).
I have added you a example also with nice plots from the imputeTS package
library(imputeTS)
yourTS <- ts(coredata(readingsZooReg), frequency = 96)
ggplot_na_distribution(yourTS)
imputedTS <- na_seadec(yourTS)
ggplot_na_imputations(yourTS, imputedTS)
Usually imputeTS also works perfectly with zoo time-series as input. I only changed it to ts again, because something with your zoo object seems odd...that is also why na.StructTS from zoo itself breaks. Maybe somebody with better knowledge can help out here.
Beware, if you really should have irregular time series do not use other packages / imputation functions than from zoo. Because they all assume the data to be regular spaced and will give results accordingly.

Convert multivariate XTS to TS in R

I wish to compute the wavelet transform of a multivariate time series dataset. I plan to use the wavethresh package and specifically the modwt() function. The help file for this function specifies that the object be either "A univariate or multivariate time series. Numeric vectors, matrices and data frames are also accepted."
Currently my dataset is in xts zoo format where the time is in 15 min intervals and I wish to convert it to ts but I am having great difficulty.
I have tried the following:
modwtCoeff <- modwt(as.ts(wideRawXTS,
+ start = head(index(wideRawXTS), 1),
+ end = tail(index(wideRawXTS), 1),
+ frequency = 1),
+ filter = "la8",
+ n.levels = "10",
+ boundary = "periodic",
+ fast = TRUE)
> class(wideRawXTS)
[1] "xts" "zoo"
where head(index(wideRawXTS,1),1) returns "2017-01-20 16:30:00 GMT" and tail(index(wideRawXTS,1),1) returns "2017-02-03 16:00:00 GMT"
I receive the following error as a result of the lines above:
Error in ts(coredata(x), frequency = frequency(x), ...) :
formal argument "frequency" matched by multiple actual arguments
The error lies in the xts to ts conversion as I removed the modwt wrapper function and I still get the same error. After further Googling I came across this article https://www.r-bloggers.com/preventing-argument-use-in-r/ but I don't fully get it. My guess is that I possibly need to decompose the conversion into individual steps to avoid errors from using some arguments in the as.ts function.
Can someone give me a bit of direction as to where I am going wrong in the conversion? In order to provide a reproducible example here is a link to a dput of the wideRawXTS object.
The general function to compute a frequency is:
frequency = number_of_events / time_interval
As your data have 1343 rows for a time interval of 14 days, the frequency depend on what is your time unit.
Time unit: Day
In this case, the frequency is:
1343/14 = 95.93 => 96
That's mean, you make 96 measurments per day.
Time unit: Hour
In this case, the frequency is:
1343/(14*24) = 3.99 => 4
That's mean, you make 4 measurments per hour.
Time unit: 15 Minute
In this case, the frequency is:
1343/(14*24*4) = 0.999 => 1
That's mean, you make one measurment every 15 minutes.

Simple time series analysis with R: aggregating and subsetting

I want to convert monthly data into quarterly averages. These are my 2 datasets:
gas <- UKgas
dd <- UKDriverDeaths
I was able to accomplish (I think) for the dd data as so:
dd.zoo <- zoo(dd)
ddq <- aggregate(dd.zoo, as.yearqtr, mean)
However I cannot figure out how to do this with the gas data...any help?
Follow-up
When I try to subset the data based on date (1969-1984) the resulting data does not include 1969 Q1 and instead includes 1985 Q1...any suggestions on how to fix this? I was just trying to subset as gas[1969:1984].
Originally I did not plan to post answer, as it looks like you did not pre-check your UKgas dataset to see that it is already a quarterly time series.
But the follow-up question is worth answering. "ts" object comes with many handy generic functions. We can use window to easily subset a time series. To extract the section between first quarter of 1969 and the final quarter of 1984, we can use
window(UKgas, start = c(1969,1), end = c(1984,4))
The result will still be a quarterly time series.
On the other hand, if we use "[" for subsetting, we lose object class:
class(UKgas[1:12])
#[1] "numeric"

Time Series Decomposition of weekly data

I am totally new to R and have just started using it. I have three years of weekly data. I want to decompose this time series data into trend, seasonal and other components. I have following doubts:
Which function I should use - ts()or decompose()
How to deal with leap year situation.
Please correct me if I am wrong, the frequency is 52.
Thanks in Advance. I would really appreciate any kind of help.
Welcome to R!
Yes, the frequency is 52.
If the data is not already classed as time-series, you will need both ts() and decompose(). To find the class of the dataset, use class(data). And if it returns "ts", your data is already a time-series as far as R is concerned. If it returns something else, like "data.frame", then you will need to change it to time-series. Assign a variable to ts(data) and check the class again to make sure.
There is a monthly time-series dataset sunspot.month already loaded into R that you can practice on. Here's an example. You can also read the help file for decompose by writing ?decompose
class(sunspot.month)
[1] "ts"
> decomp <- decompose(sunspot.month)
> summary(decomp)
Length Class Mode
x 2988 ts numeric
seasonal 2988 ts numeric
trend 2988 ts numeric
random 2988 ts numeric
figure 12 -none- numeric
type 1 -none- character
> names(decomp)
[1] "x" "seasonal" "trend" "random" "figure" "type"
> plot(decomp) # to see the plot of the decomposed time-series
The call to names indicates that you can also access the individual component data. This can be done with the $ operator. For example, if you want to look at the seasonal component only, use decomp$seasonal.
r time-series

Resources