Convert multivariate XTS to TS in R - r

I wish to compute the wavelet transform of a multivariate time series dataset. I plan to use the wavethresh package and specifically the modwt() function. The help file for this function specifies that the object be either "A univariate or multivariate time series. Numeric vectors, matrices and data frames are also accepted."
Currently my dataset is in xts zoo format where the time is in 15 min intervals and I wish to convert it to ts but I am having great difficulty.
I have tried the following:
modwtCoeff <- modwt(as.ts(wideRawXTS,
+ start = head(index(wideRawXTS), 1),
+ end = tail(index(wideRawXTS), 1),
+ frequency = 1),
+ filter = "la8",
+ n.levels = "10",
+ boundary = "periodic",
+ fast = TRUE)
> class(wideRawXTS)
[1] "xts" "zoo"
where head(index(wideRawXTS,1),1) returns "2017-01-20 16:30:00 GMT" and tail(index(wideRawXTS,1),1) returns "2017-02-03 16:00:00 GMT"
I receive the following error as a result of the lines above:
Error in ts(coredata(x), frequency = frequency(x), ...) :
formal argument "frequency" matched by multiple actual arguments
The error lies in the xts to ts conversion as I removed the modwt wrapper function and I still get the same error. After further Googling I came across this article https://www.r-bloggers.com/preventing-argument-use-in-r/ but I don't fully get it. My guess is that I possibly need to decompose the conversion into individual steps to avoid errors from using some arguments in the as.ts function.
Can someone give me a bit of direction as to where I am going wrong in the conversion? In order to provide a reproducible example here is a link to a dput of the wideRawXTS object.

The general function to compute a frequency is:
frequency = number_of_events / time_interval
As your data have 1343 rows for a time interval of 14 days, the frequency depend on what is your time unit.
Time unit: Day
In this case, the frequency is:
1343/14 = 95.93 => 96
That's mean, you make 96 measurments per day.
Time unit: Hour
In this case, the frequency is:
1343/(14*24) = 3.99 => 4
That's mean, you make 4 measurments per hour.
Time unit: 15 Minute
In this case, the frequency is:
1343/(14*24*4) = 0.999 => 1
That's mean, you make one measurment every 15 minutes.

Related

Correct imputation for a zooreg object?

My objective is to impute NAs in a zooreg time series object. The pattern of the time series is cyclic. My code is:
#load libraries required
library("zoo")
# create sequence every 15 minutes from 1st Dec to 20th Dec, 2018
timeStamp <- seq.POSIXt(from=as.POSIXct('2018-01-01 00:00:00', tz="UTC"), to=as.POSIXct('2018-01-20 23:45:00', tz="UTC"), by = "15 min")
# data which increases from 12am to 12pm, then decreases till 12 am of next day, for 20 days
readings <- rep(c(seq(1,48,1), seq(48,1,-1)), 20)
dF <- data.frame(timeStamp=timeStamp, readings=readings)
# create a regular zooreg object, frequency is 1 day( 4 readings * 24 hours)
readingsZooReg <- zooreg(dF$readings, order.by = dF$timeStamp, frequency = 4*24)
plot(readingsZooReg)
# force some data to be NAs
window(readingsZooReg, start = as.POSIXct("2018-01-14 00:00:00", tz="UTC"), end = as.POSIXct("2018-01-16 23:45:00", tz="UTC")) <- NA
plot(readingsZooReg)
# plot imputed values
plot(na.approx(readingsZooReg))
The plots are:
Full time series, NAs added, Imputed time series
I'm purposely using zoo here, since the time series I work on are irregular(eg. solar, oil wells, etc)
1) Is my usage of "zooreg" correct? Or would a "zoo" object suffice ?
2) Is my frequency variable right?
3) Why won't na.approx work? I've also tried na.StructTs, the R script hangs.
4) Is there a solution using any other package? xts, ts, etc?
Your current example time-series is a regular time-series.
(a irregular time series would have time-steps with different time distances between observations)
E.g.:
10:00:10, 10:00:20, 10:00:30, 10:00:40, 10:00:50 (regular spaced)
10:00:10, 10:00:17, 10:00:33, 10:00:37, 10:00:50 (irregular spaced)
If you really need to handle irregular spaced time-series, zoo is your go to package. Otherwise you can also use other time series classes as xts and ts.
About the frequency:
You set the frequency of a time-series usually according to a value where you expect patterns to repeat. (in your example this could be 96). In real live this is often 1 day, 1 week, 1 month,....but it can be also different from these like 1,5 days. (e.g. if you have daily returning patterns and 1 minute observations you would set the frequency to 1440).
na.approx of zoo workes perfectly. It is exactly doing what it is expected to. A interpolation between the points 0 before the gap and 0 at the end of the gap will give a straight line at 0. Of course that is probably not the result you expected, because it does not account for seasonality. That is why G. Grothendieck suggests you na.StructTS as a method to choose. (this method is usually better in accounting for seasonality)
The best choice if you are not bound to zoo would in this specific case be using na_seadec from the imputeTS package ( a package solely dedicated to time series imputation).
I have added you a example also with nice plots from the imputeTS package
library(imputeTS)
yourTS <- ts(coredata(readingsZooReg), frequency = 96)
ggplot_na_distribution(yourTS)
imputedTS <- na_seadec(yourTS)
ggplot_na_imputations(yourTS, imputedTS)
Usually imputeTS also works perfectly with zoo time-series as input. I only changed it to ts again, because something with your zoo object seems odd...that is also why na.StructTS from zoo itself breaks. Maybe somebody with better knowledge can help out here.
Beware, if you really should have irregular time series do not use other packages / imputation functions than from zoo. Because they all assume the data to be regular spaced and will give results accordingly.

Reading Time series data in R

I am trying to import time series data in R with the below code. The data is from 1-7-2014 to 30-4-2017 making it 1035 data point. But when I use the below code it gives 1093 observation.
series <- ts(data1, start=c(2014,7,1), end=c(2017,4,30), frequency = 365)
Can someone help me in understanding where am I going wrong?
ts doesn't allow input for start and end in this form. Either a single number or a vector of two integers is allowed. In second case it's year and day number, starting from 1st January.
With the help of lubridate you can use the following. decimal_date will convert the date to proper integer, suitable for ts.
library(lubridate)
series <- ts(data1, start=decimal_date(as.Date("2014-07-01")), end=decimal_date(as.Date("2017-04-30") + 1), frequency = 365)
> length(series)
[1] 1035

Understand frequency parameter while converting xts to ts object in R

What is the meaning of frequency below; when I have converted my xts object to ts object and tried printing ts object I got below information.
My data is hourly data. But I could not understand how this below frequency is calculated. I want to make sure my ts object is treating my data as hourly data.
Time Series:
Start = 1
End = 15548401
Frequency = 0.000277777777777778 (how this is equivalent to hourly frequency?)
So, My dataframe looks like below intitally:
y
1484337600 19.22819
1484341200 19.28906
1484344800 19.28228
1484348400 19.21669
1484352000 19.32759
1484355600 19.21833
1484359200 19.20626
1484362800 19.28737
1484366400 19.20651
1484370000 19.18424
It has epoch times and values. Epoch times are row.names in this dataframe.
Now, I converted into xts object using --
xts_dataframe <- xts(x = dataframe$y,
order.by = as.POSIXct(as.numeric(row.names(dataframe)), origin="1970-01-01"))
ts_dataframe <- as.ts(xts_dataframe)
Please suggest what I'm doing wrong? Basically I want to convert my initial dataframe to ts() object as I need to apply ARIMA on it. This data is per hour data. I'm really facing hard time to work with it.
The frequency is equivalent to 1/deltat, where deltat is the fraction of the sampling period between successive observations. ?frequency gives the example that deltat would be "1/12 for monthly data".
In the case of hourly data, deltat is 3600, since there are 3600 seconds in an hour. Since frequency = 1 / deltat, that means frequency = 1 / 3600, or 0.0002777778.

Weekly time series in R

I need to process five years of weekly data. I used the following command to create a time series from that:
my.ts <- ts(x[,3], start = c(2009,12), freq=52)
When plotting the series it looks good. However, the time points of the observations are stored as:
time(my.ts)
# Time Series:
# Start = c(2009, 12)
# End = c(2014, 26)
# Frequency = 52
# [1] 2009.212 2009.231 2009.250 2009.269 2009.288 2009.308 2009.327 ...
I expected to see proper dates instead (which should be aligned with a Calendar). What shall I do?
That is how the "ts" class works.
The zoo package can represent time series with dates (and other indexes):
library(zoo)
z <- zooreg(1:3, start = as.Date("2009-12-01"), deltat = 7)
giving:
> z
2009-12-01 2009-12-08 2009-12-15
1 2 3
> time(z)
[1] "2009-12-01" "2009-12-08" "2009-12-15"
The xts package and a number of other packages can also represent time series with dates although they do it by converting to POSIXct internally whereas zoo maintains the original class.

Calculating a daily mean in R

Say I have the following matrix:
x1 = 1:288
x2 = matrix(x1,nrow=96,ncol=3)
Is there an easy way to get the mean of rows 1:24,25:48,49:72,73:96 for column 2?
Basically I have a one year time series and I have to average some data every 24 hours.
There is.
Suppose we have the days :
Days <- rep(1:4,each=24)
you could do easily
tapply(x2[,2],Days,mean)
If you have a dataframe with a Date variable, you can use that one. You can do that for all variables at once, using aggregate :
x2 <- as.data.frame(cbind(x2,Days))
aggregate(x2[,1:3],by=list(Days),mean)
Take a look at the help files of these functions to start with. Also do a search here, there are quite some other interesting answers on this problem :
Aggregating daily content
Compute means of a group by factor
PS : If you're going to do a lot of timeseries, you should take a look at the zoo package (on CRAN : http://cran.r-project.org/web/packages/zoo/index.html )
1) ts. Since this is a regularly spaced time series, convert it to a ts series and then aggregate it from frequency 24 to frequency 1:
aggregate(ts(x2[, 2], freq = 24), 1, mean)
giving:
Time Series:
Start = 1
End = 4
Frequency = 1
[1] 108.5 132.5 156.5 180.5
2) zoo. Here it is using zoo. The zoo package can also handle irregularly spaced series (if we needed to extend this). Below day.hour is the day number (1, 2, 3, 4) plus the hour as a fraction of the day so that floor(day.hour) is just the day number:
library(zoo)
day.hour <- seq(1, length = length(x2[, 2]), by = 1/24)
z <- zoo(x2[, 2], day.hour)
aggregate(z, floor, mean)
## 1 2 3 4
## 108.5 132.5 156.5 180.5
If zz is the output from aggregate then coredata(zz) and time(zz) are the values and times, respectively, as ordinary vectors.
Quite compact and computationally fast way of doing this is to reshape the vector into a suitable matrix and calculating the column means.
colMeans(matrix(x2[,2],nrow=24))

Resources