I'm relatively new to R so please bear with me. I'm trying to get to grips with basic irregular time-series analysis.
That's what my data file looks like, some 40k lines. The spacing is not always exactly 20sec.
Time, Avg
04/03/2015 00:00:23,20.24
04/03/2015 00:00:43,20.38
04/03/2015 00:01:03,20.53
04/03/2015 00:01:23,20.54
04/03/2015 00:01:43,20.53
data <- read.zoo("data.csv",sep=",",tz='',header=T,format='%d/%m/%Y %H:%M:%S')
I'm happy to aggregate by minutes
data <- to.minutes(as.xts(data))
Using the "open" column as an example
head(data[,1])
as.xts(data).Open
2015-03-04 00:00:43 20.24
2015-03-04 00:01:43 20.53
2015-03-04 00:02:43 20.47
2015-03-04 00:03:43 20.38
2015-03-04 00:04:43 20.05
2015-03-04 00:05:43 19.84
data <- data[,1]
And here is where it all falls apart for me
fit <- stl(data, t.window=15, s.window="periodic", robust=TRUE)
Error in stl(data, t.window = 15, s.window = "periodic", robust = TRUE) :
series is not periodic or has less than two periods
I've googled the error message, but it's not really clear to me. Is period = frequency? For my dataset I would expect the seasonal component to be weekly.
frequency(data) <- 52
fit <- stl(data, t.window=15, s.window="periodic", robust=TRUE)
Error in na.fail.default(as.ts(x)) : missing values in object
?
head(as.ts(data))
[1] 20.24 NA NA NA NA NA
Uh, what?
What am I doing wrong? How do I have to prepare the xts object to be able to properly pass it to stl()?
Thank you.
I extract numeric values of xts_object and build a ts object for stl function. However the time stamps of xts_object is completely ignored in this case.
stl(ts(as.numeric(xts_object), frequency=52), s.window="periodic", robust=TRUE)
Related
I have 3 years of daily data in a column and need to write the code in R to convert the data-frame into a time series object but I am unsure of the coding. I attach the raw data. I was wondering whether to set the frequency to monthly or leave it daily, or whether to adapt the raw data to make it more user friendly in R. Any advice/help would be appreciated.
Thanks
Martin.
I couldn't get the code to load up. I then changed the frequency to just a year and 1 and it accepted the data but it is not giving the full picture.
This is the R code
`install.packages("readxl")
install.packages("forecast")
install.packages("tseries")
library(readxl)
library(forecast)
library(tseries)
asb <- read_excel("C://Users//BCCAMNHY//OneDrive - Birmingham City Council//HomeFiles//My Documents//DATA ANALYST TRAINING//PROJECT 4//PROJECT DOCUMENTS//ASB_311022.xlsx")
View(asb)
class(
asbtime=ts(asb$`ASB Submitted`,start = min(asb$`Date for R`,end = max(asb$`Date for R`),frequency = 12)
class(asbtime)
library(forecast)
library(tseries)
plot(asbtime)
acf(asbtime)
pacf(asbtime)
adf.test(asbtime)
gdpmodel=auto.arima(gdptime,ic="aic",trace = TRUE) ## dont understand this line of code
acf(ts(asb$residuals)) # not sure if this code should be changed to asb$asb submitted
pacf(ts(asb$residuals))# as above
myasbforecast=forecast(asbmodel,level = c(95),h=10*4) ##### Don't understand this line of code. Want a monthly or daily forecast - think ideally monthly
mygdpforecast
plot(asbforecast)
Box.test(myasbforecast$resid, lag=5, type= "Ljung-Box")
Box.test(mygdpforecast$resid, lag=15, type= "Ljung-Box")
Box.test(myasbforecast$resid, lag=25, type= "Ljung-Box")
An extract of the raw data is:
Submitted Count of Submitted
01/03/2019 1
02/03/2019 0
03/03/2019 0
04/03/2019 0
05/03/2019 1
06/03/2019 0
07/03/2019 1
08/03/2019 2
09/03/2019 0
10/03/2019 0
11/03/2019 27
12/03/2019 54
13/03/2019 52
14/03/2019 46
15/03/2019 44
In your example, the names of the data columns do not match those used in the code. I think it's a coincidence but check it out anyway.
IMHO, these will be enought for conversion into ts:
asbtime=ts(asb$`Count of Submitted`, start=2019, frequency = 365)
plot(forecast(asbtime), xlab = "year", ylab="Submitted")
Both maxMATX and maxZIM return no observation, which I am very confused about.
Here is the code
library(tseries)
\#teries have all the Financial Data , hence we need to load it
data.ZIM\<- get.hist.quote("ZIM")
data.MATX\<- get.hist.quote("MATX")
data.ZIM\<-data.ZIM\[Sys.Date()-0:364\]
data.MATX\<-data.MATX\[Sys.Date()-0:364\]
head(data.ZIM)
head(data.MATX)
min(data.ZIM$Close)
max(data.ZIM$Close)
minZIM=data.ZIM\[data.ZIM$Close==24.34\]
maxZIM=data.ZIM\[data.ZIM$Close==88.62\]
data.ZIM\[data.ZIM$Close==88.62\]
minZIM
maxZIM
min(data.MATX$Close)
max(data.MATX$Close)
minMATX=data.MATX\[data.MATX$Close==60.07,\]
maxMATX=data.MATX\[data.MATX$Close==121.47,\]
minMATX
maxMATX
I was trying to extract the data from Tseries and I have faced difficulty when trying to print the row (or specifically I was trying to find the date of which the 52 weeks low and high was happening ).
Use which.min and which.max to find indexes of minimum and maximum close and use those to look up the time.
library(tseries)
data.ZIM <- get.hist.quote("ZIM", start = Sys.Date() - 364)
tmin <- time(data.ZIM)[which.min(data.ZIM$Close)]; tmin
## [1] "2021-03-31"
data.ZIM[tmin]
## Open High Low Close
## 2021-03-31 24.75 24.99 24.15 24.34
I am learning time series analysis with R and came across these 2 functions while learning. I do understand that the output of both of these is a periodic data defined by the frequency of period and the only difference I can see is the OHLC output option in the to.period().
Other than the OHLC when a particular of these functions is to be used?
to.period and all the to.minutes, to.weekly, to.quarterly are indeed meant for OHLC data.
If you take the function to.period it will take the open from the first day of the period, the close of the last day of the period and the highest high / lowest low of the specified period. These functions work very well together with the quantmod / tidyquant / quantstrat packages. See code example 1.
If you give the to.period non-OHLC data, but a timeseries with 1 data column, you still get a sort of OHLC back. See code example 2.
Now period.apply is is more interesting. Here you can supply your own functions to be applied on the data. Especially in combination with endpoints this can be a powerful function in timeseries data if you want to aggregate your function to different time periods. The index is mostly specified with endpoints, since with endpoints you can create the index you need to get to higher time levels (from day to week / etc etc). See code example 3 and 4.
Remember to use matrix functions with period.apply if you have more than 1 column of data since xts is basicly a matrix and an index. See code example 5.
More info on this data.camp course.
library(xts)
data(sample_matrix)
zoo.data <- zoo(rnorm(31)+10,as.Date(13514:13744,origin="1970-01-01"))
# code example 1
to.quarterly(sample_matrix)
sample_matrix.Open sample_matrix.High sample_matrix.Low sample_matrix.Close
2007 Q1 50.03978 51.32342 48.23648 48.97490
2007 Q2 48.94407 50.33781 47.09144 47.76719
# same as to.quarterly
to.period(sample_matrix, period = "quarters")
sample_matrix.Open sample_matrix.High sample_matrix.Low sample_matrix.Close
2007 Q1 50.03978 51.32342 48.23648 48.97490
2007 Q2 48.94407 50.33781 47.09144 47.76719
# code example 2
to.period(zoo.data, period = "quarters")
zoo.data.Open zoo.data.High zoo.data.Low zoo.data.Close
2007-03-31 9.039875 11.31391 7.451139 10.35057
2007-06-30 10.834614 11.31391 7.451139 11.28427
2007-08-19 11.004465 11.31391 7.451139 11.30360
# code example 3 using base standard deviation in the chosen period
period.apply(zoo.data, endpoints(zoo.data, on = "quarters"), sd)
2007-03-31 2007-06-30 2007-08-19
1.026825 1.052786 1.071758
# self defined function of summing x + x for the period
period.apply(zoo.data, endpoints(zoo.data, on = "quarters"), function(x) sum(x + x) )
2007-03-31 2007-06-30 2007-08-19
1798.7240 1812.4736 993.5729
# code example 5
period.apply(sample_matrix, endpoints(sample_matrix, on = "quarters"), colMeans)
Open High Low Close
2007-03-31 50.15493 50.24838 50.05231 50.14677
2007-06-30 48.47278 48.56691 48.36606 48.45318
I intend to perform a time series analysis on my data set. I have imported the data (monthly data from January 2015 till December 2017) from a csv file and my codes in RStudio appear as follows:
library(timetk)
library(tidyquant)
library(timeSeries)
library(tseries)
library(forecast)
mydata1 <- read.csv("mydata.csv", as.is=TRUE, header = TRUE)
mydata1
date pkgrev
1 1/1/2015 39103770
2 2/1/2015 27652952
3 3/1/2015 30324308
4 4/1/2015 35347040
5 5/1/2015 31093119
6 6/1/2015 20670477
7 7/1/2015 24841570
mydata2 <- mydata1 %>%
mutate(date = mdy(date))
mydata2
date pkgrev
1 2015-01-01 39103770
2 2015-02-01 27652952
3 2015-03-01 30324308
4 2015-04-01 35347040
5 2015-05-01 31093119
6 2015-06-01 20670477
7 2015-07-01 24841570
class(mydata2)
[1] "data.frame"
It is when running this piece of code that things get a little weird (for me at least):
mydata2_ts <- ts(mydata2, start=c(2015,1), freq=12)
mydata2_ts
date pkgrev
Jan 2015 16436 39103770
Feb 2015 16467 27652952
Mar 2015 16495 30324308
Apr 2015 16526 35347040
May 2015 16556 31093119
Jun 2015 16587 20670477
Jul 2015 16617 24841570
I don't really understand the values in the date column! It seems the dates have been converted into numeric format.
class(mydata2_ts)
[1] "mts" "ts" "matrix"
Now, running the following codes give me an error:
stlRes <- stl(mydata2_ts, s.window = "periodic")
Error in stl(mydata2_ts, s.window = "periodic") :
only univariate series are allowed
What is wrong with my process?
The reason that you got this error is because you tried to feed a data set with two variables (date + pkgrev) into STL's argument, which only takes a univariate time series as a proper argument.
To solve this problem, you could create a univariate ts object without the date variable. In your case, you need to use mydata2$pkgrev (or mydata2["pkgrev"] after mydata2 is converted into a dataframe) instead of mydata2 in your code mydata2_ts <- ts(mydata2, start=c(2015,1), freq=12). The ts object is already supplied with the temporal information as you specified start date and frequency in the argument.
If you would like to create a new dataframe with both the ts object and its corresponding date variable, I would suggest you to use the following code:
mydata3 = cbind(as.Date(time(mydata2_ts)), mydata2_ts)
mydata3 = as.data.frame(mydata3)
However, for the purpose of STL decompostion, the input of the first argument should be a ts object, i.e., mydata2_ts.
Using the zoo package (and help from SO) I have created a time series from the following:
z <- read.zoo("D:\\Futures Data\\BNVol3.csv", sep = ",", header = TRUE, index = 1:2,
tz="", format = "%d/%m/%Y %H:%M")
This holds data in the following format:(Intra-day from 07:00 to 20.50)
2012-10-01 14:50:00 2012-10-01 15:00:00 2012-10-01 15:10:00 2012-10-01 15:20:00
8638 9014 9402 9505
I want to "deseasonalize" the intra-day component of this data so that 1 day is considered a complete seasonal cycle. (I am using the day component because not all days will run from 07.00 to 20.50 due to bank holidays etc, but running from 07.00 to 20.50 is usually the standard. I assume that if i used the 84 intra-day points as 1 seasonal cycle then as some point the deseasonalizing will begin to get thrown off track)
I have tried to use the decompose method but this has not worked.
x <- Decompose(z)
Not sure "zoo" and decompose method are compatible but I thought "zoo" and "ts" were designed to be. Is there another way to do this?
Thanks in advance for any help.