Having trouble with R's time series objects - r

I have a column of 84 monthly expenditures from 1/2004 - 12/2010, which in Excel looks like...
12247815.55
11812697.14
13741176.13
21372260.37
27412419.28
42447077.96
55563235.3
45130678.8
54579583.53
43406197.32
34318334.64
25321371.4
...(74 more entries)
I am trying to run an stl() from the forecast package on this series, and so I load the data:
d <- ts(read.csv("deseason_vVectForTS.csv",
header = TRUE),
start=c(2004,1),
end=c(2010,12),
frequency = 12)
(If I do header=FALSE it will absorb the first entry - 122...- as the header for the second column, and name the first column's header 'X')
But instead of my environment being populated with a Time Series Object from 2004 to 2011 (as it has said before) it simply says ts[1:84, 1].
Probably related is the fact that,
fit <- stl(d)
throws
Error in stl(d) : only univariate series are allowed.
despite the fact that
head(d)
[1] 12247816 11812697 13741176 21372260 27412419 42447078
and
d
Jan Feb Mar Apr May Jun Jul Aug Sep Oct
2004 12247816 11812697 13741176 21372260 27412419 42447078 55563235 45130679 54579584 43406197
("years 2005-2010 look exactly the same, and all rows have columns for Jan-Dec; it just doesn't fit on here neatly - just trying to show the object has taken the ts labeling structure.")
What am I doing wrong? As far as I know this is the same way I have been building my time series objects in the past...

read.csv reads in a matrix. If it only has one column, it is still a matrix. To make it a vector use
d <- ts(read.csv("deseason_vVectForTS.csv",
header = TRUE)[,1],
start=c(2004,1),
end=c(2010,12),
frequency = 12)
Also, please check your facts. stl is in the stats package, not the forecast package. This is easily checked by using help(stl).

Related

R : DSA package (Daily seasonal adjustment) for time series, problem with dsa function and February 29th

I have a time series starting on January 30th 2018 and ending on June 14th 2020 that I would like to seasonnally adjust with the Indian holidays. To do so, I wanted to use the DSA package and the dsa function, based on the research that I made (It is a model that can deal with daily time series unlike X-13 for instance).
First of all, I import my csv file with 2 columns and 867 rows and I convert it into a xts object (necessary form for the use of the dsa function).
library(dsa)
library(xts)
daily_demand_df = read.table(file = file.path(file_path,"Demand data - daily.csv"),
sep = ";", row.names = NULL, header = FALSE,
encoding = 'utf-8', skip = 1,
colClasses = c("character", "character"),
col.names = c("Date","Demand_Value"))
daily_demand_df$Demand_Value <- as.numeric(gsub(',', '.', daily_demand_df$Demand_Value))
daily_demand_df$Date<-as.Date(daily_demand_df$Date, format = "%d/%m/%Y")
daily_demand_df <- daily_demand_df[order(daily_demand_df$Date),]
rownames(daily_demand_df) <- 1:nrow(daily_demand_df)
head(x = daily_demand_df)
Date Demand_Value
1 2018-01-29 3242.5
2 2018-01-30 3269.5
3 2018-01-31 3276.9
4 2018-02-01 3274.1
5 2018-02-02 3291.3
6 2018-02-03 3286.1
daily_demand_timeserie <- xts(x = daily_demand_df$Demand_Value,
order.by = daily_demand_df$Date,
frequency=365.2425)
Then I tried to apply the dsa function (without the holidays effect for now) on the time series, but I got the following error:
adjusted <- dsa(series = daily_demand_timeserie)
Error in xts::xts(s1, order.by = xts::last(times, n = length(s1))) : NROW(x) must match length(order.by)
I tried to explore the DSA function source code in order to understand where the probleme might lie, and I found out that in the function, at some point, February 29th is removed from the time series. Then I modified the source code of the dsa function to print the length of the s1 and the time series used in the dsa function and it returned 869 and 868, hence the length issue.
Does anyone know how to solve this issue ?
Thank you in advance, here is the documentation that I used.
Link for the theoretical paper on DSA
DSA reference manual
DSA function source code
First of all, you do not need to specify the frequency variable in your xts() definition. But I do not think that this caused the problem.
I changed a few things in the package, so I hope that the problem does not occur anymore.
Generally, check whether you have missing values in the beginning and end of the series, because this can lead to problems in how dsa handels and interpolates missing data.

Subset xts object using variables for start and end periods

I have a xts object called 'usagexts' with dates from 01 Oct 15 to 31 Mar 18. I want to create 3 subsets of this object for the periods 01 Oct 15 to 31 Mar 16, 01 Oct 16 to 31 Mar 17 and 01 Oct 17 to 31 Mar 18 without actually hardcoding the dates as these will changes as time goes on.
The object structure is like so :
dateperiod,usageval
2015-10-01,21542
2015-10-02,21572
2015-10-03,21342
...
...
2018-03-31,20942
I have another data frame called 'periodvalues' like so :-
startdate,enddate, periodtext
2015-10-01,2016-03-31,1510_1603
2016-10-01,2017-03-31,1610_1703
2017-10-01,2018-03-31,1710_1803
I want to be able to create 3 xts objects like so :-
usagexts_1510_1603 -> xts object containing usage details for relevant period
usagexts_1610_1703 -> xts object containing usage details for relevant period
usagexts_1710_1803 -> xts object containing usage details for relevant period
I only got as far as creating a list of size 3 containing the periodtext from the above data frame. I was trying to somehow specify the start and end period for the xts object using the "objectname fromdate/todate" structure through variables but it didn't work - something like so :
usagexts_1610_1703 <- usagexts[var1/var2]
The LHS came from the list and the variables on the RHS cames from variable defintion done prior.
usagexts_1610_1703 <- usagexts[var1/var2]
Expected results should be like so :
usagexts_1510_1603 <- usagexts["2015-10-01/2016-03-31"]
usagexts_1610_1703 <- usagexts["2016-10-01/2017-03-31"]
usagexts_1710_1803 <- usagexts["2017-10-01/2018-03-31"]
Any assistance on that shall be highly valued.
Best regards
Deepak
If var1 and var2 are variables, then the filter string can be specified using paste as:
usagexts[paste(var1, var2, sep="/")]

How to add only one observation at a time amongst several observations in R?

Say I have observations for several periods for financial data, how can I create a function in R that only adds one observation at a time throughout my dataset so that I can compare how a single observation impacts my original data?
Say for instance that I have something like this:
Apple Microsoft Tesla Amazon
2010 0.8533719 0.8078440 0.2620114 0.1869552
2011 0.7462573 0.5127501 0.5452448 0.1369686
2012 0.7580671 0.5062639 0.7847919 0.8362821
2013 0.3154078 0.6960258 0.7303597 0.6057027
2014 0.4741735 0.3906580 0.4515726 0.1396147
2015 0.4230036 0.4728911 0.1262413 0.7495193
2016 0.2396552 0.5001825 0.6732861 0.8535837
2017 0.2007575 0.8875209 0.5086837 0.2211072
#And I define my original covariance matrix as follows:
cov.m <- cov(x[1:5,])
#I would like to add only one new observation at a time, so the results should be:
cov(x[1:5,]), cov(x[1:6,]), cov(x[1:7,]), cov(x[1:8,])
I have tried using rbind and a repeat loop, but it seems like I still have to define every row to include in rbind, which is quite tedious if I want to test on say 100+ different observations as I then manually need to specify all the observations, and I would have no use for the repeat loop in that case either.
Does this get you closer to your expected output?
lapply(5:nrow(x), function(y) cov(x[1:y, ]))

R - addtable2plot() not displaying dataframe after plot()

So I'm having problems in R because I'm trying to add a dataframe into the blank space of a plot using addtable2plot() but it's not displaying the desired dataframe.
My plot is the forecast of a time series model which I called model, so the plot is given by
plot(forecast(model),6)
which yields
The dataframe is given by df<-data.frame(forecast(model,6))[,1:3] with output
Point.Forecast Lo.80 Hi.80
Jun 2017 174.3482 157.4225 191.2738
Jul 2017 174.3574 155.0521 193.6627
Aug 2017 172.4448 151.0009 193.8887
Sep 2017 175.8619 152.4541 199.2697
Oct 2017 179.7774 154.5395 205.0152
Nov 2017 176.8982 149.9368 203.8597
and the way I'm trying to display the dataframe onto the plot is addtable2plot(.5,8,df,bty="o",display.rownames=TRUE,hlines=TRUE,vlines=TRUE,title="My forecast") but this is not displaying the dataframe.
It's strange because the example coming from the documentation works perfect as you can see
So it looks like addtable2plot() doesn't work together with plot() but I can't find a reference where this is mentioned explicitly or an alternative to what I want to achieve.
Any help is appreciated!
You need to set up your table coordinates (x,y) properly. Something arround (2000,200) in your case.
addtable2plot(2000,200,df,bty="o",display.rownames=TRUE,hlin‌​es=TRUE,vlines=TRUE,‌​title="My forecast")

Select a value from time series by date in R

How to select a value from time series corresponding needed date?
I create a monthly time series object with command:
producers.price <- ts(producers.price, start=2012+0/12, frequency=12)
Then I try to do next:
value <- producers.price[as.Date("01.2015", "%m.%Y")]
But this doesn't make that I want and value is equal
[1] NA
Instead of 10396.8212805739 if producers.price is:
producers.price <- structure(c(7481.52109434237, 6393.18959031561, 6416.63065650718,
5672.08354710121, 7606.24186413516, 5201.59247092013, 6488.18361474813,
8376.39182893415, 9199.50916585545, 8261.87133079494, 8293.8195347453,
8233.13630279516, 7883.17272003961, 7537.21001580393, 6566.60260432381,
7119.99345843556, 8086.40101607729, 9125.11104610046, 10134.0228610828,
10834.5732454454, 9410.35031874371, 9559.36933274129, 9952.38679679724,
10390.3628690951, 11134.8432864557, 11652.0075507499, 12626.9616107684,
12140.6698452193, 11336.8315981684, 10526.0309052316, 10632.1492109584,
8341.26367412737, 9338.95688558448, 9732.80173656971, 10724.5525831506,
11272.2273444623, 10396.8212805739, 10626.8428853062, 11701.0802817581,
NA), .Tsp = c(2012, 2015.25, 12), class = "ts")
So, I had/have a similar problem and was looking all over to solve it. My solution is not as great as I'd have wanted it to be, but it works. I tried it out with your data and it seems to give the right result.
Explanation
Turns out in R time series data is really stored as a sequence, starting at 1, and not with yout T. Eg. If you have a time series that starts in 1950 and ends in 1960 with each data at one year interval, the Y at 1950 will be ts[1] and Y at 1960 will be ts[11].
Based on this logic you will need to subtract the date from the start of the data and add 1 to get the value at that point.
This code in R gives you the result you expect.
producers.price[((as.yearmon("2015-01")- as.yearmon("2012-01"))*12)+1]
If you need help in the time calculations, check this answer
You will need the zoo and lubridate packages
Get the difference between dates in terms of weeks, months, quarters, and years
Hope it helps :)
1) window.ts
The window.ts function is used to subset a "ts" time series by a time window. The window command produces a time series with one data point and the [[1]] makes it a straight numeric value:
window(producers.price, start = 2015 + 0/12, end = 2015 + 0/12)[[1]]
## [1] 10396.82
2) zoo We can alternately convert it to zoo and subscript it by a yearmon class variable and then use [[1]] or coredata to convert it to a plain number or we can use window.zoo much as we did with window.ts :
library(zoo)
as.zoo(producers.price)[as.yearmon("2015-01")][[1]]
## [1] 10396.82
coredata(as.zoo(producers.price)[as.yearmon("2015-01")])
## [1] 10396.82
window(as.zoo(producers.price), 2015 + 0/12 )[[1]]
## [1] 10396.82
coredata(window(as.zoo(producers.price), 2015 + 0/12 ))
## [1] 10396.82
3) xts The four lines in (2) also work if library(zoo) is replaced with library(xts) and as.zoo is replaced with as.xts.
Looking for a simple command, one line and no library needed?
You might try this.
as.numeric(window(producers.price, 2015.1, 2015.2))

Resources