I have one time series object that looks as follows (sorry, I don't know how to format it any more nicely):
Jan Feb Mar Apr May Jun
Jul Aug Sep
2010 0.051495184 0.012516017
0.029767280 0.046781229 0.041615717 0.002205329 0.056919026 -0.026339813 0.078932572 ...
It contains data from 2010m01 - 2014m12
And one that looks like this:
Time Series: Start = 673 End = 732 Frequency = 1
[1] 0.01241940
0.01238126 0.01234626 0.01227542 ...
They have the same number of observations. However, when I try to subract them I get the error:
Error in .cbind.ts(list(e1, e2), c(deparse(substitute(e1))[1L], deparse(substitute(e2))[1L]), :
not all series have the same frequency
Can anyone tell me what I can do to subtract the two?
Thanks in advance.
Edit:
str() gives:
Time-Series [1:60] from 2010 to 2015: 0.0515 0.0125 0.0298 0.0468
0.0416 ...
and
Time-Series [1:60] from 673 to 732: 0.0124 0.0124 0.0123 0.0123
0.0122 ...
There is no frequency<- function, but you can change the frequency of time-series objects using the ts function:
> x <- ts(1:10, frequency = 4, start = c(1959, 2))
> frequency(x) <- 12
Error in frequency(x) <- 12 : could not find function "frequency<-"
> y <- ts(x, frequency=12)
> frequency(y)
[1] 12
Related
I have a dataset that has daily prices from Jan 1 2009 to Jan 1 2019 and I want to transform it into a time series. When I use monthly data, the ts() function works as expected:
> head(monthlyts)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1999 0.8811 0.8854 0.9251 0.8940 0.8746 0.8521 0.8522 0.8799 0.9143 0.8951 0.9123 0.8862
2000 0.8665 0.8934 0.8900 0.8709 0.8463 0.8185 0.8319 0.8266 0.8677 0.8697 0.8346 0.8575
but when I try it with daily prices it appears completely differently:
> head(dailyts)
Time Series:
Start = 2009
End = 2009.01368925394
Frequency = 365.25
Price
[1,] 0.8990
[2,] 0.8990
[3,] 0.9014
[4,] 0.9004
[5,] 0.9041
[6,] 0.8986
The code I'm using for both is the same so I'm not sure what the issue is.
monthlyts <- ts(mprices['Price'], frequency=12, start=c(2009,1))
dailyts <- ts(dprices['Price'], frequency=365.25, start=c(2009,1))
There's no change in the data either, both .csv files are dl'd from the same website and are the same timeframe, just one is monthly and one is daily.
Any ideas on how to get the daily time series properly?
Here's some test data that's representative of the problem
data <- as.data.frame(sample(seq(from=0, to=1, by=0.0001), size = 730, replace = TRUE))
colnames(data) <- 'data'
datats <- ts(data, frequency=365, start=c(2009,1))
head(datats)
It should output two rows of data labelled 2009 and 2010 with 365 columns in each row.
I am having sales data of an item for two years & for each month. The first column is row index [1:24], & the second column is the sales figure. I want to use STL to find seasonal & trend component. I tried;
ts_data<- ts(mydata[,-1],frequency = 12,start=c(2016,1))
ts_data
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2016 1250 760 2590 7990 2070 6770 4760 4270 2550 6070 4580 2350
2017 1510 4140 2450 3010 1070 1230 850 490 170 1970 890 1871
I can see the data is now a timeseries object with proper timestamps. But,
ts.stl <- stl(ts_data,"periodic")
is giving me the well-known error,
Error in stl(ts_data, "periodic") :
series is not periodic or has less than two periods
I can see the data is periodic & spreading for two years. So, what is the reason behind it?
Surprisingly, the same data is working with frequency=10 & I am able to generate the numbers. But having 10 frequency, it's going to 2018 which is wrong & completely unacceptable for my case.
ts_data<- ts(mydata[,-1],frequency = 10,start=c(2016,1))
ts.stl <- stl(ts_data,"periodic")
plot(ts.stl)
ts.stl
Call:
stl(x = ts_data, s.window = "periodic")
Components
Time Series:
Start = c(2016, 1)
End = c(2018, 4)
Frequency = 10
seasonal trend remainder
2016.0 -699.45750 2963.3599 -1013.90241
2016.1 -920.17520 3160.1418 -1479.96656
2016.2 -864.22434 3356.9236 97.30074
2016.3 2246.22241 3521.8163 2221.96126
2016.4 -810.00737 3686.7091 -806.70170
2016.5 1872.04426 3827.6781 1070.27765
2016.6 -50.90412 3968.6471 842.25700
2016.7 -174.47317 4057.6799 386.79324
2016.8 -1183.04223 4146.7128 -413.67053
2016.9 584.01391 3922.6366 1563.34950
2017.0 -699.45750 3698.5604 1580.89707
2017.1 -920.17520 3391.5549 -121.37975
2017.2 -864.22434 3084.5495 -710.32512
2017.3 2246.22241 2728.3775 -834.59993
2017.4 -810.00737 2372.2056 887.80179
2017.5 1872.04426 2129.8152 -991.85943
2017.6 -50.90412 1887.4248 -766.52066
2017.7 -174.47317 1704.6805 -300.20736
2017.8 -1183.04223 1521.9363 511.10594
2017.9 584.01391 1341.6322 -1435.64615
2018.0 -699.45750 1161.3282 -291.87070
2018.1 -920.17520 999.3977 1890.77746
2018.2 -864.22434 837.4673 916.75706
2018.3 2246.22241 689.0683 -1064.29067
How can I find the components using frequency=12 in STL? Please guide me.
I did a forecast in R of a stl fit to a subset of time series data using the code below. The only difference between Scenario 1 & 2 is that I mistakenly set both the original time series and subset time series to start=c(2015,12) in Scenarios 2. The forecast results for the two scenarios is different. Both Scenarios have the same start date for the subset data that is used in the forecast. I do not understand why the original time series start date impacts the forecast results.
# Scenario 1
ts.vision = ts(data=vision$ADJ_ILR, frequency = 12,start=c(2015,1), end=c(2017,12))
ts.vision.sub <- window(ts.vision, start=c(2015, 12))
ts.vision.fit <- stl(ts.vision.sub, t.window=15, s.window="periodic", robust=TRUE)
forecast(ts.vision.fit, h=12)
# Scenario 2
ts.vision = ts(data=vision$ADJ_ILR, frequency = 12,start=c(2015,12), end=c(2017,12))
ts.vision.sub <- window(ts.vision, start=c(2015, 12))
ts.vision.fit <- stl(ts.vision.sub, t.window=15, s.window="periodic", robust=TRUE)
forecast(ts.vision.fit, h=12)
Here is a similar situation where I have re-written the two scenarios using the nottem dataset available on base R:
# Scenario 1
ts.nottem = ts(nottem, frequency = 12,start=c(1920,1), end=c(1939,12))
ts.nottem.sub <- window(ts.nottem, start=c(1937, 12))
ts.nottem.fit <- stl(ts.nottem.sub, t.window=15, s.window="periodic", robust=TRUE)
forecast(ts.nottem.fit, h=12)
#Scenario 2
ts.nottem = ts(nottem, frequency = 12,start=c(1937,12), end=c(1939,12))
ts.nottem.sub <- window(ts.nottem, start=c(1937, 12))
ts.nottem.fit <- stl(ts.nottem.sub, t.window=15, s.window="periodic", robust=TRUE)
forecast(ts.nottem.fit, h=12)
I do not understand why the forecast results are different even though the subset time series data is the same between Scenario 1 & 2.
Here is your example with some built in data
library(forecast)
ts.1 <- ts(data=nottem[1:36], frequency = 12, start=c(2015,1), end=c(2017,12))
ts.1.sub <- window(ts.1, start=c(2015, 12))
ts.1.fit <- stl(ts.1.sub, t.window=15, s.window="periodic", robust=TRUE)
ts.1.fc <- forecast(ts.1.fit, h=12)
ts.2 <- ts(data=nottem[1:36], frequency = 12, start=c(2015,12), end=c(2017,12))
ts.2.sub <- window(ts.2, start=c(2015, 12))
ts.2.fit <- stl(ts.2.sub, t.window=15, s.window="periodic", robust=TRUE)
ts.2.fc <- forecast(ts.2.fit, h=12)
#all.equal(ts.1.fc, ts.2.fc) # NO!
I omit the long angry messages! What's happening? Take a look at the two subset objects:
head(ts.1.sub)
#> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
#> 2015 39.8
#> 2016 44.2 39.8 45.1 47.0 54.1
head(ts.2.sub)
#> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
#> 2015 40.6
#> 2016 40.8 44.4 46.7 54.1 58.5
They aren't the same, because in scenario 1 you are getting the 12th value through the 36th value in the original data, and in scenario 2 you are getting 1st through 25th values.
This is the normal output from the test:
attach(airquality)
pw <- pairwise.wilcox.test(Ozone, Month, p.adj = "bonf")
pw
data: Ozone and Month
May Jun Jul Aug
Jun 1.0000 - - -
Jul 0.0003 0.1414 - -
Aug 0.0012 0.2591 1.0000 -
Sep 1.0000 1.0000 0.0074 0.0325
I recently had to conduct a test with 10 levels of a factor. While the lower triangular format of the pairwise.wilcox.test is useful and concise, I thought it would be convenient to arrange it in a simlar way to the Tukey HSD output where each pairwise combination is listed along with it's asociated p value. This was my attempt to do this:
pw.df <- as.data.frame(pw$p.value)
pw.diff <- vector("character")
pw.pval <- vector("numeric")
for (i in 1:ncol(pw.df) )
for (j in i:length(pw.df) ) {
pw.diff <- c(pw.diff,paste(colnames(pw.df[i]),"-",rownames(pw.df)[j]))
pw.pval <- c(pw.pval,pw.df[j,i])
}
# order them by ascending p value
v <- order(pw.pval,decreasing = F)
pw.df <- data.frame(pw.diff[v],pw.pval[v])
# display those that are significant at the 5% level
pw.df[pw.df$pw.pval<0.05,]
pw.diff.v. pw.pval.v.
1 May - Jul 0.000299639
2 May - Aug 0.001208078
3 Jul - Sep 0.007442604
4 Aug - Sep 0.032479550
If anyone has some tips/tricks/advice on how to make this easier and/or more elegant I would be grateful.
I would use the reshape or reshape2 package for this task, specifically the melt() command. The object returned by pairwise.wilcox.test contains the data of interest in the third slot, so something like melt(pw[[3]]) should do the trick:
X1 X2 value
1 Jun May 1.000000000
2 Jul May 0.000299639
3 Aug May 0.001208078
4 Sep May 1.000000000
5 Jun Jun NA
....
I've written a function that takes a data.frame which represent intervals of data which occur across a 1 minute timeframe. The purpose of the function is to take these 1 minute intervals and convert them into higher intervals. Example, 1 minute becomes 5 minute, 60 minute etc...The data set itself has the potential to have gaps in the data i.e. jumps in time so it must accommodate for these bad data occurrences. I've written the following code which appears to work but the performance is absolutely terrible on large data sets.
I'm hoping that someone could provide some suggestions on how I might be able to speed this up. See below.
compressMinute = function(interval, DAT) {
#Grab all data which begins at the same interval length
retSet = NULL
intervalFilter = which(DAT$time$min %% interval == 0)
barSet = NULL
for (x in intervalFilter) {
barEndTime = DAT$time[x] + 60*interval
barIntervals = DAT[x,]
x = x+1
while(x <= nrow(DAT) & DAT[x,"time"] < barEndTime) {
barIntervals = rbind(barIntervals,DAT[x,])
x = x + 1
}
bar = data.frame(date=barIntervals[1,"date"],time=barIntervals[1,"time"],open=barIntervals[1,"open"],high=max(barIntervals[1:nrow(barIntervals),"high"]),
low=min(barIntervals[1:nrow(barIntervals),"low"]),close=tail(barIntervals,1)$close,volume=sum(barIntervals[1:nrow(barIntervals),"volume"]))
if (is.null(barSet)) {
barSet = bar
} else {
barSet = rbind(barSet, bar)
}
}
return(barSet)
}
EDIT:
Below is a row of my data. Each row represents a 1 minute interval, I am trying to convert this into arbitrary buckets which are the aggregates of these 1 minute intervals, i.e. 5 minutes, 15 minutes, 60 minutes, 240 minutes, etc...
date time open high low close volume
2005-09-06 2005-09-06 16:33:00 1297.25 1297.50 1297.25 1297.25 98
You probably want to re-use existing facitlities, specifically the POSIXct time types, as well as existing packages.
For example, look at the xts package --- it already has a generic function to.period() as well as convenience wrappers to.minutes(), to.minutes3(), to.minutes10(), ....
Here is an example from the help page:
R> example(to.minutes)
t.mn10R> data(sample_matrix)
t.mn10R> samplexts <- as.xts(sample_matrix)
t.mn10R> to.monthly(samplexts)
samplexts.Open samplexts.High samplexts.Low samplexts.Close
Jan 2007 50.0398 50.7734 49.7631 50.2258
Feb 2007 50.2245 51.3234 50.1910 50.7709
Mar 2007 50.8162 50.8162 48.2365 48.9749
Apr 2007 48.9441 50.3378 48.8096 49.3397
May 2007 49.3457 49.6910 47.5180 47.7378
Jun 2007 47.7443 47.9413 47.0914 47.7672
t.mn10R> to.monthly(sample_matrix)
sample_matrix.Open sample_matrix.High sample_matrix.Low sample_matrix.Close
Jan 2007 50.0398 50.7734 49.7631 50.2258
Feb 2007 50.2245 51.3234 50.1910 50.7709
Mar 2007 50.8162 50.8162 48.2365 48.9749
Apr 2007 48.9441 50.3378 48.8096 49.3397
May 2007 49.3457 49.6910 47.5180 47.7378
Jun 2007 47.7443 47.9413 47.0914 47.7672
t.mn10R> str(to.monthly(samplexts))
An ‘xts’ object from Jan 2007 to Jun 2007 containing:
Data: num [1:6, 1:4] 50 50.2 50.8 48.9 49.3 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:4] "samplexts.Open" "samplexts.High" "samplexts.Low" "samplexts.Close"
Indexed by objects of class: [yearmon] TZ:
xts Attributes:
NULL
t.mn10R> str(to.monthly(sample_matrix))
num [1:6, 1:4] 50 50.2 50.8 48.9 49.3 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:6] "Jan 2007" "Feb 2007" "Mar 2007" "Apr 2007" ...
..$ : chr [1:4] "sample_matrix.Open" "sample_matrix.High" "sample_matrix.Low" "sample_matrix.Close"
R>