I need some guidance regarding how changepoints work in time series. I am trying to detect some changepoints using R, and the package called "changepoint" (https://cran.r-project.org/web/packages/changepoint/changepoint.pdf).
There are options for how to detect when the variance (cpt.var) and the mean (cpt.mean) change, but what I'm trying to look for is when the time series changes trend.
Maybe I'm confused with what changepoints really are, but is there any way to get this information?
I am showing the result of using cpt.var() function, and I have added some arrows, showing what I would like to achieve.
Is there any way to achieve this? I guess should be somehow like inflection points...
I would appreciate any light on this.
Thanks beforehand,
Jon
EDIT
I have tried with the approach of using diff(), but is not detecting the change correctly:
The data I am using is the following:
[1] 10.695 10.715 10.700 10.665 10.830 10.830 10.800 11.070 11.145 11.270 11.015 11.060 10.945 10.965 10.780 10.735 10.705 10.680 10.600 10.335 10.220 10.125
[23] 10.370 10.595 10.680 11.000 10.980 11.065 11.060 11.355 11.445 11.415 11.350 11.310 11.330 11.360 11.445 11.335 11.275 11.300 11.295 11.470 11.445 11.325
[45] 11.300 11.260 11.200 11.210 11.230 11.240 11.300 11.250 11.285 11.215 11.260 11.395 11.410 11.235 11.320 11.475 11.470 11.685 11.740 11.740 11.700 11.905
[67] 11.720 12.230 12.285 12.505 12.410 11.995 12.110 12.005 11.915 11.890 11.820 11.730 11.700 11.660 11.685 11.615 11.360 11.425 11.185 11.275 11.265 11.375
[89] 11.310 11.250 11.050 10.880 10.775 10.775 10.805 10.755 10.595 10.700 10.585 10.510 10.290 10.255 10.395 10.290 10.425 10.405 10.365 10.010 10.305 10.185
[111] 10.400 10.700 10.725 10.875 10.750 10.760 10.905 10.680 10.670 10.895 10.790 10.990 10.925 10.980 10.975 11.035 10.895 10.985 11.035 11.295 11.245 11.535
[133] 11.510 11.430 11.450 11.390 11.520 11.585
And when I do diff() I get this data:
[1] 0.020 -0.015 -0.035 0.165 0.000 -0.030 0.270 0.075 0.125 -0.255 0.045 -0.115 0.020 -0.185 -0.045 -0.030 -0.025 -0.080 -0.265 -0.115 -0.095 0.245
[23] 0.225 0.085 0.320 -0.020 0.085 -0.005 0.295 0.090 -0.030 -0.065 -0.040 0.020 0.030 0.085 -0.110 -0.060 0.025 -0.005 0.175 -0.025 -0.120 -0.025
[45] -0.040 -0.060 0.010 0.020 0.010 0.060 -0.050 0.035 -0.070 0.045 0.135 0.015 -0.175 0.085 0.155 -0.005 0.215 0.055 0.000 -0.040 0.205 -0.185
[67] 0.510 0.055 0.220 -0.095 -0.415 0.115 -0.105 -0.090 -0.025 -0.070 -0.090 -0.030 -0.040 0.025 -0.070 -0.255 0.065 -0.240 0.090 -0.010 0.110 -0.065
[89] -0.060 -0.200 -0.170 -0.105 0.000 0.030 -0.050 -0.160 0.105 -0.115 -0.075 -0.220 -0.035 0.140 -0.105 0.135 -0.020 -0.040 -0.355 0.295 -0.120 0.215
[111] 0.300 0.025 0.150 -0.125 0.010 0.145 -0.225 -0.010 0.225 -0.105 0.200 -0.065 0.055 -0.005 0.060 -0.140 0.090 0.050 0.260 -0.050 0.290 -0.025
[133] -0.080 0.020 -0.060 0.130 0.065
What I get is the next results:
> cpt =cpt.mean(diff(vector), method="PELT")
> (cpt.pts <- attributes(cpt)$cpts)
[1] 137
Appearly does not make sense... Any clue?
In R, there are many packages available for time series changepoint detection. changepoint is definitely a very useful one. A partial list of the packages is summarized in CRAN Task View:
Change point detection is provided in strucchange (using linear regression models), and in trend (using nonparametric tests). The changepoint package provides many popular changepoint methods, and ecp does nonparametric changepoint detection for univariate and multivariate series. changepoint.np implements the nonparametric PELT algorithm, while changepoint.mv detects changepoints in multivariate time series. InspectChangepoint uses sparse projection to estimate changepoints in high-dimensional time series. robcp provides robust change-point detection using Huberized cusum tests, and Rbeast provides Bayesian change-point detection and time series decomposition.
Here is also a great blog comparing several alternative packages: https://www.marinedatascience.co/blog/2019/09/28/comparison-of-change-point-detection-methods/. Another impressive comparison is from Dr. Jonas Kristoffer LindelΓΈv who developed the mcp package: https://lindeloev.github.io/mcp/articles/packages.html.
Below I used your sample time series to generate some quick results using the Rbeast package developed by myself (chosen here apparently for ego of self-promoting as well as perceived relvance). Rbeast is a Baysian changepoint detection algorithm and it can estimate the probability of changepoint occurrence. It can also be used for decomposing time series into seasonality and trend, but apparently, your time series is trend-only, so in the beast function below, season='none' is specified.
y = c(10.695,10.715,10.700,10.665,10.830,10.830,10.800,11.070,11.145,11.270,11.015,11.060,10.945,10.965,10.780,10.735,10.705,
10.680,10.600,10.335,10.220,10.125,10.370,10.595,10.680,11.000,10.980,11.065,11.060,11.355,11.445,11.415,11.350,11.310,11.330,
11.360,11.445,11.335,11.275,11.300,11.295,11.470,11.445,11.325,11.300,11.260,11.200,11.210,11.230,11.240,11.300,11.250,11.285,
11.215,11.260,11.395,11.410,11.235,11.320,11.475,11.470,11.685,11.740,11.740,11.700,11.905,11.720,12.230,12.285,12.505,12.410,
11.995,12.110,12.005,11.915,11.890,11.820,11.730,11.700,11.660,11.685,11.615,11.360,11.425,11.185,11.275,11.265,11.375,11.310,
11.250,11.050,10.880,10.775,10.775,10.805,10.755,10.595,10.700,10.585,10.510,10.290,10.255,10.395,10.290,10.425,10.405,10.365,
10.010,10.305,10.185,10.400,10.700,10.725,10.875,10.750,10.760,10.905,10.680,10.670,10.895,10.790,10.990,10.925,10.980,10.975,
11.035,10.895,10.985,11.035,11.295,11.245,11.535 ,11.510,11.430,11.450,11.390,11.520,11.585)
library(Rbeast)
out=beast(y, season='none')
plot(out)
print(out)
In the figure above, dashed vertical lines mark the most likely locations of changepoints; the green curve of Pr(tcp) shows the point-wise probability of changepoint occurrence over time. The order_t curve gives the estimated mean order of the piecewise polynomials needed to adequately fit the trend (the 0-th order is constant and the 1st order is linear): An average order toward 0 means that the trend is more likely to be flat and an order close to 1 means that the trend is linear. The output can be also printed as some ascii outputs, as shown below. Again, it says that the time series is most likely to have 8 changepoints; their most probable locations are given in out$trend$cp.
Result for time series #1 (total number of time series in 'out': 1)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ SEASONAL CHANGEPOINTS +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
No seasonal/periodic component present (i.e., season='none')
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ TREND CHANGEPOINTS +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
An ascii plot of the probability dist for number of chgpts(ncp)
---------------------------------------------------------------
Pr(ncp=0 )=0.000|* |
Pr(ncp=1 )=0.000|* |
Pr(ncp=2 )=0.000|* |
Pr(ncp=3 )=0.000|* |
Pr(ncp=4 )=0.000|* |
Pr(ncp=5 )=0.000|* |
Pr(ncp=6 )=0.055|***** |
Pr(ncp=7 )=0.074|****** |
Pr(ncp=8 )=0.575|******************************************** |
Pr(ncp=9 )=0.240|******************* |
Pr(ncp=10)=0.056|***** |
---------------------------------------------------------------
Max ncp : 10 | A parameter you set (e.g., maxTrendKnotNum) |
Mode ncp: 8 | Pr(ncp= 8)=0.57; there is a 57.5% probability|
| that the trend componet has 8 chngept(s). |
Avg ncp : 8.17 | Sum[ncp*Pr(ncp)] |
---------------------------------------------------------------
List of most probable trend changepoints (avg number of changpts: 8.17)
--------------------------------.
tcp# |time (cp) |prob(cpPr)|
-----|---------------|----------|
1 |8.0000 | 0.92767|
2 |112.0000 | 0.91433|
3 |68.0000 | 0.84213|
4 |21.0000 | 0.80188|
5 |32.0000 | 0.78171|
6 |130.0000 | 0.76938|
7 |101.0000 | 0.66404|
8 |62.0000 | 0.61171|
--------------------------------'
If the signal isn't too noisy, you could use diff to detect changepoints in slope instead of mean:
library(changepoint)
set.seed(1)
slope <- rep(sample(10,10)-5,sample(100,10))
sig <- cumsum(slope)+runif(n=length(slope),min = -1, max = 1)
cpt =cpt.mean(diff(sig),method="PELT")
# Show change point
(cpt.pts <- attributes(cpt)$cpts)
#> [1] 58 109 206 312 367 440 447 520 599
plot(sig,type="l")
lines(x=cpt.pts,y=sig[cpt.pts],type="p",col="red",cex=2)
Another option which seems to work better with the data you provided is to use piecewise linear segmentation:
library(ifultools)
changepoints <- linearSegmentation(x=1:length(data),y=data,angle.tolerance = 90,n.fit=10,plot=T)
changepoints
#[1] 13 24 36 58 72 106
In order to replicate the results of a previous study, I am trying to apply a method of factor analysis of a matrix that is described in Horst (1965) as "basic structure with simultaneous factor solution".
How would I approach this method in R?
Given a matrix m, and providing for instance that I extract two factors, I have tried applying the following:
fa(r = cor(m), rotate = 'none', factors = 2)
but I don't think this approach is right.
Just found out.
Library(psych)
Principal(r= cor(m), rotate = " none ", nfactor= 2)
Does the job. Horst refers to what is also called an eigen value decomposition. It can also be done using eigen() and attaining the same result.
.. not really.. loadings seem pretty close but looking at the maths I am not sure the method described below is akin to eigen value decomposition in fact, looking more closely, the method is applied directly on the raw data and no product momentum calculations are required..
.. I am trying (slowly) to work out the maths myself and to understand what the computation instruction describes.
For your information, here is the standardized matrix that is used for the calculation carried out in the example in the original textbook:
0.444 0.627 1.458 1.754 2.967 2.585 0.970 0.616 0.853
2.648 2.563 1.950 -1.341 -1.015 -0.700 0.904 0.976 0.150
-0.104 -0.159 0.049 0.510 -0.378 -0.468 2.217 2.378 2.291
-0.970 -1.216 -1.129 -0.079 -0.378 -0.645 0.287 0.312 -2.266
-1.164 -1.060 -1.485 -1.878 -0.021 -0.530 -1.483 0.190 -0.429
-0.956 -1.122 -0.938 -1.282 -0.779 0.121 0.447 -1.565 -0.429
0.198 -0.242 -0.055 0.021 0.526 -1.528 -0.575 -1.244 -0.114
-0.035 -0.485 1.129 -0.014 -0.894 -0.316 -1.421 -0.705 -0.349
-1.050 0.786 -0.048 0.101 -0.354 -0.433 -0.298 -0.377 -0.256
0.298 0.197 -0.010 0.558 0.253 0.464 -0.284 -0.240 -0.031
0.568 0.367 -0.429 0.811 -0.007 0.786 -0.250 0.081 0.541
0.125 -0.256 -0.492 0.839 0.079 0.665 -0.513 -0.422 0.039
here are the computation instructions and examples
... I was wondering if this is just a standard approach in factor analysis or in pricipal component analysis.. and if so, which one? The introduction says that this method is rank reduction type solution in the sense that the major product of the factor score and factor loading matrices yields a residual which is precisely of rank equal to that of the original matrix less the number of factors.
This particular type of analysis is "direct" in the sense that is carried out directly on the raw data (at best it is the normalized matrix).
I have multiple xts objects that look something like this:
t x0 x1 x2 x3 x4
2000-01-01 0.397 0.262 -0.039 0.440 0.186
2000-01-02 0.020 -0.219 0.197 0.301 -0.300
2000-01-03 0.010 0.064 -0.034 0.214 -0.451
2000-01-04 -0.002 -0.483 -0.251 0.023 0.153
2000-01-05 0.451 0.375 0.566 0.258 -0.092
2000-01-06 0.411 0.219 0.108 0.137 -0.087
2000-01-07 0.111 -0.039 0.187 -0.271 0.346
2000-01-08 0.131 -0.078 0.163 -0.057 -0.313
2000-01-09 -0.409 -0.022 0.211 -0.297 0.217
.
.
.
Now, I am interested in finding out how well the average of the other variables explains each single x variable during each period π (e.g. monthly).
Or in other words, I'm interested in building a time series of the r-squared in the following regression:
xi,t = π½0 + π½1 avgi,t + πi,t
where avgi,t is be the average of all other variables at time t. And running that for each i π n variables and observations t π π (e.g. π is a month). I would like to be able to run this with any period π, where I believe xts might be able to help, since there are functions such as endpoints or apply.monthly?
I have years and years of data, and multiple of those xts objects, so the question is, what would be a sensible way to run those regressions, collecting the time series of r-squared values? A for-loop should be able to take care of doing that for each separate xts object, but I really don't think a for-loop would be very wise for each single object?