R principal() get eigenvalues of factors

R principal() get eigenvalues of factors - r

I've done a PCA and the result looks something like this:
RC1 RC14 RC2 RC5 RC3 RC9 RC6 RC7 RC16 RC11 RC19 RC12 RC26 RC8 RC10 RC4 RC20 …
SS loadings 3.199 3.161 3.001 2.958 2.928 2.908 2.793 2.786 2.727 2.723 2.696 2.558 2.544 2.540 2.515 2.499 2.494 …
Proportion Var 0.005 0.005 0.005 0.004 0.004 0.004 0.004 0.004 0.004 0.004 0.004 0.004 0.004 0.004 0.004 0.004 0.004 …
Cumulative Var 0.005 0.010 0.014 0.019 0.023 0.027 0.032 0.036 0.040 0.044 0.048 0.052 0.056 0.060 0.063 0.067 0.071 …
As you can see the factors (RC1, RC14, etc.) aren't in the correct order.
To get the eigenvalues I can use fit$values and I'll get a list like this
[1] 4.9880983 4.3804479 3.4831868 3.4637441 3.1826873 2.9171613 2.7109790 2.7069910 2.6505181 2.5475078 2.5339040
[12] 2.5167436 2.4434298 2.4023438 2.3648536 2.3065183 2.2927025 2.2779793 2.2523245 2.2436222 2.2073776 2.1823970
[23] 2.1626319 2.1487751 2.1274126 2.0963421 2.0918373 2.0728735 2.0603362 2.0470462 2.0355974 2.0202679 2.0170792
[34] 2.0013015 1.9891380 1.9874788 …
Now I want the eigenvalues of those factors. The question is—because the factors are not ordered—how can I match factors and their respective eigenvalues? I guess RC1 has an eigenvalue of 4.9880983, but does RC14 have an eigenvalue of 4.3804479 or 2.4023438?

You could install the FactoExtra library which has a lot of great tools. It lists the eigen value beside the PC axis ID so there won't be any confusion.
library(FactoExtra)
eig.val <- get_eigenvalue(fit)
eig.val[1:8,] #spits out first 8 axes.

Related

Changepoints detection in time series in R

I need some guidance regarding how changepoints work in time series. I am trying to detect some changepoints using R, and the package called "changepoint" (https://cran.r-project.org/web/packages/changepoint/changepoint.pdf).
There are options for how to detect when the variance (cpt.var) and the mean (cpt.mean) change, but what I'm trying to look for is when the time series changes trend.
Maybe I'm confused with what changepoints really are, but is there any way to get this information?
I am showing the result of using cpt.var() function, and I have added some arrows, showing what I would like to achieve.
Is there any way to achieve this? I guess should be somehow like inflection points...
I would appreciate any light on this.
Thanks beforehand,
Jon
EDIT
I have tried with the approach of using diff(), but is not detecting the change correctly:
The data I am using is the following:
[1] 10.695 10.715 10.700 10.665 10.830 10.830 10.800 11.070 11.145 11.270 11.015 11.060 10.945 10.965 10.780 10.735 10.705 10.680 10.600 10.335 10.220 10.125
[23] 10.370 10.595 10.680 11.000 10.980 11.065 11.060 11.355 11.445 11.415 11.350 11.310 11.330 11.360 11.445 11.335 11.275 11.300 11.295 11.470 11.445 11.325
[45] 11.300 11.260 11.200 11.210 11.230 11.240 11.300 11.250 11.285 11.215 11.260 11.395 11.410 11.235 11.320 11.475 11.470 11.685 11.740 11.740 11.700 11.905
[67] 11.720 12.230 12.285 12.505 12.410 11.995 12.110 12.005 11.915 11.890 11.820 11.730 11.700 11.660 11.685 11.615 11.360 11.425 11.185 11.275 11.265 11.375
[89] 11.310 11.250 11.050 10.880 10.775 10.775 10.805 10.755 10.595 10.700 10.585 10.510 10.290 10.255 10.395 10.290 10.425 10.405 10.365 10.010 10.305 10.185
[111] 10.400 10.700 10.725 10.875 10.750 10.760 10.905 10.680 10.670 10.895 10.790 10.990 10.925 10.980 10.975 11.035 10.895 10.985 11.035 11.295 11.245 11.535
[133] 11.510 11.430 11.450 11.390 11.520 11.585
And when I do diff() I get this data:
[1] 0.020 -0.015 -0.035 0.165 0.000 -0.030 0.270 0.075 0.125 -0.255 0.045 -0.115 0.020 -0.185 -0.045 -0.030 -0.025 -0.080 -0.265 -0.115 -0.095 0.245
[23] 0.225 0.085 0.320 -0.020 0.085 -0.005 0.295 0.090 -0.030 -0.065 -0.040 0.020 0.030 0.085 -0.110 -0.060 0.025 -0.005 0.175 -0.025 -0.120 -0.025
[45] -0.040 -0.060 0.010 0.020 0.010 0.060 -0.050 0.035 -0.070 0.045 0.135 0.015 -0.175 0.085 0.155 -0.005 0.215 0.055 0.000 -0.040 0.205 -0.185
[67] 0.510 0.055 0.220 -0.095 -0.415 0.115 -0.105 -0.090 -0.025 -0.070 -0.090 -0.030 -0.040 0.025 -0.070 -0.255 0.065 -0.240 0.090 -0.010 0.110 -0.065
[89] -0.060 -0.200 -0.170 -0.105 0.000 0.030 -0.050 -0.160 0.105 -0.115 -0.075 -0.220 -0.035 0.140 -0.105 0.135 -0.020 -0.040 -0.355 0.295 -0.120 0.215
[111] 0.300 0.025 0.150 -0.125 0.010 0.145 -0.225 -0.010 0.225 -0.105 0.200 -0.065 0.055 -0.005 0.060 -0.140 0.090 0.050 0.260 -0.050 0.290 -0.025
[133] -0.080 0.020 -0.060 0.130 0.065
What I get is the next results:
> cpt =cpt.mean(diff(vector), method="PELT")
> (cpt.pts <- attributes(cpt)$cpts)
[1] 137
Appearly does not make sense... Any clue?

In R, there are many packages available for time series changepoint detection. changepoint is definitely a very useful one. A partial list of the packages is summarized in CRAN Task View:
Change point detection is provided in strucchange (using linear regression models), and in trend (using nonparametric tests). The changepoint package provides many popular changepoint methods, and ecp does nonparametric changepoint detection for univariate and multivariate series. changepoint.np implements the nonparametric PELT algorithm, while changepoint.mv detects changepoints in multivariate time series. InspectChangepoint uses sparse projection to estimate changepoints in high-dimensional time series. robcp provides robust change-point detection using Huberized cusum tests, and Rbeast provides Bayesian change-point detection and time series decomposition.
Here is also a great blog comparing several alternative packages: https://www.marinedatascience.co/blog/2019/09/28/comparison-of-change-point-detection-methods/. Another impressive comparison is from Dr. Jonas Kristoffer Lindeløv who developed the mcp package: https://lindeloev.github.io/mcp/articles/packages.html.
Below I used your sample time series to generate some quick results using the Rbeast package developed by myself (chosen here apparently for ego of self-promoting as well as perceived relvance). Rbeast is a Baysian changepoint detection algorithm and it can estimate the probability of changepoint occurrence. It can also be used for decomposing time series into seasonality and trend, but apparently, your time series is trend-only, so in the beast function below, season='none' is specified.
y = c(10.695,10.715,10.700,10.665,10.830,10.830,10.800,11.070,11.145,11.270,11.015,11.060,10.945,10.965,10.780,10.735,10.705,
10.680,10.600,10.335,10.220,10.125,10.370,10.595,10.680,11.000,10.980,11.065,11.060,11.355,11.445,11.415,11.350,11.310,11.330,
11.360,11.445,11.335,11.275,11.300,11.295,11.470,11.445,11.325,11.300,11.260,11.200,11.210,11.230,11.240,11.300,11.250,11.285,
11.215,11.260,11.395,11.410,11.235,11.320,11.475,11.470,11.685,11.740,11.740,11.700,11.905,11.720,12.230,12.285,12.505,12.410,
11.995,12.110,12.005,11.915,11.890,11.820,11.730,11.700,11.660,11.685,11.615,11.360,11.425,11.185,11.275,11.265,11.375,11.310,
11.250,11.050,10.880,10.775,10.775,10.805,10.755,10.595,10.700,10.585,10.510,10.290,10.255,10.395,10.290,10.425,10.405,10.365,
10.010,10.305,10.185,10.400,10.700,10.725,10.875,10.750,10.760,10.905,10.680,10.670,10.895,10.790,10.990,10.925,10.980,10.975,
11.035,10.895,10.985,11.035,11.295,11.245,11.535 ,11.510,11.430,11.450,11.390,11.520,11.585)
library(Rbeast)
out=beast(y, season='none')
plot(out)
print(out)
In the figure above, dashed vertical lines mark the most likely locations of changepoints; the green curve of Pr(tcp) shows the point-wise probability of changepoint occurrence over time. The order_t curve gives the estimated mean order of the piecewise polynomials needed to adequately fit the trend (the 0-th order is constant and the 1st order is linear): An average order toward 0 means that the trend is more likely to be flat and an order close to 1 means that the trend is linear. The output can be also printed as some ascii outputs, as shown below. Again, it says that the time series is most likely to have 8 changepoints; their most probable locations are given in out$trend$cp.
Result for time series #1 (total number of time series in 'out': 1)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ SEASONAL CHANGEPOINTS +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
No seasonal/periodic component present (i.e., season='none')
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ TREND CHANGEPOINTS +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
An ascii plot of the probability dist for number of chgpts(ncp)
---------------------------------------------------------------
Pr(ncp=0 )=0.000|* |
Pr(ncp=1 )=0.000|* |
Pr(ncp=2 )=0.000|* |
Pr(ncp=3 )=0.000|* |
Pr(ncp=4 )=0.000|* |
Pr(ncp=5 )=0.000|* |
Pr(ncp=6 )=0.055|***** |
Pr(ncp=7 )=0.074|****** |
Pr(ncp=8 )=0.575|******************************************** |
Pr(ncp=9 )=0.240|******************* |
Pr(ncp=10)=0.056|***** |
---------------------------------------------------------------
Max ncp : 10 | A parameter you set (e.g., maxTrendKnotNum) |
Mode ncp: 8 | Pr(ncp= 8)=0.57; there is a 57.5% probability|
| that the trend componet has 8 chngept(s). |
Avg ncp : 8.17 | Sum[ncp*Pr(ncp)] |
---------------------------------------------------------------
List of most probable trend changepoints (avg number of changpts: 8.17)
--------------------------------.
tcp# |time (cp) |prob(cpPr)|
-----|---------------|----------|
1 |8.0000 | 0.92767|
2 |112.0000 | 0.91433|
3 |68.0000 | 0.84213|
4 |21.0000 | 0.80188|
5 |32.0000 | 0.78171|
6 |130.0000 | 0.76938|
7 |101.0000 | 0.66404|
8 |62.0000 | 0.61171|
--------------------------------'

If the signal isn't too noisy, you could use diff to detect changepoints in slope instead of mean:
library(changepoint)
set.seed(1)
slope <- rep(sample(10,10)-5,sample(100,10))
sig <- cumsum(slope)+runif(n=length(slope),min = -1, max = 1)
cpt =cpt.mean(diff(sig),method="PELT")
# Show change point
(cpt.pts <- attributes(cpt)$cpts)
#> [1] 58 109 206 312 367 440 447 520 599
plot(sig,type="l")
lines(x=cpt.pts,y=sig[cpt.pts],type="p",col="red",cex=2)
Another option which seems to work better with the data you provided is to use piecewise linear segmentation:
library(ifultools)
changepoints <- linearSegmentation(x=1:length(data),y=data,angle.tolerance = 90,n.fit=10,plot=T)
changepoints
#[1] 13 24 36 58 72 106

Binning by equal standard deviation R

I have a vector containing some data, in particular
tau_3[p_3<3]
[1] 7.837 7.813 6.276 8.669 7.001 6.032 6.897 5.967 9.417 8.251 7.892 8.752 9.873 9.461 8.591 7.697 8.372 9.324 9.135 7.807
[21] 10.034 10.701 9.315 6.979 9.843 8.742 8.829 7.406 8.588 6.803 7.462 8.379 8.075 8.294 8.218
which has to be studied with respect to another set of datapoints
>p_3[p_3<3]
[1] 0.020 0.021 0.022 0.023 0.024 0.026 0.028 0.014 0.029 0.030 0.033 0.035 0.037 0.040 0.042 0.044 0.050 0.055 0.060 0.065 0.070 0.075 0.080 0.085
[25] 0.090 0.100 0.110 0.120 0.130 0.150 0.160 0.190 0.200 0.230 0.240
I would like to divide the pressure p_3 data (the subset given above) it in such a way that each bin has, more or less, the same standard deviation for the decay time \tau_3 data that it contains. In particular, I should have a vector containing the breaks for such binned data.
I don't know of any package that could do this and I've been scratching my head on how to do it for hours. If you could give me a solution I would be very grateful.

How to calculate normalised ratio indices in all possible band combinations and then correlation with environmental variables in R?

I want to calculate normalised ratio and simple ratio indices in all possible band combinations and then want to correlate with environmental variables in R. Then I want to identify the best combination giving the highest correlation. The implementation is available in hsdar package in R. But it is very slow for a large dataset. I am attaching a small dataset here
Environmental_variable WV_400 WV_401 WV_402 WV_403 WV_404 WV_405
95.512 0.035 0.034 0.034 0.034 0.034 0.034
97.900 0.047 0.047 0.047 0.046 0.046 0.046
92.897 0.004 0.004 0.006 0.008 0.009 0.009
94.209 0.011 0.012 0.013 0.016 0.017 0.017
87.472 0.010 0.010 0.011 0.014 0.015 0.015
91.109 0.010 0.011 0.013 0.015 0.016 0.016
92.830 0.024 0.025 0.026 0.028 0.029 0.029
I am giving one example code from hsdar package for reference
library(hsdar)
data(spectral_data)
## Calculate normalised ratio indices in all possible combinations
nri_WV <- nri(spectral_data, recursive = TRUE)
## Build glm-models
glmnri <- glm.nri(nri_WV ~ chlorophyll, preddata = spectral_data)
## Return best 10 models
BM <- nri_best_performance(glmnri, n = 10, coefficient = "p.value")
Any help in the form of R code as a fast alternative to the hsdar package is highly appreciated.

Convert column headers into new columns

My data frame consists of time series financial data from many public companies. I purposely set companies' weights as their column headers while cleaning the data, and I also calculated log returns for each of them in order to calculate weighted returns in the next step.
Here is an example. There are four companies: A, B, C and D, and their corresponding weights in the portfolio are 0.4, 0.3, 0.2, 0.1 separately. So the current data set looks like:
df1 <- data.frame(matrix(vector(),ncol=9, nrow = 4))
colnames(df1) <- c("Date","0.4","0.4.Log","0.3","0.3.Log","0.2","0.2.Log","0.1","0.1.Log")
df1[1,] <- c("2004-10-29","103.238","0","131.149","0","99.913","0","104.254","0")
df1[2,] <- c("2004-11-30","104.821","0.015","138.989","0.058","99.872","0.000","103.997","-0.002")
df1[3,] <- c("2004-12-31","105.141","0.003","137.266","-0.012","99.993","0.001","104.025","0.000")
df1[4,] <- c("2005-01-31","107.682","0.024","137.08","-0.001","99.782","-0.002","105.287","0.012")
df1
Date 0.4 0.4.Log 0.3 0.3.Log 0.2 0.2.Log 0.1 0.1.Log
1 2004-10-29 103.238 0 131.149 0 99.913 0 104.254 0
2 2004-11-30 104.821 0.015 138.989 0.058 99.872 0.000 103.997 -0.002
3 2004-12-31 105.141 0.003 137.266 -0.012 99.993 0.001 104.025 0.000
4 2005-01-31 107.682 0.024 137.08 -0.001 99.782 -0.002 105.287 0.012
I want to create new columns that contain company weights so that I can calculate weighted returns in my next step:
Date 0.4 0.4.W 0.4.Log 0.3 0.3.W 0.3.Log 0.2 0.2.W 0.2.Log 0.1 0.1.W 0.1.Log
1 2004-10-29 103.238 0.400 0.000 131.149 0.300 0.000 99.913 0.200 0.000 104.254 0.100 0.000
2 2004-11-30 104.821 0.400 0.015 138.989 0.300 0.058 99.872 0.200 0.000 103.997 0.100 -0.002
3 2004-12-31 105.141 0.400 0.003 137.266 0.300 -0.012 99.993 0.200 0.001 104.025 0.100 0.000
4 2005-01-31 107.682 0.400 0.024 137.080 0.300 -0.001 99.782 0.200 -0.002 105.287 0.100 0.012

We can try
v1 <- grep("^[0-9.]+$", names(df1), value = TRUE)
df1[paste0(v1, ".w")] <- as.list(as.numeric(v1))

interpolate data series with R

I am having trouble interpolating the values of two data series. I have a reference time in first column. The second column is time linked for values of P130. I want to interpolate new values of P130 (third column) according to reference time.
The reference time and timeP130 have the first and last value the same and they are all in variable steps, so there is no pattern.
Reference_time timeP130 P130 results
0.0001 0.0001 0.2194 0.2194
0.000694 0.003 0.25 0.22552
0.00138889 0.0035 0.26 0.23164
0.00208333 0.006 0.24 0.23776
0.00277778 0.009 0.245 0.24388
0.003 0.009 0.255 0.25
0.00416667 0.0125 0.27 ETC
0.00486111 0.015 0.21
0.00555556 0.018 0.20
0.00625 0.0208 0.2194
0.00694444 0.021 0.2194
0.00763889 0.0211 0.2194
0.00833333 0.0215 0.2194
0.00902778 0.022 0.2195
0.00972222 0.0327 0.2591
0.0104167 0.0433 0.3664
0.0111111 0.0839 0.4068
0.0118056 2.5 0.4087
0.0125 0.27
0.0141944
0.0158889
0.0165833
0.0182778
2.5 0.4087

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R principal() get eigenvalues of factors - r

You could install the FactoExtra library which has a lot of great tools. It lists the eigen value beside the PC axis ID so there won't be any confusion. library(FactoExtra) eig.val <- get_eigenvalue(fit) eig.val[1:8,] #spits out first 8 axes.

Related

Changepoints detection in time series in R

Binning by equal standard deviation R

How to calculate normalised ratio indices in all possible band combinations and then correlation with environmental variables in R?

Convert column headers into new columns

interpolate data series with R

Categories

Resources