I was doing PCA on 35 rasters with some environmental data (climond). Everything works fine and I use this command in R to perform a PCA on raster stack:
pca<-princomp(na.omit(values(s)), cor=TRUE)
All the rasters look fine but each of the components is exaplaining exactly the same proportion of variance (0.029), so in the end they sum to 1. It's a bit strange for me because I'm used to the result that e.g. first three pca axes explain e.g. 50% of variance and the rest of the components is exaplaining less and less of it. So is my result correct or should I do some modification the the princcomp? That's how it looks like:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
SS loadings 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Proportion Var 0.029 0.029 0.029 0.029 0.029 0.029 0.029 0.029 0.029
Cumulative Var 0.029 0.057 0.086 0.114 0.143 0.171 0.200 0.229 0.257
Comp.10 Comp.11 Comp.12 Comp.13 Comp.14 Comp.15 Comp.16 Comp.17
SS loadings 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Proportion Var 0.029 0.029 0.029 0.029 0.029 0.029 0.029 0.029
Cumulative Var 0.286 0.314 0.343 0.371 0.400 0.429 0.457 0.486
Comp.18 Comp.19 Comp.20 Comp.21 Comp.22 Comp.23 Comp.24 Comp.25
SS loadings 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Proportion Var 0.029 0.029 0.029 0.029 0.029 0.029 0.029 0.029
Cumulative Var 0.514 0.543 0.571 0.600 0.629 0.657 0.686 0.714
Comp.26 Comp.27 Comp.28 Comp.29 Comp.30 Comp.31 Comp.32 Comp.33
SS loadings 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Proportion Var 0.029 0.029 0.029 0.029 0.029 0.029 0.029 0.029
Cumulative Var 0.743 0.771 0.800 0.829 0.857 0.886 0.914 0.943
Comp.34 Comp.35
SS loadings 1.000 1.000
Proportion Var 0.029 0.029
Cumulative Var 0.971 1.000
However the eigenvalues (or standard deviation here) is decreasing:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
4.129756e+00 2.608465e+00 1.679122e+00 1.514034e+00 1.380337e+00 1.196104e+00
Comp.7 Comp.8 Comp.9 Comp.10 Comp.11 Comp.12
8.529524e-01 7.653760e-01 5.784449e-01 5.179734e-01 4.731845e-01 3.555741e-01
Comp.13 Comp.14 Comp.15 Comp.16 Comp.17 Comp.18
2.995183e-01 2.672110e-01 2.287449e-01 1.907395e-01 1.776953e-01 1.665386e-01
Comp.19 Comp.20 Comp.21 Comp.22 Comp.23 Comp.24
1.547910e-01 1.389805e-01 1.216129e-01 1.167798e-01 1.106441e-01 8.282240e-02
Comp.25 Comp.26 Comp.27 Comp.28 Comp.29 Comp.30
7.930042e-02 7.283533e-02 6.605863e-02 6.310874e-02 4.944323e-02 4.318459e-02
Comp.31 Comp.32 Comp.33 Comp.34 Comp.35
3.272761e-02 2.591280e-02 2.395581e-02 1.479677e-02 1.191348e-07
I hope it is a relevant question! Thank's in advance for answering.
Related
Following Taylor and Tibshirani (2015), I'm applying the selectiveinference package in R after a Lasso Logit fit with glmnet. Specifically, I'm interested in inference for the lasso with a fixed lambda.
Below I report the code:
First, I standardized the X matrix (as suggested https://cran.r-project.org/web/packages/selectiveInference/selectiveInference.pdf).
Then, I fit glmnet and after I extracted the beta coefficient for a lambda previously picked with LOOCV.
X.std <- std(X1[1:2833,])
fit = glmnet(X.std, Y[1:2833],alpha=1, family=c("binomial"))
fit$lambda
lambda=0.00431814
n=2833
beta_hat = coef(fit, x=X.std, y=Y[1:2833], s=lambda/n, exact=TRUE)
beta_hat
out = fixedLassoInf(X.std, Y[1:2833],beta_hat,lambda,family="binomial")
out
After I run the code, this is what I get. I understood that there is something related to KKT conditions, and that is a problem specific to Lasso Logit, as when I try with family=gaussian, I do not get any warnings or mistakes.
Warning message:
In fixedLogitLassoInf(x, y, beta, lambda, alpha = alpha, type = type, :
Solution beta does not satisfy the KKT conditions (to within specified tolerances)
> res
Call:
fixedLassoInf(x = X.std, y = Y[1:2833], beta = b, lambda = lam,
family = c("binomial"))
Testing results at lambda = 0.004, with alpha = 0.100
Var Coef Z-score P-value LowConfPt UpConfPt LowTailArea UpTailArea
1 58.558 6.496 0.000 46.078 124.807 0.049 0.050
2 -8.008 -2.815 0.005 -13.555 -3.106 0.049 0.049
3 -18.514 -6.580 0.000 -31.262 -14.153 0.049 0.048
4 -1.070 -0.390 0.447 -22.976 19.282 0.050 0.050
5 -0.320 -1.231 0.610 -0.660 1.837 0.050 0.000
6 -0.448 -1.906 0.619 -2.378 5.056 0.050 0.050
7 -47.732 -9.624 0.000 -161.370 -44.277 0.050 0.050
8 -39.023 -8.378 0.000 -54.988 -31.510 0.050 0.048
10 23.827 1.991 0.181 -20.151 42.867 0.049 0.049
11 -2.454 -0.522 0.087 -269.951 9.345 0.050 0.050
12 0.045 0.018 0.993 -Inf -14.962 0.000 0.050
13 -18.647 -1.143 0.156 -149.623 25.464 0.050 0.050
14 -3.508 -1.140 0.305 -8.444 7.000 0.049 0.049
15 -0.620 -0.209 0.846 -3.486 46.045 0.050 0.050
16 -3.960 -1.288 0.739 -6.931 47.641 0.049 0.050
17 -8.587 -3.010 0.023 -42.700 -2.474 0.050 0.049
18 2.851 0.986 0.031 2.745 196.728 0.050 0.050
19 -6.612 -1.258 0.546 -14.967 37.070 0.049 0.000
20 -11.621 -2.291 0.021 -29.558 -2.536 0.050 0.049
21 -76.957 -0.980 0.565 -186.701 483.180 0.049 0.050
22 -13.556 -5.053 0.000 -126.367 -13.274 0.050 0.049
23 -4.836 -0.388 0.519 -109.667 125.933 0.050 0.050
24 11.355 0.898 0.492 -55.335 30.312 0.050 0.049
25 -1.118 -0.146 0.919 -4.439 232.172 0.049 0.050
26 -7.776 -1.298 0.200 -17.540 8.006 0.050 0.049
27 0.678 0.234 0.515 -42.265 38.710 0.050 0.050
28 32.938 1.065 0.335 -77.314 82.363 0.050 0.049
Does someone know how to solve this warning?
I would like to understand which kind of "tolerances" should I specify.
Thank for the help.
I am trying to create a structural equation model that tests the structure of latent variables underlying a big 5 dataset found on kaggle. More specifically, I would like to replicate a finding which suggests that common method variance (e.g., response biases) inflate the often observed high intercorrelations between the manifest variables/items of the big 5 (Chang, Connelly & Geeza (2012).
big5_CFAmodel_cmv <-'EXTRA =~ EXT1 + EXT2 + EXT3 + EXT4 + EXT5 + EXT7 + EXT8 + EXT9 + EXT10
AGREE =~ AGR1 + AGR2 + AGR4 + AGR5 + AGR6 + AGR7 + AGR8 + AGR9 + AGR10
EMO =~ EST1 + EST2 + EST3 + EST5 + EST6 + EST7 + EST8 + EST9 + EST10
OPEN =~ OPN1 + OPN2 + OPN3 + OPN5 + OPN6 + OPN7 + OPN8 + OPN9 + OPN10
CON =~ CSN1 + CSN2 + CSN3 + CSN4 + CSN5 + CSN6 + CSN7 + CSN8 + CSN9
CMV =~ EXT1 + EXT2 + EXT3 + EXT4 + EXT5 + EXT7 + EXT8 + EXT9 + EXT10 + AGR1 + AGR2 + AGR4 + AGR5 + AGR6 + AGR7 + AGR8 + AGR9 + AGR10 + CSN1 + CSN2 + CSN3 + CSN4 + CSN5 + CSN6 + CSN7 + CSN8 + CSN9 + EST1 + EST2 + EST3 + EST5 + EST6 + EST7 + EST8 + EST9 + EST10 + OPN1 + OPN2 + OPN3 + OPN5 + OPN6 + OPN7 + OPN8 + OPN9 + OPN10 '
big5_CFA_cmv <- cfa(model = big5_CFAmodel_cmv,
data = big5, estimator = "MLR")
Here is my full code on Github. Now I get a warning from lavaan:
lavaan WARNING:
The variance-covariance matrix of the estimated parameters (vcov)
does not appear to be positive definite! The smallest eigenvalue
(= -4.921738e-07) is smaller than zero. This may be a symptom that
the model is not identified.
But when I run summary(big5_CFA_cmv, fit.measures = TRUE, standardized = TRUE, rsquare = TRUE), lavaan appeared to end normally and produced good fit statistics.
lavaan 0.6-8 ended normally after 77 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 150
Used Total
Number of observations 498 500
Model Test User Model:
Standard Robust
Test Statistic 2459.635 2262.490
Degrees of freedom 885 885
P-value (Chi-square) 0.000 0.000
Scaling correction factor 1.087
Yuan-Bentler correction (Mplus variant)
Model Test Baseline Model:
Test statistic 9934.617 8875.238
Degrees of freedom 990 990
P-value 0.000 0.000
Scaling correction factor 1.119
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.824 0.825
Tucker-Lewis Index (TLI) 0.803 0.805
Robust Comparative Fit Index (CFI) 0.830
Robust Tucker-Lewis Index (TLI) 0.810
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -31449.932 -31449.932
Scaling correction factor 1.208
for the MLR correction
Loglikelihood unrestricted model (H1) -30220.114 -30220.114
Scaling correction factor 1.105
for the MLR correction
Akaike (AIC) 63199.863 63199.863
Bayesian (BIC) 63831.453 63831.453
Sample-size adjusted Bayesian (BIC) 63355.347 63355.347
Root Mean Square Error of Approximation:
RMSEA 0.060 0.056
90 Percent confidence interval - lower 0.057 0.053
90 Percent confidence interval - upper 0.063 0.059
P-value RMSEA <= 0.05 0.000 0.000
Robust RMSEA 0.058
90 Percent confidence interval - lower 0.055
90 Percent confidence interval - upper 0.061
Standardized Root Mean Square Residual:
SRMR 0.061 0.061
Parameter Estimates:
Standard errors Sandwich
Information bread Observed
Observed information based on Hessian
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
EXTRA =~
EXT1 1.000 0.455 0.372
EXT2 1.010 0.323 3.129 0.002 0.459 0.358
EXT3 0.131 0.301 0.434 0.664 0.059 0.049
EXT4 1.393 0.430 3.240 0.001 0.633 0.532
EXT5 0.706 0.168 4.188 0.000 0.321 0.263
EXT7 1.001 0.183 5.477 0.000 0.455 0.323
EXT8 1.400 0.545 2.570 0.010 0.637 0.513
EXT9 1.468 0.426 3.446 0.001 0.667 0.505
EXT10 1.092 0.335 3.258 0.001 0.497 0.387
AGREE =~
AGR1 1.000 0.616 0.486
AGR2 0.721 0.166 4.349 0.000 0.444 0.374
AGR4 1.531 0.205 7.479 0.000 0.944 0.848
AGR5 0.999 0.141 7.085 0.000 0.615 0.568
AGR6 1.220 0.189 6.464 0.000 0.752 0.661
AGR7 0.743 0.155 4.795 0.000 0.458 0.406
AGR8 0.836 0.126 6.614 0.000 0.515 0.502
AGR9 1.292 0.209 6.176 0.000 0.796 0.741
AGR10 0.423 0.124 3.409 0.001 0.261 0.258
EMO =~
EST1 1.000 0.856 0.669
EST2 0.674 0.063 10.626 0.000 0.577 0.485
EST3 0.761 0.059 12.831 0.000 0.651 0.580
EST5 0.646 0.081 7.970 0.000 0.552 0.444
EST6 0.936 0.069 13.542 0.000 0.801 0.661
EST7 1.256 0.128 9.805 0.000 1.075 0.880
EST8 1.298 0.131 9.888 0.000 1.111 0.883
EST9 0.856 0.071 11.997 0.000 0.733 0.602
EST10 0.831 0.085 9.744 0.000 0.711 0.545
OPEN =~
OPN1 1.000 0.593 0.518
OPN2 0.853 0.106 8.065 0.000 0.506 0.492
OPN3 1.064 0.205 5.186 0.000 0.631 0.615
OPN5 1.012 0.124 8.161 0.000 0.600 0.654
OPN6 1.039 0.204 5.085 0.000 0.616 0.553
OPN7 0.721 0.089 8.115 0.000 0.428 0.481
OPN8 0.981 0.077 12.785 0.000 0.582 0.474
OPN9 0.550 0.106 5.187 0.000 0.326 0.332
OPN10 1.269 0.200 6.332 0.000 0.753 0.772
CON =~
CSN1 1.000 0.779 0.671
CSN2 1.151 0.128 8.997 0.000 0.897 0.665
CSN3 0.567 0.068 8.336 0.000 0.442 0.437
CSN4 1.054 0.107 9.867 0.000 0.821 0.669
CSN5 0.976 0.083 11.749 0.000 0.760 0.593
CSN6 1.393 0.133 10.464 0.000 1.085 0.779
CSN7 0.832 0.082 10.175 0.000 0.648 0.583
CSN8 0.684 0.077 8.910 0.000 0.532 0.500
CSN9 0.938 0.075 12.535 0.000 0.730 0.574
CMV =~
EXT1 1.000 0.815 0.666
EXT2 1.074 0.091 11.863 0.000 0.875 0.683
EXT3 1.112 0.159 7.001 0.000 0.907 0.749
EXT4 0.992 0.090 11.067 0.000 0.809 0.679
EXT5 1.194 0.108 11.064 0.000 0.974 0.798
EXT7 1.253 0.069 18.133 0.000 1.021 0.725
EXT8 0.733 0.109 6.706 0.000 0.597 0.481
EXT9 0.857 0.105 8.136 0.000 0.698 0.529
EXT10 1.010 0.088 11.446 0.000 0.824 0.641
AGR1 0.047 0.142 0.328 0.743 0.038 0.030
AGR2 0.579 0.173 3.336 0.001 0.472 0.397
AGR4 -0.144 0.167 -0.859 0.390 -0.117 -0.105
AGR5 0.154 0.143 1.075 0.282 0.125 0.116
AGR6 -0.156 0.161 -0.971 0.332 -0.127 -0.112
AGR7 0.581 0.178 3.270 0.001 0.474 0.421
AGR8 0.224 0.123 1.820 0.069 0.183 0.178
AGR9 -0.043 0.145 -0.299 0.765 -0.035 -0.033
AGR10 0.540 0.137 3.935 0.000 0.440 0.436
CSN1 -0.109 0.143 -0.761 0.446 -0.089 -0.077
CSN2 -0.289 0.150 -1.931 0.054 -0.235 -0.175
CSN3 -0.064 0.114 -0.561 0.575 -0.052 -0.052
CSN4 0.041 0.166 0.246 0.806 0.033 0.027
CSN5 0.009 0.132 0.065 0.948 0.007 0.005
CSN6 -0.307 0.181 -1.694 0.090 -0.251 -0.180
CSN7 -0.206 0.132 -1.555 0.120 -0.168 -0.151
CSN8 0.102 0.137 0.741 0.459 0.083 0.078
CSN9 0.016 0.151 0.107 0.915 0.013 0.010
EST1 -0.063 0.167 -0.375 0.708 -0.051 -0.040
EST2 0.136 0.109 1.248 0.212 0.110 0.093
EST3 -0.103 0.165 -0.625 0.532 -0.084 -0.075
EST5 0.117 0.125 0.932 0.351 0.095 0.076
EST6 0.002 0.158 0.010 0.992 0.001 0.001
EST7 -0.253 0.239 -1.058 0.290 -0.206 -0.169
EST8 -0.216 0.243 -0.888 0.375 -0.176 -0.140
EST9 0.159 0.136 1.168 0.243 0.129 0.106
EST10 0.331 0.135 2.462 0.014 0.270 0.207
OPN1 -0.025 0.150 -0.169 0.866 -0.021 -0.018
OPN2 0.042 0.127 0.332 0.740 0.034 0.033
OPN3 -0.088 0.110 -0.799 0.424 -0.072 -0.070
OPN5 0.208 0.139 1.499 0.134 0.170 0.185
OPN6 -0.012 0.116 -0.102 0.919 -0.010 -0.009
OPN7 0.146 0.126 1.156 0.248 0.119 0.133
OPN8 -0.140 0.135 -1.036 0.300 -0.114 -0.093
OPN9 -0.074 0.103 -0.723 0.470 -0.060 -0.062
OPN10 0.035 0.138 0.250 0.802 0.028 0.029
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
EXTRA ~~
AGREE -0.096 0.036 -2.692 0.007 -0.342 -0.342
EMO -0.089 0.050 -1.782 0.075 -0.228 -0.228
OPEN -0.013 0.025 -0.534 0.594 -0.048 -0.048
CON -0.060 0.042 -1.440 0.150 -0.170 -0.170
CMV -0.063 0.081 -0.783 0.434 -0.171 -0.171
AGREE ~~
EMO -0.003 0.057 -0.059 0.953 -0.006 -0.006
OPEN 0.068 0.040 1.712 0.087 0.186 0.186
CON 0.085 0.047 1.818 0.069 0.177 0.177
CMV 0.239 0.046 5.185 0.000 0.476 0.476
EMO ~~
OPEN 0.040 0.042 0.957 0.338 0.079 0.079
CON 0.229 0.050 4.542 0.000 0.343 0.343
CMV 0.250 0.066 3.810 0.000 0.358 0.358
OPEN ~~
CON 0.058 0.044 1.308 0.191 0.125 0.125
CMV 0.098 0.069 1.412 0.158 0.202 0.202
CON ~~
CMV 0.185 0.072 2.576 0.010 0.291 0.291
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.EXT1 0.754 0.059 12.680 0.000 0.754 0.503
.EXT2 0.804 0.065 12.443 0.000 0.804 0.489
.EXT3 0.658 0.084 7.843 0.000 0.658 0.449
.EXT4 0.537 0.059 9.162 0.000 0.537 0.379
.EXT5 0.545 0.049 11.184 0.000 0.545 0.366
.EXT7 0.892 0.080 11.107 0.000 0.892 0.450
.EXT8 0.907 0.117 7.740 0.000 0.907 0.589
.EXT9 0.971 0.099 9.763 0.000 0.971 0.556
.EXT10 0.867 0.081 10.666 0.000 0.867 0.525
.AGR1 1.207 0.109 11.087 0.000 1.207 0.750
.AGR2 0.790 0.085 9.293 0.000 0.790 0.561
.AGR4 0.439 0.079 5.592 0.000 0.439 0.355
.AGR5 0.708 0.066 10.721 0.000 0.708 0.602
.AGR6 0.803 0.075 10.670 0.000 0.803 0.621
.AGR7 0.628 0.056 11.266 0.000 0.628 0.495
.AGR8 0.664 0.059 11.168 0.000 0.664 0.631
.AGR9 0.548 0.056 9.726 0.000 0.548 0.474
.AGR10 0.647 0.059 10.934 0.000 0.647 0.636
.EST1 0.935 0.080 11.644 0.000 0.935 0.571
.EST2 1.026 0.077 13.359 0.000 1.026 0.724
.EST3 0.869 0.070 12.409 0.000 0.869 0.689
.EST5 1.196 0.075 15.912 0.000 1.196 0.773
.EST6 0.826 0.067 12.380 0.000 0.826 0.562
.EST7 0.453 0.059 7.653 0.000 0.453 0.304
.EST8 0.457 0.065 7.044 0.000 0.457 0.289
.EST9 0.862 0.067 12.860 0.000 0.862 0.581
.EST10 0.986 0.074 13.395 0.000 0.986 0.579
.OPN1 0.964 0.098 9.828 0.000 0.964 0.735
.OPN2 0.792 0.070 11.309 0.000 0.792 0.750
.OPN3 0.670 0.085 7.903 0.000 0.670 0.635
.OPN5 0.413 0.039 10.466 0.000 0.413 0.490
.OPN6 0.866 0.099 8.780 0.000 0.866 0.696
.OPN7 0.574 0.048 11.944 0.000 0.574 0.725
.OPN8 1.181 0.094 12.627 0.000 1.181 0.784
.OPN9 0.863 0.083 10.424 0.000 0.863 0.894
.OPN10 0.376 0.051 7.358 0.000 0.376 0.395
.CSN1 0.774 0.079 9.836 0.000 0.774 0.574
.CSN2 1.082 0.099 10.961 0.000 1.082 0.595
.CSN3 0.837 0.072 11.594 0.000 0.837 0.820
.CSN4 0.817 0.067 12.117 0.000 0.817 0.542
.CSN5 1.063 0.077 13.728 0.000 1.063 0.646
.CSN6 0.856 0.089 9.613 0.000 0.856 0.442
.CSN7 0.850 0.065 13.025 0.000 0.850 0.688
.CSN8 0.817 0.057 14.298 0.000 0.817 0.721
.CSN9 1.079 0.077 13.982 0.000 1.079 0.667
EXTRA 0.207 0.141 1.467 0.142 1.000 1.000
AGREE 0.380 0.101 3.744 0.000 1.000 1.000
EMO 0.732 0.104 7.075 0.000 1.000 1.000
OPEN 0.352 0.098 3.603 0.000 1.000 1.000
CON 0.606 0.089 6.792 0.000 1.000 1.000
CMV 0.665 0.203 3.269 0.001 1.000 1.000
R-Square:
Estimate
EXT1 0.497
EXT2 0.511
EXT3 0.551
EXT4 0.621
EXT5 0.634
EXT7 0.550
EXT8 0.411
EXT9 0.444
EXT10 0.475
AGR1 0.250
AGR2 0.439
AGR4 0.645
AGR5 0.398
AGR6 0.379
AGR7 0.505
AGR8 0.369
AGR9 0.526
AGR10 0.364
EST1 0.429
EST2 0.276
EST3 0.311
EST5 0.227
EST6 0.438
EST7 0.696
EST8 0.711
EST9 0.419
EST10 0.421
OPN1 0.265
OPN2 0.250
OPN3 0.365
OPN5 0.510
OPN6 0.304
OPN7 0.275
OPN8 0.216
OPN9 0.106
OPN10 0.605
CSN1 0.426
CSN2 0.405
CSN3 0.180
CSN4 0.458
CSN5 0.354
CSN6 0.558
CSN7 0.312
CSN8 0.279
CSN9 0.333
However, there are some negative factor loadings on the common method variance factor. Additionally, extraversion seems to correlate negatively with cmv.
What does this mean? And can I trust the fit statistics or is my model misspecified?
First, let me clear up your misinterpretation of the warning message. It refers to the covariance matrix of estimated parameters (i.e., vcov(big5_CFA_cmv), from which SEs are calculated as the square-roots of the variances on the diagonal), not to the estimates themselves. Redundancy among estimates can possibly indicate a lack of identification, which you empirically check by saving the model-implied covariance matrix and fitting the same model to it.
MI_COV <- lavInspect(big5_CFA_cmv, "cov.ov")
summary(cfa(model = big5_CFAmodel_cmv,
sample.cov = MI_COV,
sample.nobs = nobs(big5_CFA_cmv))
If your estimates change, that is evidence that your model is not identified. If the estimates remain the same, the empirical check is inconclusive (i.e., it might still not be identified, but the optimizer just found the same local solution that seemed stable enough to stop searching the parameter space; criteria for inferring convergence are not perfect).
Regarding your model specification, I would doubt it is identified because your CMV factor (on which all indicators load) is allowed to correlate with the trait factors (which are also allowed to correlate). That contradicts the definition of a "method factor", which is something about the way the data were measured that has nothing to do with what is attempted to be measured. Even when traits are orthogonal to methods, empirical identification becomes tenuous when traits and/or methods are allowed to correlate among each other. Multitrait--multimethod (MTMM) are notorious for such problems, as are many bifactor models (which are typically one trait and many methods, which your model resembles but is reversed).
What does this mean?
Your negative (and most positive) CMV loadings are not significant. Varying around 0 (in both directions) is consistent with the null hypothesis that they are zero. More noteworthy (and related to my concern above) is that the CMV loadings are significant for all EXT indicators, but only a few others (3 AGR and an EST indicator). The correlations between CMV and traits really complicates the interpretation, as does using reference indicators. Before you interpret anything, I would recommend fixing all factor variances to 1 using std.lv=TRUE and making CMV orthogonal: EXTRA + AGREE + EMO + OPEN + CON ~~ 0*CMV.
However, I still anticipate problems due to estimating so many model parameters with a relatively small sample of 500 (498 after listwise deletion). That is not nearly a large enough sample to expect 50*51/2 = 1275 (co)variances to be reliably estimated.
In using princomp() I get an object with a "loadings" attribute. The "loadings" is a composite object which holds information that I would prefer to have it separate in normal matrix format so that I can handle and manipulate it freely. In particular I would like to extract the loadings in a matrix format. I can not find a way to extract though this information from the object. Is this possible and if yes how?
I want to use princomp() because it accepts as input a covariance matrix which is easier to provide since my dataset is very large (7000000 x 650).
For a reproducible example, please see below:
> data(mtcars)
>
> princomp_mtcars = princomp(mtcars)
>
> loadings = princomp_mtcars$loadings
>
> names(loadings)
NULL
>
> class(loadings)
[1] "loadings"
>
> loadings
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10 Comp.11
mpg 0.982 0.144
cyl 0.228 -0.239 0.794 0.425 0.189 0.132 0.145
disp -0.900 -0.435
hp -0.435 0.899
drat 0.133 -0.227 0.939 0.184
wt -0.128 0.244 0.127 -0.187 -0.156 0.391 0.830
qsec -0.886 0.214 0.190 0.255 0.103 -0.204
vs -0.177 -0.103 -0.684 0.303 0.626
am 0.136 -0.205 0.201 0.572 -0.163 0.733
gear 0.130 0.276 -0.335 0.802 -0.217 -0.156 0.204 -0.191
carb -0.103 0.269 0.855 0.284 -0.165 -0.128 -0.240
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10 Comp.11
SS loadings 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Proportion Var 0.091 0.091 0.091 0.091 0.091 0.091 0.091 0.091 0.091 0.091 0.091
Cumulative Var 0.091 0.182 0.273 0.364 0.455 0.545 0.636 0.727 0.818 0.909 1.000
>
Your advice will be appreciated.
Test1 Test2 Test3 Test4 Test5
0.048 0.552 0.544 1.0000 0.604
0.518 0.914 0.948 0.0520 0.596
0.868 0.944 0.934 0.3720 0.340
0.934 0.974 0.896 0.1530 0.090
0.792 0.464 0.096 0.7050 0.362
0.001 0.868 0.050 0.3690 0.380
0.814 0.286 0.162 0.0040 0.092
0.146 0.966 0.044 0.4660 0.726
0.970 0.862 0.001 0.4420 0.020
0.490 0.824 0.634 0.5720 0.018
0.378 0.974 0.002 0.0005 0.004
0.878 0.594 0.532 0.0420 0.366
1.000 1.000 1.000 0.3550 1.000
1.000 1.000 1.000 0.3170 1.000
1.000 1.000 1.000 0.3900 1.000
0.856 0.976 0.218 0.0005 0.001
1.000 1.000 1.000 0.4590 1.000
1.000 1.000 1.000 0.5770 1.000
0.640 0.942 0.766 0.0005 0.320
1.000 1.000 1.000 0.0830 1.000
0.260 0.968 0.032 0.0480 0.300
0.150 0.480 0.108 0.0005 0.008
0.686 0.150 0.400 0.0200 0.060
0.002 0.346 0.004 0.0005 0.098
I would like to make a 24 x 5 Heatmap of my values. I want to set the colors based on my own thresholds. For example, I want all 1.000 values to be black. I would like all values between 0.500-0.999 to be very light blue. I would like all values 0.300-0.499 to be slightly darker blue. All values 0.15-0.299 slightly darker. All values 0.10-0.149 slightly darker, all values 0.5-0.099 slightly darker, all values 0.025-0.049 slightly darker, and all values 0-0.024 the darkest. I have tried using ggplot and heatmap() but I can't figure out the user set colors. Thanks!
The scale_color_gradient2() function from ggplot should work just fine for this. The syntax for this portion of the function would be
p+scale_color_gradient(low='black',high='dodgerblue',mid='blue',midpoint=0.500,
limits=c(0,1))
where p is the plot from your ggplot()+geom_tile() [or whatever heatmap function you're using]
The assumption here is that you don't mind a fluid scale from black to dodger blue, as continuous values will change along that sliding scale. If you want to first create a dummy variable that assigns your target variable to a group based on value then you can use the scale_color_manual() function with values=c(colors you want go here) and breaks=c(the values of the bins you created go here). Something like this [simplified and not with the total number of buckets you've outlined] --
p<-ggplot(dataset,aes(x=x,y=y,color=binned.var)+geom_tile()+
scale_color_manual(values=c("darkblue","blue","dodgerblue","black"),
breaks=c("bin1","bin2","bin3","bin4"),
labels=c("0-0.024","0.025-0.049","0.05-0.099","0.10-0.149"))
Hope that helps.
I have made a reproducible example where I am having trouble with pvclust. My goal is to pick the ideal clusters in a hierarchal cluster dendogram. I've heard of 'pvclust' but can't figure out how to use it. Also if anyone has other suggestions besides this to determine the ideal clusters it will be really helpful.
My code is provided.
library(pvclust)
employee<- c('A','B','C','D','E','F','G','H','I',
'J','K','L','M','N','O','P',
'Q','R','S','T',
'U','V','W','X','Y','Z')
salary<-c(20,30,40,50,20,40,23,05,56,23,15,43,53,65,67,23,12,14,35,11,10,56,78,23,43,56)
testing90<-cbind(employee,salary)
testing90<-as.data.frame(testing90)
head(testing90)
testing90$salary<-as.numeric(testing90$salary)
row.names(testing90)<-testing90$employee
testing91<-data.frame(testing90[,-1])
head(testing91)
row.names(testing91)<-testing90$employee
d<-dist(as.matrix(testing91))
hc<-hclust(d,method = "ward.D2")
hc
plot(hc)
par(cex=0.6, mar=c(5, 8, 4, 1))
plot(hc, xlab="", ylab="", main="", sub="", axes=FALSE)
par(cex=1)
title(xlab="Publishers", main="Hierarchal Cluster of Publishers by eCPM")
axis(2)
fit<-pvclust(d, method.hclust="ward.D2", nboot=1000, method.dist="eucl")
An error came up stating:
Error in names(edges.cnt) <- paste("r", 1:rl, sep = "") :
'names' attribute [2] must be the same length as the vector [0]
A solution would be to force your object d into a matrix.
From the helpfile of pvclust:
data numeric data matrix or data frame.
Note that by forcing an object of type dist into a marix, as it was a diagonal it will get 'reflected' (math term escapes me right now), you can check the object that is being taken into account with the call:
as.matrix(d)
This would be the call you are looking for:
#note that I can't
pvclust(as.matrix(d), method.hclust="ward.D2", nboot=1000, method.dist="eucl")
#Bootstrap (r = 0.5)... Done.
#Bootstrap (r = 0.58)... Done.
#Bootstrap (r = 0.69)... Done.
#Bootstrap (r = 0.77)... Done.
#Bootstrap (r = 0.88)... Done.
#Bootstrap (r = 1.0)... Done.
#Bootstrap (r = 1.08)... Done.
#Bootstrap (r = 1.19)... Done.
#Bootstrap (r = 1.27)... Done.
#Bootstrap (r = 1.38)... Done.
#
#Cluster method: ward.D2
#Distance : euclidean
#
#Estimates on edges:
#
# au bp se.au se.bp v c pchi
#1 1.000 1.000 0.000 0.000 0.000 0.000 0.000
#2 1.000 1.000 0.000 0.000 0.000 0.000 0.000
#3 1.000 1.000 0.000 0.000 0.000 0.000 0.000
#4 1.000 1.000 0.000 0.000 0.000 0.000 0.000
#5 1.000 1.000 0.000 0.000 0.000 0.000 0.000
#6 1.000 1.000 0.000 0.000 0.000 0.000 0.000
#7 1.000 1.000 0.000 0.000 0.000 0.000 0.000
#8 1.000 1.000 0.000 0.000 0.000 0.000 0.000
#9 1.000 1.000 0.000 0.000 0.000 0.000 0.000
#10 1.000 1.000 0.000 0.000 0.000 0.000 0.000
#11 1.000 1.000 0.000 0.000 0.000 0.000 0.000
#12 1.000 1.000 0.000 0.000 0.000 0.000 0.000
#13 1.000 1.000 0.000 0.000 0.000 0.000 0.000
#14 1.000 1.000 0.000 0.000 0.000 0.000 0.000
#15 1.000 1.000 0.000 0.000 0.000 0.000 0.000
#16 1.000 1.000 0.000 0.000 0.000 0.000 0.000
#17 1.000 1.000 0.000 0.000 0.000 0.000 0.000
#18 1.000 1.000 0.000 0.000 0.000 0.000 0.000
#19 0.853 0.885 0.022 0.003 -1.126 -0.076 0.058
#20 0.854 0.885 0.022 0.003 -1.128 -0.073 0.069
#21 0.861 0.897 0.022 0.003 -1.176 -0.090 0.082
#22 0.840 0.886 0.024 0.003 -1.100 -0.106 0.060
#23 0.794 0.690 0.023 0.005 -0.658 0.162 0.591
#24 0.828 0.686 0.020 0.005 -0.716 0.232 0.704
#25 1.000 1.000 0.000 0.000 0.000 0.000 0.000
Note that this method will fix your call, but the validity of the clustering method, and quality of your data is for you to decide. Your MRE was trusted.