I am trying to compute omega estimates after exploratory factor analysis to estimate the reliability of the components I've found. Using the omega() function from the psych package I get this output:
Output for omega function
Alpha: 0.8
G.6: 0.86
Omega Hierarchical: 0.37
Omega H asymptotic: 0.43
Omega Total 0.86
Schmid Leiman Factor loadings greater than
0.2
g F1* F2* F3* h2 u2 p2
EMS1 0.30 0.71 0.59 0.41 0.15
EMS3 -0.21 0.64 0.53 0.47 0.05
EMS4 0.62 0.41 0.59 0.04
EMS7 0.34 0.62 0.50 0.50 0.23
EMS8 0.36 0.42 0.32 0.68 0.40
EMS9 0.57 0.33 0.67 0.00
EMS10 0.39 0.20 0.80 0.11
EMS11 0.72 0.51 0.49 0.02
EMS12 0.68 0.46 0.54 0.02
EMS15 0.54 -0.24 0.41 0.59 0.02
EMS16 0.22 0.77 0.63 0.37 0.08
EMS19 0.65 0.52 0.48 0.01
EMS20 0.27 0.53 0.36 0.64 0.21
EMS21 0.62 0.40 0.60 0.04
EMS23 0.63 0.42 0.58 0.07
EMS24 0.68 0.45 0.55 1.02
EMS25 0.73 0.56 0.44 0.95
EMS27 0.45 0.20 0.25 0.75 0.83
EMS28 0.78 0.59 0.41 1.02
EMS34 0.26 0.31 0.48 0.34 0.66 0.20
With eigenvalues of:
g F1* F2* F3*
2.5 3.4 2.9 0.0
general/max 0.73 max/min = Inf
mean percent general = 0.27 with sd = 0.36 and cv of 1.33
Explained Common Variance of the general factor = 0.28
The degrees of freedom are 133 and the fit is 0.8
The number of observations was 601 with Chi Square = 471.81 with prob < 1.9e-39
The root mean square of the residuals is 0.04
The df corrected root mean square of the residuals is 0.05
RMSEA index = 0.066 and the 10 % confidence intervals are 0.059 0.072
BIC = -379.21
Compare this with the adequacy of just a general factor and no group factors
The degrees of freedom for just the general factor are 170 and the fit is 5.4
The number of observations was 601 with Chi Square = 3195.63 with prob < 0
The root mean square of the residuals is 0.22
The df corrected root mean square of the residuals is 0.24
RMSEA index = 0.173 and the 10 % confidence intervals are 0.167 0.177
BIC = 2107.87
Measures of factor score adequacy
g F1* F2* F3*
Correlation of scores with factors 0.9 0.94 0.93 0
Multiple R square of scores with factors 0.8 0.89 0.86 0
Minimum correlation of factor score estimates 0.6 0.78 0.73 -1
Total, General and Subset omega for each subset
g F1* F2* F3*
Omega total for total scores and subscales 0.86 0.82 0.85 NA
Omega general for total scores and subscales 0.37 0.08 0.34 NA
Omega group for total scores and subscales 0.58 0.75 0.51 NA
Warning messages:
1: In fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, :
A loading greater than abs(1) was detected. Examine the loadings carefully.
2: In fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, :
An ultra-Heywood case was detected. Examine the results carefully
3: In cov2cor(t(w) %*% r %*% w) :
diag(.) had 0 or NA entries; non-finite result is doubtful
This is how I am calling the function:
omega(df[,items],nfactors=3)
After searching for guidance, I could not find why omega was not computed for the 3rd factor. I am not sure if it an issue related to one of the warning messages:
Warning messages:
1: In fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, :
A loading greater than abs(1) was detected. Examine the loadings carefully.
2: In fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, :
An ultra-Heywood case was detected. Examine the results carefully
3: In cov2cor(t(w) %*% r %*% w) :
diag(.) had 0 or NA entries; non-finite result is doubtful
This could be because Omega is calculated by fitting a CFA model, and in your case with 3 factors, factor number 3 is used for identification reasons. So you wouldn't expect Omega to be calculated for it
Related
I'm running a multinomial logistic regression. The outcome has four categories and there are two predictors (Male =1, a measure of the number of books in the home as a 5 point scale, and measure of motivation to read which is continuous. Here is the essential aspects of the code. I'm following "K" parameterization in https://mc-stan.org/docs/stan-users-guide/multi-logit.html. Thanks.
DOWELL <- as.factor(DOWELL)
Canadareg2 <- data.frame(Canadareg2)
n <- nrow(Canadareg2)
f <- as.formula("DOWELL ~ Male + booksHome + motivRead")
m <- model.matrix(f,Canadareg2)
data.list <- list(n=nrow(Canadareg2),
k=length(unique(Canadareg2[,1])),
d=ncol(m),x=m, Male=Male, booksHome=booksHome,
motivRead=motivRead,DOWELL=as.numeric(Canadareg2[,1]))
ReadMultiNom <- "
data {
int<lower = 2> k; // The variable has at least two categories
int<lower = 1> d; // number of predictors
int<lower = 0> n;
vector[n] Male;
vector[n] booksHome;
vector[n] motivRead;
int <lower=1, upper=k> DOWELL[n];
matrix[n, d] x;
}
parameters {
matrix[d, k] beta;
}
transformed parameters {
matrix[n, k] x_beta= x * beta;
}
model {
to_vector(beta) ~ normal(0,2); // vectorizes beta and assigns same prior
for (i in 1:n) {
DOWELL[i] ~ categorical_logit(x_beta[i]');
}
}
generated quantities {
int <lower=1, upper=k> DOWELL_rep[n];
vector[n] log_lik;
for (i in 1:n) {
DOWELL_rep[i] = categorical_logit_rng(x_beta[i]');
log_lik[i] = categorical_logit_lpmf(DOWELL[i] |x_beta[i]');
}
}
"
nChains = 4
nIter= 10000
thinSteps = 10
burnInSteps = floor(nIter/2)
DOWELL = data.list$DOWELL
MultiNomRegFit = stan(data=data.list,model_code=ReadMultiNom,
chains=nChains,control = list(adapt_delta = 0.99),
iter=nIter,warmup=burnInSteps,thin=thinSteps)
Everything runs beautifully and all convergence criterion are met. However, I am struggling to interpret the betas. I'm not sure where the Male effects are located. That is, it seems to be only for the two other predictors, but even then, one of them is a 5 point scale. It would seem to me that each beta would have three elements, e.g. beta(1,1,1), beta(1,2,1), etc. Here is the output. I'm just unsure how to interpret the betas.
Inference for Stan model: 7be33c603bd35d82ad7f6b200ccee16f.
## 4 chains, each with iter=10000; warmup=5000; thin=10;
## post-warmup draws per chain=500, total post-warmup draws=2000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## beta[1,1] -2.07 0.03 1.05 -4.16 -2.80 -2.08 -1.36 0.00 1705 1
## beta[1,2] 1.16 0.03 1.04 -0.97 0.47 1.18 1.84 3.25 1652 1
## beta[1,3] 1.06 0.03 1.07 -1.03 0.36 1.06 1.76 3.15 1703 1
## beta[1,4] -0.01 0.03 1.07 -2.17 -0.70 -0.01 0.68 2.06 1657 1
## beta[2,1] 0.51 0.03 1.04 -1.52 -0.18 0.52 1.20 2.51 1618 1
## beta[2,2] -0.01 0.03 1.04 -2.08 -0.72 -0.02 0.67 2.07 1636 1
## beta[2,3] -0.31 0.03 1.03 -2.31 -1.00 -0.31 0.38 1.70 1617 1
## beta[2,4] -0.26 0.03 1.04 -2.28 -0.94 -0.26 0.44 1.72 1648 1
## beta[3,1] 0.26 0.03 1.01 -1.73 -0.40 0.27 0.95 2.17 1525 1
## beta[3,2] 0.03 0.03 1.01 -2.01 -0.65 0.05 0.70 1.93 1525 1
## beta[3,3] 0.02 0.03 1.01 -2.04 -0.64 0.04 0.70 1.95 1522 1
## beta[3,4] -0.30 0.03 1.01 -2.36 -0.99 -0.29 0.38 1.59 1528 1
## beta[4,1] 0.18 0.02 0.98 -1.70 -0.49 0.14 0.85 2.17 1561 1
## beta[4,2] -0.09 0.02 0.98 -1.98 -0.77 -0.14 0.58 1.89 1568 1
## beta[4,3] -0.12 0.02 0.98 -2.01 -0.80 -0.15 0.56 1.85 1566 1
## beta[4,4] 0.11 0.02 0.99 -1.79 -0.56 0.07 0.79 2.10 1559 1
##
## Samples were drawn using NUTS(diag_e) at Tue Oct 18 09:58:48 2022.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
I'm not quite sure what is going on, so any advice would be appreciated.
I'm trying to do PCA analysis on some data. I'm not given the raw data, just the correlation matrix in this way:
Tmax Tmin P H PT V Vmax
Tmax 1.00 0.70 -0.08 -0.41 -0.09 -0.23 -0.08
Tmin 0.70 1.00 -0.30 0.07 0.14 -0.03 -0.01
P -0.08 -0.30 1.00 -0.18 -0.13 -0.29 -0.25
H -0.41 0.07 -0.18 1.00 0.32 -0.15 -0.19
PT -0.09 0.14 -0.13 0.32 1.00 0.11 0.07
V -0.23 -0.03 -0.29 -0.15 0.11 1.00 0.83
Vmax -0.08 -0.01 -0.25 -0.19 0.07 0.83 1.00
For this I'm trying to use the princomp() function since it has the covmat option so I can introduce data as a correlation matrix. For the pca analysis I'm using the following code:
pca_prim <- princomp(covmat=Primavera, cor = T, scores = TRUE)
I need the scores in order to plot a biplot in following steps but the scores vector I get is null:
biplot(pca_prim)
Error in biplot.princomp(pca_prim) : object 'pca_prim' has no scores
pca_prim$scores
NULL
I can't seem to find what the problem is in order to get the scores. Any suggestions?
I have a data set which I divided into the training and testing set after first recoding qualitative variables to integers. I ran PCA analysis using the psych package.
For the training set, I ran the below code:
train.scale<-scale(trainagain[,-1:-2])
pcafit<-principal(train.scale,nfactors = 11, rotate="Varimax")
It extracted the components as below:
RC1 RC4 RC3 RC5 RC2 RC6 RC7 RC8 RC9 RC11 RC10
SS loadings 2.44 1.92 1.90 1.72 1.65 1.46 1.40 1.15 1.10 1.01 1.01
Proportion Var 0.10 0.08 0.08 0.07 0.07 0.06 0.06 0.05 0.05 0.04 0.04
Cumulative Var 0.10 0.18 0.26 0.33 0.40 0.46 0.52 0.57 0.61 0.66 0.70
Proportion Explained 0.15 0.11 0.11 0.10 0.10 0.09 0.08 0.07 0.07 0.06 0.06
Cumulative Proportion 0.15 0.26 0.37 0.48 0.58 0.66 0.75 0.81 0.88 0.94 1.00
For the test set, I ran the below code:
str(testagain)
testagain.scores<-data.frame(predict(pcafit,testagain[,c(-1:-2)]))
The str(testagain) shows that my data structure is similar to trainagain, with all contents being integers. However, for the testagain.scores, the contents are all NaN.
How can I get "predict" to work? To my knowledge, I am following:
# S3 method for psych
predict(object, data,old.data,options=NULL,missing=FALSE,impute="none",...)
from:
https://www.rdocumentation.org/packages/psych/versions/2.0.7/topics/predict.psych
I think I might stumble across the solution: to remove one of the features/columns whose data is exactly the same across all samples.
pc_unrotate = principal(correlate1,nfactors = 4,rotate = "none")
print(pc_unrotate)
output:
Principal Components Analysis
Call: principal(r = correlate1, nfactors = 4, rotate = "none")
Standardized loadings (pattern matrix) based upon correlation matrix
PC1 PC2 PC3 PC4 h2 u2 com
ProdQual 0.25 -0.50 -0.08 0.67 0.77 0.232 2.2
Ecom 0.31 0.71 0.31 0.28 0.78 0.223 2.1
TechSup 0.29 -0.37 0.79 -0.20 0.89 0.107 1.9
CompRes 0.87 0.03 -0.27 -0.22 0.88 0.119 1.3
Advertising 0.34 0.58 0.11 0.33 0.58 0.424 2.4
ProdLine 0.72 -0.45 -0.15 0.21 0.79 0.213 2.0
SalesFImage 0.38 0.75 0.31 0.23 0.86 0.141 2.1
ComPricing -0.28 0.66 -0.07 -0.35 0.64 0.359 1.9
WartyClaim 0.39 -0.31 0.78 -0.19 0.89 0.108 2.0
OrdBilling 0.81 0.04 -0.22 -0.25 0.77 0.234 1.3
DelSpeed 0.88 0.12 -0.30 -0.21 0.91 0.086 1.4
PC1 PC2 PC3 PC4
SS loadings 3.43 2.55 1.69 1.09
Proportion Var 0.31 0.23 0.15 0.10
Cumulative Var 0.31 0.54 0.70 0.80
Proportion Explained 0.39 0.29 0.19 0.12
Cumulative Proportion 0.39 0.68 0.88 1.00
Mean item complexity = 1.9
Test of the hypothesis that 4 components are sufficient.
The root mean square of the residuals (RMSR) is 0.06
Fit based upon off diagonal values = 0.97
Now i need to get the scores, Tried pc_unrotate$scores but it returns null.
executed names(pc_unrotate),
Name of PCA
and found that Scores attribute is missing...so what can i do to get PCA scores?
Add argument scores=TRUE to the principal() function call: https://www.rdocumentation.org/packages/psych/versions/1.9.12.31/topics/principal
pc_unrotate = principal(correlate1,nfactors = 4,rotate = "none", scores = TRUE)
I am aware that Cronbach's alpha has been extensively discussed here and elsewhere, but I cannot find a detailed interpretation of the output table.
psych::alpha(questionaire)
Reliability analysis
Call: psych::alpha(x = diagnostic_test)
raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
0.69 0.73 1 0.14 2.7 0.026 0.6 0.18 0.12
lower alpha upper 95% confidence boundaries
0.64 0.69 0.74
Reliability if an item is dropped:
raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
Score1 0.69 0.73 0.86 0.14 2.7 0.027 0.0136 0.12
Score2 0.68 0.73 0.87 0.14 2.7 0.027 0.0136 0.12
Score3 0.69 0.73 0.87 0.14 2.7 0.027 0.0136 0.12
Score4 0.67 0.72 0.86 0.14 2.5 0.028 0.0136 0.11
Score5 0.68 0.73 0.87 0.14 2.7 0.027 0.0134 0.12
Score6 0.69 0.73 0.91 0.15 2.7 0.027 0.0138 0.12
Score7 0.69 0.73 0.85 0.15 2.7 0.027 0.0135 0.12
Score8 0.68 0.72 0.86 0.14 2.6 0.028 0.0138 0.12
Score9 0.68 0.73 0.92 0.14 2.7 0.027 0.0141 0.12
Score10 0.68 0.72 0.90 0.14 2.6 0.027 0.0137 0.12
Score11 0.67 0.72 0.86 0.14 2.5 0.028 0.0134 0.11
Score12 0.67 0.71 0.87 0.13 2.5 0.029 0.0135 0.11
Score13 0.67 0.72 0.86 0.14 2.6 0.028 0.0138 0.11
Score14 0.68 0.72 0.86 0.14 2.6 0.028 0.0138 0.11
Score15 0.67 0.72 0.86 0.14 2.5 0.028 0.0134 0.11
Score16 0.68 0.72 0.88 0.14 2.6 0.028 0.0135 0.12
score 0.65 0.65 0.66 0.10 1.8 0.030 0.0041 0.11
Item statistics
n raw.r std.r r.cor r.drop mean sd
Score1 286 0.36 0.35 0.35 0.21 0.43 0.50
Score2 286 0.37 0.36 0.36 0.23 0.71 0.45
Score3 286 0.34 0.34 0.34 0.20 0.73 0.44
Score4 286 0.46 0.46 0.46 0.33 0.35 0.48
Score5 286 0.36 0.36 0.36 0.23 0.73 0.44
Score6 286 0.29 0.32 0.32 0.18 0.87 0.34
Score7 286 0.33 0.32 0.32 0.18 0.52 0.50
Score8 286 0.42 0.41 0.41 0.28 0.36 0.48
Score9 286 0.32 0.36 0.36 0.22 0.90 0.31
Score10 286 0.37 0.40 0.40 0.26 0.83 0.37
Score11 286 0.48 0.47 0.47 0.34 0.65 0.48
Score12 286 0.49 0.49 0.49 0.37 0.71 0.46
Score13 286 0.46 0.44 0.44 0.31 0.44 0.50
Score14 286 0.44 0.43 0.43 0.30 0.43 0.50
Score15 286 0.48 0.47 0.47 0.35 0.61 0.49
Score16 286 0.39 0.39 0.39 0.26 0.25 0.43
score 286 1.00 1.00 1.00 1.00 0.60 0.18
Warning messages:
1: In cor.smooth(r) : Matrix was not positive definite, smoothing was done
2: In cor.smooth(R) : Matrix was not positive definite, smoothing was done
3: In cor.smooth(R) : Matrix was not positive definite, smoothing was done
as far as I know, r.cor stand for the total-item correlation, or biserial correlation. I have seen that this is usually interpreted together with the corresponding p-value.
1. What is the exact interpretation of r.cor and r.drop?
2. How can the p-value be calculated ?
1. Although this is more of a question for Crossvalidated, here is the detailed explanation of ‘Item statistics’ section:
raw.r: correlation between the item and the total score from the scale (i.e., item-total correlations); there is a problem with raw.r, that is, the item itself is included in the total—this means we’re correlating the item with itself, so of course it will correlate (r.cor and r.drop solve this problem; see ?alpha for details)
r.drop: item-total correlation without that item itself (i.e., item-rest correlation or corrected item-total correlation); low item-total correlations indicate that that item doesn’t correlate well with the scale overall
r.cor: item-total correlation corrected for item overlap and scale reliability
mean and sd: mean and sd of the scale if that item is dropped
2. You should not use the p-values corresponding to these correlation coefficient to guide your decisions. I would suggest not to bother calculating them.