I'm using the glca package to run a latent class analysis. I want to see how covariates (other than indicators used to construct latent classes) affect the probability of class assignment. I understand this is a multinomial logistic regression, and thus, my question is, is there a way I can change the base reference latent class? For example, my model is currently a 4-class model, and the output shows the effect of covariates on class prevalence with respect to Class-4 (base category) as default. I want to change this base category to, for example, Class-2.
My code is as follows
fc <- item(intrst, respect, expert, inclu, contbt,secure,pay,bonus, benft, innov, learn, rspons, promote, wlb, flex) ~ atenure+super+sal+minority+female+age40+edu+d_bpw+d_skill
lca4_cov <- glca(fc, data = bpw, nclass = 4, seed = 1)
and I get the following output.
> coef(lca4_cov)
Class 1 / 4 :
Odds Ratio Coefficient Std. Error t value Pr(>|t|)
(Intercept) 1.507537 0.410477 0.356744 1.151 0.24991
atenure 0.790824 -0.234679 0.102322 -2.294 0.02183 *
super 1.191961 0.175600 0.028377 6.188 6.29e-10 ***
sal 0.937025 -0.065045 0.035490 -1.833 0.06686 .
minority 2.002172 0.694233 0.060412 11.492 < 2e-16 ***
female 1.210653 0.191160 0.059345 3.221 0.00128 **
age40 1.443603 0.367142 0.081002 4.533 5.89e-06 ***
edu 1.069771 0.067444 0.042374 1.592 0.11149
d_bpw 0.981104 -0.019077 0.004169 -4.576 4.78e-06 ***
d_skill 1.172218 0.158898 0.036155 4.395 1.12e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Class 2 / 4 :
Odds Ratio Coefficient Std. Error t value Pr(>|t|)
(Intercept) 3.25282 1.17952 0.43949 2.684 0.00729 **
atenure 0.95131 -0.04992 0.12921 -0.386 0.69926
super 1.16835 0.15559 0.03381 4.602 4.22e-06 ***
sal 1.01261 0.01253 0.04373 0.287 0.77450
minority 0.72989 -0.31487 0.08012 -3.930 8.55e-05 ***
female 0.45397 -0.78971 0.07759 -10.178 < 2e-16 ***
age40 1.26221 0.23287 0.09979 2.333 0.01964 *
edu 1.29594 0.25924 0.05400 4.801 1.60e-06 ***
d_bpw 0.97317 -0.02720 0.00507 -5.365 8.26e-08 ***
d_skill 1.16223 0.15034 0.04514 3.330 0.00087 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Class 3 / 4 :
Odds Ratio Coefficient Std. Error t value Pr(>|t|)
(Intercept) 0.218153 -1.522557 0.442060 -3.444 0.000575 ***
atenure 0.625815 -0.468701 0.123004 -3.810 0.000139 ***
super 1.494112 0.401532 0.031909 12.584 < 2e-16 ***
sal 1.360924 0.308164 0.044526 6.921 4.72e-12 ***
minority 0.562590 -0.575205 0.081738 -7.037 2.07e-12 ***
female 0.860490 -0.150253 0.072121 -2.083 0.037242 *
age40 1.307940 0.268453 0.100376 2.674 0.007495 **
edu 1.804949 0.590532 0.054522 10.831 < 2e-16 ***
d_bpw 0.987353 -0.012727 0.004985 -2.553 0.010685 *
d_skill 1.073519 0.070942 0.045275 1.567 0.117163
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I would appreciate it if anyone let me know codes/references to address my problem. Thanks in advance.
Try using the decreasing option.
lca4_cov <- glca(fc, data = bpw, nclass = 4, seed = 1, decreasing = T)
Related
I am doing a linear regression in R. The output shows some variables (equity & Equity, and loan & Loan) double and one is written with a capital letter. In the dataset, they are always written in lowercase but appear in two different ways when I run the regression. I do not find the answer online, so maybe some of you can help me out? Any ideas are highly appreciated!
Model1 <- lm(Lifetime_CO2 ~ signatory + as.factor(Finance_Type), data = Data_dup)
summary(Model1)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 90.351 4.397 20.550 < 2e-16 ***
signatory 7.378 1.732 4.259 2.10e-05 ***
as.factor(Finance_Type)equity -29.059 4.640 -6.263 4.18e-10 ***
as.factor(Finance_Type)Equity 14.549 38.971 0.373 0.708914
as.factor(Finance_Type)government grant -81.284 22.784 -3.568 0.000365 ***
as.factor(Finance_Type)insurance -2.810 16.397 -0.171 0.863948
as.factor(Finance_Type)loan -25.183 4.422 -5.695 1.32e-08 ***
as.factor(Finance_Type)Loan 14.549 27.731 0.525 0.599852
as.factor(Finance_Type)refinancing bond -9.728 19.878 -0.489 0.624578
as.factor(Finance_Type)refinancing equity -40.601 27.731 -1.464 0.143252
as.factor(Finance_Type)refinancing loan -26.889 5.344 -5.031 5.09e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
You can convert upper-case characters in the Finance_Type column to lower-case, or vice versa.
By the way, as.factor() is not needed unless you want to re-order levels of a categorical variable.
Data_dup$Finance_Type <- tolower(Data_dup$Finance_Type)
Model1 <- lm(Lifetime_CO2 ~ signatory + Finance_Type, data = Data_dup)
summary(Model1)
I'm using 'gamlss' from the package 'gamlss' (version 5.4-1) in R for a generalized additive model for location scale and shape.
My model looks like this
propvoc3 = gamlss(proporcion.voc ~ familiaridad * proporcion)
When I want to see the Anova table I get this output
Mu link function: identity
Mu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.625e-01 9.476e-02 5.936 1.9e-06 ***
familiaridaddesconocido -1.094e-01 1.059e-01 -1.032 0.31042
proporcionmayor 4.375e-01 1.340e-01 3.265 0.00281 **
proporcionmenor 1.822e-17 1.340e-01 0.000 1.00000
familiaridaddesconocido:proporcionmayor -3.281e-01 1.708e-01 -1.921 0.06464 .
familiaridaddesconocido:proporcionmenor 5.469e-01 1.708e-01 3.201 0.00331 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
------------------------------------------------------------------
I just want to know if there is a way to get the values just by variable and not by every term?
I am trying to figure out how to calculate the marginal effects of my model using the, "clogit," function in the survival package. The margins package does not seem to work with this type of model, but does work with "multinom" and "mclogit." However, I am investigating the affects of choice characteristics, and not individual characteristics, so it needs to be a conditional logit model. The mclogit function works with the margins package, but these results are widely different from the results using the clogit function, why is that? Any help calculating the marginal effects from the clogit function would be greatly appreciated.
mclogit output:
Call:
mclogit(formula = cbind(selected, caseID) ~ SysTEM + OWN + cost +
ENVIRON + NEIGH + save, data = atl)
Estimate Std. Error z value Pr(>|z|)
SysTEM 0.139965 0.025758 5.434 5.51e-08 ***
OWN 0.008931 0.026375 0.339 0.735
cost -0.103012 0.004215 -24.439 < 2e-16 ***
ENVIRON 0.675341 0.037104 18.201 < 2e-16 ***
NEIGH 0.419054 0.031958 13.112 < 2e-16 ***
save 0.532825 0.023399 22.771 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null Deviance: 18380
Residual Deviance: 16670
Number of Fisher Scoring iterations: 4
Number of observations: 8364
clogit output:
Call:
coxph(formula = Surv(rep(1, 25092L), selected) ~ SysTEM + OWN +
cost + ENVIRON + NEIGH + save + strata(caseID), data = atl,
method = "exact")
n= 25092, number of events= 8364
coef exp(coef) se(coef) z Pr(>|z|)
SysTEM 0.133184 1.142461 0.034165 3.898 9.69e-05 ***
OWN -0.015884 0.984241 0.036346 -0.437 0.662
cost -0.179833 0.835410 0.005543 -32.442 < 2e-16 ***
ENVIRON 1.186329 3.275036 0.049558 23.938 < 2e-16 ***
NEIGH 0.658657 1.932195 0.042063 15.659 < 2e-16 ***
save 0.970051 2.638079 0.031352 30.941 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
SysTEM 1.1425 0.8753 1.0685 1.2216
OWN 0.9842 1.0160 0.9166 1.0569
cost 0.8354 1.1970 0.8264 0.8445
ENVIRON 3.2750 0.3053 2.9719 3.6091
NEIGH 1.9322 0.5175 1.7793 2.0982
save 2.6381 0.3791 2.4809 2.8053
Concordance= 0.701 (se = 0.004 )
Rsquare= 0.103 (max possible= 0.688 )
Likelihood ratio test= 2740 on 6 df, p=<2e-16
Wald test = 2465 on 6 df, p=<2e-16
Score (logrank) test = 2784 on 6 df, p=<2e-16
margins output for mclogit
margins(model2A)
SysTEM OWN cost ENVIRON NEIGH save
0.001944 0.000124 -0.001431 0.00938 0.00582 0.0074
margins output for clogit
margins(model2A)
Error in match.arg(type) :
'arg' should be one of “risk”, “expected”, “lp”
I have ran a quasipoisson GLM with the following code:
Output3 <- glm(GCN ~ DHSI + N + P, PondsTask2, family = quasipoisson(link = "log"))
and received this output:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.69713 0.56293 -3.015 0.00272 **
DHSI 3.44795 0.74749 4.613 0.00000519 ***
N -0.59648 0.36357 -1.641 0.10157
P -0.01964 0.37419 -0.052 0.95816
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
With the DHSI being statistically significant, but the other two variables not being significant. How do I go about dropping variables until I have the minimum model?
This is the R code for logistic reg model,
> hrlogis1 <- glm(Attrition~. -Age -DailyRate -Department -Education
> -EducationField -HourlyRate -JobLevel
> -JobRole -MonthlyIncome -MonthlyRate
> -PercentSalaryHike -PerformanceRating
> -StandardHours -StockOptionLevel
> , family=binomial(link = "logit"),data=hrtrain)
where:
Attrition is the dependent variable and rest are all the independent variables.
Below is the summary of the model:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.25573 0.84329 1.489 0.136464
BusinessTravelTravel_Frequently 1.86022 0.47410 3.924 8.72e-05 ***
BusinessTravelTravel_Rarely 1.28273 0.44368 2.891 0.003839 **
DistanceFromHome 0.03869 0.01138 3.400 0.000673 ***
EnvironmentSatisfaction -0.36484 0.08714 -4.187 2.83e-05 ***
GenderMale 0.52556 0.19656 2.674 0.007499 **
JobInvolvement -0.59407 0.13259 -4.480 7.45e-06 ***
JobSatisfaction -0.37315 0.08671 -4.303 1.68e-05 ***
MaritalStatusMarried 0.23408 0.26993 0.867 0.385848
MaritalStatusSingle 1.37647 0.27511 5.003 5.63e-07 ***
NumCompaniesWorked 0.16439 0.04034 4.075 4.59e-05 ***
OverTimeYes 1.67531 0.20054 8.354 < 2e-16 ***
RelationshipSatisfaction -0.23865 0.08726 -2.735 0.006240 **
TotalWorkingYears -0.12385 0.02360 -5.249 1.53e-07 ***
TrainingTimesLastYear -0.15522 0.07447 -2.084 0.037124 *
WorkLifeBalance -0.30969 0.13025 -2.378 0.017427 *
YearsAtCompany 0.06887 0.04169 1.652 0.098513 .
YearsInCurrentRole -0.10812 0.04880 -2.216 0.026713 *
YearsSinceLastPromotion 0.14006 0.04452 3.146 0.001657 **
YearsWithCurrManager -0.09343 0.04984 -1.875 0.060834 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Now I want to remove those which are not significant, here in this case "MaritalStatusMarried" is not significant.
MaritalStatus is a variable(column) with two levels "Married" and "Single".
How about:
data$MaritalStatus[data[,num]="Married"] <- NA
(where num = number of the column in the data)
The values for Married will be replaced for NA's and then you can run the glm model again.