This indeed looks like easy error message to solve, but I've been struggling with it the entire morning and I don't really see why I get this message, but maybe one of you is more familiar with this problem. I made a testdataset to illustrate my question:
> test
PatientID Age Gender Date Group var1 var2 var3
1 1 70 Male 1/1/2015 A_0 0.30 4 117
2 1 70 Male 1/6/2015 A_1 0.70 9 90
3 2 52 Female 1/1/2015 A_0 1.00 1 87
4 2 52 Female 1/8/2015 A_1 2.00 11 103
5 3 49 Male 1/3/2015 A_0 0.25 14 111
6 3 49 Male 1/8/2015 A_1 0.30 5 50
7 4 36 Female 1/3/2015 A_0 0.70 7 82
8 4 36 Female 1/6/2015 A_1 0.80 8 133
> library(broom)
> lapply(names(test)[6:9], function(n) {
+ linear <- lm(n ~ Group + Age + Gender + Date, data = test)
+ lapply(linear, glance)
+ })
Error in model.frame.default(formula = n ~ Group + Age + Gender + Date, :
variable lengths differ (found for 'Group')
When I try to run the same regression on one of the variables I don't get the error message:
> summary(lm(var1 ~ Group + Age + Gender + Date))
Call:
lm(formula = var1 ~ Group + Age + Gender + Date)
Residuals:
1 2 3 4 5 6 7 8
-0.03321 -0.06578 0.03321 0.09672 0.19571 -0.09672 -0.19571 0.06578
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.19491 0.68201 -1.752 0.2219
GroupA_1 0.93650 0.26436 3.543 0.0713 .
Age 0.04157 0.01234 3.368 0.0780 .
GenderMale -1.38185 0.25128 -5.499 0.0315 *
Date1/3/2015 0.59406 0.32438 1.831 0.2085
Date1/6/2015 -0.50393 0.23247 -2.168 0.1625
Date1/8/2015 NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2304 on 2 degrees of freedom
Multiple R-squared: 0.9536, Adjusted R-squared: 0.8375
F-statistic: 8.216 on 5 and 2 DF, p-value: 0.112
I did some diagnostics to check wether these factors really have only one level, but it seems not to be the case...
> l<-sapply(test,function(x)is.factor(x))
> l
PatientID Age Gender Date Group var1 var2 var3
FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE
> m<-test[,names(which(l=="TRUE"))]
> m
Gender Date Group
1 Male 1/1/2015 A_0
2 Male 1/6/2015 A_1
3 Female 1/1/2015 A_0
4 Female 1/8/2015 A_1
5 Male 1/3/2015 A_0
6 Male 1/8/2015 A_1
7 Female 1/3/2015 A_0
8 Female 1/6/2015 A_1
> ifelse(n<-sapply(m,function(x)length(levels(x)))==1,"DROP","NODROP")
Gender Date Group
"NODROP" "NODROP" "NODROP"
So it would be really great if somebody has a suggestion to overcome this at first sight easy error?
New script after suggestion underneath:
attach(test)
lapply(names(test)[6:8], function(n) {
linear <-as.formula(paste(n, "~ Group + Age + Gender + Date"))
glance(lm(linear, na.action=na.omit))
})
I also incorporated now how to handle missing values.
Thank you very very much for the suggestion, because I wasn't seeing it clearly anymore this morning!!
Related
I have run this regression without any problems and I get 4 coefficients, for each interaction between econ_sit and educ_cat. Econ_sit is a continous variable, and educ_cat is a categorical variable from 1-6. How can i plot the coefficients only for the interaction terms in a good way?
model_int_f <- felm(satis_gov_sc ~ econ_sit*factor(educ_cat) + factor(benefit) + econ_neth + age + gender + pol_sof
| factor(wave) + factor(id) # Respondent and time fixed effects
| 0
| id, # Cluster standard errors on each respondent
data = full1)
summary(model_int_f)
Call:
felm(formula = satis_gov_sc ~ econ_sit * factor(educ_cat) + factor(benefit) + econ_neth + age + gender + pol_sof | factor(wave) + factor(id) | 0 | id, data = full1)
Residuals:
Min 1Q Median 3Q Max
-0.58468 -0.04464 0.00000 0.04728 0.78470
Coefficients:
Estimate Cluster s.e. t value Pr(>|t|)
econ_sit 0.1411692 0.0603100 2.341 0.01928 *
factor(educ_cat)2 0.0525580 0.0450045 1.168 0.24292
factor(educ_cat)3 0.1229048 0.0576735 2.131 0.03313 *
factor(educ_cat)4 0.1244146 0.0486455 2.558 0.01057 *
factor(educ_cat)5 0.1245556 0.0520246 2.394 0.01669 *
factor(educ_cat)6 0.1570034 0.0577240 2.720 0.00655 **
factor(benefit)2 -0.0030380 0.0119970 -0.253 0.80010
factor(benefit)3 0.0026064 0.0072590 0.359 0.71957
econ_neth 0.0642726 0.0131940 4.871 1.14e-06 ***
age 0.0177453 0.0152661 1.162 0.24512
gender 0.1088780 0.0076137 14.300 < 2e-16 ***
pol_sof 0.0006003 0.0094504 0.064 0.94935
econ_sit:factor(educ_cat)2 -0.0804820 0.0653488 -1.232 0.21816
econ_sit:factor(educ_cat)3 -0.0950652 0.0793818 -1.198 0.23114
econ_sit:factor(educ_cat)4 -0.1259772 0.0692072 -1.820 0.06877 .
econ_sit:factor(educ_cat)5 -0.1469749 0.0654870 -2.244 0.02485 *
econ_sit:factor(educ_cat)6 -0.1166243 0.0693709 -1.681 0.09279 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1161 on 11159 degrees of freedom
(23983 observations deleted due to missingness)
Multiple R-squared(full model): 0.8119 Adjusted R-squared: 0.717
Multiple R-squared(proj model): 0.00657 Adjusted R-squared: -0.4946
F-statistic(full model, *iid*):8.557 on 5630 and 11159 DF, p-value: < 2.2e-16
F-statistic(proj model): 55.38 on 17 and 5609 DF, p-value: < 2.2e-16
This is what my data looks like:
$ id : num 1 1 1 1 2 2 2 2 3 3 3 3
$ wave : chr "2013" "2015" "2016" "2017" ...
$ satis_gov_sc: num 0.5 0.4 0.4 0.6 0.6 0.5 0.6 0.7 0.7 0.7 ...
$ econ_sit : num NA NA 0.708 0.75 0.708 ...
$ educ_cat : num 5 5 5 5 5 6 6 6 6 6 ...
$ benefit : num 3 3 3 3 3 3 3 3 3 3 ...
$ econ_neth : num NA 0.6 0.6 0.7 0.7 0.5 0.4 0.6 0.8 0.7 ...
$ age : num 58 60 61 62 63 51 53 54 55 56 ...
$ gender : num 1 1 1 1 1 1 1 1 1 1 ...
$ pol_sof : num 1 1 1 0.8 1 1 1 1 0.8 1 ...
I've tried to run af simple plot_model with the following code:
plot_model(model_int_f, type = "pred", terms = c("econ_sit", "educ_cat"))
However I only get error because the felm function is not compatible with "pred":
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "felm"
Any suggestions on how to plot the interaction terms?
Thanks in advance!
felm does not have a predict method so it is not compatible with plot_model. You could use some other fixed effects library.
Here's an example using fixest. As you did not provide a sample of your data, I have used data(iris).
library(fixest); library(sjPlot)
res = feols(Sepal.Length ~ Sepal.Width + Petal.Length:Species | Species, cluster='Species', iris)
plot_model(res, type = "pred", terms = c("Petal.Length", "Species"))
data <- read.csv("Documents/ABA/dataset.csv")
df <- subset(data, select=c(k7, n3, n2a, d1a1x, k17, bmgc23g, m1a_corruption_pos,
j30_permit_pos, bmge1, lcu, j30_instability_pos,
bmgc25))
#filtering dataset for selected variable
impute <- df[c("k7","k17","d1a1x","bmgc23g", "m1a_corruption_pos",
"j30_permit_pos", "bmge1", "lcu", "j30_instability_pos",
"bmgc25")]
tempData <- mice(impute, m=5, maxit=10, method="pmm", seed=500)
impdat <- complete(tempData, action="long", include=FALSE)
May I know what is wrong or how it can fixed ?
This is correct! First, you used mice(., m=5) (the default) to impute yout data set five times. Using complete(., action=long), you combined all five imputations in a long format. To distinguish the individual imputations, two variables are added, .imp, which distinguishes between the five imputations, and .id which are the initial row names.
library(mice)
imp <- mice(nhanes, m=3)
nhanes_imp <- complete(imp, action='long')
nhanes_imp
# .imp .id age bmi hyp chl
# 1 1 1 1 29.6 1 187
# 2 1 2 2 22.7 1 187
# 3 1 3 1 29.6 1 187
# [...]
# 26 2 1 1 22.7 1 118
# 27 2 2 2 22.7 1 187
# 28 2 3 1 30.1 1 187
# [...]
# 51 3 1 1 27.2 1 131
# 52 3 2 2 22.7 1 187
# 53 3 3 1 24.9 1 187
# [...]
# 76 4 1 1 22.0 1 113
# 77 4 2 2 22.7 1 187
# 78 4 3 1 22.0 1 187
# [...]
# 101 5 1 1 35.3 1 187
# 102 5 2 2 22.7 1 187
# 103 5 3 1 35.3 1 187
# [...]
Naturally your imputed data set has five times the number of rows than you initial one.
nrow(nhanes_imp) / nrow(nhanes)
# [1] 5
You should never use complete without action='long' (see my older answer there).
Continue by pooling your calculations. For instance, for OLS you may use the pool() function, which comes with mice, that basically averages what lm is doing, over the five imputation versions.
fit <- with(data=imp, exp=lm(bmi ~ hyp + chl))
summary(pool(fit))
# term estimate std.error statistic df p.value
# 1 (Intercept) 21.38468643 4.58030244 4.6688372 16.64367 0.0002323604
# 2 hyp -1.89607759 2.18239135 -0.8688073 19.00235 0.3957936019
# 3 chl 0.03942668 0.02449571 1.6095343 15.72940 0.1273825300
In case we mistakenly do OLS without pooling the imputed data sets, the number of observations is blown up to five times of it's actually size. Hence the degrees of freedom are to large, and the variance and all statistics depending on it underestimated:
summary(lm(bmi ~ hyp + chl, nhanes_imp))
# Call:
# lm(formula = bmi ~ hyp + chl, data = nhanes_imp)
#
# Residuals:
# Min 1Q Median 3Q Max
# -6.9010 -2.7027 0.3682 3.0993 8.4682
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 21.165549 1.794706 11.793 < 2e-16 ***
# hyp -1.920889 0.907041 -2.118 0.0362 *
# chl 0.040573 0.009444 4.296 3.51e-05 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 3.934 on 122 degrees of freedom
# Multiple R-squared: 0.1346, Adjusted R-squared: 0.1205
# F-statistic: 9.492 on 2 and 122 DF, p-value: 0.0001475
I have a multinomial logit model with two individual specific variables (first and age).
I would like to conduct the hmftest to check if the IIA holds.
My dataset looks like this:
head(df)
mode choice first age
1 both 1 0 24
2 pre 1 1 23
3 both 1 2 53
4 post 1 3 43
5 no 1 1 55
6 both 1 2 63
I adjusted it for the mlogit to:
mode choice first age idx
1 TRUE 1 0 24 1:both
2 FALSE 1 0 24 1:no
3 FALSE 1 0 24 1:post
4 FALSE 1 0 24 1:pre
5 FALSE 1 1 23 2:both
6 FALSE 1 1 23 2:no
7 FALSE 1 1 23 2:post
8 TRUE 1 1 23 2:pre
9 TRUE 1 2 53 3:both
10 FALSE 1 2 53 3:no
~~~ indexes ~~~~
id1 id2
1 1 both
2 1 no
3 1 post
4 1 pre
5 2 both
6 2 no
7 2 post
8 2 pre
9 3 both
10 3 no
indexes: 1, 2
My original (full) model runs as follows:
full <- mlogit(mode ~ 0 | first + age, data = df_mlogit, reflevel = "no")
leading to the following result:
Call:
mlogit(formula = mode ~ 0 | first + age, data = df_mlogit, reflevel = "no",
method = "nr")
Frequencies of alternatives:choice
no both post pre
0.2 0.4 0.2 0.2
nr method
18 iterations, 0h:0m:0s
g'(-H)^-1g = 8.11E-07
gradient close to zero
Coefficients :
Estimate Std. Error z-value Pr(>|z|)
(Intercept):both 2.0077e+01 1.0441e+04 0.0019 0.9985
(Intercept):post -4.1283e-01 1.4771e+04 0.0000 1.0000
(Intercept):pre 5.3346e-01 1.4690e+04 0.0000 1.0000
first1:both -4.0237e+01 1.1059e+04 -0.0036 0.9971
first1:post -8.9168e-01 1.4771e+04 -0.0001 1.0000
first1:pre -6.6805e-01 1.4690e+04 0.0000 1.0000
first2:both -1.9674e+01 1.0441e+04 -0.0019 0.9985
first2:post -1.8975e+01 1.5683e+04 -0.0012 0.9990
first2:pre -1.8889e+01 1.5601e+04 -0.0012 0.9990
first3:both -2.1185e+01 1.1896e+04 -0.0018 0.9986
first3:post 1.9200e+01 1.5315e+04 0.0013 0.9990
first3:pre 1.9218e+01 1.5237e+04 0.0013 0.9990
age:both 2.1898e-02 2.9396e-02 0.7449 0.4563
age:post 9.3377e-03 2.3157e-02 0.4032 0.6868
age:pre -1.2338e-02 2.2812e-02 -0.5408 0.5886
Log-Likelihood: -61.044
McFadden R^2: 0.54178
Likelihood ratio test : chisq = 144.35 (p.value = < 2.22e-16)
To test for IIA, I exclude one alternative from the model (here "pre") and run the model as follows:
part <- mlogit(mode ~ 0 | first + age, data = df_mlogit, reflevel = "no",
alt.subset = c("no", "post", "both"))
leading to
Call:
mlogit(formula = mode ~ 0 | first + age, data = df_mlogit, alt.subset = c("no",
"post", "both"), reflevel = "no", method = "nr")
Frequencies of alternatives:choice
no both post
0.25 0.50 0.25
nr method
18 iterations, 0h:0m:0s
g'(-H)^-1g = 6.88E-07
gradient close to zero
Coefficients :
Estimate Std. Error z-value Pr(>|z|)
(Intercept):both 1.9136e+01 6.5223e+03 0.0029 0.9977
(Intercept):post -9.2040e-01 9.2734e+03 -0.0001 0.9999
first1:both -3.9410e+01 7.5835e+03 -0.0052 0.9959
first1:post -9.3119e-01 9.2734e+03 -0.0001 0.9999
first2:both -1.8733e+01 6.5223e+03 -0.0029 0.9977
first2:post -1.8094e+01 9.8569e+03 -0.0018 0.9985
first3:both -2.0191e+01 1.1049e+04 -0.0018 0.9985
first3:post 2.0119e+01 1.1188e+04 0.0018 0.9986
age:both 2.1898e-02 2.9396e-02 0.7449 0.4563
age:post 1.9879e-02 2.7872e-02 0.7132 0.4757
Log-Likelihood: -27.325
McFadden R^2: 0.67149
Likelihood ratio test : chisq = 111.71 (p.value = < 2.22e-16)
However when I want to codnuct the hmftest then the following error occurs:
> hmftest(full, part)
Error in solve.default(diff.var) :
system is computationally singular: reciprocal condition number = 4.34252e-21
Does anyone have an idea where the problem might be?
I believe the issue here could be that the hmftest checks if the probability ratio of two alternatives depends only on the characteristics of these alternatives.
Since there are only individual-level variables here, the test won't work in this case.
I am building a random intercept model in R using the glmer function, with the 2nd level variable being country. When I run my model however, it is only including 24 countries and 27005 observations when there are 60 countries and 75047 observations.
I can provide other info if necessary but just wondering if anyone has any initial idea why this might be. I cannot find anything online.
Generalized linear mixed model fit by maximum likelihood (Adaptive Gauss-Hermite Quadrature, nAGQ = 0) ['glmerMod']
Family: binomial ( logit )
Formula: serve ~ age + sex + income + religion + proud + trusting + outgoing + (1 | country)
Data: WVS
Control: glmerControl(optimizer = "bobyqa")
AIC BIC logLik deviance df.resid
30102.4 30250.1 -15033.2 30066.4 26987
Scaled residuals:
Min 1Q Median 3Q Max
-4.2087 -0.8943 0.4331 0.6737 3.8525
Random effects:
Groups Name Variance Std.Dev.
country (Intercept) 0.6272 0.7919
Number of obs: 27005, groups: country, 24
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.188730 0.181939 1.037 0.299584
age -0.004503 0.001229 -3.666 0.000247 ***
sexmale 0.672997 0.028757 23.403 < 2e-16 ***
income -0.005812 0.007070 -0.822 0.411024
religionRather important 0.117421 0.049464 2.374 0.017604 *
religionVery important 0.269977 0.048460 5.571 2.53e-08 ***
proud2 -0.210176 0.033430 -6.287 3.23e-10 ***
proud3 -0.306502 0.054530 -5.621 1.90e-08 ***
proud4 -0.601837 0.099568 -6.044 1.50e-09 ***
trusting2 0.134689 0.055366 2.433 0.014987 *
trusting3 0.195169 0.056104 3.479 0.000504 ***
trusting4 0.309589 0.054498 5.681 1.34e-08 ***
trusting5 0.294739 0.059784 4.930 8.22e-07 ***
outgoing2 -0.160543 0.062618 -2.564 0.010352 *
outgoing3 -0.119559 0.062781 -1.904 0.056861 .
outgoing4 0.120816 0.060180 2.008 0.044689 *
outgoing5 0.238158 0.063453 3.753 0.000175 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Here is a sample of the data:
conscription serve country sex education income religion immigrant proud trusting outgoing age
1 1 Yes ALG male 3 5 Very important 0 1 2 2 -15.7403361
2 1 Yes ALG female 3 6 Rather important 0 2 4 2 -12.7403361
3 1 Yes ALG female 3 6 Very important 0 1 3 3 -10.7403361
4 1 Yes ALG female 3 5 Very important 0 1 3 4 -8.7403361
5 1 Yes ALG female 2 7 Very important 0 1 4 4 -1.7403361
6 1 Yes ALG male 4 5 Very important 0 1 3 4 -0.7403361
7 1 Yes ALG male 3 7 Very important 0 1 2 2 4.2596639
8 1 Yes ALG female 2 2 Rather important 0 1 3 4 7.2596639
9 1 Yes ALG male 1 5 Rather important 0 1 3 2 22.2596639
11 1 Yes ALG female 4 5 Very important 0 1 3 1 -13.7403361
I am fitting a cox model to some data that is structured as such:
str(test)
'data.frame': 147 obs. of 8 variables:
$ AGE : int 71 69 90 78 61 74 78 78 81 45 ...
$ Gender : Factor w/ 2 levels "F","M": 2 1 2 1 2 1 2 1 2 1 ...
$ RACE : Factor w/ 5 levels "","BLACK","HISPANIC",..: 5 2 5 5 5 5 5 5 5 1 ...
$ SIDE : Factor w/ 2 levels "L","R": 1 1 2 1 2 1 1 1 2 1 ...
$ LESION.INDICATION: Factor w/ 12 levels "CLAUDICATION",..: 1 11 4 11 9 1 1 11 11 11 ...
$ RUTH.CLASS : int 3 5 4 5 4 3 3 5 5 5 ...
$ LESION.TYPE : Factor w/ 3 levels "","OCCLUSION",..: 3 3 2 3 3 3 2 3 3 3 ...
$ Primary : int 1190 1032 166 689 219 840 1063 115 810 157 ...
the RUTH.CLASS variable is actually a factor, and i've changed it to one as such:
> test$RUTH.CLASS <- as.factor(test$RUTH.CLASS)
> summary(test$RUTH.CLASS)
3 4 5 6
48 56 35 8
great.
after fitting the model
stent.surv <- Surv(test$Primary)
> cox.ruthclass <- coxph(stent.surv ~ RUTH.CLASS, data=test )
>
> summary(cox.ruthclass)
Call:
coxph(formula = stent.surv ~ RUTH.CLASS, data = test)
n= 147, number of events= 147
coef exp(coef) se(coef) z Pr(>|z|)
RUTH.CLASS4 0.1599 1.1734 0.1987 0.804 0.42111
RUTH.CLASS5 0.5848 1.7947 0.2263 2.585 0.00974 **
RUTH.CLASS6 0.3624 1.4368 0.3846 0.942 0.34599
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
RUTH.CLASS4 1.173 0.8522 0.7948 1.732
RUTH.CLASS5 1.795 0.5572 1.1518 2.796
RUTH.CLASS6 1.437 0.6960 0.6762 3.053
Concordance= 0.574 (se = 0.026 )
Rsquare= 0.045 (max possible= 1 )
Likelihood ratio test= 6.71 on 3 df, p=0.08156
Wald test = 7.09 on 3 df, p=0.06902
Score (logrank) test = 7.23 on 3 df, p=0.06478
> levels(test$RUTH.CLASS)
[1] "3" "4" "5" "6"
When i fit more variables in the model, similar things happen:
cox.fit <- coxph(stent.surv ~ RUTH.CLASS + LESION.INDICATION + LESION.TYPE, data=test )
>
> summary(cox.fit)
Call:
coxph(formula = stent.surv ~ RUTH.CLASS + LESION.INDICATION +
LESION.TYPE, data = test)
n= 147, number of events= 147
coef exp(coef) se(coef) z Pr(>|z|)
RUTH.CLASS4 -0.5854 0.5569 1.1852 -0.494 0.6214
RUTH.CLASS5 -0.1476 0.8627 1.0182 -0.145 0.8847
RUTH.CLASS6 -0.4509 0.6370 1.0998 -0.410 0.6818
LESION.INDICATIONEMBOLIC -0.4611 0.6306 1.5425 -0.299 0.7650
LESION.INDICATIONISCHEMIA 1.3794 3.9725 1.1541 1.195 0.2320
LESION.INDICATIONISCHEMIA/CLAUDICATION 0.2546 1.2899 1.0189 0.250 0.8027
LESION.INDICATIONREST PAIN 0.5302 1.6993 1.1853 0.447 0.6547
LESION.INDICATIONTISSUE LOSS 0.7793 2.1800 1.0254 0.760 0.4473
LESION.TYPEOCCLUSION -0.5886 0.5551 0.4360 -1.350 0.1770
LESION.TYPESTEN -0.7895 0.4541 0.4378 -1.803 0.0714 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
RUTH.CLASS4 0.5569 1.7956 0.05456 5.684
RUTH.CLASS5 0.8627 1.1591 0.11726 6.348
RUTH.CLASS6 0.6370 1.5698 0.07379 5.499
LESION.INDICATIONEMBOLIC 0.6306 1.5858 0.03067 12.964
LESION.INDICATIONISCHEMIA 3.9725 0.2517 0.41374 38.141
LESION.INDICATIONISCHEMIA/CLAUDICATION 1.2899 0.7752 0.17510 9.503
LESION.INDICATIONREST PAIN 1.6993 0.5885 0.16645 17.347
LESION.INDICATIONTISSUE LOSS 2.1800 0.4587 0.29216 16.266
LESION.TYPEOCCLUSION 0.5551 1.8015 0.23619 1.305
LESION.TYPESTEN 0.4541 2.2023 0.19250 1.071
Concordance= 0.619 (se = 0.028 )
Rsquare= 0.137 (max possible= 1 )
Likelihood ratio test= 21.6 on 10 df, p=0.01726
Wald test = 22.23 on 10 df, p=0.01398
Score (logrank) test = 23.46 on 10 df, p=0.009161
> levels(test$LESION.INDICATION)
[1] "CLAUDICATION" "EMBOLIC" "ISCHEMIA" "ISCHEMIA/CLAUDICATION"
[5] "REST PAIN" "TISSUE LOSS"
> levels(test$LESION.TYPE)
[1] "" "OCCLUSION" "STEN"
truncated output from model.matrix below:
> model.matrix(cox.fit)
RUTH.CLASS4 RUTH.CLASS5 RUTH.CLASS6 LESION.INDICATIONEMBOLIC LESION.INDICATIONISCHEMIA
1 0 0 0 0 0
2 0 1 0 0 0
We can see that the the first level of each of these is being excluded from the model. Any input would be greatly appreciated. I noticed that on the LESION.TYPE variable, the blank level "" is not being included, but that is not by design - that should be a NA or something similar.
I'm confused and could use some help with this. Thanks.
Factors in any model return coefficients based on a base level (a contrast).Your contrasts default to a base factor. There is no point in calculating a coefficient for the dropped value because the model will return the predictions when that dropped value = 1 given that all the other factor values are 0 (factors are complete and mutually exclusive for every observation). You can alter your default contrast by changing the contrasts in your options.
For your coefficients to be versus an average of all factors:
options(contrasts=c(unordered="contr.sum", ordered="contr.poly"))
For your coefficients to be versus a specific treatment (what you have above and your default):
options(contrasts=c(unordered="contr.treatment", ordered="contr.poly"))
As you can see there are two types of factors in R: unordered (or categorical, e.g. red, green, blue) and ordered (e.g. strongly disagree, disagree, no opinion, agree, strongly agree)