intcox() output without likelihood ratio test - r

I'm doing survival analysis with interval-cesored data and I'm using intcox() function from the intcox package in R, which is based on the coxph function.
The function returns the output without likelihood ratio test values:
> intcox(surv~sexo,data=dados)
Call:
intcox(formula = surv ~ sexo, data = dados)
coef exp(coef) se(coef) z p
sexojuvenil 2.596 13.4 NA NA NA
sexomacho -0.105 0.9 NA NA NA
Likelihood ratio test=NA on 2 df, p=NA n= 156
I don't know why this is happening... Here is the application of the coxph() function to the same data:
> coxph(Surv(dias_seg,status)~sexo,data=dados)
Call:
coxph(formula = Surv(dias_seg, status) ~ sexo, data = dados)
coef exp(coef) se(coef) z p
sexojuvenil 2.320 10.172 0.630 3.684 0.00023
sexomacho -0.169 0.844 0.252 -0.671 0.50000
Likelihood ratio test=9.28 on 2 df, p=0.00967 n= 156, number of events= 77
str(dados$sexo)
Factor w/ 3 levels "femea","juvenil",..: 3 3 3 3 3 3 3 3 3 3 ...
Can you help me to solve this problem?
Thanks in advance.

I was told by Volkmar Henschel (author of intcox package) that the "The fit with intcox gives an object of class ”coxph” without the standard errors of the regression coefficients".
More descriptions on this document:
ftp://ftp.auckland.ac.nz/pub/software/CRAN/doc/vignettes/intcox/intcox.pdf

Related

Check for statistically significant differences between groups after running a logistic regression w/ interaction & random effect?

I ran an ordinal logistic regression (using the function clmm from the R package ordinal) with a two-factor interaction and a random effect.
The response is a factor w/ 5 levels (Liker scale: 1 2 3 4 5), the independent variables are a factor w/ 2 levels (time) and a factor w/ 3 levels (group)
The code looks like this:
library(ordinal)
# dataset
ID time group response
person1 1 a 3
person2 1 a 5
person3 1 c 5
person4 1 b 2
person5 1 c 2
person6 1 a 4
person1 2 a 2
person2 2 a 2
person3 2 c 1
person4 2 b 4
person5 2 c 3
person6 2 a 4
... ... ... ...
# model
model <- clmm(response ~ time*group + (1|ID))
# model results
formula: response ~ time * group + (1 | ID)
data: dataset
link threshold nobs logLik AIC niter max.grad cond.H
logit flexible 168 -226.76 473.52 508(4150) 9.42e-05 1.8e+02
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 5.18 2.276
Number of groups: ID 84
Coefficients:
Estimate Std. Error z value Pr(>|z|)
time2 0.2837 0.5289 0.536 0.59170
group_b 1.8746 0.6946 2.699 0.00695 **
group_c 4.0023 0.9383 4.265 2e-05 ***
time2:group_b -0.5100 0.7294 -0.699 0.48447
time2:group_c -0.8830 0.9749 -0.906 0.36508
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Threshold coefficients:
Estimate Std. Error z value
1|2 -2.6223 0.6440 -4.072
2|3 0.2474 0.5427 0.456
3|4 2.5384 0.5824 4.359
4|5 4.6786 0.7143 6.550
As you can see, the model results show only whether there are differences compared to the intercept (i.e. time1:group_a). However, what I am also interested in is to check if the difference between time1:group_b and time2:group_b is statistically significant, same for group_c.
Since I have to account for the random effect, I cannot use a simple chi-square test to check for statistically significant differences between groups. I therefore tried to run the function contrast from the R package emmeans, which uses the output of the function emmeans, see the code below:
library(emmeans)
em <- emmeans(model, ~ time | group) #calculates the estimated marginal means
contrast(em, "consec", simple = "each")
# contrast results
$`simple contrasts for time`
group = a:
contrast estimate SE df z.ratio p.value
2 - 1 0.284 0.529 Inf 0.536 0.5917
group = b:
contrast estimate SE df z.ratio p.value
2 - 1 -0.226 0.482 Inf -0.470 0.6386
group = c:
contrast estimate SE df z.ratio p.value
2 - 1 -0.599 0.816 Inf -0.734 0.4629
Note: contrasts are still on the as.factor scale
$`simple contrasts for group`
time = 1:
contrast estimate SE df z.ratio p.value
b - a 1.87 0.695 Inf 2.699 0.0137
c - b 2.13 0.871 Inf 2.443 0.0284
time = 2:
contrast estimate SE df z.ratio p.value
b - a 1.36 0.687 Inf 1.986 0.0897
c - b 1.75 0.838 Inf 2.095 0.0695
Note: contrasts are still on the as.factor scale
P value adjustment: mvt method for 2 tests
My questions are:
a) Is this a correct and valid method to check whether the differences are significant?
b) If not, what is the correct way to do this?
Of course any other suggestion is extremely welcome! Thanks a lot.

How to extract scaled residuals from glmer summary

I am trying to write a .csv file that appends the important information from the summary of a glmer analysis (from the package lme4).
I have been able to isolate the coefficients, AIC, and random effects , but I have not been able to isolate the scaled residuals (Min, 1Q, Median, 3Q, Max).
I have tried using $residuals, but I get a very long output, not the information shown in the summary.
> library(lme4)
> setwd("C:/Users/Arthur Scully/Dropbox/! ! ! ! PHD/Chapter 2 Lynx Bobcat BC/ResourceSelection")
> #simple vectors
>
> x <- c("a","b","b","b","b","d","b","c","c","a")
>
> y <- c(1,1,0,1,0,1,1,1,1,0)
>
>
> # Simple data frame
>
> aes.samp <- data.frame(x,y)
> aes.samp
x y
1 a 1
2 b 1
3 b 0
4 b 1
5 b 0
6 d 1
7 b 1
8 c 1
9 c 1
10 a 0
>
> # Simple glmer
>
> aes.glmer <- glmer(y~(1|x),aes.samp,family ="binomial")
boundary (singular) fit: see ?isSingular
>
> summary(aes.glmer)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: y ~ (1 | x)
Data: aes.samp
AIC BIC logLik deviance df.resid
16.2 16.8 -6.1 12.2 8
I can isolate information above by using the call summary(aes.glmer)$AIC
Scaled residuals:
Min 1Q Median 3Q Max
-1.5275 -0.9820 0.6546 0.6546 0.6546
I do not know the call to isolate the above information
Random effects:
Groups Name Variance Std.Dev.
x (Intercept) 0 0
Number of obs: 10, groups: x, 4
I can isolate this information using the ranef function
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.8473 0.6901 1.228 0.22
And I can isolate the information above using summary(aes.glmer)$coefficient
convergence code: 0
boundary (singular) fit: see ?isSingular
>
> #Pull important
> ##write call to select important output
> aes.glmer.coef <- summary(aes.glmer)$coefficient
> aes.glmer.AIC <- summary(aes.glmer)$AIC
> aes.glmer.ran <-ranef(aes.glmer)
>
> ##
> data.frame(c(aes.glmer.coef, aes.glmer.AIC, aes.glmer.ran))
X0.847297859077025 X0.690065555425105 X1.22785125618255 X0.219502810378876 AIC BIC logLik deviance df.resid X.Intercept.
a 0.8472979 0.6900656 1.227851 0.2195028 16.21729 16.82246 -6.108643 12.21729 8 0
b 0.8472979 0.6900656 1.227851 0.2195028 16.21729 16.82246 -6.108643 12.21729 8 0
c 0.8472979 0.6900656 1.227851 0.2195028 16.21729 16.82246 -6.108643 12.21729 8 0
d 0.8472979 0.6900656 1.227851 0.2195028 16.21729 16.82246 -6.108643 12.21729 8 0
If anyone knows what call I can use to isolate the "scaled residuals" I would be very greatful.
I haven't got your data, so we'll use example data from the lme4 vignette.
library(lme4)
library(lattice)
library(broom)
gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp, family = binomial)
This is for the residuals. tidy from the broom package puts it in to a tibble, which you can then export to a csv.
x <- tidy(quantile(residuals(gm1, "pearson", scaled = TRUE)))
x
# A tibble: 5 x 2
names x
<chr> <dbl>
1 0% -2.38
2 25% -0.789
3 50% -0.203
4 75% 0.514
5 100% 2.88
Also here are some of the other bits that you might find useful, using glance from broom.
y <- glance(gm1)
y
# A tibble: 1 x 6
sigma logLik AIC BIC deviance df.residual
<dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 1 -92.0 194. 204. 73.5 51
And
z <- tidy(gm1)
z
# A tibble: 5 x 6
term estimate std.error statistic p.value group
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 (Intercept) -1.40 0.231 -6.05 1.47e-9 fixed
2 period2 -0.992 0.303 -3.27 1.07e-3 fixed
3 period3 -1.13 0.323 -3.49 4.74e-4 fixed
4 period4 -1.58 0.422 -3.74 1.82e-4 fixed
5 sd_(Intercept).herd 0.642 NA NA NA herd

R logistic regression and marginal effects - how to exclude NA values in categorical independent variable

I am a beginner with R. I am using glm to conduct logistic regression and then using the 'margins' package to calculate marginal effects but I don't seem to be able to exclude the missing values in my categorical independent variable.
I have tried to ask R to exclude NAs from the regression. The categorical variable is weight status at age 9 (wgt9), and it has three levels (1, 2, 3) and some NAs.
What am I doing wrong? Why do I get a wgt9NA result in my outputs and how can I correct it?
Thanks in advance for any help/advice.
Conduct logistic regression
summary(logit.phbehav <- glm(obese13 ~ gender + as.factor(wgt9) + aded08b,
data = gui, weights = bdwg01, family = binomial(link = "logit")))
Regression output
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -3.99 0.293 -13.6 2.86e- 42
2 gender 0.387 0.121 3.19 1.42e- 3
3 as.factor(wgt9)2 2.49 0.177 14.1 3.28e- 45
4 as.factor(wgt9)3 4.65 0.182 25.6 4.81e-144
5 as.factor(wgt9)NA 2.60 0.234 11.1 9.94e- 29
6 aded08b -0.0755 0.0224 -3.37 7.47e- 4
Calculate the marginal effects
effects_logit_phtotal = margins(logit.phtot)
print(effects_logit_phtotal)
summary(effects_logit_phtotal)
Marginal effects output
> summary(effects_logit_phtotal)
factor AME SE z p lower upper
aded08a -0.0012 0.0002 -4.8785 0.0000 -0.0017 -0.0007
gender 0.0115 0.0048 2.3899 0.0169 0.0021 0.0210
wgt92 0.0941 0.0086 10.9618 0.0000 0.0773 0.1109
wgt93 0.4708 0.0255 18.4569 0.0000 0.4208 0.5207
wgt9NA 0.1027 0.0179 5.7531 0.0000 0.0677 0.1377
First of all welcome to stack overflow. Please check the answer here to see how to make a great R question. Not providing a sample of your data, some times makes it impossible to answer the question. However taking a guess, I think that you have not set your NA values correctly but as strings. This behavior can be seen in the dummy data below.
First let's create the dummy data:
v1 <- c(2,3,3,3,2,2,2,2,NA,NA,NA)
v2 <- c(2,3,3,3,2,2,2,2,"NA","NA","NA")
v3 <- c(11,5,6,7,10,8,7,6,2,5,3)
obese <- c(0,1,1,0,0,1,1,1,0,0,0)
df <- data.frame(obese,v1,v2)
Using the variable named v1, does not include NA as a category:
glm(formula = obese ~ as.factor(v1) + v3, family = binomial(link = "logit"),
data = df)
Deviance Residuals:
1 2 3 4 5 6 7 8
-2.110e-08 2.110e-08 1.168e-05 -1.105e-05 -2.110e-08 3.094e-06 2.110e-08 2.110e-08
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 401.48 898581.15 0 1
as.factor(v1)3 -96.51 326132.30 0 1
v3 -46.93 106842.02 0 1
While making the string "NA" to factor gives an output similar to the one in question:
glm(formula = obese ~ as.factor(v2) + v3, family = binomial(link = "logit"),
data = df)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.402e-05 -2.110e-08 -2.110e-08 2.110e-08 1.472e-05
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 394.21 744490.08 0.001 1
as.factor(v2)3 -95.33 340427.26 0.000 1
as.factor(v2)NA -327.07 613934.84 -0.001 1
v3 -45.99 84477.60 -0.001 1
Try the following to replace NAs that are strings:
gui$wgt9[ gui$wgt9 == "NA" ] <- NA
Don't forget to accept any answer that solved your problem.

Keep getting NaNs when carrying out TukeyHSD in R

I have this small data frame that I want to carry out a TukeyHSD test on.
data.frame': 4 obs. of 4 variables:
$ Species : Factor w/ 4 levels "Anthoxanthum",..: 1 1 1 1
$ Harvest : Factor w/ 4 levels "b","c","d","e": 1 2 3 4
$ Total : num 0.2449 0.1248 0.0722 0.1025
I perform an analysis of variance with aov:
anthox1 <- aov(Total ~ Harvest, data=anthox)
anthox.tukey <- TukeyHSD(anthox1, "Harvest", conf.level = 0.95)
but when I run the TukeyHSD I get this message:
Warning message:
In qtukey(conf.level, length(means), x$df.residual) : NaNs produced
Can anyone help me to fix the problem and also explain why this is happening. I feel like everything is correctly written (code and data) but for some reason it does not want to work.
Since you have exactly one observation per group, you get a perfect fit:
Total <- c(0.2449, 0.1248, 0.0722, 0.1025)
Harvest <- c("b","c","d","e")
anthox1 <- aov(Total ~ Harvest)
summary.lm(anthox1)
#Call:
# aov(formula = Total ~ Harvest)
#
#Residuals:
# ALL 4 residuals are 0: no residual degrees of freedom!
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.2449 NA NA NA
#Harvestc -0.1201 NA NA NA
#Harvestd -0.1727 NA NA NA
#Harveste -0.1424 NA NA NA
#
#Residual standard error: NaN on 0 degrees of freedom
#Multiple R-squared: 1, Adjusted R-squared: NaN
#F-statistic: NaN on 3 and 0 DF, p-value: NA
This means you don't have enough residual degrees of freedom for a Tukey test (or for any statistics).

How do I extract lmer fixed effects by observation?

I have a lme object, constructed from some repeated measures nutrient intake data (two 24-hour intake periods per RespondentID):
Male.lme2 <- lmer(BoxCoxXY ~ -1 + AgeFactor + IntakeDay + (1|RespondentID),
data = Male.Data,
weights = SampleWeight)
and I can successfully retrieve the random effects by RespondentID using ranef(Male.lme1). I would also like to collect the result of the fixed effects by RespondentID. coef(Male.lme1) does not provide exactly what I need, as I show below.
> summary(Male.lme1)
Linear mixed model fit by REML
Formula: BoxCoxXY ~ AgeFactor + IntakeDay + (1 | RespondentID)
Data: Male.Data
AIC BIC logLik deviance REMLdev
9994 10039 -4990 9952 9980
Random effects:
Groups Name Variance Std.Dev.
RespondentID (Intercept) 0.19408 0.44055
Residual 0.37491 0.61230
Number of obs: 4498, groups: RespondentID, 2249
Fixed effects:
Estimate Std. Error t value
(Intercept) 13.98016 0.03405 410.6
AgeFactor4to8 0.50572 0.04084 12.4
AgeFactor9to13 0.94329 0.04159 22.7
AgeFactor14to18 1.30654 0.04312 30.3
IntakeDayDay2Intake -0.13871 0.01809 -7.7
Correlation of Fixed Effects:
(Intr) AgFc48 AgF913 AF1418
AgeFactr4t8 -0.775
AgeFctr9t13 -0.761 0.634
AgFctr14t18 -0.734 0.612 0.601
IntkDyDy2In -0.266 0.000 0.000 0.000
I have appended the fitted results to my data, head(Male.Data) shows
NutrientID RespondentID Gender Age SampleWeight IntakeDay IntakeAmt AgeFactor BoxCoxXY lmefits
2 267 100020 1 12 0.4952835 Day1Intake 12145.852 9to13 15.61196 15.22633
7 267 100419 1 14 0.3632839 Day1Intake 9591.953 14to18 15.01444 15.31373
8 267 100459 1 11 0.4952835 Day1Intake 7838.713 9to13 14.51458 15.00062
12 267 101138 1 15 1.3258785 Day1Intake 11113.266 14to18 15.38541 15.75337
14 267 101214 1 6 2.1198688 Day1Intake 7150.133 4to8 14.29022 14.32658
18 267 101389 1 5 2.1198688 Day1Intake 5091.528 4to8 13.47928 14.58117
The first couple of lines from coef(Male.lme1) are:
$RespondentID
(Intercept) AgeFactor4to8 AgeFactor9to13 AgeFactor14to18 IntakeDayDay2Intake
100020 14.28304 0.5057221 0.9432941 1.306542 -0.1387098
100419 14.00719 0.5057221 0.9432941 1.306542 -0.1387098
100459 14.05732 0.5057221 0.9432941 1.306542 -0.1387098
101138 14.44682 0.5057221 0.9432941 1.306542 -0.1387098
101214 13.82086 0.5057221 0.9432941 1.306542 -0.1387098
101389 14.07545 0.5057221 0.9432941 1.306542 -0.1387098
To demonstrate how the coef results relate to the fitted estimates in Male.Data (which were grabbed using Male.Data$lmefits <- fitted(Male.lme1), for the first RespondentID, who has the AgeFactor level 9-13:
- the fitted value is 15.22633, which equals - from the coeffs - (Intercept) + (AgeFactor9-13) = 14.28304 + 0.9432941
Is there a clever command for me to use that will do want I want automatically, which is to extract the fixed effect estimate for each subject, or am I faced with a series of if statements trying to apply the correct AgeFactor level to each subject to get the correct fixed effect estimate, after deducting the random effect contribution off the Intercept?
Update, apologies, was trying to cut down on the output I was providing and forgot about str(). Output is:
>str(Male.Data)
'data.frame': 4498 obs. of 11 variables:
$ NutrientID : int 267 267 267 267 267 267 267 267 267 267 ...
$ RespondentID: Factor w/ 2249 levels "100020","100419",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Gender : int 1 1 1 1 1 1 1 1 1 1 ...
$ Age : int 12 14 11 15 6 5 10 2 2 9 ...
$ BodyWeight : num 51.6 46.3 46.1 63.2 28.4 18 38.2 14.4 14.6 32.1 ...
$ SampleWeight: num 0.495 0.363 0.495 1.326 2.12 ...
$ IntakeDay : Factor w/ 2 levels "Day1Intake","Day2Intake": 1 1 1 1 1 1 1 1 1 1 ...
$ IntakeAmt : num 12146 9592 7839 11113 7150 ...
$ AgeFactor : Factor w/ 4 levels "1to3","4to8",..: 3 4 3 4 2 2 3 1 1 3 ...
$ BoxCoxXY : num 15.6 15 14.5 15.4 14.3 ...
$ lmefits : num 15.2 15.3 15 15.8 14.3 ...
The BodyWeight and Gender aren't being used (this is the males data, so all the Gender values are the same) and the NutrientID is similarly fixed for the data.
I have been doing horrible ifelse statements sinced I posted, so will try out your suggestion immediately. :)
Update2: this works perfectly with my current data and should be future-proof for new data, thanks to DWin for the extra help in the comment for this. :)
AgeLevels <- length(unique(Male.Data$AgeFactor))
Temp <- as.data.frame(fixef(Male.lme1)['(Intercept)'] +
c(0,fixef(Male.lme1)[2:AgeLevels])[
match(Male.Data$AgeFactor, c("1to3", "4to8", "9to13","14to18", "19to30","31to50","51to70","71Plus") )] +
c(0,fixef(Male.lme1)[(AgeLevels+1)])[
match(Male.Data$IntakeDay, c("Day1Intake","Day2Intake") )])
names(Temp) <- c("FxdEffct")
Below is how I've always found it easiest to extract the individuals' fixed effects and random effects components in the lme4-package. It actually extracts the corresponding fit to each observation. Assuming we have a mixed-effects model of form:
y = Xb + Zu + e
where Xb are the fixed effects and Zu are the random effects, we can extract the components (using lme4's sleepstudy as an example):
library(lme4)
fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
# Xb
fix <- getME(fm1,'X') %*% fixef(fm1)
# Zu
ran <- t(as.matrix(getME(fm1,'Zt'))) %*% unlist(ranef(fm1))
# Xb + Zu
fixran <- fix + ran
I know that this works as a generalized approach to extracting components from linear mixed-effects models. For non-linear models, the model matrix X contains repeats and you may have to tailor the above code a bit. Here's some validation output as well as a visualization using lattice:
> head(cbind(fix, ran, fixran, fitted(fm1)))
[,1] [,2] [,3] [,4]
[1,] 251.4051 2.257187 253.6623 253.6623
[2,] 261.8724 11.456439 273.3288 273.3288
[3,] 272.3397 20.655691 292.9954 292.9954
[4,] 282.8070 29.854944 312.6619 312.6619
[5,] 293.2742 39.054196 332.3284 332.3284
[6,] 303.7415 48.253449 351.9950 351.9950
# Xb + Zu
> all(round((fixran),6) == round(fitted(fm1),6))
[1] TRUE
# e = y - (Xb + Zu)
> all(round(resid(fm1),6) == round(sleepstudy[,"Reaction"]-(fixran),6))
[1] TRUE
nobs <- 10 # 10 observations per subject
legend = list(text=list(c("y", "Xb + Zu", "Xb")), lines = list(col=c("blue", "red", "black"), pch=c(1,1,1), lwd=c(1,1,1), type=c("b","b","b")))
require(lattice)
xyplot(
Reaction ~ Days | Subject, data = sleepstudy,
panel = function(x, y, ...){
panel.points(x, y, type='b', col='blue')
panel.points(x, fix[(1+nobs*(panel.number()-1)):(nobs*(panel.number()))], type='b', col='black')
panel.points(x, fixran[(1+nobs*(panel.number()-1)):(nobs*(panel.number()))], type='b', col='red')
},
key = legend
)
It is going to be something like this (although you really should have given us the results of str(Male.Data) because model output does not tell us the factor levels for the baseline values:)
#First look at the coefficients
fixef(Male.lme2)
#Then do the calculations
fixef(Male.lme2)[`(Intercept)`] +
c(0,fixef(Male.lme2)[2:4])[
match(Male.Data$AgeFactor, c("1to3", "4to8", "9to13","14to18") )] +
c(0,fixef(Male.lme2)[5])[
match(Male.Data$IntakeDay, c("Day1Intake","Day2Intake") )]
You are basically running the original data through a match function to pick the correct coefficient(s) to add to the intercept ... which will be 0 if the data is the factor's base level (whose spelling I am guessing at.)
EDIT: I just noticed that you put a "-1" in the formula so perhaps all of your AgeFactor terms are listed in the output and you can tale out the 0 in the coefficient vector and the invented AgeFactor level in the match table vector.

Resources