stargazer: odds ratio for hazards model with wrong significance stars - r

I have been using the following function to output logistic regression tables with stargazer as odds ratio with the right significance stars and standard errors:
stargazer2 <- function(model, odd.ratio = F, ...) {
if(!("list" %in% class(model))) model <- list(model)
if (odd.ratio) {
coefOR2 <- lapply(model, function(x) exp(coef(x)))
seOR2 <- lapply(model, function(x) exp(coef(x)) * summary(x)$coef[, 2])
p2 <- lapply(model, function(x) summary(x)$coefficients[, 4])
stargazer(model, coef = coefOR2, se = seOR2, p = p2, ...)
} else {
stargazer(model, ...)
}
}
However, this no longer works when I'm using a hazards model analysis, and the significance stars move around in the output. Reproducible example below:
library(survival)
library(stargazer)
data("diabetic")
juvenile <- 1*(diabetic$age < 20)
fit <-coxph(Surv(time, status) ~ trt + juvenile, cluster= id,
data= diabetic)
stargazer2(fit, odd.ratio = T, type = "text")
stargazer2(fit, odd.ratio = F, type = "text")

The citation for that code brings up a blocked webpage. I don't think this line is correct:
seOR2 <- lapply(model, function(x) exp(coef(x)) * summary(x)$coef[, 2])
The reported effect measures are different. The "odd.ratio = T" version is reporting the exponentiated coefficient estimate while the "odd.ratio = F" version is displaying the unexponentiated version. Quite frankly the code you have copied is highly suspect for the line that calculates the seOR2 as well as the fact that it uses the sloppy and dangerous shortcut of T and F for logical values. The p-values for the ordinary stargazer call is correct while the new improved version is not.

The citation for that code brings up a blocked webpage. I don't think this line is correct:
seOR2 <- lapply(model, function(x) exp(coef(x)) * summary(x)$coef[, 2])
The reported effect measures are different. The "odd.ratio = T" version is reporting the exponentiated coefficient estimate while the "odd.ratio = F" version is displaying the unexponentiated version. Quite frankly the code you have copied is highly suspect for the line that calculates the seOR2 as well as the fact that it uses the sloppy and dangerous shortcut of T and F for logical values. The p-values for the ordinary stargazer call are correct while the new improved version is not. You might want to contact the author of that code to advise them to check their statistical logic.

Related

How to normalize a Lmer model?

lmer:
mixed.lmer6 <- lmer(Size ~ (Time+I(Time^2))*Country*STemperature +
(1|Country:Locality)+ (1|Locality:Individual)+(1|Batch)+
(1|Egg_masses), REML = FALSE, data = data_NoNA)
residuals:
plot_model(mixed.lmer6, type = "diag")
Tried manual log,power, sqrt transformations in my formula but no improvement and I also can not find a suitable automatic transformation R function such as BoxCox (which does not work for LMER's)
Any help or tips would be appreciated
This might be better suited for CrossValidated ("what should I do?" is appropriate for CV; "how should I do it?" is best for Stack Overflow), but I'll take a crack.
The Q-Q plot is generally the last/least important diagnostic you should look at (the order should be approximately (1) check for significant bias/missed patterns in the mean [fitted vs. residual, residual vs. covariates]; (2) check for outliers/influential points [leverage, Cook's distance]; (3) check for heteroscedasticity [scale-location plot]; (4) check distributional assumptions [Q-Q plot]). The reason is that any of the "upstream" failures (e.g. missed patterns) will show up in the Q-Q plot as well; resolving them will often resolve the apparent non-Normality.
If you can fix the distributional assumptions by fixing something else about the model (adding covariates/adding interactions/adding polynomial or spline terms/removing outliers), then do that.
you could code your own brute-force Box-Cox, something like
fitted_model <- lmer(..., data = mydata)
bcfun <- function(lambda, resp = "y") {
y <- mydata[[resp]]
mydata$newy <- if (lambda==0) log(y) else (y^lambda -1)/lambda
## https://stats.stackexchange.com/questions/261380/how-do-i-get-the-box-cox-log-likelihood-using-the-jacobian
log_jac <- sum((lambda-1)*log(y))
newfit <- update(fitted_model, newy ~ ., data = mydata)
return(-2*(c(logLik(newfit))+ log_jac))
}
lambdavec <- seq(-2, 2, by = 0.2)
boxcox <- vapply(lambdavec, bcfun, FUN.VALUE = numeric(1))
plot(lambdavec, boxcox - min(boxcox))
(lightly tested! but feel free to let me know if it doesn't work)
if you do need to fit a mixed model with a heavy-tailed residual distribution (e.g. Student t), the options are fairly limited. The brms package can fit such models (but takes you down the Bayesian/MCMC rabbit hole), and the heavy package (currently archived on CRAN) will work, but doesn't appear to handle crossed random effects.

Time-dependent covariates- is there something wrong with this code? (R program)

I am checking a few of my Cox multivariate regression analyses' proportional hazard assumptions using time-dependent co-variates, using the survival package. The question is looking at survival in groups with different ADAMTS13 levels (a type of enzyme).
Could I check if something is wrong with my code itself? It keeps saying Error in tt(TMAdata$ADAMTS13level.f) : could not find function "tt" . Why?
Notably, ADAMTS13level.f is a factor variable.
cox_multivariate_survival_ADAMTS13 <- coxph(Surv(TMAdata$Daysalive, TMAdata$'Dead=1')
~TMAdata$ADAMTS13level.f
+TMAdata$`Age at diagnosis`
+TMAdata$CCIwithoutage
+TMAdata$Gender.f
+TMAdata$`Peak Creatinine`
+TMAdata$DICorcrit.f,
tt(TMAdata$ADAMTS13level.f),
tt = function(x, t, ...)
{mtrx <- model.matrix(~x)[,-1]
mtrx * log(t)})
Thanks- starting with the fundamentals of my actual code or typos- I have tried different permutations to no avail yet.
#Limey was on the right track!
The time-transformed version of ADAMTS13level.f needs to be added to the model, instead of being separated into a separate argument of coxph(...).
The form of coxph call when testing the time-dependent categorical variables is described in How to use the timeSplitter by Max Gordon.
Other helpful documentation:
coxph - fit proportional hazards regression model
cox_multivariate_survival_ADAMTS13 <-
coxph(
Surv(
Daysalive,
'Dead=1'
) ~
ADAMTS13level.f
+ `Age at diagnosis`
+ CCIwithoutage
+ Gender.f
+ `Peak Creatinine`
+ DICorcrit.f
+ tt(ADAMTS13level.f),
tt = function(x, t, ...) {
mtrx <- model.matrix(~x)[,-1]
mtrx * log(t)
},
data = TMAdata
)
p.s. with the original data, there was also a problem because Daysalive included a zero (0) value, which eventually resulted in an 'infinite predictor' error from coxph, probably because tt transformed the data using a log(t). (https://rdrr.io/github/therneau/survival/src/R/coxph.R)

R : Clustered standard errors in fractional probit model

I need to estimate a fractional (response taking values between 0 and 1) model with R. I also want to cluster the standard errors. I have found several examples in SO and elsewhere and I built this function based on my findings:
require(sandwich)
require(lmtest)
clrobustse <- function(fit, cl){
M <- length(unique(cl))
N <- length(cl)
K <- fit$rank
dfc <- (M/(M - 1))*((N - 1)/(N - K))
uj <- apply(estfun(fit), 2, function(x) tapply(x, cl, sum))
vcovCL <- dfc*sandwich(fit, meat = crossprod(uj)/N)
coeftest(fit, vcovCL)
}
I estimate my model like this:
fit <- glm(dep ~ exp1 + exp2 + exp3, data = df, fam = quasibinomial("probit"))
clrobustse(fit, df$cluster)
Everything works fine and I get the results. However, I suspect that something is not right as the non-clustered version:
coeftest(fit)
gives the exact same standard errors. I checked that Stata reports and that displays different clustered errors. I suspect that I have misspecified the function clrobustse but I just don't know how. Any ideas about what could be going wrong here?

How to extract p-values from lmekin objects in coxme package

I want to be able to view the p-values for the lmekin object produced by the coxme package.
eg.
model= lmekin(formula = height ~ score + sex + age + (1 | IID), data = phenotype_df, varlist = kinship_matrix)
I tried:
summary(model)
coef(summary(model))
summary(model$coefficient$fixed)
fixef(model)/ sqrt(diag(vcov(model)) #(Calculates Z-scores but not p-values)
But these did not work. So how do I view the p-values for this linear mixed model?
It took me ages of searching to figure this out, but I noticed a lot of other similar questions without proper answers, so I wanted to answer this.
You use:
library(coxme)
print(model)
Note it is important to load the coxme package beforehand or it will not work.
I've also noticed a lot of posts about how to extract the p-value from lmekin objects, or how to extract the p-value from coxme objects in general. I wrote this function, which is based on the coxme:::print.coxme function code (to view code type coxme:::print.coxme directly into R). print calculates p-values on the fly - this function allows the extraction of p-values and saves them to an object.
Note that mod is your model, eg. mod <- lmekin(y~x+a+b)
Use print(mod) to double check that the tables match.
extract_coxme_table <- function (mod){
beta <- mod$coefficients$fixed
nvar <- length(beta)
nfrail <- nrow(mod$var) - nvar
se <- sqrt(diag(mod$var)[nfrail + 1:nvar])
z<- round(beta/se, 2)
p<- signif(1 - pchisq((beta/se)^2, 1), 2)
table=data.frame(cbind(beta,se,z,p))
return(table)
}
I arrived at this topic because I was looking for the same thing for just the coxme object. The function of IcedCoffee works with a micro adjustment:
extract_coxme_table <- function (mod){
beta <- mod$coefficients #$fixed is not needed
nvar <- length(beta)
nfrail <- nrow(mod$var) - nvar
se <- sqrt(diag(mod$var)[nfrail + 1:nvar])
z<- round(beta/se, 2)
p<- signif(1 - pchisq((beta/se)^2, 1), 2)
table=data.frame(cbind(beta,se,z,p))
return(table)
}

R dlm library: model definition

I am trying to build a state space model in R using the dlm library but keep getting the "incompatible dimensions of matrices" error.
The specifications of the model I am trying to produce are as per below:
Yt = At + beta*Ft + et
Ft = phi1Ft-1 + phi2Ft-2 + vt
At = At-1 + wt
where Yt is the observable variable, At is the time varying coefficient modelled as a random walk and Ft is a latent factor which follows an AR(2) process.
I have been trying to set up the FF, V, GG, W, m0 and C0 matrices required by dlm to specify the model, but have yet to get the program to work.
Please see below my latest attempt which returns an "incompatible dimensions of matrices" error.
I have traced the relative matrix sizes on paper and they look fine to me. Could anyone please advise where I am going wrong?
# the below code works now
model <- function(x) {
FF <- matrix(c(x[1],0,1), nr=1)
V <- matrix(exp(x[2]))
GG <- matrix(c(x[3],1,0,x[4],0,0,0,0,1), nr=3)
W <- diag(c(exp(x[5]),0,exp(x[6])))
m0 <- rep(0,3)
C0 <- 10*diag(3)
dlm(FF=FF, V=V, GG=GG, W=W, m0=m0,C0=C0)
}
# x[1:6] are beta, Var(et), phi1, phi2, Var(vt) and Var(wt)
fit <- dlmMLE(y1, parm = c(rep(0,6)), build = model)
dlmy1 <- model(fit$par)

Resources