Plotting a survival curve from a survreg prediction - r

I'm relatively new to survival analysis and have been used some standard telco churn data example with a sample below called 'telco':
telco <- read.csv(text = "State,Account_Length,Area_Code,Intl_Plan,Day_Mins,Day_Calls,Day_Charge,Eve_Mins,Eve_Calls,Eve_Charge,Night_Mins,Night_Calls,Night_Charge,Intl_Mins,Intl_Calls,Intl_Charge,CustServ_Calls,Churn
IN,65,415,no,129.1,137,21.95,228.5,83,19.42,208.8,111,9.4,12.7,6,3.43,4,TRUE
RI,74,415,no,187.7,127,31.91,163.4,148,13.89,196,94,8.82,9.1,5,2.46,0,FALSE
IA,168,408,no,128.8,96,21.9,104.9,71,8.92,141.1,128,6.35,11.2,2,3.02,1,FALSE
MT,95,510,no,156.6,88,26.62,247.6,75,21.05,192.3,115,8.65,12.3,5,3.32,3,FALSE
IA,62,415,no,120.7,70,20.52,307.2,76,26.11,203,99,9.14,13.1,6,3.54,4,FALSE
NY,161,415,no,332.9,67,56.59,317.8,97,27.01,160.6,128,7.23,5.4,9,1.46,4,TRUE")
I've run:
library(survival)
dependentvars = Surv(telco$Account_Length, telco$Churn)
telcosurvreg = survreg(dependentvars ~ -Churn -Account_Length, dist="gaussian",data=telco)
telcopred = predict(telcosurvreg, newdata=telco, type="quantile", p=.5)
...to get the predicted lifetime of each customer.
What I'm struggling with is how to visualise a survival curve for this. Is there a way (preferably in ggplot2) to do this from the data I have?

Here is a base R version that plots the predicted survival curves. I have changed the formula so the curves differ for each row
> # change setup so we have one covariate
> telcosurvreg = survreg(
+ Surv(Account_Length, Churn) ~ Eve_Charge, dist = "gaussian", data = telco)
> telcosurvreg # has more than an intercept
Call:
survreg(formula = Surv(Account_Length, Churn) ~ Eve_Charge, data = telco,
dist = "gaussian")
Coefficients:
(Intercept) Eve_Charge
227.274695 -3.586121
Scale= 56.9418
Loglik(model)= -12.1 Loglik(intercept only)= -12.4
Chisq= 0.54 on 1 degrees of freedom, p= 0.46
n= 6
>
> # find linear predictors
> vals <- predict(telcosurvreg, newdata = telco, type = "lp")
>
> # use the survreg.distributions object. See ?survreg.distributions
> x_grid <- 1:400
> sur_curves <- sapply(
+ vals, function(x)
+ survreg.distributions[[telcosurvreg$dist]]$density(
+ (x - x_grid) / telcosurvreg$scale)[, 1])
>
> # plot with base R
> matplot(x_grid, sur_curves, type = "l", lty = 1)
Here is the result

Related

Marginal effects for de-meaned polynomials in mixed models

In the mixed model (or REWB) framework it is common to model within changes by subtracting the cluster mean (demeaning) from a time varying x-variable, see eg. (Bell, Fairbrother & Jones, 2018). This estimator is basically the same as a fixed effects (FE) estimator (shown below using the sleepstudy data).
The issue arises when trying to model polynomials using the same principle. The equality between the estimators break when we enter our demeaned variable as a polynomial. We can restore this equality by first squaring the variable and then demeaning (see. re_poly_fixed).
dt <- lme4::sleepstudy
dt$days_squared <- dt$Days * dt$Days
dt <- cbind(dt, datawizard::demean(dt, select = c("Days", "days_squared"), group = "Subject"))
re <- lme4::lmer(Reaction ~ Days_within + (1 | Subject), data = dt, REML = FALSE)
fe <- fixest::feols(Reaction ~ Days | Subject, data = dt)
re_poly <- lme4::lmer(Reaction ~ poly(Days_within, 2, raw = TRUE) + (1 | Subject),
data = dt, REML = FALSE)
fe_poly <- fixest::feols(Reaction ~ poly(Days, 2, raw = TRUE) | Subject, data = dt)
re_poly_fixed <- lme4::lmer(Reaction ~ Days_within + days_squared_within + (1 | Subject),
data = dt, REML = FALSE)
models <-
list("re" = re, "fe" = fe, "re_poly" = re_poly, "fe_poly" = fe_poly, "re_poly_fixed" = re_poly_fixed)
modelsummary::modelsummary(models)
The main issue with this strategy is that for postestimation, especially packages that calculate marginal effects (e.g. marginaleffects in R or margins in STATA) the variable needs to be entered as a polynomial term for the calculations to consider both x and x^2. That is using poly() or I() in R or factor notation c.x##c.x in STATA). The difference can be seen in the two calls below, where the FE-call returns one effect for "Days" and the manual call returns two separate terms.
(me_fe <- summary(marginaleffects::marginaleffects(fe_poly)))
(me_re <- summary(marginaleffects::marginaleffects(re_poly_fixed)))
I may be missing something obvious here, but is it possible to retain the equality between the estimators in FE and the Mixed model setups with polynomials, while still being able to use common packages for marginal effects?
The problem is that when a transformed variable is hardcoded, the marginaleffects package does not know that it should manipulate both the transformed and the original at the same time to compute the slope. One solution is to de-mean inside the formula with I(). You should be aware that this may make the model fitting less efficient.
Here’s an example where I pre-compute the within-group means using data.table, but you could achieve the same result with dplyr::group_by():
library(lme4)
library(data.table)
library(modelsummary)
library(marginaleffects)
dt <- data.table(lme4::sleepstudy)
dt[, `:=`(Days_mean = mean(Days),
Days_within = Days - mean(Days)),
by = "Subject"]
re_poly <- lmer(
Reaction ~ poly(Days_within, 2, raw = TRUE) + (1 | Subject),
data = dt, REML = FALSE)
re_poly_2 <- lmer(
Reaction ~ poly(I(Days - Days_mean), 2, raw = TRUE) + (1 | Subject),
data = dt, REML = FALSE)
models <- list(re_poly, re_poly_2)
modelsummary(models, output = "markdown")
Model 1
Model 2
(Intercept)
295.727
295.727
(9.173)
(9.173)
poly(Days_within, 2, raw = TRUE)1
10.467
(0.799)
poly(Days_within, 2, raw = TRUE)2
0.337
(0.316)
poly(I(Days - Days_mean), 2, raw = TRUE)1
10.467
(0.799)
poly(I(Days - Days_mean), 2, raw = TRUE)2
0.337
(0.316)
SD (Intercept Subject)
36.021
36.021
SD (Observations)
30.787
30.787
Num.Obs.
180
180
R2 Marg.
0.290
0.290
R2 Cond.
0.700
0.700
AIC
1795.8
1795.8
BIC
1811.8
1811.8
ICC
0.6
0.6
RMSE
29.32
29.32
The estimated average marginal effects are – as expected – different:
marginaleffects(re_poly) |> summary()
#> Term Effect Std. Error z value Pr(>|z|) 2.5 % 97.5 %
#> 1 Days_within 10.47 0.7989 13.1 < 2.22e-16 8.902 12.03
#>
#> Model type: lmerMod
#> Prediction type: response
marginaleffects(re_poly_2) |> summary()
#> Term Effect Std. Error z value Pr(>|z|) 2.5 % 97.5 %
#> 1 Days 10.47 0.7989 13.1 < 2.22e-16 8.902 12.03
#>
#> Model type: lmerMod
#> Prediction type: response
The following answer is not exactly what I asked for in the question. But at least it is a decent workaround for anyone having similar problems.
library(lme4)
library(data.table)
library(fixest)
library(marginaleffects)
dt <- data.table(lme4::sleepstudy)
dt[, `:=`(Days_mean = mean(Days),
Days_within = Days - mean(Days),
Days2 = Days^2,
Days2_within = Days^2 - mean(Days^2)),
by = "Subject"]
fe_poly <- fixest::feols(
Reaction ~ poly(Days, 2, raw = TRUE) | Subject, data = dt)
re_poly_fixed <- lme4::lmer(
Reaction ~ Days_within + Days2_within + (1 | Subject), data = dt, REML = FALSE)
modelsummary(list(fe_poly, re_poly_fixed), output = "markdown")
We start with the two models previously described. We can manually calculate the AME or marginal effects at other values and get confidence intervals using multcomp::glht(). The approach is relatively similar to that of lincom in STATA. I have written a wrapper that returns the values in a data.table:
lincom <- function(model, linhyp) {
t <- summary(multcomp::glht(model, linfct = c(linhyp)))
ci <- confint(t)
dt <- data.table::data.table(
"estimate" = t[["test"]]$coefficients,
"se" = t[["test"]]$sigma,
"ll" = ci[["confint"]][2],
"ul" = ci[["confint"]][3],
"t" = t[["test"]]$tstat,
"p" = t[["test"]]$pvalues,
"id" = rownames(t[["linfct"]])[1])
return(dt)
}
This can likely be improved or adapted to other similar needs. We can calculate the AME by taking the partial derivative. For the present case we do this with the following equation: days + 2 * days^2 * mean(days).
marginaleffects(fe_poly) |> summary()
Term Effect Std. Error z value Pr(>|z|) 2.5 % 97.5 %
1 Days 10.47 1.554 6.734 1.6532e-11 7.421 13.51
Model type: fixest
Prediction type: response
By adding this formula to the lincom function, we get similar results:
names(fe_poly$coefficients) <- c("Days", "Days2")
mean(dt$Days) # Mean = 4.5
lincom(fe_poly, "Days + 2 * Days2 * 4.5 = 0")
estimate se ll ul t p id
1: 10.46729 1.554498 7.397306 13.53727 6.733549 2.817051e-10 Days + 2 * Days2 * 4.5
lincom(re_poly_fixed, "Days_within + 2 * Days2_within * 4.5 = 0")
estimate se ll ul t p id
1: 10.46729 0.798932 8.901408 12.03316 13.1016 0 Days_within + 2 * Days2_within * 4.5
It is possible to check other ranges of values and to add other variables from the model using the formula. This can be done using lapply or a loop and the output can then be combined using a simple rbind. This should make it relatively easy to present/plot results.
EDIT
Like Vincent pointed out below there is also marginaleffects::deltamethod. This looks to be a better more robust option, that provide similar results (with the same syntax):
mfx1 <- marginaleffects::deltamethod(
fe_poly, "Days + 2 * Days2 * 4.5 = 0")
mfx2 <- marginaleffects::deltamethod(
re_poly_fixed, "Days_within + 2 * Days2_within * 4.5 = 0")
rbind(mfx1, mfx2)
term estimate std.error statistic p.value conf.low conf.high
1 Days + 2 * Days2 * 4.5 = 0 10.46729 1.554498 6.733549 1.655739e-11 7.420527 13.51405
2 Days_within + 2 * Days2_within * 4.5 = 0 10.46729 0.798932 13.101597 3.224003e-39 8.901408 12.03316

Calculating Brier Score and Integrated Brier Score using ranger R package

I want to calculate Brier score and integrated Brier score for my analysis using "ranger" R package.
As an example, I use the veteran data from the "survival" package as follows
install.packages("ranger")
library(ranger)
install.packages("survival")
library(survival)
#load veteran data
data(veteran)
data <- veteran
# training and test data
n <- nrow(data)
testind <- sample(1:n,n*0.7)
trainind <- (1:n)[-testind]
#train ranger
rg <- ranger(Surv(time, status) ~ ., data = data[trainind,])
# use rg to predict test data
pred <- predict(rg,data=data[testind,],num.trees=rg$num.trees)
#cummulative hazard function for each sample
pred$chf
#survival probability for each sample
pred$survival
How can I calculate Brier score and integrated Brier score?
The Integrated Brier Score (IBS) can be calculated using the pec function of the pec package but you need to define a predictSurvProb command to extract survival probability predictions from the ranger modeling approach (?pec:::predictSurvProb for a list of available models).
A possibile solution is:
predictSurvProb.ranger <- function (object, newdata, times, ...) {
ptemp <- ranger:::predict.ranger(object, data = newdata, importance = "none")$survival
pos <- prodlim::sindex(jump.times = object$unique.death.times,
eval.times = times)
p <- cbind(1, ptemp)[, pos + 1, drop = FALSE]
if (NROW(p) != NROW(newdata) || NCOL(p) != length(times))
stop(paste("\nPrediction matrix has wrong dimensions:\nRequested newdata x times: ",
NROW(newdata), " x ", length(times), "\nProvided prediction matrix: ",
NROW(p), " x ", NCOL(p), "\n\n", sep = ""))
p
}
This function can be used as follows:
library(ranger)
library(survival)
data(veteran)
dts <- veteran
n <- nrow(dts)
set.seed(1)
testind <- sample(1:n,n*0.7)
trainind <- (1:n)[-testind]
rg <- ranger(Surv(time, status) ~ ., data = dts[trainind,])
# A formula to be inputted into the pec command
frm <- as.formula(paste("Surv(time, status)~",
paste(rg$forest$independent.variable.names, collapse="+")))
library(pec)
# Using pec for IBS estimation
PredError <- pec(object=rg,
formula = frm, cens.model="marginal",
data=dts[testind,], verbose=F, maxtime=200)
The IBS can be evaluated using the print.pec command, indicating in times the time points at which to show the IBS:
print(PredError, times=seq(10,200,50))
# ...
# Integrated Brier score (crps):
#
# IBS[0;time=10) IBS[0;time=60) IBS[0;time=110) IBS[0;time=160)
# Reference 0.043 0.183 0.212 0.209
# ranger 0.041 0.144 0.166 0.176

Clustered standard errors with texreg?

I'm trying to reproduce this stata example and move from stargazer to texreg. The data is available here.
To run the regression and get the se I run this code:
library(readstata13)
library(sandwich)
cluster_se <- function(model_result, data, cluster){
model_variables <- intersect(colnames(data), c(colnames(model_result$model), cluster))
model_rows <- as.integer(rownames(model_result$model))
data <- data[model_rows, model_variables]
cl <- data[[cluster]]
M <- length(unique(cl))
N <- nrow(data)
K <- model_result$rank
dfc <- (M/(M-1))*((N-1)/(N-K))
uj <- apply(estfun(model_result), 2, function(x) tapply(x, cl, sum));
vcovCL <- dfc*sandwich(model_result, meat=crossprod(uj)/N)
sqrt(diag(vcovCL))
}
elemapi2 <- read.dta13(file = 'elemapi2.dta')
lm1 <- lm(formula = api00 ~ acs_k3 + acs_46 + full + enroll, data = elemapi2)
se.lm1 <- cluster_se(model_result = lm1, data = elemapi2, cluster = "dnum")
stargazer::stargazer(lm1, type = "text", style = "aer", se = list(se.lm1))
==========================================================
api00
----------------------------------------------------------
acs_k3 6.954
(6.901)
acs_46 5.966**
(2.531)
full 4.668***
(0.703)
enroll -0.106**
(0.043)
Constant -5.200
(121.786)
Observations 395
R2 0.385
Adjusted R2 0.379
Residual Std. Error 112.198 (df = 390)
F Statistic 61.006*** (df = 4; 390)
----------------------------------------------------------
Notes: ***Significant at the 1 percent level.
**Significant at the 5 percent level.
*Significant at the 10 percent level.
texreg produces this:
texreg::screenreg(lm1, override.se=list(se.lm1))
========================
Model 1
------------------------
(Intercept) -5.20
(121.79)
acs_k3 6.95
(6.90)
acs_46 5.97 ***
(2.53)
full 4.67 ***
(0.70)
enroll -0.11 ***
(0.04)
------------------------
R^2 0.38
Adj. R^2 0.38
Num. obs. 395
RMSE 112.20
========================
How can I fix the p-values?
Robust Standard Errors with texreg are easy: just pass the coeftest directly!
This has become much easier since the question was last answered: it appears you can now just pass the coeftest with the desired variance-covariance matrix directly. Downside: you lose the goodness of fit statistics (such as R^2 and number of observations), but depending on your needs, this may not be a big problem
How to include robust standard errors with texreg
> screenreg(list(reg1, coeftest(reg1,vcov = vcovHC(reg1, 'HC1'))),
custom.model.names = c('Standard Standard Errors', 'Robust Standard Errors'))
=============================================================
Standard Standard Errors Robust Standard Errors
-------------------------------------------------------------
(Intercept) -192.89 *** -192.89 *
(55.59) (75.38)
x 2.84 ** 2.84 **
(0.96) (1.04)
-------------------------------------------------------------
R^2 0.08
Adj. R^2 0.07
Num. obs. 100
RMSE 275.88
=============================================================
*** p < 0.001, ** p < 0.01, * p < 0.05
To generate this example, I created a dataframe with heteroscedasticity, see below for full runnable sample code:
require(sandwich);
require(texreg);
set.seed(1234)
df <- data.frame(x = 1:100);
df$y <- 1 + 0.5*df$x + 5*100:1*rnorm(100)
reg1 <- lm(y ~ x, data = df)
First, notice that your usage of as.integer is dangerous and likely to cause problems once you use data with non-numeric rownames. For instance, using the built-in dataset mtcars whose rownames consist of car names, your function will coerce all rownames to NA, and your function will not work.
To your actual question, you can provide custom p-values to texreg, which means that you need to compute the corresponding p-values. To achieve this, you could compute the variance-covariance matrix, compute the test-statistics, and then compute the p-value manually, or you just compute the variance-covariance matrix and supply it to e.g. coeftest. Then you can extract the standard errors and p-values from there. Since I am unwilling to download any data, I use the mtcars-data for the following:
library(sandwich)
library(lmtest)
library(texreg)
cluster_se <- function(model_result, data, cluster){
model_variables <- intersect(colnames(data), c(colnames(model_result$model), cluster))
model_rows <- rownames(model_result$model) # changed to be able to work with mtcars, not tested with other data
data <- data[model_rows, model_variables]
cl <- data[[cluster]]
M <- length(unique(cl))
N <- nrow(data)
K <- model_result$rank
dfc <- (M/(M-1))*((N-1)/(N-K))
uj <- apply(estfun(model_result), 2, function(x) tapply(x, cl, sum));
vcovCL <- dfc*sandwich(model_result, meat=crossprod(uj)/N)
}
lm1 <- lm(formula = mpg ~ cyl + disp, data = mtcars)
vcov.lm1 <- cluster_se(model_result = lm1, data = mtcars, cluster = "carb")
standard.errors <- coeftest(lm1, vcov. = vcov.lm1)[,2]
p.values <- coeftest(lm1, vcov. = vcov.lm1)[,4]
texreg::screenreg(lm1, override.se=standard.errors, override.p = p.values)
And just for completeness sake, let's do it manually:
t.stats <- abs(coefficients(lm1) / sqrt(diag(vcov.lm1)))
t.stats
(Intercept) cyl disp
38.681699 5.365107 3.745143
These are your t-statistics using the cluster-robust standard errors. The degree of freedom is stored in lm1$df.residual, and using the built in functions for the t-distribution (see e.g. ?pt), we get:
manual.p <- 2*pt(-t.stats, df=lm1$df.residual)
manual.p
(Intercept) cyl disp
1.648628e-26 9.197470e-06 7.954759e-04
Here, pt is the distribution function, and we want to compute the probability of observing a statistic at least as extreme as the one we observe. Since we testing two-sided and it is a symmetric density, we first take the left extreme using the negative value, and then double it. This is identical to using 2*(1-pt(t.stats, df=lm1$df.residual)). Now, just to check that this yields the same result as before:
all.equal(p.values, manual.p)
[1] TRUE

Clustered standard errors different in plm vs lfe

When I run a cluster standard error panel specification with plm and lfe I get results that differ at the second significant figure. Does anyone know why they differ in their calculation of the SE's?
set.seed(572015)
library(lfe)
library(plm)
library(lmtest)
# clustering example
x <- c(sapply(sample(1:20), rep, times = 1000)) + rnorm(20*1000, sd = 1)
y <- 5 + 10*x + rnorm(20*1000, sd = 10) + c(sapply(rnorm(20, sd = 10), rep, times = 1000))
facX <- factor(sapply(1:20, rep, times = 1000))
mydata <- data.frame(y=y,x=x,facX=facX, state=rep(1:1000, 20))
model <- plm(y ~ x, data = mydata, index = c("facX", "state"), effect = "individual", model = "within")
plmTest <- coeftest(model,vcov=vcovHC(model,type = "HC1", cluster="group"))
lfeTest <- summary(felm(y ~ x | facX | 0 | facX))
data.frame(lfeClusterSE=lfeTest$coefficients[2],
plmClusterSE=plmTest[2])
lfeClusterSE plmClusterSE
1 0.06746538 0.06572588
The difference is in the degrees-of-freedom adjustment. This is the usual first guess when looking for differences in supposedly similar standard errors (see e.g., Different Robust Standard Errors of Logit Regression in Stata and R). Here, the problem can be illustrated when comparing the results from (1) plm+vcovHC, (2) felm, (3) lm+cluster.vcov (from package multiwayvcov).
First, I refit all models:
m1 <- plm(y ~ x, data = mydata, index = c("facX", "state"),
effect = "individual", model = "within")
m2 <- felm(y ~ x | facX | 0 | facX, data = mydata)
m3 <- lm(y ~ facX + x, data = mydata)
All lead to the same coefficient estimates. For m3 the fixed effects are explicitly reported while they are not for m1 and m2. Hence, for m3 only the last coefficient is extracted with tail(..., 1).
all.equal(coef(m1), coef(m2))
## [1] TRUE
all.equal(coef(m1), tail(coef(m3), 1))
## [1] TRUE
The non-robust standard errors also agree.
se <- function(object) tail(sqrt(diag(object)), 1)
se(vcov(m1))
## x
## 0.07002696
se(vcov(m2))
## x
## 0.07002696
se(vcov(m3))
## x
## 0.07002696
And when comparing the clustered standard errors we can now show that felm uses the degrees-of-freedom correction while plm does not:
se(vcovHC(m1))
## x
## 0.06572423
m2$cse
## x
## 0.06746538
se(cluster.vcov(m3, mydata$facX))
## x
## 0.06746538
se(cluster.vcov(m3, mydata$facX, df_correction = FALSE))
## x
## 0.06572423

Get confidence intervals for regression coefficients of "mlm" object returned by `lm()`

I'm running a multivariate regression with 2 outcome variables and 5 predictors. I would like to obtain the confidence intervals for all regression coefficients. Usually I use the function lm but it doesn't seem to work for a multivariate regression model (object mlm).
Here's a reproducible example.
library(car)
mod <- lm(cbind(income, prestige) ~ education + women, data=Prestige)
confint(mod) # doesn't return anything.
Any alternative way to do it? (I could just use the value of the standard error and multiply by the right critical t value, but I was wondering if there was an easier way to do it).
confint won't return you anything, because there is no "mlm" method supported:
methods(confint)
#[1] confint.default confint.glm* confint.lm confint.nls*
As you said, we can just plus / minus some multiple of standard error to get upper / lower bound of confidence interval. You were probably going to do this via coef(summary(mod)), then use some *apply method to extract standard errors. But my answer to Obtain standard errors of regression coefficients for an “mlm” object returned by lm() gives you a supper efficient way to get standard errors without going through summary. Applying std_mlm to your example model gives:
se <- std_mlm(mod)
# income prestige
#(Intercept) 1162.299027 3.54212524
#education 103.731410 0.31612316
#women 8.921229 0.02718759
Now, we define another small function to compute lower and upper bound:
## add "mlm" method to generic function "confint"
confint.mlm <- function (model, level = 0.95) {
beta <- coef(model)
se <- std_mlm (model)
alpha <- qt((1 - level) / 2, df = model$df.residual)
list(lower = beta + alpha * se, upper = beta - alpha * se)
}
## call "confint"
confint(mod)
#$lower
# income prestige
#(Intercept) -3798.25140 -15.7825086
#education 739.05564 4.8005390
#women -81.75738 -0.1469923
#
#$upper
# income prestige
#(Intercept) 814.25546 -1.72581876
#education 1150.70689 6.05505285
#women -46.35407 -0.03910015
It is easy to interpret this. For example, for response income, the 95%-confidence interval for all variables are
#(intercept) (-3798.25140, 814.25546)
# education (739.05564, 1150.70689)
# women (-81.75738, -46.35407)
This comes from the predict.lm example. You want the interval = 'confidence' option.
x <- rnorm(15)
y <- x + rnorm(15)
predict(lm(y ~ x))
new <- data.frame(x = seq(-3, 3, 0.5))
predict(lm(y ~ x), new, se.fit = TRUE)
pred.w.clim <- predict(lm(y ~ x), new, interval = "confidence")
matplot(new$x, pred.w.clim,
lty = c(1,2,2,3,3), type = "l", ylab = "predicted y")
This seems to have been discussed recently (July 2018) on the R-devel list, so hopefully by the next version of R it will be fixed. A workaround proposed on that list is to use:
confint.mlm <- function (object, level = 0.95, ...) {
cf <- coef(object)
ncfs <- as.numeric(cf)
a <- (1 - level)/2
a <- c(a, 1 - a)
fac <- qt(a, object$df.residual)
pct <- stats:::format.perc(a, 3)
ses <- sqrt(diag(vcov(object)))
ci <- ncfs + ses %o% fac
setNames(data.frame(ci),pct)
}
Test:
fit_mlm <- lm(cbind(mpg, disp) ~ wt, mtcars)
confint(fit_mlm)
Gives:
2.5 % 97.5 %
mpg:(Intercept) 33.450500 41.119753
mpg:wt -6.486308 -4.202635
disp:(Intercept) -204.091436 -58.205395
disp:wt 90.757897 134.198380
Personnally, I like it in a clean tibble way (using broom::tidy would be even better, but has an issue currently)
library(tidyverse)
confint(fit_mlm) %>%
rownames_to_column() %>%
separate(rowname, c("response", "term"), sep=":")
Gives:
response term 2.5 % 97.5 %
1 mpg (Intercept) 33.450500 41.119753
2 mpg wt -6.486308 -4.202635
3 disp (Intercept) -204.091436 -58.205395
4 disp wt 90.757897 134.198380

Resources