Meta-analysis: Forest plot of summary estimates using metafor package - r

I am meta-analysing data from ~90 studies. This presents some challenges in how to display the data in an accessible format for publication. I would like to display only the overall effect size estimates of the different meta-analyses and exclude the study-specific estimates. I am able to do this in Stata using the metan package and adding the summaryonly command. Is it possible to suppress the study-level effect sizes in the forest plot outputs using the metafor package (or any other meta-analysis R package)?
I've been using the addpoly command to add the effect size estimates for sub-samples as described in the package documentation, e.g.:
res.a <- rma(n1i = Intervention_n, n2i = Control_n, m1i = intervention_d, m2i = control_d, sd1i = intervention_d_sd,
sd2i = control_d_sd, measure="MD", intercept=TRUE, data = Dataset.a, vtype="LS", method="DL", level=95,
digits=4, subset = (exclude==0 & child=="No"), slab=paste(Dataset.a$Label, Dataset.a$Year, sep=", "))
addpoly(res.a, row=7.5, cex=.75, font=3, mlab="Random effects model for subgroup")

If I understand you correctly, you are conducting several analyses with these ~90 studies (e.g., based on different subsets) and your goal is to show only the summary estimates (as based on these analyses) in a forest plot. Then the easiest approach would be to just collect the estimates and corresponding variances of the various analyses in a vector and then pass that to the forest() function. Let me give a simple example:
### load metafor package
library(metafor)
### load BCG vaccine dataset
data(dat.bcg)
### calculate log relative risks and corresponding sampling variances
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
### fit random-effects models to some subsets
res.r <- rma(yi, vi, data=dat, subset=alloc=="random")
res.s <- rma(yi, vi, data=dat, subset=alloc=="systematic")
res.a <- rma(yi, vi, data=dat, subset=alloc=="alternate")
### collect model estimates and corresponding variances
estimates <- c(coef(res.r), coef(res.s), coef(res.a))
variances <- c(vcov(res.r), vcov(res.s), vcov(res.a))
### create vector with labels
labels <- c("Random Allocation", "Systematic Allocation", "Alternate Allocation")
### forest plot
forest(estimates, variances, slab=labels)
If you don't like that the point sizes differ (by default, they are drawn inversely proportional to the variances), you could use:
forest(estimates, variances, slab=labels, psize=1)
A couple other improvements:
forest(estimates, variances, slab=labels, psize=1, atransf=exp, xlab="Relative Risk (log scale)", at=log(c(.2, .5, 1, 2)))
ADDENDUM
In case you prefer polygon shapes for the estimates, you could do the following. First draw the plot as above, but use efac=0 to hide the vertical lines on the CIs. Then just draw over the summary polygons with addpoly():
forest(estimates, variances, slab=labels, psize=1, atransf=exp, xlab="Relative Risk (log scale)", at=log(c(.2, .5, 1, 2)), efac=0)
addpoly(estimates, variances, atransf=exp, rows=3:1, col="white", annotate=FALSE)
You can also use efac=1.5 in addpoly() to stretch the polygons vertically. Adjust the factor to your taste.

Related

metafor provides 95%CI that are different from the original values

I am using metafor package for combining beta coefficients from the linear regression model. I used the following code. I supplied the reported se and beta values for the rma function. But, when I see the forest plot, the 95% confidence intervals are different from the ones reported in the studies. I also tried it using mtcars data set by running three models and combining the coefficients. Still, the 95%CI we see on the forest plot are different from the original models. The deviations are far from rounding errors. A reproducible example is below.
library(metafor)
library(dplyr)
lm1 <- lm(hp~mpg, data=mtcars[1:15,])
lm2 <- lm(hp~mpg, data=mtcars[1:32,])
lm3 <- lm(hp~mpg, data=mtcars[13:32,])
study <- c("study1", "study2", "study3")
beta_coef <- c(lm1$coefficients[2],
lm2$coefficients[2],
lm3$coefficients[2]) %>% as.numeric()
se <- c(1.856, 1.31,1.458)
ci_lower <- c(confint(lm1)[2,1],
confint(lm2)[2,1],
confint(lm3)[2,1]) %>% as.numeric()
ci_upper <- c(confint(lm1)[2,2],
confint(lm2)[2,2],
confint(lm3)[2,2]) %>% as.numeric()
df <- cbind(study=study,
beta_coef=beta_coef,
se=se,
ci_lower=ci_lower,
ci_upper=ci_upper) %>% as.data.frame()
pooled <- rma(yi=beta_coef, vi=se, slab=study)
forest(pooled)
Compare the confidence intervals on the forest plot with the one on the data frame.
data frame
df <- cbind(study=study,
beta_coef=beta_coef,
se=se,
ci_lower=ci_lower,
ci_upper=ci_upper) %>% as.data.frame()
Argument vi is for specifying the sampling variances, but you are passing the standard errors to the argument. So you should do:
pooled <- rma(yi=beta_coef, sei=se, slab=study)
But you will still find a discrepancy here, since the CIs in the forest plot are constructed based on a normal distribution, while the CIs you obtained from the regression model are based on t-distributions. If you want the exact same CIs in the forest plot, you could just pass the CI bounds to the function like this:
forest(beta_coef, ci.lb=ci_lower, ci.ub=ci_upper)
If you want to add a summary polygon from some meta-analysis to the forest plot, you can do this with addpoly(). So the complete code for this example would be:
forest(beta_coef, ci.lb=ci_lower, ci.ub=ci_upper, ylim=c(-1.5,6))
addpoly(pooled, row=-1)
abline(h=0)

Plotting Cumulative Events from Adjusted Survival Curve in R

I am attempting to create an adjusted survival curve (from a Cox model) and would like to display this information as cumulative events.
I have attempted this:
library(survival)
data("ovarian")
library(survminer)
model<-coxph(Surv(futime, fustat) ~ age + strata(rx), data=ovarian)
gplot<-ggadjustedcurves(model) ## Expected plot of adjusted survival curve
Because the "fun=" still has not been implemented in ggadjustedcurves I took the advice of a user on this page and extracted the elements into plotdata and created a new column as shown below.
plotdata<-gplot$data
plotdata%<>%
mutate(new=1-surv) ## 1-survival probability
I am new to R environment and ggplot so how can I then plot the new adjusted survival curve with the new created column and keep the theme of the original plot (contained in gplot).
Thanks!
Edit:
My current solution is as follows.
library(rms)
model<-coxph(Surv(futime, fustat) ~ age+ strata(rx), data=ovarian)
survfit(model, conf.type = "plain", conf.int = 1)
plot(survfit(model), conf.int = T,col = c(1,2), fun='event')
This achieves the survival curve I wanted however I am not sure if the confidence bars are really the standard errors (+/-1). I supplied 1 to the conf.int argument and believe this to create the standard errors in this way since conf.type is specified as plain.
How can I further customize this plot as the base graph looks rather bland! How do I get a display as close as possible to the survminer curves?
You can use the adjustedCurves package instead, which allows both plotting confidence intervals and naturally includes an option to display cumulative incidence functions. First, install it using:
devtools::install_github("https://github.com/RobinDenz1/adjustedCurves")
Now you can use:
library(adjustedCurves)
library(survival)
library(riskRegression)
# needs to be a factor
ovarian$rx <- factor(ovarian$rx)
# needs to include x=TRUE
model <- coxph(Surv(futime, fustat) ~ age + strata(rx), data=ovarian, x=TRUE)
adj <- adjustedsurv(data=ovarian,
event="fustat",
ev_time="futime",
variable="rx",
method="direct",
outcome_model=model,
conf_int=TRUE)
plot(adj, cif=TRUE, conf_int=TRUE)
Which produces:
I would probably not use this method here, though. Simulation studies have shown that the cox-regression based method performs badly in small sample sizes. You might want to take a look at method="iptw" or method="aiptw" inside the adjustedCurves package instead.

Calculating AUC from nnet model

For a bit of background, I am using the nnet package building a simple neural network.
My dataset has a number of factor and continuous variable features. To handle the continuous variables I apply scale and center which minuses each by its mean and divides by its SD.
I'm trying to produce an ROC & AUC plot from the results of neural network model.
The below is the code used to build my basic neural network model:
model1 <- nnet(Cohort ~ .-Cohort,
data = train.sample,
size = 1)
To get some predictions, I call the following function:
train.predictions <- predict(model1, train.sample)
Now, this assigns the train.predictions object to a large matrix consisting of 0 & 1 values. What I want to do, is getting the class probabilities for each prediction so I can plot an ROC curve using the pROC package.
So, I tried adding the following parameter to my predict function:
train.predictions <- predict(model1, train.sample, type="prob")
But I get an error:
Error in match.arg(type) : 'arg' should be one of “raw”, “class”
How can I go about getting class probabilities from outputs?
Assuming your test/validation data set is in train.test, and train.labels contains the true class labels:
train.predictions <- predict(model1, train.test, type="raw")
## This might not be necessary:
detach(package:nnet,unload = T)
library(ROCR)
## train.labels:= A vector, matrix, list, or data frame containing the true
## class labels. Must have the same dimensions as 'predictions'.
## computing a simple ROC curve (x-axis: fpr, y-axis: tpr)
pred = prediction(train.predictions, train.labels)
perf = performance(pred, "tpr", "fpr")
plot(perf, lwd=2, col="blue", main="ROC - Title")
abline(a=0, b=1)

Add raw data points to jp.int (sjPlot)

For my manuscript, I plotted a lme with an interaction of two continuous variables:
Create data
mydata <- data.frame( SID=sample(1:150,400,replace=TRUE),age=sample(50:70,400,replace=TRUE), sex=sample(c("Male","Female"),200, replace=TRUE),time= seq(0.7, 6.2, length.out=400), Vol =rnorm(400),HCD =rnorm(400))
mydata$time <- as.numeric(mydata$time)
Run the model:
model <- lme(HCD ~ age*time+sex*time+Vol*time, random=~time|SID, data=mydata)
Make plot:
sjp.int(model, swap.pred=T, show.ci=T, mdrt.values="meansd")
The reviewer now wants me to add the raw data points to this plot. How can I do this? I tried adding geom_point() referring to mydata, but that is not possible.
Any ideas?
Update:
I thought that maybe I could extract the random slope of HCD and then residuals HCD for the covariates and also residuals Vol for the covariates and plot those two to make things easier (then I could plot the points in a 2D plot).
So, I tried to extract the slopes and use these to fit a linear regression, but the results are different (in the reproducible example less significant, but in my data: the interaction became non-significant (and was significant in the lme)). Not sure what that means or whether this just shows that I should not try to plot it this way.
get the slopes:
model <- lme(HCD ~ time, random=~time|SID, data=mydata)
slopes <- rbind(row.names(model$coefficients$random$SID), model$coef$random$SID[,2])
slopes2 <- data.frame(matrix(unlist(slopes), nrow=144, byrow=T))
names(slopes2)[1] <- "SID"
names(slopes2)[2] <- "slopes"
(save the slopes2 and reopen, because somehow R sees it as a factor)
Then create a cross-sectional dataframe and merge the slopes:
mydata$time2 <- round(mydata$time)
new <- reshape(mydata,idvar = "SID", timevar="time2", direction="wide")
newdata <- dplyr::left_join(new, slop, by="SID")
The lm:
modelw <- lm(slop$slopes ~ age.1+sex.1+Vol.1, data=newdata)
Vol now has a p-value of 0.8 (previously this was 0.14)

What is the method for pooling when Paule-Mandel estimator is used in package metafor?

Consider the code below which determines a random effect model with a Paule-Mandel estimator for heterogeneity:
library(metafor)
res = rma(measure = "RD", ai = Ai, bi = Bi, ci = Ci, di = Di, data = data1, method="PM")
In package metafor manual the method for pooling is mentioned in the case Hunter-Schmidt or DerSimonian-Laird estimators are used for pooling results, but not mentioned for Paule-Mandel estimator. Any hints?
The Paule-Mandel (PM) estimator is a method for estimating the amount of heterogeneity (usually denoted tau^2 in the meta-analytic literature). Once this variance component has been estimated, nothing different happens than with any of the other methods: We just compute the weighted average of the estimates, using 1/(sampling variance + tau^2) as the weights. To illustrate:
library(metafor)
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
res <- rma(yi, vi, data = dat, method="PM")
res
coef(res)
weighted.mean(dat$yi, 1/(dat$vi + res$tau2))
The last two lines give you the same value: -0.7149682.
Edit: The Mantel-Haenszel method also computes a weighted average. In the example above, escalc() computes the log risk ratios (and corresponding sampling variances) and we then compute the weighted mean based on the log risk ratios. The MH method works a bit different in that it computes a weighted average based on the risk ratio values directly. To illustrate:
res <- rma.mh(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
res
exp(coef(res))
weighted.mean(exp(dat$yi), weights(res))
The last two lines both give the same value: 0.6352672.

Resources