Prevalence meta-analysis with metafor - percentages instead of proportions - metafor

I'm running a meta-analysis of prevalence rates using the metafor package.
The dataset is named 'metaAAS' and it has the study's author/year, the number of participants exposed to the condition (ni) and the number of those presenting the symptom (nphy).
I'm using the following code to call the dataset from a csv file:
metaAAS <- read.csv("metaAAS.csv")
dat <- get(metaAAS)
Then I'm using the function 'escalc' to calculate effect sizes and outcome measures. I am using the argument 'PR' because the studies measure a dichotomous variable (either participants present the symptom or not):
metaAAS <- escalc(measure="PR", xi=nphy, ni=ni, data=metaAAS, slab=paste (authors, year))
And this code to run the meta-analysis and show the results:
res <- rma(yi, vi, data=metaAAS)
res
confint(res)
My model results show this:
estimate: 0.3497
se: 0.0352
zval:9.9428
pval:<.0001
ci.lb: 0.2807
ci.ub:0.4186
How can I change the results to percentages (e.g. 34.97) instead of proportions (0.3497)?
It would be easy just to change this in the description of the results, but the whole Forest Plot shows the estimated prevalence as proportions, and I need them as percentages.
Any thoughts? Thanks!

I didn't manage to change the 'model results' output, but I changed the Forest Plot, which was the main goal.
First, create a function to multiply numbers *100:
mytransf<-function(x)
(x)*100
Then add this function to the forest plot arguments:
forest (res, ylim=c(-1,28),transf=mytransf)

Related

Using R to Compute Variable Importance Plot for Random Forest

I used R to compute variable importance plot for random forest model. When I looked at the mean decrease in Gini plot, I was confused by the number on the x axis. Why are the numbers so large? I thought they were supposed to represent percentage of decrease. My model ran normally, and I used the below code to create the plot:
Variable Importance Plot
I tried the following code:
Plot <- importance(rf_best_model, scale=T)
varImp.rf =dotchart(sort(Plot[,4]), xlim=c(0,10000), xlab="MeanDecreaseGini")
varImp

How to plot multi-level meta-analysis by study (in contrast to the subgroup)?

I am doing a multi-level meta-analysis. Many studies have several subgroups. When I make a forest plot studies are presented as subgroups. There are 60 of them, however, I would like to plot studies according to the study, then it would be 25 studies and it would be more appropriate. Does anyone have an idea how to do this forest plot?
I did it this way:
full.model <- rma.mv(yi = yi,
V = vi,
slab = Author,
data = df,
random = ~ 1 | Author/Study,
test = "t",
method = "REML")
forest(full.model)
It is not clear to me if you want to aggregate to the Author level or to the Study level. If there are multiple rows of data for particular studies, then the model isn't really complete and you would want to add another random intercept for the level of the estimates within studies. Essentially, the lowest random effect should have as many values for nlvls in the output as there are estimates (k).
Let's first tackle the case where we have a multilevel structure with two levels, studies and multiple estimates within studies (for some technical reasons, some might call this a three-level model, but let's not get into this). I will use a fully reproducible example for illustration purposes, using the dat.konstantopoulos2011 dataset, where we have districts and schools within districts. We fit a multilevel model of the type as you have with:
library(metafor)
dat <- dat.konstantopoulos2011
res <- rma.mv(yi, vi, random = ~ 1 | district/school, data=dat)
res
We can aggregate the estimates to the district level using the aggregate() function, specifying the marginal var-cov matrix of the estimates from the model to account for their non-independence (note that this makes use of aggregate.escalc() which only works with escalc objects, so if it is not, you need to convert the dataset to one - see help(aggregate.escalc) for details):
agg <- aggregate(dat, cluster=dat$district, V=vcov(res, type="obs"))
agg
You will find that if you then fit an equal-effects model to these estimates based on the aggregated data that the results are identical to what you obtained from the multilevel model (we use an equal-effects model since the heterogeneity accounted for by the multilevel model is already encapsulated in vcov(res, type="obs")):
rma(yi, vi, method="EE", data=agg)
So, we can now use these aggregated values in a forest plot:
with(agg, forest(yi, vi, slab=district))
My guess based on your description is that you actually have an additional level that you should include in the model and that you want to aggregate to the intermediate level. This is a tad more complicated, since aggregate() isn't meant for that. Just for illustration purposes, say we use year as another (higher) level and I will mess a bit with the data so that all three variance components are non-zero (again, just for illustration purposes):
dat$yi[dat$year == 1976] <- dat$yi[dat$year == 1976] + 0.8
res <- rma.mv(yi, vi, random = ~ 1 | year/district/school, data=dat)
res
Now instead of aggregate(), we can accomplish the same thing by using a multivariate model, including the intermediate level as a factor and using again vcov(res, type="obs") as the var-cov matrix:
agg <- rma.mv(yi, V=vcov(res, type="obs"), mods = ~ 0 + factor(district), data=dat)
agg
Now the model coefficients of this model are the aggregated values and the var-cov matrix of the model coefficients is the var-cov matrix of these aggregated values:
coef(agg)
vcov(agg)
They are not all independent (since we haven't aggregated to the highest level), so if we want to check that we can obtain the same results as from the multilevel model, we must account for this dependency:
rma.mv(coef(agg), V=vcov(agg), method="EE")
Again, exactly the same results. So now we use these coefficients and the diagonal from vcov(agg) as their sampling variances in the forest plot:
forest(coef(agg), diag(vcov(agg)), slab=names(coef(agg)))
The forest plot cannot indicate the dependency that still remains in these values, so if one were to meta-analyze these aggregated values using only diag(vcov(agg)) as their sampling variances, the results would not be identical to what you get from the full multilevel model. But there isn't really a way around that and the plot is just a visualization of the aggregated estimates and the CIs shown are correct.
You need to specify your own grouping in a new column of data and use this as the new random effect:
df$study_group <- c(1,1,1,2,2,3,4,5,5,5) # example
full.model <- rma.mv(yi = yi,
V = vi,
slab = Author,
data = df,
random = ~ 1 | study_group,
test = "t",
method = "REML")
forest(full.model)

Predicted Survival Curves using Corrected Group Prognosis Method

How can I plot predicted survival curves of a continuous covariate (let's say 20th and 80th percentile of the value) using the corrected group prognosis method as implemented in R by Therneau
For example,
library(survival)
library(survminer)
fit <- coxph( Surv(stop, event) ~ size + strata(rx), data = bladder )
ggadjustedcurves(fit, data=bladder, method = "conditional", strata=rx)
Now, this is useful because I am given two survival curves that are stratified by rx (either 0 or 1) and the conditional method is being acted upon the bladder data set. However, let's say I would like to use the marginal method but not stratify and instead plot my continuous covariate at 20th and 80th value but also re-balance the subpopulation. Would like any step in the right direction.
To re-state, I have a Cox model with continuous predictors. I would like to build a Cox model but not stratify on rx but have this in the model. Then, I want to pass the created Cox object into ggadjustedcurves() function with uses "subpopulation re-balancing" when given a reference data set. And then, instead of showing two survival curves stratified on a categorical variable, I want to plot two representative survival curves at the 20th and 80th percentile.
EDIT
My first attempt
fit2 <- coxph( Surv(stop, event) ~ size + rx, data = bladder ) #remove strata
fit2
# CGP
pred<- data.frame("rx" = 1, "size" = 3.2)
ggadjustedcurves(fit2, data = pred , method = "conditional", reference = bladder)
Is this what I think it is? Conditional re-balancing has been applied to the reference data set and then the predicted curves are generated for an individual with rx=1 and size of 3.2.
It is difficult to understand what you are truly looking for, but I think I have a rough idea. I think you want to plot the survival curve that would have been observed if every person in your sample had received a specific value for the continuous covariate. If there is no confounding, you can simply use a Cox model that includes only the continuous covariate and use the predict() function for a range of points in time and plot the results. If you need to adjust for confounding, you can include the confounders in the Cox model and use g-computation to obtain the desired probabilities. I describe this in a recent preprint: https://arxiv.org/pdf/2208.04644.pdf
This can be done in R using the contsurvplot package (also developed by me). First, install the package using:
devtools::install_github("RobinDenz1/contsurvplot")
Afterwards, fit your Cox model, but use x=TRUE in the coxph call:
library(survival)
library(contsurvplot)
library(riskRegression)
library(ggplot2)
fit2 <- coxph(Surv(stop, event) ~ size + rx, data=bladder, x=TRUE)
You can now call the plot_surv_lines function to obtain the causal survival curves for specific values of size, given the model. Using the horizon argument you can tell the function for which values you want to plot the survival curves. I choose the 20% and 80% quantile of size as you described:
plot_surv_lines(time="stop",
status="event",
variable="size",
data=bladder,
model=fit2,
horizon=quantile(bladder$size, probs=c(0.2, 0.8)))
The package contains a lot more plotting routines to visualize the causal effect of a continuous variable on a time-to-event outcome that might be more suitable for what you actually want.

How to extract average ROC curve predictions using ROCR?

The ROCR library in R offer the ability to plot an average ROC curve (right from the ROCR reference manual):
library(ROCR)
library(ROCR)
data(ROCR.xval)
# plot ROC curves for several cross-validation runs (dotted
# in grey), overlaid by the vertical average curve and boxplots
# showing the vertical spread around the average.
data(ROCR.xval)
pred <- prediction(ROCR.xval$predictions, ROCR.xval$labels)
perf <- performance(pred,"tpr","fpr")
plot(perf,col="grey82",lty=3)
plot(perf,lwd=3,avg="vertical",spread.estimate="boxplot",add=TRUE)
Lovely. Unfortunately, there's seemingly no ability to obtain the average ROC curve itself as an object/dataframe/etc. for further statistical testing (say, with pROC). I did do some research (albeit perhaps after the fact), and I found this post:
Global variables in R
I looked through ROCR's code reveals the following lines for passing a result to a plot:
performance_plots.R, (starting at line 451)
## compute average curve
perf.avg <- perf.sampled
perf.avg#x.values <- list( rowMeans( data.frame( perf.avg#x.values)))
perf.avg#y.values <- list(rowMeans( data.frame( perf.avg#y.values)))
perf.avg#alpha.values <- list( alpha.values )
So, using the trace function I looked up here (General suggestions for debugging in R):
trace(.performance.plot.horizontal.avg, edit=TRUE)
I added the following line to the performance_plots.R after the lines listed above:
perf.rocr.avg <<- perf.avg # note the double `<<`
A horrible hack, yet it works as I can plot perf.rocr.avg without a problem. Unfortunately, when using pROC, I can't compare my averaged ROC curve because it requires a pROC roc object. That's fine, but the catch is that the pROC roc object requires the original prediction and reference data to create. As far as I can tell, ROCR is averaging the ROC curves themselves and not the predictions, so it seems I can't get what I want out of ROCR.
Is there a way to reverse-engineer the predictions from the averaged ROC curve created by ROCR?
I met the same problem as you. In my perspective, the average ROC generated by the ROCR package just assigned numeric values, while other statistical attribution (e.g. confidence interval) lacks. That means statistic with the average ROC may make no sense and that's why the roc object can't be generated by (tpr, fpr) list in PRoc package. However, I find a paper to address this problem, i.e., the comparison between average ROCs. The title is "The average area under correlated receiver operating characteristic curves: a nonparametric approach based on generalized two-sample Wilcoxon statistics". I hope that's helpful.

Adding column of predicted Hazard Ratio to dataframe after Cox Regression in R

I need to add columns of predicted hazard ratio in the dataframe after running Cox PH regression in R. The dataframe is a panel data where numgvkey if firm identifier and age is time identifier. You can download a small section of the date from this link:
https://drive.google.com/file/d/0B8usDJAPeV85VFRWd01pb0h1MDA/view?usp=sharing
I have don the following:
library(survival)
library(readstata13)
sme <- read.dta13("sme.dta")
reg<-coxph(Surv(age,EVENT2)~L1FETA+frailty(numgvkey), ties=c("efron"), data=sme)
summary(reg)
hr <- predict(reg, type="risk")
How can I add a 5th column of "Hazard Ratio"(hr) in my 'sme' dataframe? Also, is there any way to predict the EVENT2 probability rather than 'hr'?
The predict.coxph function allows you to generate several different "type" of output. One of them is "expected" which may be what you mean by "probability". It's not really a probability since the numbers sometimes exceed 1.0 when the relative risk, "baseline hazard" and times under observation are high.
The "risk" option for "type" returns the hazard ratio.
There is a survfit.coxph which allows one to calculate predicted survival. The object it returns has both surv and a cumhaz list components.
You might want to try this:
sme$cumhaz <- survfit(fit, newdata=sme)$cumhaz

Resources