Meta-analysis of proportion - r

I tried to do a meta-analysis of a single proportion. Here is the R codes:
# Packages
library(metafor)
# Data
dat <- dat.debruin2009 #from metafor package
# Metafor package ----
dat <- escalc(measure = "PLO", xi = xi, ni = ni, data = dat)
## Calculate random effect
res <- rma(yi, vi, data = dat)
res
predict(res, transf = transf.ilogit)
Here is the raw result (logit) from res object:
Random-Effects Model (k = 13; tau^2 estimator: REML)
tau^2 (estimated amount of total heterogeneity): 0.4014 (SE = 0.1955)
tau (square root of estimated tau^2 value): 0.6336
I^2 (total heterogeneity / total variability): 90.89%
H^2 (total variability / sampling variability): 10.98
Test for Heterogeneity:
Q(df = 12) = 95.9587, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
-0.1121 0.1926 -0.5821 0.5605 -0.4896 0.2654
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
And this is the result from predict():
pred ci.lb ci.ub pi.lb pi.ub
0.4720 0.3800 0.5660 0.1962 0.7660
So, my question is I get a non-significant result from raw result (p = 0.5605). But, a CI from predict() does not crosses zero (CI = 0.3800, 0.5660 ), which indicates a significant result. Do I misunderstand something or missing a step in the R code? or any explanation why the results are contradicting?
===================================================
Edit:
I tried using the meta package, I get a similar contradicting result as in metafor.
meta_pkg <- meta::metaprop(xi, ni, data = dat)
meta_pkg$.glmm.random
Here is the result (Similar result as predict() from above):
> meta_pkg
Number of studies combined: k = 13
Number of observations: o = 1516
Number of events: e = 669
proportion 95%-CI
Common effect model 0.4413 [0.4165; 0.4664]
Random effects model 0.4721 [0.3822; 0.5638]
Quantifying heterogeneity:
tau^2 = 0.3787; tau = 0.6154; I^2 = 87.5% [80.4%; 92.0%]; H = 2.83 [2.26; 3.54]
Test of heterogeneity:
Q d.f. p-value Test
95.96 12 < 0.0001 Wald-type
108.77 12 < 0.0001 Likelihood-Ratio
Details on meta-analytical method:
- Random intercept logistic regression model
- Maximum-likelihood estimator for tau^2
- Logit transformation
Similar raw result as in metafor:
> meta_pkg$.glmm.random
Random-Effects Model (k = 13; tau^2 estimator: ML)
tau^2 (estimated amount of total heterogeneity): 0.3787
tau (square root of estimated tau^2 value): 0.6154
I^2 (total heterogeneity / total variability): 90.3989%
H^2 (total variability / sampling variability): 10.4155
Tests for Heterogeneity:
Wld(df = 12) = 95.9587, p-val < .0001
LRT(df = 12) = 108.7653, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
-0.1118 0.1880 -0.5946 0.5521 -0.4804 0.2567
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The p-value is testing whether the average logit transformed proportion is significantly different from 0. That is not the same as testing whether the proportion is significantly different from 0. In fact, transf.ilogit(0) gives 0.5, so that is the corresponding value of a proportion that is being tested. And you will notice that 0.5 falls inside of the confidence interval after the back-transformation. So everything is fully consistent.

Related

How to work out the time of when the asymptote occurs in a logistic model in R?

I have fitted logistic growth models in R using nls function.
pop.ss <- nls(MRDRSLT ~ SSlogis(TIME, phi1, phi2, phi3), data = testing, na.action = na.omit)
Formula: MRDRSLT ~ SSlogis(TIME, phi1, phi2, phi3)
Parameters:
Estimate Std. Error t value Pr(>|t|)
phi1 0.23179 0.03317 6.988 0.00602 **
phi2 431.16641 36.68846 11.752 0.00132 **
phi3 79.58386 29.09809 2.735 0.07164 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.01824 on 3 degrees of freedom
Number of iterations to convergence: 3
Achieved convergence tolerance: 7.55e-06
PH1 is the growth rate where the model starts to plateau. Is there away of find the time this occurred easily ?
Or do I need to store the coefficients and rearrange to make t the subject ?
Let's try to clarify our thinking here by creating our own logistic curve using our own parameters.
We will look at a time interval of 1:100, and make time = 50 the midpoint of our curve. On the y axis we will have our curve go from 0 on the left and plateau at 20 on the right. Notice that because of the shape of the logistic curve, the plateau is only ever "reached" at time = infinity.
We will also add some random noise so we don't get singularities in our nls fit.
phi1 <- 20
phi2 <- 50
phi3 <- 0.1
TIME <- 1:100
testing <- data.frame(
TIME,
MRDRSLT = phi1/(1 + exp(phi3 * (phi2 - TIME))) + rnorm(length(TIME))
)
plot(testing$TIME, testing$MRDRSLT)
This clearly has a logistic shape. Now let's see if we can reverse engineer those parameters:
pop.ss <- nls(MRDRSLT ~ SSlogis(TIME, phi1, phi2, phi3),
data = testing, na.action = na.omit)
summary(pop.ss)
#>
#> Formula: MRDRSLT ~ SSlogis(TIME, phi1, phi2, phi3)
#>
#> Parameters:
#> Estimate Std. Error t value Pr(>|t|)
#> phi1 20.3840 0.2751 74.09 <2e-16 ***
#> phi2 50.0940 0.5863 85.45 <2e-16 ***
#> phi3 10.4841 0.4722 22.20 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 1.057 on 97 degrees of freedom
#>
#> Number of iterations to convergence: 0
#> Achieved convergence tolerance: 4.528e-06
Now let us plot what those parameters actually represent on the data:
plot(testing$TIME, testing$MRDRSLT)
abline(h = summary(pop.ss)$coef["phi1", 1], col = "red")
abline(v = summary(pop.ss)$coef["phi2", 1], col = "red")
We can see that phi1 is the value of y at the upper plateau, and that phi2 is the midpoint where the inflection point occurs.
Now, you are looking for the point in time at which the curve "starts to plateau". This is a slippery concept, but technically speaking the point at which the curve starts to plateau is the point at which the second derivative of the curve becomes negative. That occurs at the inflection point, which is equal to phi2. The curve may not "look like" it is plateauing at that point, but any other point that you choose on the curve to call the "start" of the plateau is entirely arbitrary.
Created on 2020-09-21 by the reprex package (v0.3.0)

How to correctly interpret rma.uni output?

I am trying to use the rma.uni function from the metafor package to estimate the impact of fishing gears on my abundance data. Following the method published in Sciberas et al. 2018 (DOI: 10.1111/faf.12283), I think I used correctly the function, however, I am not sure how to interpret the output.
In the function, c is the log response ratio and var_cis the associated variance. log2(t+1) represent times in days.
In my data, gear is a factor with three levels: CD, QSD and KSD.
As I am not familiar with models in general and especially this type of model, I read online documentation including this :
https://faculty.nps.edu/sebuttre/home/R/contrasts.html
Thus, I understood that only two levels from my factor gear need to be display in the output.
Below is the output I have when I run the rma.uni function. My questions are:
if gearCD is considered as a 'reference' in the model then it would mean that the effect of gearKSD is 0.14 more positive (I don't know how to word it) than gearCD and that on the opposite, gearQSD is 0.12 times more damaging ?
How should I interpret the fact that the pvalues for gearKSD and gearQSD are not significant ? Does it mean that their intercept is not significantly different from the one of gearCD ? If so, is the intercept of gearCD the same thing than intercpt?
Do you know how I could obtain one intercept value for each level of my factor gear ? I am aiming at distinguishing the intial impact of these three gears so it would be of interest to have one interpect per gear.
Similarly, if I had interaction terms with log2(t+1) (for example gearKSD:log2(t+1)) the interpreation would be silimar to how we interpret intercept ?
I am sorry I know these are a lot of questions.. Also, isn't it weird to have a R² of 100% and all other values at 0 (tau, tau² and I²)?
Thank you all very much for your help !
rma.uni(c,var_c,mods=~gear+log2(t+1),data=data_AB,method="REML")
Mixed-Effects Model (k = 15; tau^2 estimator: REML)
tau^2 (estimated amount of residual heterogeneity): 0 (SE = 0.0097)
tau (square root of estimated tau^2 value): 0
I^2 (residual heterogeneity / unaccounted variability): 0.00%
H^2 (unaccounted variability / sampling variability): 1.00
R^2 (amount of heterogeneity accounted for): 100.00%
Test for Residual Heterogeneity:
QE(df = 11) = 7.5027, p-val = 0.7570
Test of Moderators (coefficients 2:4):
QM(df = 3) = 31.7446, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
intrcpt -1.1145 0.1407 -7.9200 <.0001 -1.3904 -0.8387 ***
gearKSD 0.1488 0.1025 1.4517 0.1466 -0.0521 0.3497
gearQSD -0.1274 0.0916 -1.3919 0.1640 -0.3069 0.0520
log2(t + 1) 0.1007 0.0195 5.1626 <.0001 0.0625 0.1389 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

interpreting meta-regression outputs from metafor

I have been using the metafor package for some meta-analyses and would like to adjust for a single continuous covariate (mean age) using meta-regression. However, I require some clarification regarding the outputs and what they mean. Below I have shared the output for the base case analysis as well as the meta-regression (same studies in both, with the only difference being the addition of covariates for the meta-regression).
Base case output
Random-Effects Model (k = 36; tau^2 estimator: DL)
logLik deviance AIC BIC AICc
-18.8613 60.5927 41.7226 44.8896 42.0862
tau^2 (estimated amount of total heterogeneity): 0.0633 (SE = 0.0327)
tau (square root of estimated tau^2 value): 0.2515
I^2 (total heterogeneity / total variability): 51.46%
H^2 (total variability / sampling variability): 2.06
Test for Heterogeneity:
Q(df = 35) = 72.1031, p-val = 0.0002
Model Results:
estimate se zval pval ci.lb ci.ub
0.1266 0.0633 2.0014 0.0453 0.0026 0.2506 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Meta-regression (output)
Mixed-Effects Model (k = 36; tau^2 estimator: DL)
logLik deviance AIC BIC AICc
-18.7696 60.4092 43.5391 48.2897 44.2891
tau^2 (estimated amount of residual heterogeneity): 0.0677 (SE = 0.0346)
tau (square root of estimated tau^2 value): 0.2601
I^2 (residual heterogeneity / unaccounted variability): 52.84%
H^2 (unaccounted variability / sampling variability): 2.12
R^2 (amount of heterogeneity accounted for): 0.00%
Test for Residual Heterogeneity:
QE(df = 34) = 72.1024, p-val = 0.0001
Test of Moderators (coefficient(s) 2):
QM(df = 1) = 0.2456, p-val = 0.6202
Model Results:
estimate se zval pval ci.lb ci.ub
intrcpt -0.3741 1.0140 -0.3690 0.7122 -2.3616 1.6133
mods 0.0085 0.0172 0.4955 0.6202 -0.0252 0.0423
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
My questions are:
Why are we observing an R-squared of 0% in the meta-regression (is it simply because the covariate is not significant or do you suspect something is not correct)?
How can we interpret the outputs of the meta-regression? With back-transformation of the logHRs we suspect something like below, but would like to make sure that I am interpreting the ‘intrcpt’ and ‘mods’ values correctly.
I have assumed mods represents the pooled HR taking into account the adjustment for age.
I have assumed intrcpt represents the covariate effect (beta) – i.e. the amount that the logHR changes for a one unit increase in age. Also, I have back-transformed this output, which I am not sure is appropriate, or if I should present as is.

R: meta-analysis CI of proportion not bounded correctly

library(metafor)
rma(yi = c(0.1, 0.3, 0.14, 0.3), vi = c(0.12, 0.2, 0.3, 0.1))
I am fitting a random effects meta-analytic model of single proportions on 4 studies. Since the effect sizes are all proportions, they are bounded between 0 and 1, and so should the confidence intervals. However, the actual output shows
Random-Effects Model (k = 4; tau^2 estimator: REML)
tau^2 (estimated amount of total heterogeneity): 0 (SE = 0.1220)
tau (square root of estimated tau^2 value): 0
I^2 (total heterogeneity / total variability): 0.00%
H^2 (total variability / sampling variability): 1.00
Test for Heterogeneity:
Q(df = 3) = 0.2372, p-val = 0.9714
Model Results:
estimate se zval pval ci.lb ci.ub
0.2175 0.1936 1.1232 0.2614 -0.1620 0.5970
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
i.e. the CI is (-0.162, 0.597). How can I fix this?
You can treat the lower bound as 0. Or you could use logit transformed proportions (log odds) for the meta-analysis. After the back-transformation, the resulting estimate and CI bounds must be between 0 and 1. Or you could directly switch to a logistic mixed-effects model for your analysis (see help(rma.glmm)). The latter is also based on log odds and hence will give you values between 0 and 1 after the back-transformation.

Testing two mean effect sizes in meta-analysis for significant differences

I run two meta-analysis and want to proof that the caluclated mean effect size (in fisher z) differs between both meta-analysis.
As i am quit new in R and not such a pro in statistics, could you provide the appropriate test and how to conduct it in R?
Here are my current results of the two meta analysis:
> results1GN
Random-Effects Model (k = 4; tau^2 estimator: REML)
tau^2 (estimated amount of total heterogeneity): 0.0921 (SE = 0.0752)
tau (square root of estimated tau^2 value): 0.3034
I^2 (total heterogeneity / total variability): 99.98%
H^2 (total variability / sampling variability): 5569.05
Test for Heterogeneity:
Q(df = 3) = 22183.0526, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
0.3663 0.1517 2.4139 0.0158 0.0689 0.6637 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> results1NN
Random-Effects Model (k = 72; tau^2 estimator: REML)
tau^2 (estimated amount of total heterogeneity): 0.0521 (SE = 0.0096)
tau (square root of estimated tau^2 value): 0.2282
I^2 (total heterogeneity / total variability): 95.98%
H^2 (total variability / sampling variability): 24.85
Test for Heterogeneity:
Q(df = 71) = 1418.1237, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
0.2594 0.0282 9.2016 <.0001 0.2042 0.3147 ***
I elaborate on the previous response, which was a bit rude.
On first step, you may consider the confidence interval of you result.
You may consult the Wikipedia page on the topic for a quick summary:
http://en.wikipedia.org/wiki/Confidence_interval
Anyway, in your example you have two effect sizes with overlapping confidence interval:
results1GN = 0.36 (95%CI: 0.069 to 0.66)
results1NN = 0.26 (95%CI: 0.20 to 0.31).
So, the two results are not statistically different.
The package 'metafor' also includes a function (anova) to compare "nested models", and I quote:
"For two (nested) models of class "rma.uni", the function provides a full versus reduced model comparison in terms of model fit statistics and a likelihood ratio test".
Note that "When specifying two models for comparison, the function provides a likelihood ratio test comparing the two models. The models must be based on the same set of data and should be nested for the likelihood ratio test to make sense".
You may found more details and examples on the use of the function here:
http://www.inside-r.org/packages/cran/metafor/docs/anova.rma.uni
Also, consult the manual of the package and the the site of the package maintainer.
Hope this may be helpful.

Resources