I found the total change in deviance between 2 models to be 6.33 with 6 degrees of freedom.
How can I do a chisquared test to test the goodness of fit?
I tried dchisq(6,6.33) which gives me 0.11. Does it mean I can reject at 90%?
But when I look the chi-squared with 6 degrees of freedom, the 90% confidence interval is at 10.64, which is sure rejection. (I can also reject at 75% - 7.84, which contradicts the above which suggests rejecting at 89%)
Related
I'm working on a meta-analysis of epidemiological studies. The studies are very heterogeneous in terms of population, intervention and analysis, so I'm using a random effects model for meta-analysis using "metafor" in R.
I subsetted the studies into subgroups with comparable outcomes. 5/6 look fine.
However, there is one subgroup that looks all wrong because tau is 0 and I^2 is 0. Looking at the data, I don't see why total heterogeneity would be 0.
res <- rma(yi=beta, sei=se, slab=(1:7), measure="OR",data=SIPVdata, digits=3, method= "ML")
Random-Effects Model (k = 3; tau^2 estimator: ML)
logLik deviance AIC BIC AICc
-0.217 2.635 4.433 2.630 16.433
tau^2 (estimated amount of total heterogeneity): 0.000 (SE = 0.044)
tau (square root of estimated tau^2 value): 0.001
I^2 (total heterogeneity / total variability): 0.00%
H^2 (total variability / sampling variability): 1.00
Test for Heterogeneity:
Q(df = 2) = 2.635, p-val = 0.268
Model Results:
estimate se zval pval ci.lb ci.ub
-0.350 0.145 -2.417 0.016 -0.634 -0.066 *
Plotting the model output looks like this:
So you can see that 2 observations (5&3), which have small confidence intervals and similar estimates, have the most influence in the sample. The other estimates have wide CIs, which all overlap. I might expect the estimate heterogeneity to be low in this case, but not 0, and certainly not the total variability tau.
Does anyone have an idea what is going on in this meta-analysis?
Thank you very much!
The ML estimator of tau^2 is known to have negative bias. That of course does not mean that it is too low in this particular case, but I would suggest to switch to an estimator that is known to be approximately unbiased. The one I would recommend is REML. This is in fact the default estimator (i.e., if you do not specify the method argument).
Also, note that with 7 studies, the estimate of tau^2 (and hence I^2) is not going to be very precise. Run confint(res) and you will see that the confidence interval for I^2 is going to be very wide. In other words, all values within the CI are compatible with these data, so really there could indeed be no heterogeneity or a lot of it.
I'm currently struggling with how to report, following APA-6 recommendations, the output of rstanarm::stan_lmer().
First, I'll fit a mixed model within the frequentist approach, then will try to do the same using the bayesian framework.
Here's the reproducible code to get the data:
library(tidyverse)
library(neuropsychology)
library(rstanarm)
library(lmerTest)
df <- neuropsychology::personality %>%
select(Study_Level, Sex, Negative_Affect) %>%
mutate(Study_Level=as.factor(Study_Level),
Negative_Affect=scale(Negative_Affect)) # I understood that scaling variables is important
Now, let's fit a linear mixed model in the "traditional" way to test the impact of Sex (male/female) on Negative Affect (negative mood) with the study level (years of education) as random factor.
fit <- lmer(Negative_Affect ~ Sex + (1|Study_Level), df)
summary(fit)
The output is the following:
Linear mixed model fit by REML t-tests use Satterthwaite approximations to degrees of
freedom [lmerMod]
Formula: Negative_Affect ~ Sex + (1 | Study_Level)
Data: df
REML criterion at convergence: 3709
Scaled residuals:
Min 1Q Median 3Q Max
-2.58199 -0.72973 0.02254 0.68668 2.92841
Random effects:
Groups Name Variance Std.Dev.
Study_Level (Intercept) 0.04096 0.2024
Residual 0.94555 0.9724
Number of obs: 1327, groups: Study_Level, 8
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.01564 0.08908 4.70000 0.176 0.868
SexM -0.46667 0.06607 1321.20000 -7.064 2.62e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
SexM -0.149
To report it, I would say that "we fitted a linear mixed model with negative affect as outcome variable, sex as predictor and study level was entered as a random effect. Within this model, the male level led to a significant decrease of negative affect (beta = -0.47, t(1321)=-7.06, p < .001).
Is that correct?
Then, let's try to fit the model within a bayesian framework using rstanarm:
fitB <- stan_lmer(Negative_Affect ~ Sex + (1|Study_Level),
data=df,
prior=normal(location=0, scale=1),
prior_intercept=normal(location=0, scale=1),
prior_PD=F)
print(fitB, digits=2)
This returns:
stan_lmer
family: gaussian [identity]
formula: Negative_Affect ~ Sex + (1 | Study_Level)
------
Estimates:
Median MAD_SD
(Intercept) 0.02 0.10
SexM -0.47 0.07
sigma 0.97 0.02
Error terms:
Groups Name Std.Dev.
Study_Level (Intercept) 0.278
Residual 0.973
Num. levels: Study_Level 8
Sample avg. posterior predictive
distribution of y (X = xbar):
Median MAD_SD
mean_PPD 0.00 0.04
------
For info on the priors used see help('prior_summary.stanreg').
I think than median is the median of the posterior distribution of the coefficient and mad_sd the equivalent of standart deviation. These parameters are close to the beta and standart error of the frequentist model, which is reassuring. However, I do not know how to formalize and put the output in words.
Moreover, if I do the summary of the model (summary(fitB, probs=c(.025, .975), digits=2)), I get other features of the posterior distribution:
...
Estimates:
mean sd 2.5% 97.5%
(Intercept) 0.02 0.11 -0.19 0.23
SexM -0.47 0.07 -0.59 -0.34
...
Is something like the following good?
"we fitted a linear mixed model within the bayesian framework with negative affect as outcome variable, sex as predictor and study level was entered as a random effect. Priors for the coefficient and the intercept were set to normal (mean=0, sd=1). Within this model, the features of the posterior distribution of the coefficient associated with the male level suggest a decrease of negative affect (mean = -0.47, sd = 0.11, 95% CI[-0.59, -0.34]).
Thanks for your help.
The following is personal opinion that may or may not be acceptable to a psychology journal.
To report it, I would say that "we fitted a linear mixed model with negative affect as outcome variable, sex as predictor and study level was entered as a random effect. Within this model, the male level led to a significant decrease of negative affect (beta = -0.47, t(1321)=-7.06, p < .001).
Is that correct?
That is considered correct from a frequentist perspective.
The key concepts from a Bayesian perspective are that (conditional on the model, of course)
There is a 0.5 probability that the true effect is less than the posterior median and a 0.5 probability that the true effect is greater than the posterior median. Frequentists tend to see a posterior median as being like a numerical optimum.
The posterior_interval function yields credible intervals around the median with a default probability of 0.9 (although a lower number produces more accurate estimates of the bounds). So, you can legitimately say that there is a probability of 0.9 that the true effect is between those bounds. Frequentists tend to see confidence intervals as being like credible intervals.
The as.data.frame function will give you access to the raw draws, so mean(as.data.frame(fitB)$male > 0) yields the probability that the expected difference in the outcome between men and women in the same study is positive. Frequentists tend to see these probabilities as being like p-values.
For a Bayesian approach, I would say
We fit a linear model using Markov Chain Monte Carlo with negative affect as the outcome variable, sex as predictor and the intercept was allowed to vary by study level.
And then talk about the estimates using the three concepts above.
I am having trouble interpreting the results of a logistic regression. My outcome variable is Decision and is binary (0 or 1, not take or take a product, respectively).
My predictor variable is Thoughts and is continuous, can be positive or negative, and is rounded up to the 2nd decimal point.
I want to know how the probability of taking the product changes as Thoughts changes.
The logistic regression equation is:
glm(Decision ~ Thoughts, family = binomial, data = data)
According to this model, Thoughts has a significant impact on probability of Decision (b = .72, p = .02). To determine the odds ratio of Decision as a function of Thoughts:
exp(coef(results))
Odds ratio = 2.07.
Questions:
How do I interpret the odds ratio?
Does an odds ratio of 2.07 imply that a .01 increase (or decrease) in Thoughts affect the odds of taking (or not taking) the product by 0.07 OR
Does it imply that as Thoughts increases (decreases) by .01, the odds of taking (not taking) the product increase (decrease) by approximately 2 units?
How do I convert odds ratio of Thoughts to an estimated probability of Decision?
Or can I only estimate the probability of Decision at a certain Thoughts score (i.e. calculate the estimated probability of taking the product when Thoughts == 1)?
The coefficient returned by a logistic regression in r is a logit, or the log of the odds. To convert logits to odds ratio, you can exponentiate it, as you've done above. To convert logits to probabilities, you can use the function exp(logit)/(1+exp(logit)). However, there are some things to note about this procedure.
First, I'll use some reproducible data to illustrate
library('MASS')
data("menarche")
m<-glm(cbind(Menarche, Total-Menarche) ~ Age, family=binomial, data=menarche)
summary(m)
This returns:
Call:
glm(formula = cbind(Menarche, Total - Menarche) ~ Age, family = binomial,
data = menarche)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.0363 -0.9953 -0.4900 0.7780 1.3675
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -21.22639 0.77068 -27.54 <2e-16 ***
Age 1.63197 0.05895 27.68 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 3693.884 on 24 degrees of freedom
Residual deviance: 26.703 on 23 degrees of freedom
AIC: 114.76
Number of Fisher Scoring iterations: 4
The coefficients displayed are for logits, just as in your example. If we plot these data and this model, we see the sigmoidal function that is characteristic of a logistic model fit to binomial data
#predict gives the predicted value in terms of logits
plot.dat <- data.frame(prob = menarche$Menarche/menarche$Total,
age = menarche$Age,
fit = predict(m, menarche))
#convert those logit values to probabilities
plot.dat$fit_prob <- exp(plot.dat$fit)/(1+exp(plot.dat$fit))
library(ggplot2)
ggplot(plot.dat, aes(x=age, y=prob)) +
geom_point() +
geom_line(aes(x=age, y=fit_prob))
Note that the change in probabilities is not constant - the curve rises slowly at first, then more quickly in the middle, then levels out at the end. The difference in probabilities between 10 and 12 is far less than the difference in probabilities between 12 and 14. This means that it's impossible to summarise the relationship of age and probabilities with one number without transforming probabilities.
To answer your specific questions:
How do you interpret odds ratios?
The odds ratio for the value of the intercept is the odds of a "success" (in your data, this is the odds of taking the product) when x = 0 (i.e. zero thoughts). The odds ratio for your coefficient is the increase in odds above this value of the intercept when you add one whole x value (i.e. x=1; one thought). Using the menarche data:
exp(coef(m))
(Intercept) Age
6.046358e-10 5.113931e+00
We could interpret this as the odds of menarche occurring at age = 0 is .00000000006. Or, basically impossible. Exponentiating the age coefficient tells us the expected increase in the odds of menarche for each unit of age. In this case, it's just over a quintupling. An odds ratio of 1 indicates no change, whereas an odds ratio of 2 indicates a doubling, etc.
Your odds ratio of 2.07 implies that a 1 unit increase in 'Thoughts' increases the odds of taking the product by a factor of 2.07.
How do you convert odds ratios of thoughts to an estimated probability of decision?
You need to do this for selected values of thoughts, because, as you can see in the plot above, the change is not constant across the range of x values. If you want the probability of some value for thoughts, get the answer as follows:
exp(intercept + coef*THOUGHT_Value)/(1+(exp(intercept+coef*THOUGHT_Value))
Odds and probability are two different measures, both addressing the same aim of measuring the likeliness of an event to occur. They should not be compared to each other, only among themselves!
While odds of two predictor values (while holding others constant) are compared using "odds ratio" (odds1 / odds2), the same procedure for probability is called "risk ratio" (probability1 / probability2).
In general, odds are preferred against probability when it comes to ratios since probability is limited between 0 and 1 while odds are defined from -inf to +inf.
To easily calculate odds ratios including their confident intervals, see the oddsratio package:
library(oddsratio)
fit_glm <- glm(admit ~ gre + gpa + rank, data = data_glm, family = "binomial")
# Calculate OR for specific increment step of continuous variable
or_glm(data = data_glm, model = fit_glm,
incr = list(gre = 380, gpa = 5))
predictor oddsratio CI.low (2.5 %) CI.high (97.5 %) increment
1 gre 2.364 1.054 5.396 380
2 gpa 55.712 2.229 1511.282 5
3 rank2 0.509 0.272 0.945 Indicator variable
4 rank3 0.262 0.132 0.512 Indicator variable
5 rank4 0.212 0.091 0.471 Indicator variable
Here you can simply specify the increment of your continuous variables and see the resulting odds ratios. In this example, the response admit is 55 times more likely to occur when predictor gpa is increased by 5.
If you want to predict probabilities with your model, simply use type = response when predicting your model. This will automatically convert log odds to probability. You can then calculate risk ratios from the calculated probabilities. See ?predict.glm for more details.
I found this epiDisplay package, works fine! It might be useful for others but note that your confidence intervals or exact results will vary according to the package used so it is good to read the package details and chose the one that works well for your data.
Here is a sample code:
library(epiDisplay)
data(Wells, package="carData")
glm1 <- glm(switch~arsenic+distance+education+association,
family=binomial, data=Wells)
logistic.display(glm1)
Source website
The above formula to logits to probabilities, exp(logit)/(1+exp(logit)), may not have any meaning. This formula is normally used to convert odds to probabilities. However, in logistic regression an odds ratio is more like a ratio between two odds values (which happen to already be ratios). How would probability be defined using the above formula? Instead, it may be more correct to minus 1 from the odds ratio to find a percent value and then interpret the percentage as the odds of the outcome increase/decrease by x percent given the predictor.
When I call the following example I receive a pretty report, but with confidence equal to 95% specifically:
julia> OneSampleTTest([1,2,3], 2)
One sample t-test
-----------------
Population details:
parameter of interest: Mean
value under h_0: 2
point estimate: 2.0
95% confidence interval: (-0.48413771175033027,4.48413771175033)
Test summary:
outcome with 95% confidence: fail to reject h_0
two-sided p-value: 1.0 (not significant)
Details:
number of observations: 3
t-statistic: 0.0
degrees of freedom: 2
empirical standard error: 0.5773502691896258
I'd like to receive similar report but for confidence equal to 99%.
The docs state that OneSampleTest implements method confint, which does receive parameter alpha, but it does not give me a full report as shown above.
I performed a MCMCglmm (MCMCglmm package). Here is the summary of this model
Iterations = 3001:12991
Thinning interval = 10
Sample size = 1000
DIC: 211.0108
G-structure: ~Region
post.mean l-95% CI u-95% CI eff.samp
Region 0.2164 5.163e-17 0.358 1000
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 0.5529 0.1808 1.045 449.3
Location effects: Abondance ~ Human_impact/Fish.sp
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 1.335628 0.780363 1.907249 642.4 0.004 **
Human_impact 0.005781 -0.294084 0.347743 876.6 0.914
Human_impact:Fish.spA. perideraion -0.782846 -1.158798 -0.399131 649.9 <0.001 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Where are the coefficients?
post.mean is the mean of the posterior distribution?
Can post.mean be somehow considered as the equivalent of the estimates of a standard lm
What does eff.samp mean?
How can I find the degrees of freedom?
This model is based on bayesian statistics. Is it correct?
You can use summary.MCMCglmm from MCMCglmm package
summary method for class "MCMCglmm". The returned object is suitable for printing with the print.summary.MCMCglmm method.
DIC
Deviance Information Criterion
fixed.formula
model formula for the fixed terms
random.formula
model formula for the random terms
residual.formula
model formula for the residual terms
solutions
posterior mean, 95% HPD interval, MCMC p-values and effective sample size of fixed (and random) effects
Gcovariances
posterior mean, 95% HPD interval and effective sample size of random effect (co)variance components
Rcovariances
posterior mean, 95% HPD interval and effective sample size of residual (co)variance components
cutpoints
posterior mean, 95% HPD interval and effective sample size of cut-points from an ordinal model
csats
chain length, burn-in and thinning interval
Gterms
indexes random effect (co)variances by the component terms defined in the random formula
I am under the impression that MCMCglmm does not implement a "true" Bayesian glmmm. Similarly to the frequentist model, one has g(E(y∣u))=Xβ+Zu and there is a prior required on the dispersion parameter ϕ1 in addition to the fixed parameters β and the "G" variance of the random effect u.
But according to this MCMCglmm vignette, the model implemented in MCMCglmm is given by g(E(y∣u,e))=Xβ+Zu+e , and it does not involve the dispersion parameter ϕ1. It is not similar to the classical frequentist model.
Degree of Freedom
mcmcglmm is a wrapper for the MCMCglmm() function. The wrapper function allows for two variants of two defualt priors on the covariance matrices. The two defaults are InvW for an inverse- Wishart prior, which sets the degrees of freedom parameter equal to the dimension of each covariance matrix, and InvG for an inverse-Gamma prior, which sets the degrees of freedom parameter to 0.002 more than one less than the dimensions of the covariance matrix.