How do you interpret thresholds in CLMM (intercepts and odds ratio) - r

How do you express and interpret CLMM thresholds? Preferably using Odds ratio. Any help is highly appreciated! :)
Some background info: I have done a survey of human perceptions towards an animal. I used a CLMM to analyze the data with perception (1, 2, 3) as my response variable. My predictors are education, gender, experience of crop-raiding and observation of the animal. Village (where the survey was performed) as a random effect. Using package: "ordinal", https://cran.r-project.org/web/packages/ordinal/ordinal.pdf
Now im in the process of interpreting the results of the CLMM. I would like to use sjplot and analyze results based on odds ratio, but I'm having issues understanding the thresholds. Eg: 1|2 comes out with an estimate of -1.0375 and an odds ratio of 0.35. The sjplot also signifies that the odds ratio has a significant p-value (figure of sjplot below). What does this threshold odds ratio mean?
Xclmm <- clmm(OrdinalPercept ~ Education + Gender + Crop_raid + Obs + (1|IDVillage), Data = HR, Hess = TRUE, threshold = c( "flexible"))
summary(Xclmm)
Coefficients
Am I right in this interpretation of the estimates?
Threshold coefficient estimate:
1|2 -1.0375
2|3 0.3724
1 < -1.0375
2 > -1.0375 and < 0.3724 (-1.0376 to 0.3723)
3 > 0.3724
Respondent of gender female (estimate = 0.783) is expected to be in the third group ((3), estimate > 0.3724). And female respondents have 2.19 (odds ratio) times the odds of men, being in that third group (perception group 3).
What does the odds ratio of 1|2 and 2|3 mean?
Sjplot
Thanks :)

Related

How to obtain p-values and CI for the effect of interaction item using GLM in R

In a multivariable GLM model, I need to calculate the simple effects of the following interaction item:
sex (female vs. male)*unstablyhoused (yes vs. no)
The model has as.factor variables, with my reference variables being Male = 1, unstably housed (no) = 0.
In my model, female = 2 and unstably housed (yes) = 1
I have written the following code but not sure how can I calculate the OR's, CI, and different p-values for each of the categories stated above.
model <- glm(depression ~ as.factor(unstablyhoused) + as.factor(SEXBRTH) + as.factor(unstablyhoused)*as.factor(SEXBRTH),
data=mergedata, family = binomial(link='logit'), na.action = na.omit)
summary(model)
confint(model)
exp(coef(model))
exp(cbind(OR = coef(model), confint(model)))
My question is how do I calculate the OR, confidence interval (95% CI), and p-values for the effect of the interaction when:
sex(female vs. male) among those who are unstably housed
sex (female vs. male) among those who do NOT unstably housed
unstablyhoused (yes vs. no) among those that are female
unstablyhoused (yes vs. no) among those that are NOT female
Any help and direction in what code to use to be able to calculate 1-4 items, would be greatly appreciated. I need to know the OR, confidence intervals, and p-value of the simple main effects
ModelOutput

how to calculate heritability from half-sib design

I'm trying to measure heritability of a trait, flowering time (FT), for a set of data collected from a half-sib design. The data includes FT for each mother plant and 2 half siblings from that mother plant for ~150 different maternal lines (ML). Paternity is unknown.
I've tried:
Estimating heritability with a regression of the maternal FT and the mean sibling FT, and doubling the slope. This worked fine, and produced an estimate of 0.14.
Running an ANOVA and using the between ML variation to estimate additive genetic variance. Got the idea from slide 25 of this powerpoint and from this thread on within and between variance calculation
fit = lm(FT ~ ML, her)
anova(fit)
her is the dataset, which, in this case, only includes the half sib FT values (I excluded the mother FT values for this attempt at heritability)
From the ANOVA output I have have used the "ML" term mean square as the between ML variation, which is also equal to 1/4 of the additive genetic variance because coefficient of relatedness between half-sibs is 0.25. This value turned out to be 0.098. Also, by multiplying this by 4 I could get the additive genetic variance.
I have used the "residuals" mean square as all variability save for that accounted for by the "ML" term. So, all of variance minus 1/4 of additive genetic variance. This turned out to be 1.342.
And then attempted to calculate heritabilty as Va/Vp = (4*0.098)/(1.342 + 0.098) = 0.39
This is quite different from my slope estimate, and I'm not sure if my reasoning is correct.
I've tried things with the sommer and heritability packages of R but haven't had success using either for a half-sib design and haven't found an example of a half-sib design with either package.
Any suggestions?

Get predictions from coxph

# Create the simplest test data set
test1 <- list(time=c(4,3,1,1,2,2,3),
status=c(1,1,1,0,1,1,0),
x=c(0,2,1,1,1,0,0),
sex=c(0,0,0,0,1,1,1))
# Fit a stratified model
m=coxph(Surv(time, status) ~ x + sex, test1)
y=predict(m,type="survival",by="sex")
Basically what I am doing is making fake data called test1, then I am fitting a simple coxph model and saving it as 'm'. Then what I aim to do is get the predicted probabilities and confidence bands for the survival probability separate for sexes. My hopeful dataset 'y' will include: age, survival probability, lower confidence band, upper confidence band, and sex which equals to '0' or '1'.
This can be accomplished in two ways. The first is a slight modification to your code, using the predict() function to get predictions at a specific times for specific combinations of covariates. The second is by using the survfit() function, which estimates the entire survival curve and is easy to plot. The confidence intervals don't exactly agree as we'll see, but they should match fairly closely as long as the probabilities aren't too close to 1 or 0.
Below is code to both make the predictions as your code tries. It uses the built-in cancer data. The important difference is to create a newdata which has the covariate values you're interested in. Because of the non-linear nature of survival probabilities it is generally a bad idea to try and make a prediction for the "average person". Because we want to get a survival probability we must also specify what time to consider that probability. I've taken time = 365, age = 60, and both sex = 1 and sex = 2 So this code predicts the 1-year survival probability for a 60 year old male and a 60 year old female. Note that we must also include status in the newdata, even though it doesn't affect the result.
library(survival)
mod <- coxph(Surv(time,status) ~ age + sex, data = cancer)
pred_dat <- data.frame(time = c(365,365), status = c(2,2),
age = c(60,60), sex = c(1,2))
preds <- predict(mod, newdata = pred_dat,
type = "survival", se.fit = TRUE)
pred_dat$prob <- preds$fit
pred_dat$lcl <- preds$fit - 1.96*preds$se.fit
pred_dat$ucl <- preds$fit + 1.96*preds$se.fit
pred_dat
#> time status age sex prob lcl ucl
#> 1 365 2 60 1 0.3552262 0.2703211 0.4401313
#> 2 365 2 60 2 0.5382048 0.4389833 0.6374264
We see that for a 60 year old male the 1 year survival probability is estimated as 35.5%, while for a 60 year old female it is 53.8%.
Below we estimate the entire survival curve using survfit(). I've saved time by reusing the pred_dat from above, and because the plot gets messy I've only plotted the male curve, which is the first row. I've also added some flair, but you only need the first 2 lines.
fit <- survfit(mod, newdata = pred_dat[1,])
plot(fit, conf.int = TRUE)
title("Estimated survival probability for age 60 male")
abline(v = 365, col = "blue")
abline(h = pred_dat[1,]$prob, col = "red")
abline(h = pred_dat[1,]$lcl, col = "orange")
abline(h = pred_dat[1,]$ucl, col = "orange")
Created on 2022-06-09 by the reprex package (v2.0.1)
I've overlaid lines corresponding to the predicted probabilities from part 1. The red line is the estimated survival probability at day 365 and the orange lines are the 95% confidence interval. The predicted survival probability matches, but if you squint closely you'll see the confidence interval doesn't match exactly. That's generally not a problem, but if it is a problem you should trust the ones from survfit() instead of the ones calculated from predict().
You can also dig into the values of fit to extract fitted probabilities and confidence bands, but the programming is a little more complicated because the desired time doesn't usually match exactly.
Section 5 of this document by Dimitris Rizopoulos discusses how to estimate Survival Probabilities from a Cox model. Dimitris Rizipoulos states:
the Cox model does not estimate the baseline hazard, and therefore we cannot directly obtain survival probabilities from it. To achieve that we need to combine it with a non-parametric estimator of the baseline hazard function. The most popular method to do that is to use the Breslow estimator. For a fitted Cox model from package survival these probabilities are calculated by function survfit(). As an illustration, we would like to derive survival probabilities from the following Cox model for the AIDS dataset:
He then goes on to provide R code that shows how to estimate Survival Probabilities at specific follow-up times.
I found this useful, it may help you too.

R: Calculate and interpret odds ratio in logistic regression

I am having trouble interpreting the results of a logistic regression. My outcome variable is Decision and is binary (0 or 1, not take or take a product, respectively).
My predictor variable is Thoughts and is continuous, can be positive or negative, and is rounded up to the 2nd decimal point.
I want to know how the probability of taking the product changes as Thoughts changes.
The logistic regression equation is:
glm(Decision ~ Thoughts, family = binomial, data = data)
According to this model, Thoughts has a significant impact on probability of Decision (b = .72, p = .02). To determine the odds ratio of Decision as a function of Thoughts:
exp(coef(results))
Odds ratio = 2.07.
Questions:
How do I interpret the odds ratio?
Does an odds ratio of 2.07 imply that a .01 increase (or decrease) in Thoughts affect the odds of taking (or not taking) the product by 0.07 OR
Does it imply that as Thoughts increases (decreases) by .01, the odds of taking (not taking) the product increase (decrease) by approximately 2 units?
How do I convert odds ratio of Thoughts to an estimated probability of Decision?
Or can I only estimate the probability of Decision at a certain Thoughts score (i.e. calculate the estimated probability of taking the product when Thoughts == 1)?
The coefficient returned by a logistic regression in r is a logit, or the log of the odds. To convert logits to odds ratio, you can exponentiate it, as you've done above. To convert logits to probabilities, you can use the function exp(logit)/(1+exp(logit)). However, there are some things to note about this procedure.
First, I'll use some reproducible data to illustrate
library('MASS')
data("menarche")
m<-glm(cbind(Menarche, Total-Menarche) ~ Age, family=binomial, data=menarche)
summary(m)
This returns:
Call:
glm(formula = cbind(Menarche, Total - Menarche) ~ Age, family = binomial,
data = menarche)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.0363 -0.9953 -0.4900 0.7780 1.3675
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -21.22639 0.77068 -27.54 <2e-16 ***
Age 1.63197 0.05895 27.68 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 3693.884 on 24 degrees of freedom
Residual deviance: 26.703 on 23 degrees of freedom
AIC: 114.76
Number of Fisher Scoring iterations: 4
The coefficients displayed are for logits, just as in your example. If we plot these data and this model, we see the sigmoidal function that is characteristic of a logistic model fit to binomial data
#predict gives the predicted value in terms of logits
plot.dat <- data.frame(prob = menarche$Menarche/menarche$Total,
age = menarche$Age,
fit = predict(m, menarche))
#convert those logit values to probabilities
plot.dat$fit_prob <- exp(plot.dat$fit)/(1+exp(plot.dat$fit))
library(ggplot2)
ggplot(plot.dat, aes(x=age, y=prob)) +
geom_point() +
geom_line(aes(x=age, y=fit_prob))
Note that the change in probabilities is not constant - the curve rises slowly at first, then more quickly in the middle, then levels out at the end. The difference in probabilities between 10 and 12 is far less than the difference in probabilities between 12 and 14. This means that it's impossible to summarise the relationship of age and probabilities with one number without transforming probabilities.
To answer your specific questions:
How do you interpret odds ratios?
The odds ratio for the value of the intercept is the odds of a "success" (in your data, this is the odds of taking the product) when x = 0 (i.e. zero thoughts). The odds ratio for your coefficient is the increase in odds above this value of the intercept when you add one whole x value (i.e. x=1; one thought). Using the menarche data:
exp(coef(m))
(Intercept) Age
6.046358e-10 5.113931e+00
We could interpret this as the odds of menarche occurring at age = 0 is .00000000006. Or, basically impossible. Exponentiating the age coefficient tells us the expected increase in the odds of menarche for each unit of age. In this case, it's just over a quintupling. An odds ratio of 1 indicates no change, whereas an odds ratio of 2 indicates a doubling, etc.
Your odds ratio of 2.07 implies that a 1 unit increase in 'Thoughts' increases the odds of taking the product by a factor of 2.07.
How do you convert odds ratios of thoughts to an estimated probability of decision?
You need to do this for selected values of thoughts, because, as you can see in the plot above, the change is not constant across the range of x values. If you want the probability of some value for thoughts, get the answer as follows:
exp(intercept + coef*THOUGHT_Value)/(1+(exp(intercept+coef*THOUGHT_Value))
Odds and probability are two different measures, both addressing the same aim of measuring the likeliness of an event to occur. They should not be compared to each other, only among themselves!
While odds of two predictor values (while holding others constant) are compared using "odds ratio" (odds1 / odds2), the same procedure for probability is called "risk ratio" (probability1 / probability2).
In general, odds are preferred against probability when it comes to ratios since probability is limited between 0 and 1 while odds are defined from -inf to +inf.
To easily calculate odds ratios including their confident intervals, see the oddsratio package:
library(oddsratio)
fit_glm <- glm(admit ~ gre + gpa + rank, data = data_glm, family = "binomial")
# Calculate OR for specific increment step of continuous variable
or_glm(data = data_glm, model = fit_glm,
incr = list(gre = 380, gpa = 5))
predictor oddsratio CI.low (2.5 %) CI.high (97.5 %) increment
1 gre 2.364 1.054 5.396 380
2 gpa 55.712 2.229 1511.282 5
3 rank2 0.509 0.272 0.945 Indicator variable
4 rank3 0.262 0.132 0.512 Indicator variable
5 rank4 0.212 0.091 0.471 Indicator variable
Here you can simply specify the increment of your continuous variables and see the resulting odds ratios. In this example, the response admit is 55 times more likely to occur when predictor gpa is increased by 5.
If you want to predict probabilities with your model, simply use type = response when predicting your model. This will automatically convert log odds to probability. You can then calculate risk ratios from the calculated probabilities. See ?predict.glm for more details.
I found this epiDisplay package, works fine! It might be useful for others but note that your confidence intervals or exact results will vary according to the package used so it is good to read the package details and chose the one that works well for your data.
Here is a sample code:
library(epiDisplay)
data(Wells, package="carData")
glm1 <- glm(switch~arsenic+distance+education+association,
family=binomial, data=Wells)
logistic.display(glm1)
Source website
The above formula to logits to probabilities, exp(logit)/(1+exp(logit)), may not have any meaning. This formula is normally used to convert odds to probabilities. However, in logistic regression an odds ratio is more like a ratio between two odds values (which happen to already be ratios). How would probability be defined using the above formula? Instead, it may be more correct to minus 1 from the odds ratio to find a percent value and then interpret the percentage as the odds of the outcome increase/decrease by x percent given the predictor.

Unstandardised slopes in MuMIN package

I used the MuMIN package to do a model averaging based on information criterion, following this question.
options(na.action = "na.fail")
Create a global model with all variables and two way interaction:
global.model<-lmer(yld.res ~ rain + brk +
onset + wid + (1|state),data=data1,REML="FALSE")
Standardise the glboal model since the variables are on different scale
stdz.model <- standardize(global.model,standardize.y = TRUE)
Create all possible combinations of the model
model.set <- dredge(stdz.model)
Get the best model based on deltaAICc<2 criteria
top.models <- get.models(model.set, subset= delta<2)
Average the models to calculate the effect size (standardised slopes of the input variables)
s<-model.avg(top.models)
summary(s);confint(s)
The effect size of the variables are as follows:
Variable slope estimate
brk -0.28
rain 0.13
wid 0.10
onset 0.09
As you can see, I had standardize my model in step 3 so I can compare these slope estimates i.e. I can say slope estimate of brk is greater (in negative direction) than rain. However, since these slope estimates where standardised, I wanted to know if there is any way I can get the unstandardised slopes?
Please let me know if my question is not clear.
Thanks

Resources