Adding confidence intervals to a qq plot? - r

Is there a way to add confidence intervals to a qqplot?
I have a dataset of gene expression values, which I've visualized using PCA:
pca1 = prcomp(data, scale. = TRUE)
I'm now looking for outliers by checking the distribution of the data against the normal distribution through:
qqnorm(pca1$x,pch = 20, col = c(rep("red", 73), rep("blue", 33)))
qqline(pca1$x)
This is my data:
data = [2.48 104 4.25 219 0.682 0.302 1.09 0.586 90.7 344 13.8 1.17 305 2.8 79.7 3.18 109 0.932 562 0.958 1.87 0.59 114 391 13.5 1.41 208 2.37 166 3.42]
I would now like to plot 95% confidence intervals to check which data points lie outside. Any tips on how to do this?

The library car provides the function qqPlot(...) which adds a pointwise confidence envelope to the normal qq-plot by default:
library(car)
qqPlot(pca1$x)

Related

GlmmTMB model and emmeans

I am new to glmmtmb models, so i have ran into a problem.
I build a model and then based on the AICtab and DHARMa this was the best:
Insecticide_2<- glmmTMB(Insect_abundace~field_element+land_distance+sampling_time+year+treatment_day+(1|field_id),
data=Insect_002,
family= nbinom2)
After glmmTMB i ran Anova (from Car), and then emmeans, but the results of p-values in emmeans are the same (not lower.CL or upper.CL). What may be the problem? Is the model overfitted? Is the way i am doing the emmeans wrong?
Anova also showed that the land_distance, sampling_time, treatment_day were significant, year was almost significant (p= 0.07)
comp_emmeans1<-emmeans(Insect_002, pairwise ~ land_distance|year , type = "response")
> comp_emmeans1
$emmeans
Year = 2018:
land_distance response SE df lower.CL upper.CL
30m 2.46 0.492 474 1.658 3.64
50m 1.84 0.369 474 1.241 2.73
80m 1.36 0.283 474 0.906 2.05
110m 1.25 0.259 474 0.836 1.88
Year = 2019:
land_distance response SE df lower.CL upper.CL
30m 3.42 0.593 474 2.434 4.81
50m 2.56 0.461 474 1.799 3.65
80m 1.90 0.335 474 1.343 2.68
110m 1.75 0.317 474 1.222 2.49
Results are averaged over the levels of: field_element, sampling_time, treatment_day
Confidence level used: 0.95
Intervals are back-transformed from the log scale
$contrasts
year = 2018:
contrast ratio SE df null t.ratio p.value
30m / 50m 1.34 0.203 474 1 1.906 0.2268
30m / 80m 1.80 0.279 474 1 3.798 0.0009
30m / 110m 1.96 0.311 474 1 4.239 0.0002
50m / 80m 1.35 0.213 474 1 1.896 0.2311
50m / 110m 1.47 0.234 474 1 2.405 0.0776
80m / 110m 1.09 0.176 474 1 0.516 0.9552
year = 2019:
contrast ratio SE df null t.ratio p.value
30m / 50m 1.34 0.203 474 1 1.906 0.2268
30m / 80m 1.80 0.279 474 1 3.798 0.0009
30m / 110m 1.96 0.311 474 1 4.239 0.0002
50m / 80m 1.35 0.213 474 1 1.896 0.2311
50m / 110m 1.47 0.234 474 1 2.405 0.0776
80m / 110m 1.09 0.176 474 1 0.516 0.9552
Results are averaged over the levels of: field_element, sampling_time, treatment_day
P value adjustment: tukey method for comparing a family of 4 estimates
Tests are performed on the log scale
Should i use different comparison way? I saw that some use poly~, I tried that, results picture is the same. Also am I comparing the right things?
Last and also important question is how should i report the glmmTMB, Anova and emmeans results?
I don't recall seeing this question before, but it's been 8 months, and maybe I just forgot.
Anyway, I am not sure exactly what the question is, but there are three things going on that might possibly have caused some confusion:
The emmeans() call has the specification pairwise ~ land_distance|year, which causes it to compute both means and pairwise comparisons thereof. I think users are almost always better served by separating those steps, because estimating means and estimating contrasts are two different things.
The default way in which means are summarized (estimates, SEs, and confidence intervals) is different than the default for comparisons or other contrasts (estimates, SEs, t ratios, and adjusted P values). That's because, as I said before, there are two different things, and usually people want CIs for means and P values for contrasts. See below.
There is a log link in this model, and that has special properties when it comes to contrasts, because the difference on a log scale is the log of the ratio. So we display a ratio when we have type = "response". (With most other link functions, there is no way to back-transform the differences of transformed values.)
What I suggest, per (1), is to get the means (and not comparisons) first:
EMM <- emmeans(Insect_002, ~ land_distance|year , type = "response")
EMM # see the estimates
You can get pairwise comparisons next:
CON <- pairs(EMM) # or contrast(EMM, "pairwise")
CON # see the ratios as shown in the OP
confint(CON) # see confidence intervals instead of tests
confint(CON, type = "link") # See the pairwise differences on the log scale
If you actually want differences on the response scale rather than ratios, that's possible too:
pairs(regrid(EMM)) # tests
confint(pairs(regrid(EMM)) # CIs

How to plot model glm result with a lot of parameters

I really need help with this. I want to make a predict model for my glm quasipoisson. I have a problems since i wrongly make a glm model with my dataset.
I used to make a predict model based on my glm quasipoisson for all my parameters, but I ended up predicting for each parameter, and the result is different from the glm quasipoisson data.
Here is my dataset. I use a csv file for all my dataset. Idk how to upload this csv data in this post, pardon me for this.
Richness = as.matrix(dat1[,14])
Richness
8
3
3
4
3
5
4
3
7
8
Parameter = as.matrix(dat1[,15:22])
Parameter
JE Temp Hmdt Sond HE WE L MH
1 31.3 93 63.3 3.89 4.32 80 7.82
2 26.9 92 63.5 9.48 8.85 60 8.32
1 27.3 93 67.4 1.23 2.37 60 10.10
3 31.6 99 108.0 1.90 3.32 80 4.60
1 29.3 99 86.8 2.42 7.83 460 12.20
2 29.4 85 86.1 4.71 15.04 200 10.10
1 29.4 87 93.5 3.65 14.70 200 12.20
1 29.5 97 87.5 1.42 3.17 80 4.07
1 25.9 95 62.3 5.23 16.89 140 10.03
1 29.5 95 63.5 1.85 6.50 120 6.97
Rich = glm(Richness ~ Parameter, family=quasipoisson, data = dat1)
summary(Rich)
Call:
glm(formula = Richness ~ Parameter, family = quasipoisson, data = dat1)
Deviance Residuals:
1 2 3 4 5
-0.017139 0.016769 -0.008652 0.002194 -0.003153
6 7 8 9 10
-0.016828 0.022914 -0.013823 -0.012597 0.030219
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.4197959 0.5061733 -14.659 0.0434 *
ParameterJE 0.1833651 0.0224198 8.179 0.0775 .
ParameterTemp 0.2441301 0.0073380 33.269 0.0191 *
ParameterHmdt 0.0393258 0.0032176 12.222 0.0520 .
ParameterSond -0.0319313 0.0009662 -33.050 0.0193 *
ParameterHE -0.0982213 0.0060587 -16.212 0.0392 *
ParameterWE 0.1001758 0.0027575 36.329 0.0175 *
ParameterL -0.0014170 0.0001554 -9.117 0.0695 .
ParameterMH 0.0137196 0.0073704 1.861 0.3138
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for quasipoisson family taken to be 0.002739787)
Null deviance: 7.8395271 on 9 degrees of freedom
Residual deviance: 0.0027358 on 1 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 3
This is the model that i tried make with ggplot
ggplot(dat1, aes(Temp, Richness))+
geom_point() +
geom_smooth(method = "glm", method.args = list(family = quasipoisson),
fill = "grey", color = "black", linetype = 2)``
and this is the result.
I make for each parameters, but i just know this result turn wrong because it used a quasipoisson data for each parameter, what i want is the predict model based on quasipoisson data like in the summary above.
I tried to used the code from plot the results glm with multiple explanatories with 95% CIs, but i really confuse to set my data like the example there. But the result in that example is nearly like what i want.
Can anyone help me with this? How can I put the glm predict model for all parameters in one frame with ggplot?
Hope anyone can help me to fix this. Thank you so much!
Have you tried the plot_model function from sjplot package?
I'm writing from my phone, but the code is something Like this.
library(sjPlot)
plot_model(glm_model)
More info:
http://www.strengejacke.de/sjPlot/reference/plot_model.html
code:
data("mtcars")
glm_model<-glm(am~.,data = mtcars)
glm_model
library(sjPlot)
plot_model(glm_model, vline.color = "red")
plot_model(glm_model, show.values = TRUE, value.offset = .3)

Why I am getting different 95% confidence interval for some studies in meta-analysis in "meta" package in r?

I am trying to do a meta-analysis using hazard ratio, lower and upper 95% confidence interval but for example CARDIa study, obtained upper and lower 95%CI ([2.1560; 9.9858]) were different than the original values (1.33-6.16) and I do not know how to get the exact numbers.
Any advice will be greatly appreciated.
Used code:
library(meta);library(metafor)
data<-read.table(text="studlab HR LCI UCI
Blazek 1.78 0.84 3.76
PRECOMBAT 1.20 0.37 3.93
LE.MANS 1.14 0.3 4.25
NOBLE 2.99 1.66 5.39
MASS-II 2.90 1.39 6.01
CARDIa 4.64 1.33 6.16
BEST 2.75 1.16 6.54
", header=T, sep="")
metagen(log(HR), lower = log(LCI), upper = log(UCI),
studlab = studlab,data=data, sm = "HR")
Obtained results
HR 95%-CI %W(fixed) %W(random)
Blazek 1.7800 [0.8413; 3.7659] 16.4 16.5
PRECOMBAT 1.2000 [0.3682; 3.9109] 6.6 7.1
LE.MANS 1.1400 [0.3029; 4.2908] 5.2 5.7
NOBLE 2.9900 [1.6593; 5.3878] 26.6 25.0
MASS-II 2.9000 [1.3947; 6.0301] 17.2 17.2
CARDIa 4.6400 [2.1560; 9.9858] 15.7 15.8
BEST 2.7500 [1.1582; 6.5297] 12.3 12.7
Number of studies combined: k = 7
HR 95%-CI z p-value
Fixed effect model 2.5928 [1.9141; 3.5122] 6.15 < 0.0001
Random effects model 2.5695 [1.8611; 3.5477] 5.73 < 0.0001
Quantifying heterogeneity:
tau^2 = 0.0181 [0.0000; 0.9384]; tau = 0.1347 [0.0000; 0.9687];
I^2 = 9.4% [0.0%; 73.6%]; H = 1.05 [1.00; 1.94]
Test of heterogeneity:
Q d.f. p-value
6.63 6 0.3569
Details on meta-analytical method:
- Inverse variance method
- DerSimonian-Laird estimator for tau^2
- Jackson method for confidence interval of tau^2 and tau```
The CI output matches the original CI to 2 decimal places in all studies except for CARDIa, which I think has been incorrectly entered (forgive me if I'm wrong but I can't see any other explanation).
You can see this by calculating the standard errors manually and then recalculating the confidence intervals, much like the metagen function does.
library(meta)
se <- meta:::TE.seTE.ci(log(data$LCI), log(data$UCI))$seTE; se
#[1] 0.3823469 0.6027896 0.6762603 0.3004463 0.3735071 0.3910526 0.4412115
data$lower <- round(exp(ci(TE=log(data$HR), seTE=se)$lower), 3)
data$upper <- round(exp(ci(TE=log(data$HR), seTE=se)$upper), 3)
data
studlab HR LCI UCI lower upper
1 Blazek 1.78 0.84 3.76 0.841 3.766 #
2 PRECOMBAT 1.20 0.37 3.93 0.368 3.911 #
3 LE.MANS 1.14 0.30 4.25 0.303 4.291 #
4 NOBLE 2.99 1.66 5.39 1.659 5.388 #
5 MASS-II 2.90 1.39 6.01 1.395 6.030 #
6 CARDIa 4.64 1.33 6.16 2.156 9.986 # <- this one is incorrect.
7 BEST 2.75 1.16 6.54 1.158 6.530 #
The correct 95% CI for CARDIa should be around (2.16 - 9.99). I would verify that you have typed the values correctly.

lme4 deviant/tratment contrast coding with interactions in R - levels are missing

I have a mixed effects model (with lme4) with a 2-way interaction term, each term having multiple levels (each 4) and I would like to investigate their effects in reference to their grand mean. I present this example here from the car data set and omit the error term since it is not neccessary for this example:
## shorten data frame for simplicity
df=Cars93[c(1:15),]
df=Cars93[is.element(Cars93$Make,c('Acura Integra', 'Audi 90','BMW 535i','Subaru Legacy')),]
df$Make=drop.levels(df$Make)
df$Model=drop.levels(df$Model)
## define contrasts (every factor has 4 levels)
contrasts(df$Make) = contr.treatment(4)
contrasts(df$Model) = contr.treatment(4)
## model
m1 <- lm(Price ~ Model*Make,data=df)
summary(m1)
as you can see, the first levels are omitted in the interaction term. And I would like to have all 4 levels in the output, referenced to the grand mean (often referred to deviant coding). These are the sources I looked at: https://marissabarlaz.github.io/portfolio/contrastcoding/#coding-schemes and How to change contrasts to compare with mean of all levels rather than reference level (R, lmer)?. The last reference does not report interactions though.
The simple answer is that what you want is not possible directly. You have to use a slightly different approach.
In a model with interactions, you want to use contrasts in which the mean is zero and not a specific level. Otherwise, the lower-order effects (i.e., main effects) are not main effects but simple effects (evaluated when the other factor level is at its reference level). This is explained in more details in my chapter on mixed models:
http://singmann.org/download/publications/singmann_kellen-introduction-mixed-models.pdf
To get what you want, you have to fit the model in a reasonable manner and then pass it to emmeans to compare against the intercept (i.e., the unweighted grand mean). This works also for interactions as shown below (as your code did not work, I use warpbreaks).
afex::set_sum_contrasts() ## uses contr.sum globally
library("emmeans")
## model
m1 <- lm(breaks ~ wool * tension,data=warpbreaks)
car::Anova(m1, type = 3)
coef(m1)[1]
# (Intercept)
# 28.14815
## both CIs include grand mean:
emmeans(m1, "wool")
# wool emmean SE df lower.CL upper.CL
# A 31.0 2.11 48 26.8 35.3
# B 25.3 2.11 48 21.0 29.5
#
# Results are averaged over the levels of: tension
# Confidence level used: 0.95
## same using test
emmeans(m1, "wool", null = coef(m1)[1], infer = TRUE)
# wool emmean SE df lower.CL upper.CL null t.ratio p.value
# A 31.0 2.11 48 26.8 35.3 28.1 1.372 0.1764
# B 25.3 2.11 48 21.0 29.5 28.1 -1.372 0.1764
#
# Results are averaged over the levels of: tension
# Confidence level used: 0.95
emmeans(m1, "tension", null = coef(m1)[1], infer = TRUE)
# tension emmean SE df lower.CL upper.CL null t.ratio p.value
# L 36.4 2.58 48 31.2 41.6 28.1 3.196 0.0025
# M 26.4 2.58 48 21.2 31.6 28.1 -0.682 0.4984
# H 21.7 2.58 48 16.5 26.9 28.1 -2.514 0.0154
#
# Results are averaged over the levels of: wool
# Confidence level used: 0.95
emmeans(m1, c("tension", "wool"), null = coef(m1)[1], infer = TRUE)
# tension wool emmean SE df lower.CL upper.CL null t.ratio p.value
# L A 44.6 3.65 48 37.2 51.9 28.1 4.499 <.0001
# M A 24.0 3.65 48 16.7 31.3 28.1 -1.137 0.2610
# H A 24.6 3.65 48 17.2 31.9 28.1 -0.985 0.3295
# L B 28.2 3.65 48 20.9 35.6 28.1 0.020 0.9839
# M B 28.8 3.65 48 21.4 36.1 28.1 0.173 0.8636
# H B 18.8 3.65 48 11.4 26.1 28.1 -2.570 0.0133
#
# Confidence level used: 0.95
Note that for coef() you probably want to use fixef() for lme4 models.

How to get absolute difference estimate and confidence intervals from log(x+1) variable with emmeans

I have a mixed effect model with a log(x+1) transformed response variable. The output from emmeans with the type as "response" provides the mean and confidence intervals for both groups that I am comparing. However what I want is the mean and CI of the difference between the groups (i.e. the estimate). emmeans only provides the ratio (with type="response") or the log ratio (with type="link") and I am unsure how to change this into absolute values. If you run the model without the log(x+1) transformation then emmeans provides the estimated difference and CI around this difference, not the ratios. How can I also do this when my response variable is log(x+1) transformed?
bmnameF.lme2 = lme(log(bm+1)~TorC*name, random=~TorC|site,
data=matched.cases3F, method='REML')
emmeans(lme, pairwise~TorC,
type='response')%>%confint(OmeanFHR[[2]])%>%as.data.frame
emmeans.TorC emmeans.emmean emmeans.SE emmeans.df emmeans.lower.CL emmeans.upper.CL contrasts.contrast contrasts.estimate contrasts.SE contrasts.df contrasts.lower.CL contrasts.upper.CL
Managed 376.5484 98.66305 25 219.5120 645.9267 Managed - Open 3.390123 1.068689 217 1.821298 6.310297
Open 111.0722 43.15374 25 49.8994 247.2381 Managed - Open 3.390123 1.068689 217 1.821298 6.310297
Let me show a different example so the results are reproducible to all viewers:
mod = lm(log(breaks+1) ~ wool*tension, data = warpbreaks)
As you see, with a log transformation, comparisons/contrasts are expressed as ratios by default. But this can be changed by specifying transform instead of type in the emmeans() call:
> emmeans(mod, pairwise ~ tension|wool, transform = "response")
$emmeans
wool = A:
tension response SE df lower.CL upper.CL
L 42.3 5.06 48 32.1 52.4
M 23.6 2.83 48 17.9 29.3
H 23.7 2.83 48 18.0 29.4
wool = B:
tension response SE df lower.CL upper.CL
L 27.7 3.32 48 21.0 34.4
M 28.4 3.40 48 21.6 35.3
H 19.3 2.31 48 14.6 23.9
Confidence level used: 0.95
$contrasts
wool = A:
contrast estimate SE df t.ratio p.value
L - M 18.6253 5.80 48 3.213 0.0065
L - H 18.5775 5.80 48 3.204 0.0067
M - H -0.0479 4.01 48 -0.012 0.9999
wool = B:
contrast estimate SE df t.ratio p.value
L - M -0.7180 4.75 48 -0.151 0.9875
L - H 8.4247 4.04 48 2.086 0.1035
M - H 9.1426 4.11 48 2.224 0.0772
P value adjustment: tukey method for comparing a family of 3 estimates
Or, you can do this later via the regrid() function:
emm1 = emmeans(mod, ~ tension | wool)
emm2 = regrid(emm1)
emm2 # estimates
pairs(emm2) # comparisons
regrid() creates a new emmGrid object where everything is already back-transformed, thus side-stepping the behavior that happens with contrasts of log-transformed results. (In the previous illustration, the transform argument just calls regrid after it constructs the reference grid.)
But there is another subtle thing going on: The transformation is auto-detected as log; the +1 part is ignored. Thus, the back-transformed estimates are all too large by 1. To get this right, you need to use the make.tran() function to create this generalization of the log transformation:
> emm3 = update(emmeans(mod, ~ tension | wool), tran = make.tran("genlog", 1))
> str(emm3)
'emmGrid' object with variables:
tension = L, M, H
wool = A, B
Transformation: “log(mu + 1)”
> regrid(emm3)
wool = A:
tension response SE df lower.CL upper.CL
L 41.3 5.06 48 31.1 51.4
M 22.6 2.83 48 16.9 28.3
H 22.7 2.83 48 17.0 28.4
wool = B:
tension response SE df lower.CL upper.CL
L 26.7 3.32 48 20.0 33.4
M 27.4 3.40 48 20.6 34.3
H 18.3 2.31 48 13.6 22.9
Confidence level used: 0.95
The comparisons will come out the same as shown earlier, because offsetting all the means by 1 doesn't affect the pairwise differences.
See vignette("transformations", "emmeans") or https://cran.r-project.org/web/packages/emmeans/vignettes/transformations.html for more details.

Resources