Simple slope estimates differ between emmeans package and interactions package? - r

I computed simple slopes for an interaction with the sim_slopes() function from the interactions package and using the emtrends() function from the emmeans package and results (both the estimates and standard errors) seem to slightly differ even though both computations are based on the same linear model (using the lm() function). Is there an explanation for this? I've pasted the code and output below. x is a continuous variable and z is a categorical variable with 3 levels.
model1 <- lm(DV ~ z * x, data = data)
> sim_slopes(model1, pred = x, modx = z, johnson_neyman = FALSE)
SIMPLE SLOPES ANALYSIS
Slope of x when z = 3:
Est. S.E. t val. p
------ ------ -------- ------
0.50 0.10 4.89 0.00
Slope of x when z = 2:
Est. S.E. t val. p
------ ------ -------- ------
0.74 0.09 7.83 0.00
Slope of x when z = 1:
Est. S.E. t val. p
------ ------ -------- ------
0.33 0.10 3.37 0.00
> emtrends(model1, ~ z, var="x")
NOTE: Results may be misleading due to involvement in interactions
z x.trend SE df lower.CL upper.CL
1 0.290 0.0669 1016 0.158 0.421
2 0.618 0.0611 1016 0.498 0.738
3 0.411 0.0612 1016 0.291 0.531

I am trying to reproduce your results.
I tried interactions between z and x by using iris data, where
DV <- iris$Sepal.Length
z <- iris$Species # z is a categorical variable with 3 levels : setosa, virginica, and versicolor
x <- iris$Petal.Length #x is a continuous variable.
The resulted slope estimates and corresponding standard errors in sim_slopes and emtrends are almost identical.
model1 <- lm(DV ~ z * x, data = iris)
sim_slopes(model1, pred = x, modx = z, johnson_neyman = FALSE)
#SIMPLE SLOPES ANALYSIS
#Slope of x when z = virginica:
# Est. S.E. t val. p
#------ ------ -------- ------
# 1.00 0.09 11.43 0.00
# Slope of x when z = versicolor:
# Est. S.E. t val. p
#------ ------ -------- ------
# 0.83 0.10 8.10 0.00
# Slope of x when z = setosa:
# Est. S.E. t val. p
# ------ ------ -------- ------
# 0.54 0.28 1.96 0.05
emtrends(model1, ~ z, var="x")
# z x.trend SE df lower.CL upper.CL
# setosa 0.542 0.2768 144 -0.00476 1.09
# versicolor 0.828 0.1023 144 0.62611 1.03
# virginica 0.996 0.0871 144 0.82360 1.17
# Confidence level used: 0.95
However, this warning message: NOTE: Results may be misleading due to involvement in interactions does not emerge in my trial.
This result indicates that both packages use the same method to compute the slope. Due to the warning message that emerged in your result, I would suggest you to check again your z and x data and to reconsider how sensible the interaction of these variables is.
My suggestion is based on the explanation of the emmeans author here:
https://mran.microsoft.com/snapshot/2018-03-30/web/packages/emmeans/vignettes/interactions.html

Related

Linear mixed model confidence intervals question

Hoping that you can clear some confusion in my head.
Linear mixed model is constructed with lmerTest:
MODEL <- lmer(Ca content ~ SYSTEM +(1 | YEAR/replicate) +
(1 | YEAR:SYSTEM), data = IOSDV1)
Fun starts happening when I'm trying to get the confidence intervals for the specific levels of the main effect.
Commands emmeans and lsmeans produce the same intervals (example; SYSTEM A3: 23.9-128.9, mean 76.4, SE:8.96).
However, the command as.data.frame(effect("SYSTEM", MODEL)) produces different, narrower confidence intervals (example; SYSTEM A3: 58.0-94.9, mean 76.4, SE:8.96).
What am I missing and what number should I report?
To summarize, for the content of Ca, i have 6 total measurements per treatment (three per year, each from different replication). I will leave the names in the code in my language, as used. Idea is to test if certain production practices affect the content of specific minerals in the grains. Random effects without residual variance were left in the model for this example.
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: CA ~ SISTEM + (1 | LETO/ponovitev) + (1 | LETO:SISTEM)
Data: IOSDV1
REML criterion at convergence: 202.1
Scaled residuals:
Min 1Q Median 3Q Max
-1.60767 -0.74339 0.04665 0.73152 1.50519
Random effects:
Groups Name Variance Std.Dev.
LETO:SISTEM (Intercept) 0.0 0.0
ponovitev:LETO (Intercept) 0.0 0.0
LETO (Intercept) 120.9 11.0
Residual 118.7 10.9
Number of obs: 30, groups: LETO:SISTEM, 10; ponovitev:LETO, 8; LETO, 2
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 76.417 8.959 1.548 8.530 0.0276 *
SISTEM[T.C0] -5.183 6.291 24.000 -0.824 0.4181
SISTEM[T.C110] -13.433 6.291 24.000 -2.135 0.0431 *
SISTEM[T.C165] -7.617 6.291 24.000 -1.211 0.2378
SISTEM[T.C55] -10.883 6.291 24.000 -1.730 0.0965 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) SISTEM[T.C0 SISTEM[T.C11 SISTEM[T.C16
SISTEM[T.C0 -0.351
SISTEM[T.C11 -0.351 0.500
SISTEM[T.C16 -0.351 0.500 0.500
SISTEM[T.C5 -0.351 0.500 0.500 0.500
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see ?isSingular
> ls_means(MODEL, ddf="Kenward-Roger")
Least Squares Means table:
Estimate Std. Error df t value lower upper Pr(>|t|)
SISTEMA3 76.4167 8.9586 1.5 8.5299 23.9091 128.9243 0.02853 *
SISTEMC0 71.2333 8.9586 1.5 7.9514 18.7257 123.7409 0.03171 *
SISTEMC110 62.9833 8.9586 1.5 7.0305 10.4757 115.4909 0.03813 *
SISTEMC165 68.8000 8.9586 1.5 7.6797 16.2924 121.3076 0.03341 *
SISTEMC55 65.5333 8.9586 1.5 7.3151 13.0257 118.0409 0.03594 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Confidence level: 95%
Degrees of freedom method: Kenward-Roger
> emmeans(MODEL, spec = c("SISTEM"))
SISTEM emmean SE df lower.CL upper.CL
A3 76.4 8.96 1.53 23.9 129
C0 71.2 8.96 1.53 18.7 124
C110 63.0 8.96 1.53 10.5 115
C165 68.8 8.96 1.53 16.3 121
C55 65.5 8.96 1.53 13.0 118
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
> as.data.frame(effect("SISTEM", MODEL))
SISTEM fit se lower upper
1 A3 76.41667 8.958643 57.96600 94.86734
2 C0 71.23333 8.958643 52.78266 89.68400
3 C110 62.98333 8.958643 44.53266 81.43400
4 C165 68.80000 8.958643 50.34933 87.25067
5 C55 65.53333 8.958643 47.08266 83.98400
Many thanks.
I'm pretty sure this has to do with the dreaded "denominator degrees of freedom" question, i.e. what kind (if any) of finite-sample correction is being employed. tl;dr emmeans is using a Kenward-Roger correction, which is more or less the most accurate available option — the only reason not to use K-R is if you have a large data set for which it becomes unbearably slow.
load packages, simulate data, fit model
library(lmerTest)
library(emmeans)
library(effects)
dd <- expand.grid(f=factor(letters[1:3]),g=factor(1:20),rep=1:10)
set.seed(101)
dd$y <- simulate(~f+(1|g), newdata=dd, newparams=list(beta=rep(1,3),theta=1,sigma=1))[[1]]
m <- lmer(y~f+(1|g), data=dd)
compare default emmeans with effects
emmeans(m, ~f)
## f emmean SE df lower.CL upper.CL
## a 0.848 0.212 21.9 0.409 1.29
## b 1.853 0.212 21.9 1.414 2.29
## c 1.863 0.212 21.9 1.424 2.30
## Degrees-of-freedom method: kenward-roger
## Confidence level used: 0.95
as.data.frame(effect("f",m))
## f fit se lower upper
## 1 a 0.8480161 0.2117093 0.4322306 1.263802
## 2 b 1.8531805 0.2117093 1.4373950 2.268966
## 3 c 1.8632228 0.2117093 1.4474373 2.279008
effects doesn't explicitly tell us what/whether it's using a finite-sample correction: we could dig around in the documentation or the code to try to find out. Alternatively, we can tell emmeans not to use finite-sample correction:
emmeans(m, ~f, lmer.df="asymptotic")
## f emmean SE df asymp.LCL asymp.UCL
## a 0.848 0.212 Inf 0.433 1.26
## b 1.853 0.212 Inf 1.438 2.27
## c 1.863 0.212 Inf 1.448 2.28
## Degrees-of-freedom method: asymptotic
## Confidence level used: 0.95
Testing shows that these are equivalent to about a tolerance of 0.001 (probably close enough). In principle we should be able to specify KR=TRUE to get effects to use Kenward-Roger correction, but I haven't been able to get that to work yet.
However, I will also say that there's something a little bit funky about your example. If we compute the distance between the mean and the lower CI in units of standard error, for emmeans we get (76.4-23.9)/8.96 = 5.86, which implies a very small effect degrees of freedom (e.g. about 1.55). That seems questionable to me unless your data set is extremely small ...
From your updated post, it appears that Kenward-Roger is indeed estimating only 1.5 denominator df.
In general it is dicey/not recommended to try fitting random effects where the grouping variable has a small number of levels (although see here for a counterargument). I would try treating LETO (which has only two levels) as a fixed effect, i.e.
CA ~ SISTEM + LETO + (1 | LETO:ponovitev) + (1 | LETO:SISTEM)
and see if that helps. (I would expect you would then get on the order of 7 df, which would make your CIs ± 2.4 SE instead of ± 6 SE ...)

Estimating effect size with emmenas for post hoc

Is there a way to have effect size (such as Cohen's d or the most appropriate) directly using emmeans()?
I cannot find anything for obtaining effect size by using emmeans()
post <- emmeans(fit, pairwise~ favorite.pirate | sex)
emmip(fit, ~ favorite.pirate | sex)
There is not a built-in provision for effect-size calculations, but you can cobble one together by defining a custom contrast function that divides each pairwise comparison by a value of sigma:
mypw.emmc = function(..., sigma = 1) {
result = emmeans:::pairwise.emmc (...)
for (i in seq_along(result[1, ]))
result[[i]] = result[[i]] / sigma
result
}
Here's a test run:
> mypw.emmc(1:3, sigma = 4)
1 - 2 1 - 3 2 - 3
1 0.25 0.25 0.00
2 -0.25 0.00 0.25
3 0.00 -0.25 -0.25
With your model, the error SD is 9.246 (look at summary(fit); so, ...
> emmeans(fit, mypw ~ sex, sigma = 9.246, name = "effect.size")
NOTE: Results may be misleading due to involvement in interactions
$emmeans
sex emmean SE df lower.CL upper.CL
female 63.8 0.434 3.03 62.4 65.2
male 74.5 0.809 15.82 72.8 76.2
other 68.8 1.439 187.08 65.9 71.6
Results are averaged over the levels of: favorite.pirate
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
$contrasts
effect.size estimate SE df t.ratio p.value
female - male -1.158 0.0996 399 -11.624 <.0001
female - other -0.537 0.1627 888 -3.299 0.0029
male - other 0.621 0.1717 981 3.617 0.0009
Results are averaged over the levels of: favorite.pirate
Degrees-of-freedom method: kenward-roger
P value adjustment: tukey method for comparing a family of 3 estimates
Some words of caution though:
The SEs of the effect sizes are misleading because they don't account for the variation in sigma.
This is not a very good example because
a. The factors interact (Edward Low is different in his profile).
Also, see the warning message.
b. The model is singular (as warned when the model was fitted), yielding an estimated variance of zero for college)
library(yarrr)
View(pirates)
library(lme4)
library(lmerTest)
fit <- lmer(weight~ favorite.pirate * sex +(1|college), data = pirates)
anova(fit, ddf = "Kenward-Roger")
post <- emmeans(fit, pairwise~ sex)
post

How to build a summary table of glm's parameters and AICcWt

I have a dataset that I am using to build generalised linear models. The response variable is binary (absence/presence) and the explanatory variables are categorical.
CODE
library(tidyverse)
library(AICcmodavg)
# Data
set.seed(123)
t <- tibble(ID = 1:100,
A = as.factor(sample(c(0, 1), 100, T)),
B = as.factor(sample(c("black", "white"), 100, T)),
C = as.factor(sample(c("pos", "neg", "either"), 100, T)))
# Candidate set of models - Binomial family because response variable
# is binary (0 for absent & 1 for present)
# Global model is A ~ B_black + C_either
m1 <- glm(A ~ 1, binomial, t)
m2 <- glm(A ~ B, binomial, t)
m3 <- glm(A ~ C, binomial, t)
m4 <- glm(A ~ B + C, binomial, t)
# List with all models
ms <- list(null = m1, m_B = m2, m_C = m3, m_BC = m4)
# Summary table
aic_tbl <- aictab(ms)
PROBLEM
I want to build a table like the one below that summarises the coefficients, standard errors, and Akaike weights of the models within my candidate set.
QUESTION
Can anyone suggest how to best build this table using my list of models and AIC table?
Just to point it out: broom gets you half-way to where you want to get by turning the model output into a dataframe, which you can then reshape.
library(broom)
bind_rows(lapply(ms, tidy), .id="key")
key term estimate std.error statistic p.value
1 null (Intercept) -0.12014431182649532 0.200 -0.59963969517107030 0.549
2 m_B (Intercept) 0.00000000000000123 0.283 0.00000000000000433 1.000
3 m_B Bwhite -0.24116205496397874 0.401 -0.60071814968372905 0.548
4 m_C (Intercept) -0.47957308026188367 0.353 -1.35892869678271544 0.174
5 m_C Cneg 0.80499548069651150 0.507 1.58784953814722285 0.112
6 m_C Cpos 0.30772282333522433 0.490 0.62856402205887851 0.530
7 m_BC (Intercept) -0.36339654526926718 0.399 -0.90984856337213305 0.363
8 m_BC Bwhite -0.25083209866475475 0.408 -0.61515191157571303 0.538
9 m_BC Cneg 0.81144822536950656 0.508 1.59682131202527056 0.110
10 m_BC Cpos 0.32706970242195277 0.492 0.66527127770403538 0.506
And if you must insist of the layout of your table, I came up with the following (arguably clumsy) way of rearranging everything:
out <- bind_rows(lapply(ms, tidy), .id="mod")
t1 <- out %>% select(mod, term, estimate) %>% spread(term, estimate) %>% base::t
t2 <- out %>% select(mod, term, std.error) %>% spread(term, std.error) %>% base::t
rownames(t2) <- paste0(rownames(t2), "_std_e")
tmp <- rbind(t1, t2[-1,])
new_t <- as.data.frame(tmp[-1,])
colnames(new_t) <- tmp[1,]
new_t
Alternatively, you may want to familiarise yourself with packages that are meant to display model output for publication, e.g. texreg or stargazer come to mind:
library(texreg)
screenreg(ms)
==================================================
null m_B m_C m_BC
--------------------------------------------------
(Intercept) -0.12 0.00 -0.48 -0.36
(0.20) (0.28) (0.35) (0.40)
Bwhite -0.24 -0.25
(0.40) (0.41)
Cneg 0.80 0.81
(0.51) (0.51)
Cpos 0.31 0.33
(0.49) (0.49)
--------------------------------------------------
AIC 140.27 141.91 141.66 143.28
BIC 142.87 147.12 149.48 153.70
Log Likelihood -69.13 -68.95 -67.83 -67.64
Deviance 138.27 137.91 135.66 135.28
Num. obs. 100 100 100 100
==================================================
*** p < 0.001, ** p < 0.01, * p < 0.05

How do I use the glm() function?

I'm trying to fit a general linear model (GLM) on my data using R. I have a Y continuous variable and two categorical factors, A and B. Each factor is coded as 0 or 1, for presence or absence.
Even if just looking at the data I see a clear interaction between A and B, the GLM says that p-value>>>0.05. Am I doing something wrong?
First of all I create the data frame including my data for the GLM, which consists on a Y dependent variable and two factors, A and B. These are two level factors (0 and 1). There are 3 replicates per combination.
A<-c(0,0,0,1,1,1,0,0,0,1,1,1)
B<-c(0,0,0,0,0,0,1,1,1,1,1,1)
Y<-c(0.90,0.87,0.93,0.85,0.98,0.96,0.56,0.58,0.59,0.02,0.03,0.04)
my_data<-data.frame(A,B,Y)
Let’s see how it looks like:
my_data
## A B Y
## 1 0 0 0.90
## 2 0 0 0.87
## 3 0 0 0.93
## 4 1 0 0.85
## 5 1 0 0.98
## 6 1 0 0.96
## 7 0 1 0.56
## 8 0 1 0.58
## 9 0 1 0.59
## 10 1 1 0.02
## 11 1 1 0.03
## 12 1 1 0.04
As we can see just looking on the data, there is a clear interaction between factor A and factor B, as the value of Y dramatically decreases when A and B are present (that is A=1 and B=1). However, using the glm function I get no significant interaction between A and B, as p-value>>>0.05
attach(my_data)
## The following objects are masked _by_ .GlobalEnv:
##
## A, B, Y
my_glm<-glm(Y~A+B+A*B,data=my_data,family=binomial)
## Warning: non-integer #successes in a binomial glm!
summary(my_glm)
##
## Call:
## glm(formula = Y ~ A + B + A * B, family = binomial, data = my_data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.275191 -0.040838 0.003374 0.068165 0.229196
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.1972 1.9245 1.142 0.254
## A 0.3895 2.9705 0.131 0.896
## B -1.8881 2.2515 -0.839 0.402
## A:B -4.1747 4.6523 -0.897 0.370
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 7.86365 on 11 degrees of freedom
## Residual deviance: 0.17364 on 8 degrees of freedom
## AIC: 12.553
##
## Number of Fisher Scoring iterations: 6
While you state Y is continuous, the data shows that Y is rather a fraction. Hence, probably the reason you tried to apply GLM in the first place.
To model fractions (i.e. continuous values bounded by 0 and 1) can be done with logistic regression if certain assumptions are fullfilled. See the following cross-validated post for details: https://stats.stackexchange.com/questions/26762/how-to-do-logistic-regression-in-r-when-outcome-is-fractional. However, from the data description it is not clear that those assumptions are fullfilled.
An alternative to model fractions are beta regression or fractional repsonse models.
See below how to apply those methods to your data. The results of both methods are consistent in terms of signs and significance.
# Beta regression
install.packages("betareg")
library("betareg")
result.betareg <-betareg(Y~A+B+A*B,data=my_data)
summary(result.betareg)
# Call:
# betareg(formula = Y ~ A + B + A * B, data = my_data)
#
# Standardized weighted residuals 2:
# Min 1Q Median 3Q Max
# -2.7073 -0.4227 0.0682 0.5574 2.1586
#
# Coefficients (mean model with logit link):
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) 2.1666 0.2192 9.885 < 2e-16 ***
# A 0.6471 0.3541 1.828 0.0676 .
# B -1.8617 0.2583 -7.206 5.76e-13 ***
# A:B -4.2632 0.5156 -8.268 < 2e-16 ***
#
# Phi coefficients (precision model with identity link):
# Estimate Std. Error z value Pr(>|z|)
# (phi) 71.57 29.50 2.426 0.0153 *
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Type of estimator: ML (maximum likelihood)
# Log-likelihood: 24.56 on 5 Df
# Pseudo R-squared: 0.9626
# Number of iterations: 62 (BFGS) + 2 (Fisher scoring)
# ----------------------------------------------------------
# Fractional response model
install.packages("frm")
library("frm")
frm(Y,cbind(A, B, AB=A*B),linkfrac="logit")
*** Fractional logit regression model ***
# Estimate Std. Error t value Pr(>|t|)
# INTERCEPT 2.197225 0.157135 13.983 0.000 ***
# A 0.389465 0.530684 0.734 0.463
# B -1.888120 0.159879 -11.810 0.000 ***
# AB -4.174668 0.555642 -7.513 0.000 ***
#
# Note: robust standard errors
#
# Number of observations: 12
# R-squared: 0.992
The family=binomial implies Logit (Logistic) Regression, which is itself produces a binary result.
From Quick-R
Logistic Regression
Logistic regression is useful when you are predicting a binary outcome
from a set of continuous predictor variables. It is frequently
preferred over discriminant function analysis because of its less
restrictive assumptions.
The data shows an interaction. Try to fit a different model, logistic is not appropriate.
with(my_data, interaction.plot(A, B, Y, fixed = TRUE, col = 2:3, type = "l"))
An analysis of variance shows clear significance for all factors and interaction.
fit <- aov(Y~(A*B),data=my_data)
summary(fit)
Df Sum Sq Mean Sq F value Pr(>F)
A 1 0.2002 0.2002 130.6 3.11e-06 ***
B 1 1.1224 1.1224 732.0 3.75e-09 ***
A:B 1 0.2494 0.2494 162.7 1.35e-06 ***
Residuals 8 0.0123 0.0015
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

constrained multiple linear regression in R

Suppose I have to estimate coefficients a,b in regression:
y=a*x+b*z+c
I know in advance that y is always in range y>=0 and y<=x, but regression model produces sometimes y outside of this range.
Sample data:
mydata<-data.frame(y=c(0,1,3,4,9,11),x=c(1,3,4,7,10,11),z=c(1,1,1,9,6,7))
round(predict(lm(y~x+z,data=mydata)),2)
1 2 3 4 5 6
-0.87 1.79 3.12 4.30 9.34 10.32
First predicted value is <0.
I tried model without intercept: all predictions are >0, but third prediction of y is >x (4.03>3)
round(predict(lm(y~x+z-1,data=mydata)),2)
1 2 3 4 5 6
0.76 2.94 4.03 4.67 8.92 9.68
I also considered to model proportion y/x instead of y:
mydata$y2x<-mydata$y/mydata$x
round(predict(lm(y2x~x+z,data=mydata)),2)
1 2 3 4 5 6
0.15 0.39 0.50 0.49 0.97 1.04
round(predict(lm(y2x~x+z-1,data=mydata)),2)
1 2 3 4 5 6
0.08 0.33 0.46 0.47 0.99 1.07
But now sixth prediction is >1, but proportion should be in range [0,1].
I also tried to apply method where glm is used with offset option: Regression for a Rate variable in R
and
http://en.wikipedia.org/wiki/Poisson_regression#.22Exposure.22_and_offset
but this was not successfull.
Please note, in my data dependent variable: proportion y/x is both zero-inflated and one-inflated.
Any idea, what is suitable approach to build a model in R ('glm','lm')?
You're on the right track: if 0 ≤ y ≤ x then 0 ≤ (y/x) ≤ 1. This suggests fitting y/x to a logistic model in glm(...). Details are below, but considering that you've only got 6 points, this is a pretty good fit.
The major concern is that the model is not valid unless the error in (y/x) is Normal with constant variance (or, equivalently, the error in y increases with x). If this is true then we should get a (more or less) linear Q-Q plot, which we do.
One nuance: the interface to the glm logistic model wants two columns for y: "number of successes (S)" and "number of failures (F)". It then calculates the probability as S/(S+F). So we have to provide two columns which mimic this: y and x-y. Then glm(...) will calculate y/(y+(x-y)) = y/x.
Finally, the fit summary suggests that x is important and z may or may not be. You might want to try a model that excludes z and see if that improves AIC.
fit = glm(cbind(y,x-y)~x+z, data=mydata, family=binomial(logit))
summary(fit)
# Call:
# glm(formula = cbind(y, x - y) ~ x + z, family = binomial(logit),
# data = mydata)
# Deviance Residuals:
# 1 2 3 4 5 6
# -0.59942 -0.35394 0.62705 0.08405 -0.75590 0.81160
# Coefficients:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) -2.0264 1.2177 -1.664 0.0961 .
# x 0.6786 0.2695 2.518 0.0118 *
# z -0.2778 0.1933 -1.437 0.1507
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# (Dispersion parameter for binomial family taken to be 1)
# Null deviance: 13.7587 on 5 degrees of freedom
# Residual deviance: 2.1149 on 3 degrees of freedom
# AIC: 15.809
par(mfrow=c(2,2))
plot(fit) # residuals, Q-Q, Scale-Location, and Leverage Plots
mydata$pred <- predict(fit, type="response")
par(mfrow=c(1,1))
plot(mydata$y/mydata$x,mydata$pred,xlim=c(0,1),ylim=c(0,1), xlab="Actual", ylab="Predicted")
abline(0,1, lty=2, col="blue")

Resources