How to find intercept coefficients after setting restrictions with linearHypothese - r

So I built my logistic regression model using glm(). When I display the summary I get this with values for each variable.
Coefficients:
Estimate Std. Error z value Pr(>|z|)
After I have set a restriction using linearHypotheses but I only get results for chisquared, residuals df, and df. Am I able to see what the coefficients are for the model with the restriction?
library(car)
library(tidyverse)
library(dplyr)
logitmodel <- glm(x~ratio+x+y, family=binomial(link="logit"))
nullhypothese <- "x=0"
restrictedmodel <- linearHypothesis(logitmodel, nullhypothesis)

Related

Equivalence of a mixed model fitted by lme and lmer

I have fitted a mixed effects model considering both functions widely used in R, namely: the lme function from the nlme package and the lmer function from the lme4 package.
To readjust the model from lme to lme4, following the same reparametrization, I used the following information from this topic, being that is only possible to do this in lme4 in a hackable way.: Heterocesdastic model of mixed effects via lmer function
I apologize for hosting the data in a link, however, I couldn't find an internal R database that has variables that might match my problem.
Data: https://drive.google.com/file/d/1jKFhs4MGaVxh-OPErvLDfMNmQBouywoM/view?usp=sharing
The fitted models were:
library(nlme)
library(lme4)
ModLME = lme(Var1~I(Var2)+I(Var2^2),
random = ~1|Var3,
weights = varIdent(form=~1|Var4),
Dataone, method="REML")
ModLMER = lmer(Var1~I(Var2)+I(Var2^2)+(1|Var3)+(0+dummy(Var4,"1")|Var5),
Dataone, REML = TRUE,
control=lmerControl(check.nobs.vs.nlev="ignore",
check.nobs.vs.nRE="ignore"))
Which are equivalent, see:
all.equal(REMLcrit(ModLMER), c(-2*logLik(ModLME)))
[1] TRUE
all.equal(fixef(ModLME), fixef(ModLMER), tolerance=1e-7)
[1] TRUE
> summary(ModLME)
Linear mixed-effects model fit by REML
Data: Dataone
AIC BIC logLik
-209.1431 -193.6948 110.5715
Random effects:
Formula: ~1 | Var3
(Intercept) Residual
StdDev: 0.05789852 0.03636468
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | Var4
Parameter estimates:
0 1
1.000000 5.641709
Fixed effects: Var1 ~ I(Var2) + I(Var2^2)
Value Std.Error DF t-value p-value
(Intercept) 0.9538547 0.01699642 97 56.12093 0
I(Var2) -0.5009804 0.09336479 97 -5.36584 0
I(Var2^2) -0.4280151 0.10038257 97 -4.26384 0
summary(ModLMER)
Linear mixed model fit by REML. t-tests use Satterthwaites method [lmerModLmerTest]
Formula: Var1 ~ I(Var2) + I(Var2^2) + (1 | Var3) + (0 + dummy(Var4, "1") |
Var5)
Data: Dataone
Control: lmerControl(check.nobs.vs.nlev = "ignore", check.nobs.vs.nRE = "ignore")
REML criterion at convergence: -221.1
Scaled residuals:
Min 1Q Median 3Q Max
-4.1151 -0.5891 0.0374 0.5229 2.1880
Random effects:
Groups Name Variance Std.Dev.
Var3 (Intercept) 6.466e-12 2.543e-06
Var5 dummy(Var4, "1") 4.077e-02 2.019e-01
Residual 4.675e-03 6.837e-02
Number of obs: 100, groups: Var3, 100; Var5, 100
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.95385 0.01700 95.02863 56.121 < 2e-16 ***
I(Var2) -0.50098 0.09336 92.94048 -5.366 5.88e-07 ***
I(Var2^2) -0.42802 0.10038 91.64017 -4.264 4.88e-05 ***
However, when observing the residuals of these models, note that they are not similar. See that in the model adjusted by lmer, mysteriously appears a residue with the shape of a few points close to a straight line. So, how could you solve such a problem so that they are identical? I believe the problem is in the lme4 model.
aa=plot(ModLME, main="LME")
bb=plot(ModLMER, main="LMER")
gridExtra::grid.arrange(aa,bb,ncol=2)
I can tell you what's going on and what should in principle fix it, but at the moment the fix doesn't work ...
The residuals being plotted take all of the random effects into account, which in the case of the lmer fit includes the individual-level random effects (the (0+dummy(Var4,"1")|Var5) term), which leads to weird residuals for the Var4==1 group. To illustrate this:
plot(ModLMER, col = Dataone$Var4+1)
i.e., you can see that the weird residuals are exactly the ones in red == those for which Var4==1.
In theory we should be able to get the same residuals via:
res <- Dataone$Var1 - predict(ModLMER, re.form = ~(1|Var3))
i.e., ignore the group-specific observation-level random effect term. However, it looks like there is a bug at the moment ("contrasts can be applied only to factors with 2 or more levels").
An extremely hacky solution is to construct the random-effect predictions without the observation-level term yourself:
## fixed-effect predictions
p0 <- predict(ModLMER, re.form = NA)
## construct RE prediction, Var3 term only:
Z <- getME(ModLMER, "Z")
b <- drop(getME(ModLMER, "b"))
## zero out observation-level components
b[101:200] <- 0
## add RE predictions to fixed predictions
p1 <- drop(p0 + Z %*% b)
## plot fitted vs residual
plot(p1, Dataone$Var1 - p1)
For what it's worth, this also works:
library(glmmTMB)
ModGLMMTMB <- glmmTMB(Var1~I(Var2)+I(Var2^2)+(1|Var3),
dispformula = ~factor(Var4),
REML = TRUE,
data = Dataone)

retrieve formula used by predict function in exponential equation in R

I can't figure out how to reconstruct the results nor the formula from the predict function of a linear model. I get the same results also when using this data in ggplot geom_smooth(method='lm',formula,y ~ exp(x)).
Here's some sample data
x=c(1,10,100,1000,10000,100000,1000000,3000000)
y=c(1,1,10,15,20,30,40,60)
I would like to use an exponential function so (ignore for the moment that I log the x value, because exp() fails for very large values):
model = lm( y ~ exp(log10(x)))
mypred = predict(model)
plot(log(x),mypred)
I have tried
lm_coef <- coef(model)
plot(log10(x),lm_coef[1]*exp(-lm_coef[2]*x))
However this is giving me a decreasing exponential instead of the increasing. My goal is to extract the equation of the exponential function so I can reuse the coefficients in another context.. What equation is predict() using and is there a way to see it?
I did something along the lines of:
Df<-data.frame(x=c(1,10,100,1000,10000,100000,1000000,3000000),
y=c(1,1,10,15,20,30,40,60))
model<-lm(data = Df, formula = y~log(x))
predict(model)
plot(log(Df$x),predict(model))
summary(model)
The relevant output you get is:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6.0700 4.7262 -1.284 0.246386
log(x) 3.5651 0.5035 7.081 0.000398 ***
---
Your equation therefore is 3.5651*log(x)-6.0700

bootstrap standard errors of a linear regression in R

I have a lm object and I would like to bootstrap only its standard errors. In practice I want to use only part of the sample (with replacement) at each replication and get a distribution of standard erros. Then, if possible, I would like to display the summary of the original linear regression but with the bootstrapped standard errors and the corresponding p-values (in other words same beta coefficients but different standard errors).
Edited: In summary I want to "modify" my lm object by having the same beta coefficients of the original lm object that I ran on the original data, but having the bootstrapped standard errors (and associated t-stats and p-values) obtained by computing this lm regression several times on different subsamples (with replacement).
So my lm object looks like
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.812793 0.095282 40.016 < 2e-16 ***
x -0.904729 0.284243 -3.183 0.00147 **
z 0.599258 0.009593 62.466 < 2e-16 ***
x*z 0.091511 0.029704 3.081 0.00208 **
but the associated standard errors are wrong, and I would like to estimate them by replicating this linear regression 1000 times (replications) on different subsample (with replacement).
Is there a way to do this? can anyone help me?
Thank you for your time.
Marco
What you ask can be done following the line of the code below.
Since you have not posted an example dataset nor the model to fit, I will use the built in dataset mtcars an a simple formula with two continuous predictors.
library(boot)
boot_function <- function(data, indices, formula){
d <- data[indices, ]
obj <- lm(formula, d)
coefs <- summary(obj)$coefficients
coefs[, "Std. Error"]
}
set.seed(8527)
fmla <- as.formula("mpg ~ hp * cyl")
seboot <- boot(mtcars, boot_function, R = 1000, formula = fmla)
colMeans(seboot$t)
##[1] 6.511530646 0.068694001 1.000101450 0.008804784
I believe that it is possible to use the code above for most needs with numeric response and predictors.

Get variables from summary?

I want to grab the Standard Error column when I do summary on a linear regression model. The output is below:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.436954 0.616937 -13.676 < 2e-16 ***
x1 -0.138902 0.024247 -5.729 1.01e-08 ***
x2 0.005978 0.009142 0.654 0.51316 `
...
I just want the Std. Error column values stored into a vector. How would I go about doing so? I tried model$coefficients[,2] but that keeps giving me extra values. If anyone could help that would be great.
Say fit is the linear model, then summary(fit)$coefficients[,2] has the standard errors. Type ?summary.lm.
fit <- lm(y~x, myData)
summary(fit)$coefficients[,1] # the coefficients
summary(fit)$coefficients[,2] # the std. error in the coefficients
summary(fit)$coefficients[,3] # the t-values
summary(fit)$coefficients[,4] # the p-values

Using loops to do Chi-Square Test in R

I am new to R. I found the following code for doing univariate logistic regression for a set of variables. What i would like to do is run chi square test for a list of variables against the dependent variable, similar to the logistic regression code below. I found couple of them which involve creating all possible combinations of the variables, but I can't get it to work. Ideally, I want the one of the variables (X) to be the same.
Chi Square Analysis using for loop in R
lapply(c("age","sex","race","service","cancer",
"renal","inf","cpr","sys","heart","prevad",
"type","frac","po2","ph","pco2","bic","cre","loc"),
function(var) {
formula <- as.formula(paste("status ~", var))
res.logist <- glm(formula, data = icu, family = binomial)
summary(res.logist)
})
Are you sure that the strings in the vector you lapply over are in the column names of the icu dataset?
It works for me when I download the icu data:
system("wget http://course1.winona.edu/bdeppa/Biostatistics/Data%20Sets/ICU.TXT")
icu <- read.table('ICU.TXT', header=TRUE)
and change status to STA which is a column in icu. Here an example for some of your variables:
my.list <- lapply(c("Age","Sex","Race","Ser","Can"),
function(var) {
formula <- as.formula(paste("STA ~", var))
res.logist <- glm(formula, data = icu, family = binomial)
summary(res.logist)
})
This gives me a list with summary.glm objects. Example:
lapply(my.list, coefficients)
[[1]]
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.05851323 0.69608124 -4.393903 1.113337e-05
Age 0.02754261 0.01056416 2.607174 9.129303e-03
[[2]]
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.4271164 0.2273030 -6.2784758 3.419081e-10
Sex 0.1053605 0.3617088 0.2912855 7.708330e-01
[[3]]
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.0500583 0.4983146 -2.1072198 0.03509853
Race -0.2913384 0.4108026 -0.7091933 0.47820450
[[4]]
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.9465961 0.2310559 -4.096827 0.0000418852
Ser -0.9469461 0.3681954 -2.571858 0.0101154495
[[5]]
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.386294e+00 0.1863390 -7.439638e+00 1.009615e-13
Can 7.523358e-16 0.5892555 1.276756e-15 1.000000e+00
If you want to do a chi-square test:
my.list <- lapply(c("Age","Sex","Race","Ser","Can"),function(var)chisq.test(icu$STA, icu[,var]))
or a chi-square test for all combinations of variables:
my.list.all <- apply(combn(colnames(icu), 2), 2, function(x)chisq.test(icu[,x[1]], icu[,x[2]]))
Does this work?

Resources