Is there a standard way to estimate confidence interval for the variance parameter of a linear model with fixed-effect. E.g. given:
reg=lm(formula = 100/mpg ~ disp + hp + wt + am, data = mtcars)
how can I get the confidence interval for the variance parameter. confint only details fixed effect and lmer from lme4 does not accept model without level-2 random-effect, which is my case here.
Unfortunately, you have to implement it yourself.
Like so :
reg <- lm(formula = 100/mpg ~ disp + hp + wt + am, data = mtcars)
alpha <- 0.05
n <- length(resid(reg))
sigma <- summary(reg)$sigma
sigma*n/qchisq(1-alpha/2, df = n-2) ; sigma*n/qchisq(alpha/2, df = n-2)
> sigma*n/qchisq(1-alpha/2, df = n-2) ; sigma*n/qchisq(alpha/2, df = n-2)
[1] 0.4600539
[1] 1.287194
It comes from the relation :
I assume you are looking for the summary() function.
The code shows the following:
data(mtcars)
reg<-lm(formula = 100/mpg ~ disp + hp + wt + am, data = mtcars)
summary(reg)
# Call:
# lm(formula = 100/mpg ~ disp + hp + wt + am, data = mtcars)
#
# Residuals:
# Min 1Q Median 3Q Max
# -1.6923 -0.3901 0.0579 0.3649 1.2608
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.740648 0.738594 1.003 0.32487
# disp 0.002703 0.002715 0.996 0.32832
# hp 0.005275 0.003253 1.621 0.11657
# wt 1.001303 0.302761 3.307 0.00267 **
# am 0.155815 0.375515 0.415 0.68147
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.6754 on 27 degrees of freedom
# Multiple R-squared: 0.8527, Adjusted R-squared: 0.8309
# F-statistic: 39.08 on 4 and 27 DF, p-value: 7.369e-11
To select it, you can store the summary as a variable and select the coefficients.
summa<-summary(reg)
summa$coefficients
With that, one can select the sd covariate that you want and do the confidence interval with the % of interest. To learn the confidence interval, one can read how it is done here
R does it automatically using confint(object, parms, level)
In your case, confint(reg, level = 0.95)
Related
How do I change predictors in linear regression in loop in R?
Below is an example along with the error. Can someone please fix it.
# sample data
mpg <- mpg
str(mpg)
# array of predictors
predictors <- c("hwy", "cty")
# loop over predictors
for (predictor in predictors)
{
# fit linear regression
model <- lm(formula = predictor ~ displ + cyl,
data = mpg)
# summary of model
summary(model)
}
Error
Error in model.frame.default(formula = predictor ~ displ + cyl, data = mpg, :
variable lengths differ (found for 'displ')
We may use paste or reformulate. Also, as it is a for loop, create an object to store the output from summary
sumry_model <- vector('list', length(predictors))
names(sumry_model) <- predictors
for (predictor in predictors) {
# fit linear regression
model <- lm(reformulate(c("displ", "cyl"), response = predictor),
data = mpg)
# with paste
# model <- lm(formula = paste0(predictor, "~ displ + cyl"), data = mpg)
# summary of model
sumry_model[[predictor]] <- summary(model)
}
-output
> sumry_model
$hwy
Call:
lm(formula = reformulate(c("displ", "cyl"), response = predictor),
data = mpg)
Residuals:
Min 1Q Median 3Q Max
-7.5098 -2.1953 -0.2049 1.9023 14.9223
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 38.2162 1.0481 36.461 < 2e-16 ***
displ -1.9599 0.5194 -3.773 0.000205 ***
cyl -1.3537 0.4164 -3.251 0.001323 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.759 on 231 degrees of freedom
Multiple R-squared: 0.6049, Adjusted R-squared: 0.6014
F-statistic: 176.8 on 2 and 231 DF, p-value: < 2.2e-16
$cty
Call:
lm(formula = reformulate(c("displ", "cyl"), response = predictor),
data = mpg)
Residuals:
Min 1Q Median 3Q Max
-5.9276 -1.4750 -0.0891 1.0686 13.9261
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.2885 0.6876 41.139 < 2e-16 ***
displ -1.1979 0.3408 -3.515 0.000529 ***
cyl -1.2347 0.2732 -4.519 9.91e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.466 on 231 degrees of freedom
Multiple R-squared: 0.6671, Adjusted R-squared: 0.6642
F-statistic: 231.4 on 2 and 231 DF, p-value: < 2.2e-16
This may be also done as a multivariate response
summary(lm(cbind(hwy, cty) ~ displ + cyl, data = mpg))
Or if we want to use predictors
summary(lm(as.matrix(mpg[predictors]) ~ displ + cyl, data = mpg))
As a reproducible example, let's use the next no-sense example:
> library(glmmTMB)
> summary(glmmTMB(am ~ disp + hp + (1|carb), data = mtcars))
Family: gaussian ( identity )
Formula: am ~ disp + hp + (1 | carb)
Data: mtcars
AIC BIC logLik deviance df.resid
34.1 41.5 -12.1 24.1 27
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
carb (Intercept) 2.011e-11 4.485e-06
Residual 1.244e-01 3.528e-01
Number of obs: 32, groups: carb, 6
Dispersion estimate for gaussian family (sigma^2): 0.124
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.7559286 0.1502385 5.032 4.87e-07 ***
disp -0.0042892 0.0008355 -5.134 2.84e-07 ***
hp 0.0043626 0.0015103 2.889 0.00387 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Actually, my real model family is nbinom2. I want to make a contrast test between disp and hp. So, I try:
> glht(glmmTMB(am ~ disp + hp + (1|carb), data = mtcars), linfct = matrix(c(0,1,-1)))
Error in glht.matrix(glmmTMB(am ~ disp + hp + (1 | carb), data = mtcars), :
‘ncol(linfct)’ is not equal to ‘length(coef(model))’
How can I avoid this error?
Thank you!
The problem is actually fairly simple: linfct needs to be a matrix with the number of columns equal to the number of parameters. You specified matrix(c(0,1,-1)) without specifying numbers of rows or columns, so R made a column matrix by default. Adding nrow=1 seems to work.
library(glmmTMB)
library(multcomp)
m1<- glmmTMB(am ~ disp + hp + (1|carb), data = mtcars)
modelparm.glmmTMB <- function (model, coef. = function(x) fixef(x)[[component]],
vcov. = function(x) vcov(x)[[component]],
df = NULL, component="cond", ...) {
multcomp:::modelparm.default(model, coef. = coef., vcov. = vcov.,
df = df, ...)
}
glht(m1, linfct = matrix(c(0,1,-1),nrow=1))
Background: my data set has 52 rows and 12 columns (assume column names are A - L) and the name of my data set is foo
I am told to run a regression where foo$L is the dependent variable, and all other variables are independent except for foo$K.
The way i was doing it is
fit <- lm(foo$L ~ foo$a + ... +foo$J)
then calling
summary(fit)
Is my way a good way to run a regression and finding the intercept and coef?
Use the data argument to lm so you don't have to use the foo$ syntax for each predictor. Use dependent ~ . as the formula to have the dependent variable predicted by all other variables. Then you can use - K to exclude K:
data_mat = matrix(rnorm(52 * 12), nrow = 52)
df = as.data.frame(data_mat)
colnames(df) = LETTERS[1:12]
lm(L ~ . - K, data = df)
You can first remove the column K, and then do fit <- lm(L ~ ., data = foo). This will treat the L column as the dependent variable and all the other columns as the independent variables. You don't have to specify each column names in the formula.
Here is an example using the mtcars, fitting a multiple regression model to mpg with all the other variables except carb.
mtcars2 <- mtcars[, !names(mtcars) %in% "carb"]
fit <- lm(mpg ~ ., data = mtcars2)
summary(fit)
# Call:
# lm(formula = mpg ~ ., data = mtcars2)
#
# Residuals:
# Min 1Q Median 3Q Max
# -3.3038 -1.6964 -0.1796 1.1802 4.7245
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 12.83084 18.18671 0.706 0.48790
# cyl -0.16881 0.99544 -0.170 0.86689
# disp 0.01623 0.01290 1.259 0.22137
# hp -0.02424 0.01811 -1.339 0.19428
# drat 0.70590 1.56553 0.451 0.65647
# wt -4.03214 1.33252 -3.026 0.00621 **
# qsec 0.86829 0.68874 1.261 0.22063
# vs 0.36470 2.05009 0.178 0.86043
# am 2.55093 2.00826 1.270 0.21728
# gear 0.50294 1.32287 0.380 0.70745
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 2.593 on 22 degrees of freedom
# Multiple R-squared: 0.8687, Adjusted R-squared: 0.8149
# F-statistic: 16.17 on 9 and 22 DF, p-value: 9.244e-08
I often have to write long equations with controls variables that do not change.
For instance, hp is my variable of interest (x) that change between models and vs + am + gear + carb are my controls
lm(disp ~ hp + vs + am + gear + carb, mtcars)
Then my x is drat and then wt but my controls are the same.
lm(disp ~ drat + vs + am + gear + carb, mtcars)
lm(disp ~ wt + vs + am + gear + carb, mtcars)
I would find it quite useful sometimes to be able to reduce the equations to something like
y = 'disp'
x = 'hp'
controls = 'vs + am + gear + carb'
lm(y ~ x + controls, mtcars)
Any idea how I could achieve that?
The code below constructs a string formula (with a small edit to #ZheyuanLi's comment) to feed to lm and also uses the map function from purrr (a tidyverse package) to create a separate model for each variable in the x vector. Each element of the list models contains the model object and the name of the element is the value of x that was used in the model formula.
library(tidyverse)
y = 'disp'
x = c('hp','wt')
controls=c("vs","am","gear","carb")
models = map(setNames(x,x),
~ lm(paste(y, paste(c(.x, controls), collapse="+"), sep="~"),
data=mtcars))
map(models, summary)
$hp
Call:
lm(formula = paste(y, paste(c(.x, controls), collapse = "+"),
sep = "~"), data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-85.524 -19.153 1.109 14.957 115.804
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 261.9238 73.2477 3.576 0.0014 **
hp 1.2021 0.2453 4.900 4.38e-05 ***
vs -63.7135 26.5957 -2.396 0.0241 *
am -56.0468 30.7338 -1.824 0.0797 .
gear -31.6231 23.4816 -1.347 0.1897
carb -14.3237 10.1169 -1.416 0.1687
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 47.97 on 26 degrees of freedom
Multiple R-squared: 0.8743, Adjusted R-squared: 0.8502
F-statistic: 36.18 on 5 and 26 DF, p-value: 6.547e-11
$wt
Call:
lm(formula = paste(y, paste(c(.x, controls), collapse = "+"),
sep = "~"), data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-74.153 -36.993 -2.097 30.616 102.331
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.875 108.220 0.267 0.79172
wt 88.577 18.810 4.709 7.25e-05 ***
vs -92.669 25.186 -3.679 0.00107 **
am -3.734 34.662 -0.108 0.91503
gear -4.688 25.271 -0.186 0.85427
carb -8.455 9.662 -0.875 0.38955
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 48.88 on 26 degrees of freedom
Multiple R-squared: 0.8695, Adjusted R-squared: 0.8445
F-statistic: 34.66 on 5 and 26 DF, p-value: 1.056e-10
I want to use the partial least squares regression to find the most representative variables to predict my data.
Here is my code:
library(pls)
potion<-read.table("potion-insomnie.txt",header=T)
potionTrain <- potion[1:182,]
potionTest <- potion[183:192,]
potion1 <- plsr(Sommeil ~ Aubepine + Bave + Poudre + Pavot, data = potionTrain, validation = "LOO")
The summary(lm(potion1)) give me this answer:
Call:
lm(formula = potion1)
Residuals:
Min 1Q Median 3Q Max
-14.9475 -5.3961 0.0056 5.2321 20.5847
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.63931 1.67955 22.410 < 2e-16 ***
Aubepine -0.28226 0.05195 -5.434 1.81e-07 ***
Bave -1.79894 0.26849 -6.700 2.68e-10 ***
Poudre 0.35420 0.72849 0.486 0.627
Pavot -0.47678 0.52027 -0.916 0.361
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.845 on 177 degrees of freedom
Multiple R-squared: 0.293, Adjusted R-squared: 0.277
F-statistic: 18.34 on 4 and 177 DF, p-value: 1.271e-12
I deduced that only the variables Aubepine et Bave are representative. So I redid the model just with this two variables:
potion1 <- plsr(Sommeil ~ Aubepine + Bave, data = potionTrain, validation = "LOO")
And I plot:
plot(potion1, ncomp = 2, asp = 1, line = TRUE)
Here is the plot of predicted vs measured values:
The problem is that I see the linear regression on the plot, but I can not know its equation and R². Is it possible ?
Is the first part is the same as a multiple regression linear (ANOVA)?
pacman::p_load(pls)
data(mtcars)
potion <- mtcars
potionTrain <- potion[1:28,]
potionTest <- potion[29:32,]
potion1 <- plsr(mpg ~ cyl + disp + hp + drat, data = potionTrain, validation = "LOO")
coef(potion1) # coefficeints
scores(potion1) # scores
## R^2:
R2(potion1, estimate = "train")
## cross-validated R^2:
R2(potion1)
## Both:
R2(potion1, estimate = "all")