Extracting coefficients from a regression 1 model with 1 predictor - r

I currently have the following regression model:
> print(summary(step1))
Call:
lm(formula = model1, data = newdat1)
Residuals:
Min 1Q Median 3Q Max
-2.53654 -0.02423 -0.02423 -0.02423 1.71962
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.3962 0.0532 7.446 2.76e-12 ***
i2 0.6281 0.0339 18.528 < 2e-16 ***
I would like just the following returned as a data frame:
Estimate Std. Error t value Pr(>|t|)
i2 0.6281 0.0339 18.528 < 2e-16
I currently have the following code:
> results1<-as.data.frame(summary(step1)$coefficients[-1,drop=FALSE])
Which yields:
> results1
summary(step1)$coefficients[-1, drop = FALSE]
1 6.280769e-01
2 5.320108e-02
3 3.389873e-02
4 7.446350e+00
5 1.852804e+01
6 2.764836e-12
7 2.339089e-45
Thus is not what I want; however, it does work when there's more than 1 predictor.

It would be nice if you gave a reproducible example. I think you're looking for
cc <- coef(summary(step1))[2,,drop=FALSE]
as.data.frame(cc)
Using accessors such as coef(summary(.)) rather than summary(.)$coefficients is both prettier and more robust (there is no guarantee that the internal structure of summary() will stay the same -- although admittedly it's unlikely that this basic a part of R will change any time soon, especially as many users probably have used constructions like $coefficients).
Indexing the row by name, i.e.
coef(summary(step1))["i2",,drop=FALSE]
would probably be even better.

summary(step1)$coefficients is a matrix. When you take out the first element with [-1, drop=FALSE] it is converted to a vector, which is why you get 7 numbers instead of the row you want.
> set.seed(123)
> x <- rnorm(100)
> y <- -1 + 0.2*x + rnorm(100)
> step1 <- lm(y ~ x)
> class(summary(step1)$coefficients)
[1] "matrix"
> class(summary(step1)$coefficients[-1, drop=FALSE])
[1] "numeric"
The solution is to change the subsetting with [ so that you specify you wan to keep all columns (see ?`[`):
> summary(step1)$coefficients[-1, , drop=FALSE]
Estimate Std. Error t value Pr(>|t|)
x 0.1475284 0.1068786 1.380336 0.1706238

Related

R - Pass column names as Variable with names contain I()

I'm performing the polynomial regression and testing the linear combination of the coefficient. But I'm running to some problems that when I tried to test the linear combination of the coefficient.
LnModel_1 <- lm(formula = PROF ~ UI_1+UI_2+I(UI_1^2)+UI_1:UI_2+I(UI_2^2))
summary(LnModel_1)
It output the values below:
Call:
lm(formula = PROF ~ UI_1 + UI_2 + I(UI_1^2) + UI_1:UI_2 + I(UI_2^2))
Residuals:
Min 1Q Median 3Q Max
-3.4492 -0.5405 0.1096 0.4226 1.7346
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.66274 0.06444 72.354 < 2e-16 ***
UI_1 0.25665 0.07009 3.662 0.000278 ***
UI_2 0.25569 0.09221 2.773 0.005775 **
I(UI_1^2) -0.15168 0.04490 -3.378 0.000789 ***
I(UI_2^2) -0.08418 0.05162 -1.631 0.103643
UI_1:UI_2 -0.02849 0.05453 -0.522 0.601621
Then I use names(coef()) to extract the coefficient names
names(coef(LnModel_1))
output:
[1] "(Intercept)" "UI_1" "UI_2" "I(UI_1^2)"
"I(UI_2^2)""UI_1:UI_2"
For some reasons, when I use glht(), it give me an error on UI_2^2
slope <- glht(LnModel_1, linfct = c("UI_2+ UI_1:UI_2*2.5+ 2*2.5*I(UI_2^2) =0
") )
Output:
Error: multcomp:::expression2coef::walkCode::eval: within ‘UI_2^2’, the term
‘UI_2’ must not denote an effect. Apart from that, the term must evaluate to
a real valued constant
Don't know why it would give me this error message. How to input the I(UI_2^2) coefficient to the glht()
Thank you very much
The issue seems to be that I(UI^2) can be interpreted as an expression in R in the same fashion you did here LnModel_1 <- lm(formula = PROF ~ UI_1+UI_2+I(UI_1^2)+UI_1:UI_2+I(UI_2^2))
Therefore, you should indicate R that you want to evaluate a string inside your string:
slope <- glht(LnModel_1, linfct = c("UI_2+ UI_1:UI_2*2.5+ 2*2.5*\`I(UI_2^2)\` =0
") )
Check my example (since I cannot reproduce your problem):
library(multcomp)
cars <- copy(mtcars)
setnames(cars, "disp", "UI_2")
model <- lm(mpg~I(UI_2^2),cars)
names(coef(model))
slope <- glht(model, linfct = c("2*2.5*\`I(UI_2^2)\` =0") )

extract summary from matrix lm object

I'd like to get the coefficients from the summary section of an lm object, except I inputted a matrix and I am getting null for the summary part. Here is my code:
n=12
y=rnorm(n,23,1)
x1=rnorm(n,23,1)
x2=rnorm(n,15.5,1)
lm1=lm(y~x1+x2)
n2=10
b0=4;b1=2;b2=3
sim1<-function(){
randmat=matrix(rnorm(n*n2,0,8),n,n2)
x1mat=matrix(x1,n,n2)
x2mat=matrix(x2,n,n2)
return(b0+b1*x1mat+b2*x2mat+randmat)
}
sim1=sim1()
lm1=lm(sim1~x1+x2)
c2=summary(lm1)$coefficients
> c2
NULL
what I want is this (but repeated):
lm2=lm(sim1[,1]~x1+x2)
summary(lm2)$coefficients
Does anyone know how to extract these? Thanks
-Rik
Another way is to do the following after the end of the following line of your code.
lm1=lm(sim1~x1+x2) #this runs 10 models
All the coefficients will be stored in the list summary(lm1) as Response Y1 ... to Response Y10 (i.e. 10 models as many as ncol(sim1)).
In order to get the coefficients from each model back you could do:
all_coef <- lapply( paste0('Response Y', 1:ncol(sim1)),
function(x) summary(lm1)[[x]]$coefficients)
Or as #Rik mentions in the comment it will be faster if summary(lm1) is not repeated in the lapply loop in case you have a big matrix.
the_sum <- summary(lm1)
all_coef <- lapply( paste0('Response Y', 1:ncol(sim1)),
function(x) the_sum[[x]]$coefficients)
And the output would be:
> all_coef
[[1]]
Estimate Std. Error t value Pr(>|t|)
(Intercept) 135.242552 80.136427 1.687654 0.1257496
x1 -4.777486 2.953534 -1.617549 0.1402142
x2 4.464435 3.891641 1.147186 0.2808857
[[2]]
Estimate Std. Error t value Pr(>|t|)
(Intercept) 119.1772823 111.603046 1.06786765 0.3133851
x1 -0.1376013 4.113277 -0.03345297 0.9740435
x2 -1.2946027 5.419744 -0.23886785 0.8165585
[[3]]
Estimate Std. Error t value Pr(>|t|)
(Intercept) -51.329923 63.495202 -0.8084063 0.4397018
x1 3.721227 2.340199 1.5901325 0.1462682
x2 3.793981 3.083498 1.2304147 0.2497304
[[4]]
Estimate Std. Error t value Pr(>|t|)
(Intercept) 124.8606014 57.669842 2.16509352 0.05857967
x1 -1.2517705 2.125498 -0.58893044 0.57039201
x2 -0.1159803 2.800603 -0.04141263 0.96787111
#...and so on until 10
To get the individual coefficients for a model just do:
all_coef[[<the_number_you_want>]]

extracting linear model coefficients into a vector within a loop

I am trying to create sample of 200 linear model coefficients using a loop in R. As an end result, I want a vector containing the coefficients.
for (i in 1:200) {
smpl_5 <- population[sample(1:1000, 5), ]
model_5 <- summary(lm(y~x, data=smpl_5))
}
I can extract the coefficients easy enough, but I am having trouble outputting them into a vector within the loop. Any Suggestions?
You can use replicate for this if you like. In your case, because the number of coefficients is identical for all models, it'll return an array as shown in the example below:
d <- data.frame(x=runif(1000))
d$y <- d$x * 0.123 + rnorm(1000, 0, 0.01)
coefs <- replicate(3, {
xy <- d[sample(nrow(d), 100), ]
coef(summary(lm(y~x, data=xy)))
})
coefs
# , , 1
#
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.001361961 0.002091297 0.6512516 5.164083e-01
# x 0.121142447 0.003624717 33.4212114 2.235307e-55
#
# , , 2
#
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.003213314 0.001967050 1.63357 1.055579e-01
# x 0.118026828 0.003332906 35.41259 1.182027e-57
#
# , , 3
#
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.003366678 0.001990226 1.691606 9.389883e-02
# x 0.119408470 0.003370190 35.430783 1.128070e-57
Access particular elements with normal array indexing, e.g.:
coefs[, , 1] # return the coefs for the first model
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.001361961 0.002091297 0.6512516 5.164083e-01
# x 0.121142447 0.003624717 33.4212114 2.235307e-55
So, for your problem, you could use:
replicate(200, {
smpl_5 <- population[sample(1:1000, 5), ]
coef(summary(lm(y~x, data=smpl_5)))
})

Obtain standard errors of regression coefficients for an "mlm" object returned by `lm()`

I'd like to run 10 regressions against the same regressor, then pull all the standard errors without using a loop.
depVars <- as.matrix(data[,1:10]) # multiple dependent variables
regressor <- as.matrix([,11]) # independent variable
allModels <- lm(depVars ~ regressor) # multiple, single variable regressions
summary(allModels)[1] # Can "view" the standard error for 1st regression, but can't extract...
allModels is stored as an "mlm" object, which is really tough to work with. It'd be great if I could store a list of lm objects or a matrix with statistics of interest.
Again, the objective is to NOT use a loop. Here is a loop equivalent:
regressor <- as.matrix([,11]) # independent variable
for(i in 1:10) {
tempObject <- lm(data[,i] ~ regressor) # single regressions
table1Data[i,1] <- summary(tempObject)$coefficients[2,2] # assign std error
rm(tempObject)
}
If you put your data in long format it's very easy to get a bunch of regression results using lmList from the nlme or lme4 packages. The output is a list of regression results and the summary can give you a matrix of coefficients, just like you wanted.
library(lme4)
m <- lmList( y ~ x | group, data = dat)
summary(m)$coefficients
Those coefficients are in a simple 3 dimensional array so the standard errors are at [,2,2].
Given an "mlm" model object model, you can use the below function written by me to get standard errors of coefficients. This is very efficient: no loop, and no access to summary.mlm().
std_mlm <- function (model) {
Rinv <- with(model$qr, backsolve(qr, diag(rank)))
## unscaled standard error
std_unscaled <- sqrt(rowSums(Rinv ^ 2)[order(model$qr$pivot)])
## residual standard error
sigma <- sqrt(colSums(model$residuals ^ 2) / model$df.residual)
## return final standard error
## each column corresponds to a model
"dimnames<-"(outer(std_unscaled, sigma), list = dimnames(model$coefficients))
}
A simple, reproducible example
set.seed(0)
Y <- matrix(rnorm(50 * 5), 50) ## assume there are 5 responses
X <- rnorm(50) ## covariate
fit <- lm(Y ~ X)
We all know that it is simple to extract estimated coefficients via:
fit$coefficients ## or `coef(fit)`
# [,1] [,2] [,3] [,4] [,5]
#(Intercept) -0.21013925 0.1162145 0.04470235 0.08785647 0.02146662
#X 0.04110489 -0.1954611 -0.07979964 -0.02325163 -0.17854525
Now let's apply our std_mlm:
std_mlm(fit)
# [,1] [,2] [,3] [,4] [,5]
#(Intercept) 0.1297150 0.1400600 0.1558927 0.1456127 0.1186233
#X 0.1259283 0.1359712 0.1513418 0.1413618 0.1151603
We can of course, call summary.mlm just to check our result is correct:
coef(summary(fit))
#Response Y1 :
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) -0.21013925 0.1297150 -1.6200072 0.1117830
#X 0.04110489 0.1259283 0.3264151 0.7455293
#
#Response Y2 :
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.1162145 0.1400600 0.8297485 0.4107887
#X -0.1954611 0.1359712 -1.4375183 0.1570583
#
#Response Y3 :
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.04470235 0.1558927 0.2867508 0.7755373
#X -0.07979964 0.1513418 -0.5272811 0.6004272
#
#Response Y4 :
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.08785647 0.1456127 0.6033574 0.5491116
#X -0.02325163 0.1413618 -0.1644831 0.8700415
#
#Response Y5 :
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.02146662 0.1186233 0.1809646 0.8571573
#X -0.17854525 0.1151603 -1.5504057 0.1276132
Yes, all correct!
Here an option:
put your data in the long format using regressor as an id key.
do your regression against value by group of variable.
For example , using mtcars data set:
library(reshape2)
dat.m <- melt(mtcars,id.vars='mpg') ## mpg is my regressor
library(plyr)
ddply(dat.m,.(variable),function(x)coef(lm(variable~value,data=x)))
variable (Intercept) value
1 cyl 1 8.336774e-18
2 disp 1 6.529223e-19
3 hp 1 1.106781e-18
4 drat 1 -1.505237e-16
5 wt 1 8.846955e-17
6 qsec 1 6.167713e-17
7 vs 1 2.442366e-16
8 am 1 -3.381738e-16
9 gear 1 -8.141220e-17
10 carb 1 -6.455094e-17

Extract columns from list of coeftest objects

Is there a function that can extract two or more columns from a coeftest object? This is easy one coeftest object at a time, but can I do the same to a list (other than a for() loop)?
> # meaningless data
> temp <- data.frame(a = rnorm(100, mean = 5), b = rnorm(100, mean = 1),
+ c = 1:100)
> formulas <- list(a ~ b, a ~ c)
> models <- lapply(formulas, lm, data = temp)
> library(lmtest)
> cts <- lapply(models, coeftest)
> # easy to extract columns one object at a time
> cts[[1]][, 1:2]
Estimate Std. Error
(Intercept) 5.0314196 0.1333705
b -0.1039264 0.0987044
> # but more difficult algorithmically
> # either one column
> lapply(cts, "[[", 1)
[[1]]
[1] 5.03142
[[2]]
[1] 5.312007
> # or two
> lapply(cts, "[[", 1:2)
Error in FUN(X[[1L]], ...) : attempt to select more than one element
Maybe the more fundamental question is if there is a way to turn the meat of the coeftest object into a data frame, which would allow me to extract columns singly, then use mapply(). Thanks!
Edit: I would like to end up with a matrices (or data frames) with the first and second columns.
[[1]]
Estimate Std. Error
(Intercept) 5.0314196 0.1333705
b -0.1039264 0.0987044
[[2]]
Estimate Std. Error
(Intercept) 5.312007153 0.199485363
c -0.007378529 0.003429477
[[ is the wrong subset function in this case. Note that when you lapply() over a list, what you are operating on are the components of the list, the bits you would get with list[[i]] where i is the ith component.
As such, you only need the [, 1:2] bit of cts[[1]][, 1:2] in the lapply() call. It is a little bit trickier because of the arguments for [, but easily doable with lapply():
> lapply(cts, `[`, , 1:2)
[[1]]
Estimate Std. Error
(Intercept) 4.926679544 0.1549482
b -0.001967657 0.1062437
[[2]]
Estimate Std. Error
(Intercept) 4.849041327 0.204342067
c 0.001494454 0.003512972
Note the <space>, before 1:2; this is the equivalent of [ , 1:2].
I'm not sure if this is what you want, but how about:
> do.call("rbind", cts)[, 1:2]
Estimate Std. Error
(Intercept) 4.8200993881 0.142381642
b -0.0421189130 0.092620363
(Intercept) 4.7459340076 0.206372906
c 0.0005770324 0.003547885

Resources