extract summary from matrix lm object - r

I'd like to get the coefficients from the summary section of an lm object, except I inputted a matrix and I am getting null for the summary part. Here is my code:
n=12
y=rnorm(n,23,1)
x1=rnorm(n,23,1)
x2=rnorm(n,15.5,1)
lm1=lm(y~x1+x2)
n2=10
b0=4;b1=2;b2=3
sim1<-function(){
randmat=matrix(rnorm(n*n2,0,8),n,n2)
x1mat=matrix(x1,n,n2)
x2mat=matrix(x2,n,n2)
return(b0+b1*x1mat+b2*x2mat+randmat)
}
sim1=sim1()
lm1=lm(sim1~x1+x2)
c2=summary(lm1)$coefficients
> c2
NULL
what I want is this (but repeated):
lm2=lm(sim1[,1]~x1+x2)
summary(lm2)$coefficients
Does anyone know how to extract these? Thanks
-Rik

Another way is to do the following after the end of the following line of your code.
lm1=lm(sim1~x1+x2) #this runs 10 models
All the coefficients will be stored in the list summary(lm1) as Response Y1 ... to Response Y10 (i.e. 10 models as many as ncol(sim1)).
In order to get the coefficients from each model back you could do:
all_coef <- lapply( paste0('Response Y', 1:ncol(sim1)),
function(x) summary(lm1)[[x]]$coefficients)
Or as #Rik mentions in the comment it will be faster if summary(lm1) is not repeated in the lapply loop in case you have a big matrix.
the_sum <- summary(lm1)
all_coef <- lapply( paste0('Response Y', 1:ncol(sim1)),
function(x) the_sum[[x]]$coefficients)
And the output would be:
> all_coef
[[1]]
Estimate Std. Error t value Pr(>|t|)
(Intercept) 135.242552 80.136427 1.687654 0.1257496
x1 -4.777486 2.953534 -1.617549 0.1402142
x2 4.464435 3.891641 1.147186 0.2808857
[[2]]
Estimate Std. Error t value Pr(>|t|)
(Intercept) 119.1772823 111.603046 1.06786765 0.3133851
x1 -0.1376013 4.113277 -0.03345297 0.9740435
x2 -1.2946027 5.419744 -0.23886785 0.8165585
[[3]]
Estimate Std. Error t value Pr(>|t|)
(Intercept) -51.329923 63.495202 -0.8084063 0.4397018
x1 3.721227 2.340199 1.5901325 0.1462682
x2 3.793981 3.083498 1.2304147 0.2497304
[[4]]
Estimate Std. Error t value Pr(>|t|)
(Intercept) 124.8606014 57.669842 2.16509352 0.05857967
x1 -1.2517705 2.125498 -0.58893044 0.57039201
x2 -0.1159803 2.800603 -0.04141263 0.96787111
#...and so on until 10
To get the individual coefficients for a model just do:
all_coef[[<the_number_you_want>]]

Related

Updating model factor levels within function not working properly

I'm running some GLM models in R on a some data related to feeding trials I am doing. I'm regressing my variables of interest on a two predictors: one factor with three levels and one continuous variable. I want to compare the intercepts for each level of the factor to one another to determine if they're different. To do this, I wrote a function (called interceptCompare in the reproducible code below) which relevels the factor and updates the model and then saves the results of each model. It's my quick way of doing all the pair-wise comparisons of the intercepts.
The problem is that when I run the function, it doesn't appear to properly update the model. Each item of the list returned is the same, when they should be changing so that each item has a different level of the factor as the "(Intercept)" that the other levels are being compared against. I suspect it has something to do with the environment of the function, but I'm not sure. I haven't been able to find a similar example on stackoverflow or google.
Here's what should be a reproducible example:
food <- as.factor(rep(c("a", "b", "c"), each = 20))
variable <- rbinom(60, 1, 0.7)
movement <- rgamma(60, 10, 2)
binomial.model <- glm(variable ~ food,
family = "binomial")
gamma.model <- glm(movement ~ food,
family = Gamma)
interceptCompare <- function(model, factor) {
results <- list() # empty list to store results
for (i in unique(factor)) {
factor <- relevel(factor, ref = i)
model <- update(model)
results[[i]] <- summary(model)$coefficients[1:3, ]
}
results <- lapply(results, function(x) round(x, 4))
return(results)
}
interceptCompare(binomial.model, food)
interceptCompare(gamma.model, food)
You will need to add one line, in order to change the data, and use it within the update:
interceptCompare <- function(model, factor) {
results <- list() # empty list to store results
s <- deparse(substitute(factor))#ADD THIS LINE
for (i in unique(factor)) {
factor <- relevel(factor, ref = i)
model[["model"]][[s]] <- factor #CHANGE THE DATA IN THE MODEL
model <- update(model,data=model[["model"]])# UPDATE THE MODEL
results[[i]] <- summary(model)$coefficients[1:3, ]
}
results <- lapply(results, function(x) round(x, 4))
return(results)
}
interceptCompare(binomial.model, food)
$a
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.3863 0.5590 2.4799 0.0131
foodb -0.7673 0.7296 -1.0516 0.2930
foodc -0.2877 0.7610 -0.3780 0.7054
$b
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.6190 0.4688 1.3205 0.1867
fooda 0.7673 0.7296 1.0516 0.2930
foodc 0.4796 0.6975 0.6876 0.4917
$c
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.0986 0.5164 2.1275 0.0334
foodb -0.4796 0.6975 -0.6876 0.4917
fooda 0.2877 0.7610 0.3780 0.7054
interceptCompare(gamma.model, food)
$a
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2246 0.0156 14.3919 0.0000
foodb -0.0170 0.0213 -0.8022 0.4257
foodc -0.0057 0.0218 -0.2608 0.7952
$b
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2076 0.0144 14.3919 0.0000
fooda 0.0170 0.0213 0.8022 0.4257
foodc 0.0114 0.0210 0.5421 0.5898
$c
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2189 0.0152 14.3919 0.0000
foodb -0.0114 0.0210 -0.5421 0.5898
fooda 0.0057 0.0218 0.2608 0.7952
You need to be much more careful when trying to swap out symbols from formulas which is what you are doing. You need to put it in terms that the R language can understand. You want to pass in the name "food" not the values stored in the "food" vector as you are now. Here's an update that seems to do what you were trying to do
interceptCompare <- function(model, factor) {
sym <- substitute(factor)
results <- list() # empty list to store results
for (i in unique(factor)) {
change <- eval(bquote(~.-.(sym)+relevel(.(sym), ref=.(i))))
new_model <- update(model, change)
results[[i]] <- summary(new_model)$coefficients[1:3, ]
}
results <- lapply(results, function(x) round(x, 4))
return(results)
}
Here we capture the name "food" with the substitute. Then we use bquote() to build a new formula that will remove the original value, and relevel the factor variable with a particular reference. Then we save this to a new object so we don't keep updating the same model. For the binomial.model, this returns
$`a`
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.8473 0.4879 1.7364 0.0825
relevel(food, ref = "a")b 0.0000 0.6901 0.0000 1.0000
relevel(food, ref = "a")c -0.8473 0.6619 -1.2801 0.2005
$b
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.8473 0.4879 1.7364 0.0825
relevel(food, ref = "b")a 0.0000 0.6901 0.0000 1.0000
relevel(food, ref = "b")c -0.8473 0.6619 -1.2801 0.2005
$c
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.0000 0.4472 0.0000 1.0000
relevel(food, ref = "c")a 0.8473 0.6619 1.2801 0.2005
relevel(food, ref = "c")b 0.8473 0.6619 1.2801 0.2005
You can see how it changed the ref= at each iteration

Replace underlying standard errors in lm() OLS model in R

I would like to run an ols model using lm() in R and replace the standard errors in the model. In the following example, I would like to replace each standard error with "2":
set.seed(123)
x <- rnorm(100)
y <- rnorm(100)
mod <- lm(y ~x)
ses <- c(2,2)
coef(summary(mod))[,2] <- ses
sqrt(diag(vcov(mod))) <- ses
Any thoughts on how to do this? Thanks.
Those assignments are not going to succeed. coef, sqrt and vcov are not going to pass those values "upstream". You could do this:
> false.summ <- coef(summary(mod))
> false.sqrt.vcov <- sqrt(diag(vcov(mod)))
> false.summ
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.10280305 0.09755118 -1.0538371 0.2945488
x -0.05247161 0.10687862 -0.4909459 0.6245623
> false.summ[ , 2] <- ses
> false.sqrt.vcov
(Intercept) x
0.09755118 0.10687862
> false.sqrt.vcov <- ses
You could also modify a summary-object at least the coef-matrix, but there is no "vcov" element in summary despite the fact that vcov does return a value.
> summ <- summary(mod)
> summ$coefficients[ , 2] <- ses
> coef(summ)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.10280305 2 -1.0538371 0.2945488
x -0.05247161 2 -0.4909459 0.6245623
> summ$vcov
NULL
> vcov(summ)
(Intercept) x
(Intercept) 0.009516233 -0.00103271
x -0.001032710 0.01142304:
If you wanted to change the output of vcov when applied to a summary object you would need to distort the unscaled cov-matrix. This is the code that vcov uses for that object-class:
> getAnywhere(vcov.summary.lm)
A single object matching ‘vcov.summary.lm’ was found
It was found in the following places
registered S3 method for vcov from namespace stats
namespace:stats
with value
function (object, ...)
object$sigma^2 * object$cov.unscaled
<bytecode: 0x7fb63c784068>
<environment: namespace:stats>

Extracting coefficients from a regression 1 model with 1 predictor

I currently have the following regression model:
> print(summary(step1))
Call:
lm(formula = model1, data = newdat1)
Residuals:
Min 1Q Median 3Q Max
-2.53654 -0.02423 -0.02423 -0.02423 1.71962
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.3962 0.0532 7.446 2.76e-12 ***
i2 0.6281 0.0339 18.528 < 2e-16 ***
I would like just the following returned as a data frame:
Estimate Std. Error t value Pr(>|t|)
i2 0.6281 0.0339 18.528 < 2e-16
I currently have the following code:
> results1<-as.data.frame(summary(step1)$coefficients[-1,drop=FALSE])
Which yields:
> results1
summary(step1)$coefficients[-1, drop = FALSE]
1 6.280769e-01
2 5.320108e-02
3 3.389873e-02
4 7.446350e+00
5 1.852804e+01
6 2.764836e-12
7 2.339089e-45
Thus is not what I want; however, it does work when there's more than 1 predictor.
It would be nice if you gave a reproducible example. I think you're looking for
cc <- coef(summary(step1))[2,,drop=FALSE]
as.data.frame(cc)
Using accessors such as coef(summary(.)) rather than summary(.)$coefficients is both prettier and more robust (there is no guarantee that the internal structure of summary() will stay the same -- although admittedly it's unlikely that this basic a part of R will change any time soon, especially as many users probably have used constructions like $coefficients).
Indexing the row by name, i.e.
coef(summary(step1))["i2",,drop=FALSE]
would probably be even better.
summary(step1)$coefficients is a matrix. When you take out the first element with [-1, drop=FALSE] it is converted to a vector, which is why you get 7 numbers instead of the row you want.
> set.seed(123)
> x <- rnorm(100)
> y <- -1 + 0.2*x + rnorm(100)
> step1 <- lm(y ~ x)
> class(summary(step1)$coefficients)
[1] "matrix"
> class(summary(step1)$coefficients[-1, drop=FALSE])
[1] "numeric"
The solution is to change the subsetting with [ so that you specify you wan to keep all columns (see ?`[`):
> summary(step1)$coefficients[-1, , drop=FALSE]
Estimate Std. Error t value Pr(>|t|)
x 0.1475284 0.1068786 1.380336 0.1706238

extracting linear model coefficients into a vector within a loop

I am trying to create sample of 200 linear model coefficients using a loop in R. As an end result, I want a vector containing the coefficients.
for (i in 1:200) {
smpl_5 <- population[sample(1:1000, 5), ]
model_5 <- summary(lm(y~x, data=smpl_5))
}
I can extract the coefficients easy enough, but I am having trouble outputting them into a vector within the loop. Any Suggestions?
You can use replicate for this if you like. In your case, because the number of coefficients is identical for all models, it'll return an array as shown in the example below:
d <- data.frame(x=runif(1000))
d$y <- d$x * 0.123 + rnorm(1000, 0, 0.01)
coefs <- replicate(3, {
xy <- d[sample(nrow(d), 100), ]
coef(summary(lm(y~x, data=xy)))
})
coefs
# , , 1
#
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.001361961 0.002091297 0.6512516 5.164083e-01
# x 0.121142447 0.003624717 33.4212114 2.235307e-55
#
# , , 2
#
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.003213314 0.001967050 1.63357 1.055579e-01
# x 0.118026828 0.003332906 35.41259 1.182027e-57
#
# , , 3
#
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.003366678 0.001990226 1.691606 9.389883e-02
# x 0.119408470 0.003370190 35.430783 1.128070e-57
Access particular elements with normal array indexing, e.g.:
coefs[, , 1] # return the coefs for the first model
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.001361961 0.002091297 0.6512516 5.164083e-01
# x 0.121142447 0.003624717 33.4212114 2.235307e-55
So, for your problem, you could use:
replicate(200, {
smpl_5 <- population[sample(1:1000, 5), ]
coef(summary(lm(y~x, data=smpl_5)))
})

Obtain standard errors of regression coefficients for an "mlm" object returned by `lm()`

I'd like to run 10 regressions against the same regressor, then pull all the standard errors without using a loop.
depVars <- as.matrix(data[,1:10]) # multiple dependent variables
regressor <- as.matrix([,11]) # independent variable
allModels <- lm(depVars ~ regressor) # multiple, single variable regressions
summary(allModels)[1] # Can "view" the standard error for 1st regression, but can't extract...
allModels is stored as an "mlm" object, which is really tough to work with. It'd be great if I could store a list of lm objects or a matrix with statistics of interest.
Again, the objective is to NOT use a loop. Here is a loop equivalent:
regressor <- as.matrix([,11]) # independent variable
for(i in 1:10) {
tempObject <- lm(data[,i] ~ regressor) # single regressions
table1Data[i,1] <- summary(tempObject)$coefficients[2,2] # assign std error
rm(tempObject)
}
If you put your data in long format it's very easy to get a bunch of regression results using lmList from the nlme or lme4 packages. The output is a list of regression results and the summary can give you a matrix of coefficients, just like you wanted.
library(lme4)
m <- lmList( y ~ x | group, data = dat)
summary(m)$coefficients
Those coefficients are in a simple 3 dimensional array so the standard errors are at [,2,2].
Given an "mlm" model object model, you can use the below function written by me to get standard errors of coefficients. This is very efficient: no loop, and no access to summary.mlm().
std_mlm <- function (model) {
Rinv <- with(model$qr, backsolve(qr, diag(rank)))
## unscaled standard error
std_unscaled <- sqrt(rowSums(Rinv ^ 2)[order(model$qr$pivot)])
## residual standard error
sigma <- sqrt(colSums(model$residuals ^ 2) / model$df.residual)
## return final standard error
## each column corresponds to a model
"dimnames<-"(outer(std_unscaled, sigma), list = dimnames(model$coefficients))
}
A simple, reproducible example
set.seed(0)
Y <- matrix(rnorm(50 * 5), 50) ## assume there are 5 responses
X <- rnorm(50) ## covariate
fit <- lm(Y ~ X)
We all know that it is simple to extract estimated coefficients via:
fit$coefficients ## or `coef(fit)`
# [,1] [,2] [,3] [,4] [,5]
#(Intercept) -0.21013925 0.1162145 0.04470235 0.08785647 0.02146662
#X 0.04110489 -0.1954611 -0.07979964 -0.02325163 -0.17854525
Now let's apply our std_mlm:
std_mlm(fit)
# [,1] [,2] [,3] [,4] [,5]
#(Intercept) 0.1297150 0.1400600 0.1558927 0.1456127 0.1186233
#X 0.1259283 0.1359712 0.1513418 0.1413618 0.1151603
We can of course, call summary.mlm just to check our result is correct:
coef(summary(fit))
#Response Y1 :
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) -0.21013925 0.1297150 -1.6200072 0.1117830
#X 0.04110489 0.1259283 0.3264151 0.7455293
#
#Response Y2 :
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.1162145 0.1400600 0.8297485 0.4107887
#X -0.1954611 0.1359712 -1.4375183 0.1570583
#
#Response Y3 :
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.04470235 0.1558927 0.2867508 0.7755373
#X -0.07979964 0.1513418 -0.5272811 0.6004272
#
#Response Y4 :
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.08785647 0.1456127 0.6033574 0.5491116
#X -0.02325163 0.1413618 -0.1644831 0.8700415
#
#Response Y5 :
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.02146662 0.1186233 0.1809646 0.8571573
#X -0.17854525 0.1151603 -1.5504057 0.1276132
Yes, all correct!
Here an option:
put your data in the long format using regressor as an id key.
do your regression against value by group of variable.
For example , using mtcars data set:
library(reshape2)
dat.m <- melt(mtcars,id.vars='mpg') ## mpg is my regressor
library(plyr)
ddply(dat.m,.(variable),function(x)coef(lm(variable~value,data=x)))
variable (Intercept) value
1 cyl 1 8.336774e-18
2 disp 1 6.529223e-19
3 hp 1 1.106781e-18
4 drat 1 -1.505237e-16
5 wt 1 8.846955e-17
6 qsec 1 6.167713e-17
7 vs 1 2.442366e-16
8 am 1 -3.381738e-16
9 gear 1 -8.141220e-17
10 carb 1 -6.455094e-17

Resources