Extract columns from list of coeftest objects - r

Is there a function that can extract two or more columns from a coeftest object? This is easy one coeftest object at a time, but can I do the same to a list (other than a for() loop)?
> # meaningless data
> temp <- data.frame(a = rnorm(100, mean = 5), b = rnorm(100, mean = 1),
+ c = 1:100)
> formulas <- list(a ~ b, a ~ c)
> models <- lapply(formulas, lm, data = temp)
> library(lmtest)
> cts <- lapply(models, coeftest)
> # easy to extract columns one object at a time
> cts[[1]][, 1:2]
Estimate Std. Error
(Intercept) 5.0314196 0.1333705
b -0.1039264 0.0987044
> # but more difficult algorithmically
> # either one column
> lapply(cts, "[[", 1)
[[1]]
[1] 5.03142
[[2]]
[1] 5.312007
> # or two
> lapply(cts, "[[", 1:2)
Error in FUN(X[[1L]], ...) : attempt to select more than one element
Maybe the more fundamental question is if there is a way to turn the meat of the coeftest object into a data frame, which would allow me to extract columns singly, then use mapply(). Thanks!
Edit: I would like to end up with a matrices (or data frames) with the first and second columns.
[[1]]
Estimate Std. Error
(Intercept) 5.0314196 0.1333705
b -0.1039264 0.0987044
[[2]]
Estimate Std. Error
(Intercept) 5.312007153 0.199485363
c -0.007378529 0.003429477

[[ is the wrong subset function in this case. Note that when you lapply() over a list, what you are operating on are the components of the list, the bits you would get with list[[i]] where i is the ith component.
As such, you only need the [, 1:2] bit of cts[[1]][, 1:2] in the lapply() call. It is a little bit trickier because of the arguments for [, but easily doable with lapply():
> lapply(cts, `[`, , 1:2)
[[1]]
Estimate Std. Error
(Intercept) 4.926679544 0.1549482
b -0.001967657 0.1062437
[[2]]
Estimate Std. Error
(Intercept) 4.849041327 0.204342067
c 0.001494454 0.003512972
Note the <space>, before 1:2; this is the equivalent of [ , 1:2].

I'm not sure if this is what you want, but how about:
> do.call("rbind", cts)[, 1:2]
Estimate Std. Error
(Intercept) 4.8200993881 0.142381642
b -0.0421189130 0.092620363
(Intercept) 4.7459340076 0.206372906
c 0.0005770324 0.003547885

Related

Replace underlying standard errors in lm() OLS model in R

I would like to run an ols model using lm() in R and replace the standard errors in the model. In the following example, I would like to replace each standard error with "2":
set.seed(123)
x <- rnorm(100)
y <- rnorm(100)
mod <- lm(y ~x)
ses <- c(2,2)
coef(summary(mod))[,2] <- ses
sqrt(diag(vcov(mod))) <- ses
Any thoughts on how to do this? Thanks.
Those assignments are not going to succeed. coef, sqrt and vcov are not going to pass those values "upstream". You could do this:
> false.summ <- coef(summary(mod))
> false.sqrt.vcov <- sqrt(diag(vcov(mod)))
> false.summ
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.10280305 0.09755118 -1.0538371 0.2945488
x -0.05247161 0.10687862 -0.4909459 0.6245623
> false.summ[ , 2] <- ses
> false.sqrt.vcov
(Intercept) x
0.09755118 0.10687862
> false.sqrt.vcov <- ses
You could also modify a summary-object at least the coef-matrix, but there is no "vcov" element in summary despite the fact that vcov does return a value.
> summ <- summary(mod)
> summ$coefficients[ , 2] <- ses
> coef(summ)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.10280305 2 -1.0538371 0.2945488
x -0.05247161 2 -0.4909459 0.6245623
> summ$vcov
NULL
> vcov(summ)
(Intercept) x
(Intercept) 0.009516233 -0.00103271
x -0.001032710 0.01142304:
If you wanted to change the output of vcov when applied to a summary object you would need to distort the unscaled cov-matrix. This is the code that vcov uses for that object-class:
> getAnywhere(vcov.summary.lm)
A single object matching ‘vcov.summary.lm’ was found
It was found in the following places
registered S3 method for vcov from namespace stats
namespace:stats
with value
function (object, ...)
object$sigma^2 * object$cov.unscaled
<bytecode: 0x7fb63c784068>
<environment: namespace:stats>

Extracting coefficients from a regression 1 model with 1 predictor

I currently have the following regression model:
> print(summary(step1))
Call:
lm(formula = model1, data = newdat1)
Residuals:
Min 1Q Median 3Q Max
-2.53654 -0.02423 -0.02423 -0.02423 1.71962
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.3962 0.0532 7.446 2.76e-12 ***
i2 0.6281 0.0339 18.528 < 2e-16 ***
I would like just the following returned as a data frame:
Estimate Std. Error t value Pr(>|t|)
i2 0.6281 0.0339 18.528 < 2e-16
I currently have the following code:
> results1<-as.data.frame(summary(step1)$coefficients[-1,drop=FALSE])
Which yields:
> results1
summary(step1)$coefficients[-1, drop = FALSE]
1 6.280769e-01
2 5.320108e-02
3 3.389873e-02
4 7.446350e+00
5 1.852804e+01
6 2.764836e-12
7 2.339089e-45
Thus is not what I want; however, it does work when there's more than 1 predictor.
It would be nice if you gave a reproducible example. I think you're looking for
cc <- coef(summary(step1))[2,,drop=FALSE]
as.data.frame(cc)
Using accessors such as coef(summary(.)) rather than summary(.)$coefficients is both prettier and more robust (there is no guarantee that the internal structure of summary() will stay the same -- although admittedly it's unlikely that this basic a part of R will change any time soon, especially as many users probably have used constructions like $coefficients).
Indexing the row by name, i.e.
coef(summary(step1))["i2",,drop=FALSE]
would probably be even better.
summary(step1)$coefficients is a matrix. When you take out the first element with [-1, drop=FALSE] it is converted to a vector, which is why you get 7 numbers instead of the row you want.
> set.seed(123)
> x <- rnorm(100)
> y <- -1 + 0.2*x + rnorm(100)
> step1 <- lm(y ~ x)
> class(summary(step1)$coefficients)
[1] "matrix"
> class(summary(step1)$coefficients[-1, drop=FALSE])
[1] "numeric"
The solution is to change the subsetting with [ so that you specify you wan to keep all columns (see ?`[`):
> summary(step1)$coefficients[-1, , drop=FALSE]
Estimate Std. Error t value Pr(>|t|)
x 0.1475284 0.1068786 1.380336 0.1706238

Change printed variable names for summary()

I am using summary() to create a, yes, summary from my regression. What now is printed is my variable names, including underscore.
Is there any way to change the printed variable names so that I can see e.g. "Age of dog" instead of dog_age.
I can not change the variable names since they can not contain spaces.
Something like this?
> x <- summary(lm(mpg ~ cyl+wt, mtcars))
> rownames(x$coef) <- c("YOUR", "NAMES", "HERE")
> x$coef
# Estimate Std. Error t value Pr(>|t|)
# YOUR 39.6863 1.7150 23.141 < 2e-16
# NAMES -1.5078 0.4147 -3.636 0.001064
# HERE -3.1910 0.7569 -4.216 0.000222
Or you could just change the names in the data before running regression
> names(mtcars)[1:3] <- rownames(x$coef)
> lm(YOUR ~ NAMES+HERE, mtcars)
# Call:
# lm(formula = YOUR ~ NAMES + HERE, data = mtcars)
# Coefficients:
# (Intercept) NAMES HERE
# 34.66099 -1.58728 -0.02058
You can use backtick ` to introduce spaces in variables:
dat = data.frame(`Age of dog`=1:10,`T`=1:10,check.names=FALSE)
summary(lm(T~`Age of dog`,data=dat))

Obtain standard errors of regression coefficients for an "mlm" object returned by `lm()`

I'd like to run 10 regressions against the same regressor, then pull all the standard errors without using a loop.
depVars <- as.matrix(data[,1:10]) # multiple dependent variables
regressor <- as.matrix([,11]) # independent variable
allModels <- lm(depVars ~ regressor) # multiple, single variable regressions
summary(allModels)[1] # Can "view" the standard error for 1st regression, but can't extract...
allModels is stored as an "mlm" object, which is really tough to work with. It'd be great if I could store a list of lm objects or a matrix with statistics of interest.
Again, the objective is to NOT use a loop. Here is a loop equivalent:
regressor <- as.matrix([,11]) # independent variable
for(i in 1:10) {
tempObject <- lm(data[,i] ~ regressor) # single regressions
table1Data[i,1] <- summary(tempObject)$coefficients[2,2] # assign std error
rm(tempObject)
}
If you put your data in long format it's very easy to get a bunch of regression results using lmList from the nlme or lme4 packages. The output is a list of regression results and the summary can give you a matrix of coefficients, just like you wanted.
library(lme4)
m <- lmList( y ~ x | group, data = dat)
summary(m)$coefficients
Those coefficients are in a simple 3 dimensional array so the standard errors are at [,2,2].
Given an "mlm" model object model, you can use the below function written by me to get standard errors of coefficients. This is very efficient: no loop, and no access to summary.mlm().
std_mlm <- function (model) {
Rinv <- with(model$qr, backsolve(qr, diag(rank)))
## unscaled standard error
std_unscaled <- sqrt(rowSums(Rinv ^ 2)[order(model$qr$pivot)])
## residual standard error
sigma <- sqrt(colSums(model$residuals ^ 2) / model$df.residual)
## return final standard error
## each column corresponds to a model
"dimnames<-"(outer(std_unscaled, sigma), list = dimnames(model$coefficients))
}
A simple, reproducible example
set.seed(0)
Y <- matrix(rnorm(50 * 5), 50) ## assume there are 5 responses
X <- rnorm(50) ## covariate
fit <- lm(Y ~ X)
We all know that it is simple to extract estimated coefficients via:
fit$coefficients ## or `coef(fit)`
# [,1] [,2] [,3] [,4] [,5]
#(Intercept) -0.21013925 0.1162145 0.04470235 0.08785647 0.02146662
#X 0.04110489 -0.1954611 -0.07979964 -0.02325163 -0.17854525
Now let's apply our std_mlm:
std_mlm(fit)
# [,1] [,2] [,3] [,4] [,5]
#(Intercept) 0.1297150 0.1400600 0.1558927 0.1456127 0.1186233
#X 0.1259283 0.1359712 0.1513418 0.1413618 0.1151603
We can of course, call summary.mlm just to check our result is correct:
coef(summary(fit))
#Response Y1 :
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) -0.21013925 0.1297150 -1.6200072 0.1117830
#X 0.04110489 0.1259283 0.3264151 0.7455293
#
#Response Y2 :
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.1162145 0.1400600 0.8297485 0.4107887
#X -0.1954611 0.1359712 -1.4375183 0.1570583
#
#Response Y3 :
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.04470235 0.1558927 0.2867508 0.7755373
#X -0.07979964 0.1513418 -0.5272811 0.6004272
#
#Response Y4 :
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.08785647 0.1456127 0.6033574 0.5491116
#X -0.02325163 0.1413618 -0.1644831 0.8700415
#
#Response Y5 :
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.02146662 0.1186233 0.1809646 0.8571573
#X -0.17854525 0.1151603 -1.5504057 0.1276132
Yes, all correct!
Here an option:
put your data in the long format using regressor as an id key.
do your regression against value by group of variable.
For example , using mtcars data set:
library(reshape2)
dat.m <- melt(mtcars,id.vars='mpg') ## mpg is my regressor
library(plyr)
ddply(dat.m,.(variable),function(x)coef(lm(variable~value,data=x)))
variable (Intercept) value
1 cyl 1 8.336774e-18
2 disp 1 6.529223e-19
3 hp 1 1.106781e-18
4 drat 1 -1.505237e-16
5 wt 1 8.846955e-17
6 qsec 1 6.167713e-17
7 vs 1 2.442366e-16
8 am 1 -3.381738e-16
9 gear 1 -8.141220e-17
10 carb 1 -6.455094e-17

handling outputs with different lengths using ldply

Just a quick question on how to handle outputs of different lengths using ldply from the plyr package. Here is a simple version of the code I am using and the error I am getting:
# function to collect the coefficients from the regression models:
> SecreatWeapon <- dlply(merged1,~country.x, function(df) {
+ lm(log(child_mortality) ~ log(IHME_usd_gdppc)+ hiv_prev,data=df)
+ })
>
# functions to extract the output of interest
> extract.coefs <- function(mod) c(extract.coefs = summary(mod)$coefficients[,1])
> extract.se.coefs <- function(mod) c(extract.se.coefs = summary(mod)$coefficients[,2])
>
# function to combine the extracted output
> res <- ldply(SecreatWeapon, extract.coefs)
Error in list_to_dataframe(res, attr(.data, "split_labels")) :
Results do not have equal lengths
Here the error is due to the fact that some models will contain NA values so that:
> SecreatWeapon[[1]]
Call:
lm(formula = log(child_mortality) ~ log(IHME_usd_gdppc) + hiv_prev,
data = df)
Coefficients:
(Intercept) log(IHME_usd_gdppc) hiv_prev
-4.6811 0.5195 NA
and therefore the following output won't have the same length; for example:
> summary(SecreatWeapon[[1]])$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.6811000 0.6954918 -6.730633 6.494799e-08
log(IHME_usd_gdppc) 0.5194643 0.1224292 4.242977 1.417349e-04
but for the other one I get
> summary(SecreatWeapon[[10]])$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.612698 1.7505236 10.632646 1.176347e-12
log(IHME_usd_gdppc) -2.256465 0.1773498 -12.723244 6.919009e-15
hiv_prev -272.558951 160.3704493 -1.699558 9.784053e-02
Any easy fixes? Thank you very much,
Antonio Pedro.
The summary.lm( . ) function accessed with $coefficients gives different output than the coef would with an lm argument for any lm-object with an NA "coefficient". Would you be satisfied with using something like this:
coef.se <- function(mod) {
extract.coefs <- function(mod) coef(mod) # lengths all the same
extract.se.coefs <- function(mod) { summary(mod)$coefficients[,2]}
return( merge( extract.coefs(mod), extract.se.coefs(mod), by='row.names', all=TRUE) )
}
With Roland's example it gives:
> coef.se(fit)
Row.names x y
1 (Intercept) -0.3606557 0.1602034
2 x1 2.2131148 0.1419714
3 x2 NA NA
You could rename the x as coef and the y as se.coef
y <- c(1,2,3)
x1 <- c(0.6,1.1,1.5)
x2 <- c(1,1,1)
fit <- lm(y~x1+x2)
summary(fit)$coef
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) -0.3606557 0.1602034 -2.251236 0.26612016
#x1 2.2131148 0.1419714 15.588457 0.04078329
#function for full matrix, adjusted from getAnywhere(print.summary.lm)
full_coeffs <- function (fit) {
fit_sum <- summary(fit)
cn <- names(fit_sum$aliased)
coefs <- matrix(NA, length(fit_sum$aliased), 4,
dimnames = list(cn, colnames(fit_sum$coefficients)))
coefs[!fit_sum$aliased, ] <- fit_sum$coefficients
coefs
}
full_coeffs(fit)
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) -0.3606557 0.1602034 -2.251236 0.26612016
#x1 2.2131148 0.1419714 15.588457 0.04078329
#x2 NA NA NA NA

Resources