Replace underlying standard errors in lm() OLS model in R - r

I would like to run an ols model using lm() in R and replace the standard errors in the model. In the following example, I would like to replace each standard error with "2":
set.seed(123)
x <- rnorm(100)
y <- rnorm(100)
mod <- lm(y ~x)
ses <- c(2,2)
coef(summary(mod))[,2] <- ses
sqrt(diag(vcov(mod))) <- ses
Any thoughts on how to do this? Thanks.

Those assignments are not going to succeed. coef, sqrt and vcov are not going to pass those values "upstream". You could do this:
> false.summ <- coef(summary(mod))
> false.sqrt.vcov <- sqrt(diag(vcov(mod)))
> false.summ
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.10280305 0.09755118 -1.0538371 0.2945488
x -0.05247161 0.10687862 -0.4909459 0.6245623
> false.summ[ , 2] <- ses
> false.sqrt.vcov
(Intercept) x
0.09755118 0.10687862
> false.sqrt.vcov <- ses
You could also modify a summary-object at least the coef-matrix, but there is no "vcov" element in summary despite the fact that vcov does return a value.
> summ <- summary(mod)
> summ$coefficients[ , 2] <- ses
> coef(summ)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.10280305 2 -1.0538371 0.2945488
x -0.05247161 2 -0.4909459 0.6245623
> summ$vcov
NULL
> vcov(summ)
(Intercept) x
(Intercept) 0.009516233 -0.00103271
x -0.001032710 0.01142304:
If you wanted to change the output of vcov when applied to a summary object you would need to distort the unscaled cov-matrix. This is the code that vcov uses for that object-class:
> getAnywhere(vcov.summary.lm)
A single object matching ‘vcov.summary.lm’ was found
It was found in the following places
registered S3 method for vcov from namespace stats
namespace:stats
with value
function (object, ...)
object$sigma^2 * object$cov.unscaled
<bytecode: 0x7fb63c784068>
<environment: namespace:stats>

Related

R: Different result from glm and mle2 package in R

So I want to find the estimate parameter using GLM and compare it with mle2 package.
Here's my code for GLM
d <- read.delim("http://dnett.github.io/S510/Disease.txt")
d$disease=factor(d$disease)
d$ses=factor(d$ses)
d$sector=factor(d$sector)
str(d)
glm2 <- glm(disease~ses+sector, family=binomial(link=logit), data=d)
summary(glm2)
And my code for mle2()
y<-as.numeric(as.character(d$disease))
x1<-as.numeric(as.character(d$age))
x2<-as.numeric(as.character(d$sector))
x3<-as.numeric(as.character(d$ses))
library(bbmle)
nlldbin=function(A,B,C,D){
eta<-A+B*(x3==2)+C*(x3==3)+D*(x2==2)
p<-1/(1+exp(-eta))
joint.pdf= (p^y)*((1-p)^(1-y))
-sum(joint.pdf, log=TRUE ,na.rm=TRUE)
}
st <- list(A=0.0001,B=0.0001,C=0.0001,D=0.0001)
est_mle2<-mle2(start=st,nlldbin,hessian=TRUE)
summary(est_mle2)
But the result is quiet different. Please help me to fix this, thank you!
> summary(est_mle2)
Maximum likelihood estimation
Call:
mle2(minuslogl = nlldbin, start = st, hessian.opts = TRUE)
Coefficients:
Estimate Std. Error z value Pr(z)
A -20.4999 5775.1484 -0.0035 0.9972
B -5.2499 120578.9515 0.0000 1.0000
C -7.9999 722637.2670 0.0000 1.0000
D -2.2499 39746.6639 -0.0001 1.0000
> summary(glm2)
Call:
glm(formula = disease ~ ses + sector, family = binomial(link = logit),
data = d)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.52001 0.33514 -4.535 5.75e-06 ***
ses2 -0.08525 0.41744 -0.204 0.838177
ses3 0.16086 0.39261 0.410 0.682019
sector2 1.28098 0.34140 3.752 0.000175 ***
I'm not sure your definition of eta is correct. I would use the model matrix.
X <- model.matrix(~ ses + sector, data = d)
nlldbin <- function(A,B,C,D){
eta <- X %*% c(A, B, C, D)
p <- 1/(1+exp(-eta))
logpdf <- y*log(p) + (1-y)*log(1-p)
-sum(logpdf)
}
This line
-sum(joint.pdf, log=TRUE ,na.rm=TRUE)
is wrong. sum doesn't have a special log argument; what you're doing is adding the value TRUE (which gets converted to 1) to the pdf.
What you want is
-sum(log(joint.pdf), na.rm=TRUE)
but this is also not very good for numerical reasons, as the pdf is likely to underflow. A better way of writing it would be
logpdf <- y*log(p) + (1-y)*log(1-p)
-sum(logpdf, na.rm=TRUE)

R - Pass column names as Variable with names contain I()

I'm performing the polynomial regression and testing the linear combination of the coefficient. But I'm running to some problems that when I tried to test the linear combination of the coefficient.
LnModel_1 <- lm(formula = PROF ~ UI_1+UI_2+I(UI_1^2)+UI_1:UI_2+I(UI_2^2))
summary(LnModel_1)
It output the values below:
Call:
lm(formula = PROF ~ UI_1 + UI_2 + I(UI_1^2) + UI_1:UI_2 + I(UI_2^2))
Residuals:
Min 1Q Median 3Q Max
-3.4492 -0.5405 0.1096 0.4226 1.7346
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.66274 0.06444 72.354 < 2e-16 ***
UI_1 0.25665 0.07009 3.662 0.000278 ***
UI_2 0.25569 0.09221 2.773 0.005775 **
I(UI_1^2) -0.15168 0.04490 -3.378 0.000789 ***
I(UI_2^2) -0.08418 0.05162 -1.631 0.103643
UI_1:UI_2 -0.02849 0.05453 -0.522 0.601621
Then I use names(coef()) to extract the coefficient names
names(coef(LnModel_1))
output:
[1] "(Intercept)" "UI_1" "UI_2" "I(UI_1^2)"
"I(UI_2^2)""UI_1:UI_2"
For some reasons, when I use glht(), it give me an error on UI_2^2
slope <- glht(LnModel_1, linfct = c("UI_2+ UI_1:UI_2*2.5+ 2*2.5*I(UI_2^2) =0
") )
Output:
Error: multcomp:::expression2coef::walkCode::eval: within ‘UI_2^2’, the term
‘UI_2’ must not denote an effect. Apart from that, the term must evaluate to
a real valued constant
Don't know why it would give me this error message. How to input the I(UI_2^2) coefficient to the glht()
Thank you very much
The issue seems to be that I(UI^2) can be interpreted as an expression in R in the same fashion you did here LnModel_1 <- lm(formula = PROF ~ UI_1+UI_2+I(UI_1^2)+UI_1:UI_2+I(UI_2^2))
Therefore, you should indicate R that you want to evaluate a string inside your string:
slope <- glht(LnModel_1, linfct = c("UI_2+ UI_1:UI_2*2.5+ 2*2.5*\`I(UI_2^2)\` =0
") )
Check my example (since I cannot reproduce your problem):
library(multcomp)
cars <- copy(mtcars)
setnames(cars, "disp", "UI_2")
model <- lm(mpg~I(UI_2^2),cars)
names(coef(model))
slope <- glht(model, linfct = c("2*2.5*\`I(UI_2^2)\` =0") )

extract summary from matrix lm object

I'd like to get the coefficients from the summary section of an lm object, except I inputted a matrix and I am getting null for the summary part. Here is my code:
n=12
y=rnorm(n,23,1)
x1=rnorm(n,23,1)
x2=rnorm(n,15.5,1)
lm1=lm(y~x1+x2)
n2=10
b0=4;b1=2;b2=3
sim1<-function(){
randmat=matrix(rnorm(n*n2,0,8),n,n2)
x1mat=matrix(x1,n,n2)
x2mat=matrix(x2,n,n2)
return(b0+b1*x1mat+b2*x2mat+randmat)
}
sim1=sim1()
lm1=lm(sim1~x1+x2)
c2=summary(lm1)$coefficients
> c2
NULL
what I want is this (but repeated):
lm2=lm(sim1[,1]~x1+x2)
summary(lm2)$coefficients
Does anyone know how to extract these? Thanks
-Rik
Another way is to do the following after the end of the following line of your code.
lm1=lm(sim1~x1+x2) #this runs 10 models
All the coefficients will be stored in the list summary(lm1) as Response Y1 ... to Response Y10 (i.e. 10 models as many as ncol(sim1)).
In order to get the coefficients from each model back you could do:
all_coef <- lapply( paste0('Response Y', 1:ncol(sim1)),
function(x) summary(lm1)[[x]]$coefficients)
Or as #Rik mentions in the comment it will be faster if summary(lm1) is not repeated in the lapply loop in case you have a big matrix.
the_sum <- summary(lm1)
all_coef <- lapply( paste0('Response Y', 1:ncol(sim1)),
function(x) the_sum[[x]]$coefficients)
And the output would be:
> all_coef
[[1]]
Estimate Std. Error t value Pr(>|t|)
(Intercept) 135.242552 80.136427 1.687654 0.1257496
x1 -4.777486 2.953534 -1.617549 0.1402142
x2 4.464435 3.891641 1.147186 0.2808857
[[2]]
Estimate Std. Error t value Pr(>|t|)
(Intercept) 119.1772823 111.603046 1.06786765 0.3133851
x1 -0.1376013 4.113277 -0.03345297 0.9740435
x2 -1.2946027 5.419744 -0.23886785 0.8165585
[[3]]
Estimate Std. Error t value Pr(>|t|)
(Intercept) -51.329923 63.495202 -0.8084063 0.4397018
x1 3.721227 2.340199 1.5901325 0.1462682
x2 3.793981 3.083498 1.2304147 0.2497304
[[4]]
Estimate Std. Error t value Pr(>|t|)
(Intercept) 124.8606014 57.669842 2.16509352 0.05857967
x1 -1.2517705 2.125498 -0.58893044 0.57039201
x2 -0.1159803 2.800603 -0.04141263 0.96787111
#...and so on until 10
To get the individual coefficients for a model just do:
all_coef[[<the_number_you_want>]]

Custom Bootstrapped Standard Error: numeric 'envir' arg not of length one

I am writing a custom script to bootstrap standard errors in a GLM in R and receive the following error:
Error in eval(predvars, data, env) : numeric 'envir' arg not of length one
Can someone explain what I am doing wrong? My code:
#Number of simulations
sims<-numbersimsdesired
#Set up place to store data
saved.se<-matrix(NA,sims,numberofcolumnsdesired)
y<-matrix(NA,realdata.rownumber)
x1<-matrix(NA,realdata.rownumber)
x2<-matrix(NA,realdata.rownumber)
#Resample entire dataset with replacement
for (sim in 1:sims) {
fake.data<-sample(1:nrow(data5),nrow(data5),replace=TRUE)
#Define variables for GLM using fake data
y<-realdata$y[fake.data]
x1<-realdata$x1[fake.data]
x2<-realdata$x2[fake.data]
#Run GLM on fake data, extract SEs, save SE into matrix
glm.output<-glm(y ~ x1 + x2, family = "poisson", data = fake.data)
saved.se[sim,]<-summary(glm.output)$coefficients[0,2]
}
An example: if we suppose sims = 1000 and we want 10 columns (suppose instead of x1 and x2, we have x1...x10) the goal is a dataset with 1,000 rows and 10 columns containing each explanatory variable's SEs.
There isn't a reason to reinvent the wheel. Here is an example of bootstrapping the standard error of the intercept with the boot package:
set.seed(42)
counts <- c(18,17,15,20,10,20,25,13,12)
x1 <- 1:9
x2 <- sample(9)
DF <- data.frame(counts, x1, x2)
glm1 <- glm(counts ~ x1 + x2, family = poisson(), data=DF)
summary(glm1)$coef
# Estimate Std. Error z value Pr(>|z|)
#(Intercept) 2.08416378 0.42561333 4.896848 9.738611e-07
#x1 0.04838210 0.04370521 1.107010 2.682897e-01
#x2 0.09418791 0.04446747 2.118131 3.416400e-02
library(boot)
intercept.se <- function(d, i) {
glm1.b <- glm(counts ~ x1 + x2, family = poisson(), data=d[i,])
summary(glm1.b)$coef[1,2]
}
set.seed(42)
boot.intercept.se <- boot(DF, intercept.se, R=999)
#ORDINARY NONPARAMETRIC BOOTSTRAP
#
#
#Call:
#boot(data = DF, statistic = intercept.se, R = 999)
#
#
#Bootstrap Statistics :
# original bias std. error
#t1* 0.4256133 0.103114 0.2994377
Edit:
If you prefer doing it without a package:
n <- 999
set.seed(42)
ind <- matrix(sample(nrow(DF), nrow(DF)*n, replace=TRUE), nrow=n)
boot.values <- apply(ind, 1, function(...) {
i <- c(...)
intercept.se(DF, i)
})
sd(boot.values)
#[1] 0.2994377

handling outputs with different lengths using ldply

Just a quick question on how to handle outputs of different lengths using ldply from the plyr package. Here is a simple version of the code I am using and the error I am getting:
# function to collect the coefficients from the regression models:
> SecreatWeapon <- dlply(merged1,~country.x, function(df) {
+ lm(log(child_mortality) ~ log(IHME_usd_gdppc)+ hiv_prev,data=df)
+ })
>
# functions to extract the output of interest
> extract.coefs <- function(mod) c(extract.coefs = summary(mod)$coefficients[,1])
> extract.se.coefs <- function(mod) c(extract.se.coefs = summary(mod)$coefficients[,2])
>
# function to combine the extracted output
> res <- ldply(SecreatWeapon, extract.coefs)
Error in list_to_dataframe(res, attr(.data, "split_labels")) :
Results do not have equal lengths
Here the error is due to the fact that some models will contain NA values so that:
> SecreatWeapon[[1]]
Call:
lm(formula = log(child_mortality) ~ log(IHME_usd_gdppc) + hiv_prev,
data = df)
Coefficients:
(Intercept) log(IHME_usd_gdppc) hiv_prev
-4.6811 0.5195 NA
and therefore the following output won't have the same length; for example:
> summary(SecreatWeapon[[1]])$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.6811000 0.6954918 -6.730633 6.494799e-08
log(IHME_usd_gdppc) 0.5194643 0.1224292 4.242977 1.417349e-04
but for the other one I get
> summary(SecreatWeapon[[10]])$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.612698 1.7505236 10.632646 1.176347e-12
log(IHME_usd_gdppc) -2.256465 0.1773498 -12.723244 6.919009e-15
hiv_prev -272.558951 160.3704493 -1.699558 9.784053e-02
Any easy fixes? Thank you very much,
Antonio Pedro.
The summary.lm( . ) function accessed with $coefficients gives different output than the coef would with an lm argument for any lm-object with an NA "coefficient". Would you be satisfied with using something like this:
coef.se <- function(mod) {
extract.coefs <- function(mod) coef(mod) # lengths all the same
extract.se.coefs <- function(mod) { summary(mod)$coefficients[,2]}
return( merge( extract.coefs(mod), extract.se.coefs(mod), by='row.names', all=TRUE) )
}
With Roland's example it gives:
> coef.se(fit)
Row.names x y
1 (Intercept) -0.3606557 0.1602034
2 x1 2.2131148 0.1419714
3 x2 NA NA
You could rename the x as coef and the y as se.coef
y <- c(1,2,3)
x1 <- c(0.6,1.1,1.5)
x2 <- c(1,1,1)
fit <- lm(y~x1+x2)
summary(fit)$coef
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) -0.3606557 0.1602034 -2.251236 0.26612016
#x1 2.2131148 0.1419714 15.588457 0.04078329
#function for full matrix, adjusted from getAnywhere(print.summary.lm)
full_coeffs <- function (fit) {
fit_sum <- summary(fit)
cn <- names(fit_sum$aliased)
coefs <- matrix(NA, length(fit_sum$aliased), 4,
dimnames = list(cn, colnames(fit_sum$coefficients)))
coefs[!fit_sum$aliased, ] <- fit_sum$coefficients
coefs
}
full_coeffs(fit)
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) -0.3606557 0.1602034 -2.251236 0.26612016
#x1 2.2131148 0.1419714 15.588457 0.04078329
#x2 NA NA NA NA

Resources