R - Pass column names as Variable with names contain I() - r

I'm performing the polynomial regression and testing the linear combination of the coefficient. But I'm running to some problems that when I tried to test the linear combination of the coefficient.
LnModel_1 <- lm(formula = PROF ~ UI_1+UI_2+I(UI_1^2)+UI_1:UI_2+I(UI_2^2))
summary(LnModel_1)
It output the values below:
Call:
lm(formula = PROF ~ UI_1 + UI_2 + I(UI_1^2) + UI_1:UI_2 + I(UI_2^2))
Residuals:
Min 1Q Median 3Q Max
-3.4492 -0.5405 0.1096 0.4226 1.7346
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.66274 0.06444 72.354 < 2e-16 ***
UI_1 0.25665 0.07009 3.662 0.000278 ***
UI_2 0.25569 0.09221 2.773 0.005775 **
I(UI_1^2) -0.15168 0.04490 -3.378 0.000789 ***
I(UI_2^2) -0.08418 0.05162 -1.631 0.103643
UI_1:UI_2 -0.02849 0.05453 -0.522 0.601621
Then I use names(coef()) to extract the coefficient names
names(coef(LnModel_1))
output:
[1] "(Intercept)" "UI_1" "UI_2" "I(UI_1^2)"
"I(UI_2^2)""UI_1:UI_2"
For some reasons, when I use glht(), it give me an error on UI_2^2
slope <- glht(LnModel_1, linfct = c("UI_2+ UI_1:UI_2*2.5+ 2*2.5*I(UI_2^2) =0
") )
Output:
Error: multcomp:::expression2coef::walkCode::eval: within ‘UI_2^2’, the term
‘UI_2’ must not denote an effect. Apart from that, the term must evaluate to
a real valued constant
Don't know why it would give me this error message. How to input the I(UI_2^2) coefficient to the glht()
Thank you very much

The issue seems to be that I(UI^2) can be interpreted as an expression in R in the same fashion you did here LnModel_1 <- lm(formula = PROF ~ UI_1+UI_2+I(UI_1^2)+UI_1:UI_2+I(UI_2^2))
Therefore, you should indicate R that you want to evaluate a string inside your string:
slope <- glht(LnModel_1, linfct = c("UI_2+ UI_1:UI_2*2.5+ 2*2.5*\`I(UI_2^2)\` =0
") )
Check my example (since I cannot reproduce your problem):
library(multcomp)
cars <- copy(mtcars)
setnames(cars, "disp", "UI_2")
model <- lm(mpg~I(UI_2^2),cars)
names(coef(model))
slope <- glht(model, linfct = c("2*2.5*\`I(UI_2^2)\` =0") )

Related

R: Different result from glm and mle2 package in R

So I want to find the estimate parameter using GLM and compare it with mle2 package.
Here's my code for GLM
d <- read.delim("http://dnett.github.io/S510/Disease.txt")
d$disease=factor(d$disease)
d$ses=factor(d$ses)
d$sector=factor(d$sector)
str(d)
glm2 <- glm(disease~ses+sector, family=binomial(link=logit), data=d)
summary(glm2)
And my code for mle2()
y<-as.numeric(as.character(d$disease))
x1<-as.numeric(as.character(d$age))
x2<-as.numeric(as.character(d$sector))
x3<-as.numeric(as.character(d$ses))
library(bbmle)
nlldbin=function(A,B,C,D){
eta<-A+B*(x3==2)+C*(x3==3)+D*(x2==2)
p<-1/(1+exp(-eta))
joint.pdf= (p^y)*((1-p)^(1-y))
-sum(joint.pdf, log=TRUE ,na.rm=TRUE)
}
st <- list(A=0.0001,B=0.0001,C=0.0001,D=0.0001)
est_mle2<-mle2(start=st,nlldbin,hessian=TRUE)
summary(est_mle2)
But the result is quiet different. Please help me to fix this, thank you!
> summary(est_mle2)
Maximum likelihood estimation
Call:
mle2(minuslogl = nlldbin, start = st, hessian.opts = TRUE)
Coefficients:
Estimate Std. Error z value Pr(z)
A -20.4999 5775.1484 -0.0035 0.9972
B -5.2499 120578.9515 0.0000 1.0000
C -7.9999 722637.2670 0.0000 1.0000
D -2.2499 39746.6639 -0.0001 1.0000
> summary(glm2)
Call:
glm(formula = disease ~ ses + sector, family = binomial(link = logit),
data = d)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.52001 0.33514 -4.535 5.75e-06 ***
ses2 -0.08525 0.41744 -0.204 0.838177
ses3 0.16086 0.39261 0.410 0.682019
sector2 1.28098 0.34140 3.752 0.000175 ***
I'm not sure your definition of eta is correct. I would use the model matrix.
X <- model.matrix(~ ses + sector, data = d)
nlldbin <- function(A,B,C,D){
eta <- X %*% c(A, B, C, D)
p <- 1/(1+exp(-eta))
logpdf <- y*log(p) + (1-y)*log(1-p)
-sum(logpdf)
}
This line
-sum(joint.pdf, log=TRUE ,na.rm=TRUE)
is wrong. sum doesn't have a special log argument; what you're doing is adding the value TRUE (which gets converted to 1) to the pdf.
What you want is
-sum(log(joint.pdf), na.rm=TRUE)
but this is also not very good for numerical reasons, as the pdf is likely to underflow. A better way of writing it would be
logpdf <- y*log(p) + (1-y)*log(1-p)
-sum(logpdf, na.rm=TRUE)

Change Y intercept in Poisson GLM R

Background: I have the following data that I run a glm function on:
location = c("DH", "Bos", "Beth")
count = c(166, 57, 38)
#make into df
df = data.frame(location, count)
#poisson
summary(glm(count ~ location, family=poisson))
Output:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.6376 0.1622 22.424 < 2e-16 ***
locationBos 0.4055 0.2094 1.936 0.0529 .
locationDH 1.4744 0.1798 8.199 2.43e-16 ***
Problem: I would like to change the (Intercept) so I can get all my values relative to Bos
I looked Change reference group using glm with binomial family and How to force R to use a specified factor level as reference in a regression?. I tried there method and it did not work, and I am not sure why.
Tried:
df1 <- within(df, location <- relevel(location, ref = 1))
#poisson
summary(glm(count ~ location, family=poisson, data = df1))
Desired Output:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) ...
locationBeth ...
locationDH ...
Question: How do I solve this problem?
I think your problem is that you are modifying the data frame, but in your model you are not using the data frame. Use the data argument in the model to use the data in the data frame.
location = c("DH", "Bos", "Beth")
count = c(166, 57, 38)
# make into df
df = data.frame(location, count)
Note that location by itself is a character vector. data.frame() coerces it to a factor by default in the data frame. After this conversion, we can use relevel to specify the reference level.
df$location = relevel(df$location, ref = "Bos") # set Bos as reference
summary(glm(count ~ location, family=poisson, data = df))
# Call:
# glm(formula = count ~ location, family = poisson, data = df)
# ...
# Coefficients:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) 4.0431 0.1325 30.524 < 2e-16 ***
# locationBeth -0.4055 0.2094 -1.936 0.0529 .
# locationDH 1.0689 0.1535 6.963 3.33e-12 ***
# ...

How to conduct linear hypothesis test on regression coefficients with a clustered covariance matrix?

I am interested in calculating estimates and standard errors for linear combinations of coefficients after a linear regression in R. For example, suppose I have the regression and test:
data(mtcars)
library(multcomp)
lm1 <- lm(mpg ~ cyl + hp, data = mtcars)
summary(glht(lm1, linfct = 'cyl + hp = 0'))
This will estimate the value of the sum of the coefficients on cyl and hp, and provide the standard error based on the covariance matrix produced by lm.
But, suppose I want to cluster my standard errors, on a third variable:
data(mtcars)
library(multcomp)
library(lmtest)
library(multiwayvcov)
lm1 <- lm(mpg ~ cyl + hp, data = mtcars)
vcv <- cluster.vcov(lm1, cluster = mtcars$am)
ct1 <- coeftest(lm1,vcov. = vcv)
ct1 contains the SEs for my clustering by am. However, if I try to use the ct1 object in glht, you get an error saying
Error in modelparm.default(model, ...) :
no ‘coef’ method for ‘model’ found!
Any advice on how to do the linear hypothesis with the clustered variance covariance matrix?
Thanks!
glht(ct1, linfct = 'cyl + hp = 0') won't work, because ct1 is not a glht object and can not be coerced to such via as.glht. I don't know whether there is a package or an existing function to do this, but this is not a difficult job to work out ourselves. The following small function does it:
LinearCombTest <- function (lmObject, vars, .vcov = NULL) {
## if `.vcov` missing, use the one returned by `lm`
if (is.null(.vcov)) .vcov <- vcov(lmObject)
## estimated coefficients
beta <- coef(lmObject)
## sum of `vars`
sumvars <- sum(beta[vars])
## get standard errors for sum of `vars`
se <- sum(.vcov[vars, vars]) ^ 0.5
## perform t-test on `sumvars`
tscore <- sumvars / se
pvalue <- 2 * pt(abs(tscore), lmObject$df.residual, lower.tail = FALSE)
## return a matrix
matrix(c(sumvars, se, tscore, pvalue), nrow = 1L,
dimnames = list(paste0(paste0(vars, collapse = " + "), " = 0"),
c("Estimate", "Std. Error", "t value", "Pr(>|t|)")))
}
Let's have a test:
data(mtcars)
lm1 <- lm(mpg ~ cyl + hp, data = mtcars)
library(multiwayvcov)
vcv <- cluster.vcov(lm1, cluster = mtcars$am)
If we leave .vcov unspecified in LinearCombTest, it is as same as multcomp::glht:
LinearCombTest(lm1, c("cyl","hp"))
# Estimate Std. Error t value Pr(>|t|)
#cyl + hp = 0 -2.283815 0.5634632 -4.053175 0.0003462092
library(multcomp)
summary(glht(lm1, linfct = 'cyl + hp = 0'))
#Linear Hypotheses:
# Estimate Std. Error t value Pr(>|t|)
#cyl + hp == 0 -2.2838 0.5635 -4.053 0.000346 ***
If we provide a covariance, it does what you want:
LinearCombTest(lm1, c("cyl","hp"), vcv)
# Estimate Std. Error t value Pr(>|t|)
#cyl + hp = 0 -2.283815 0.7594086 -3.00736 0.005399071
Remark
LinearCombTest is upgraded at Get p-value for group mean difference without refitting linear model with a new reference level, where we can test any combination with combination coefficients alpha:
alpha[1] * vars[1] + alpha[2] * vars[2] + ... + alpha[k] * vars[k]
rather than just the sum
vars[1] + vars[2] + ... + vars[k]

Extracting coefficients from a regression 1 model with 1 predictor

I currently have the following regression model:
> print(summary(step1))
Call:
lm(formula = model1, data = newdat1)
Residuals:
Min 1Q Median 3Q Max
-2.53654 -0.02423 -0.02423 -0.02423 1.71962
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.3962 0.0532 7.446 2.76e-12 ***
i2 0.6281 0.0339 18.528 < 2e-16 ***
I would like just the following returned as a data frame:
Estimate Std. Error t value Pr(>|t|)
i2 0.6281 0.0339 18.528 < 2e-16
I currently have the following code:
> results1<-as.data.frame(summary(step1)$coefficients[-1,drop=FALSE])
Which yields:
> results1
summary(step1)$coefficients[-1, drop = FALSE]
1 6.280769e-01
2 5.320108e-02
3 3.389873e-02
4 7.446350e+00
5 1.852804e+01
6 2.764836e-12
7 2.339089e-45
Thus is not what I want; however, it does work when there's more than 1 predictor.
It would be nice if you gave a reproducible example. I think you're looking for
cc <- coef(summary(step1))[2,,drop=FALSE]
as.data.frame(cc)
Using accessors such as coef(summary(.)) rather than summary(.)$coefficients is both prettier and more robust (there is no guarantee that the internal structure of summary() will stay the same -- although admittedly it's unlikely that this basic a part of R will change any time soon, especially as many users probably have used constructions like $coefficients).
Indexing the row by name, i.e.
coef(summary(step1))["i2",,drop=FALSE]
would probably be even better.
summary(step1)$coefficients is a matrix. When you take out the first element with [-1, drop=FALSE] it is converted to a vector, which is why you get 7 numbers instead of the row you want.
> set.seed(123)
> x <- rnorm(100)
> y <- -1 + 0.2*x + rnorm(100)
> step1 <- lm(y ~ x)
> class(summary(step1)$coefficients)
[1] "matrix"
> class(summary(step1)$coefficients[-1, drop=FALSE])
[1] "numeric"
The solution is to change the subsetting with [ so that you specify you wan to keep all columns (see ?`[`):
> summary(step1)$coefficients[-1, , drop=FALSE]
Estimate Std. Error t value Pr(>|t|)
x 0.1475284 0.1068786 1.380336 0.1706238

Extract regression coefficient values

I have a regression model for some time series data investigating drug utilisation. The purpose is to fit a spline to a time series and work out 95% CI etc. The model goes as follows:
id <- ts(1:length(drug$Date))
a1 <- ts(drug$Rate)
a2 <- lag(a1-1)
tg <- ts.union(a1,id,a2)
mg <-lm (a1~a2+bs(id,df=df1),data=tg)
The summary output of mg is:
Call:
lm(formula = a1 ~ a2 + bs(id, df = df1), data = tg)
Residuals:
Min 1Q Median 3Q Max
-0.31617 -0.11711 -0.02897 0.12330 0.40442
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.77443 0.09011 8.594 1.10e-11 ***
a2 0.13270 0.13593 0.976 0.33329
bs(id, df = df1)1 -0.16349 0.23431 -0.698 0.48832
bs(id, df = df1)2 0.63013 0.19362 3.254 0.00196 **
bs(id, df = df1)3 0.33859 0.14399 2.351 0.02238 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
I am using the Pr(>|t|) value of a2 to test if the data under investigation are autocorrelated.
Is it possible to extract this value of Pr(>|t|) (in this model 0.33329) and store it in a scalar to perform a logical test?
Alternatively, can it be worked out using another method?
A summary.lm object stores these values in a matrix called 'coefficients'. So the value you are after can be accessed with:
a2Pval <- summary(mg)$coefficients[2, 4]
Or, more generally/readably, coef(summary(mg))["a2","Pr(>|t|)"]. See here for why this method is preferred.
The package broom comes in handy here (it uses the "tidy" format).
tidy(mg) will give a nicely formated data.frame with coefficients, t statistics etc. Works also for other models (e.g. plm, ...).
Example from broom's github repo:
lmfit <- lm(mpg ~ wt, mtcars)
require(broom)
tidy(lmfit)
term estimate std.error statistic p.value
1 (Intercept) 37.285 1.8776 19.858 8.242e-19
2 wt -5.344 0.5591 -9.559 1.294e-10
is.data.frame(tidy(lmfit))
[1] TRUE
Just pass your regression model into the following function:
plot_coeffs <- function(mlr_model) {
coeffs <- coefficients(mlr_model)
mp <- barplot(coeffs, col="#3F97D0", xaxt='n', main="Regression Coefficients")
lablist <- names(coeffs)
text(mp, par("usr")[3], labels = lablist, srt = 45, adj = c(1.1,1.1), xpd = TRUE, cex=0.6)
}
Use as follows:
model <- lm(Petal.Width ~ ., data = iris)
plot_coeffs(model)
To answer your question, you can explore the contents of the model's output by saving the model as a variable and clicking on it in the environment window. You can then click around to see what it contains and what is stored where.
Another way is to type yourmodelname$ and select the components of the model one by one to see what each contains. When you get to yourmodelname$coefficients, you will see all of beta-, p, and t- values you desire.

Resources