Change Y intercept in Poisson GLM R - r

Background: I have the following data that I run a glm function on:
location = c("DH", "Bos", "Beth")
count = c(166, 57, 38)
#make into df
df = data.frame(location, count)
#poisson
summary(glm(count ~ location, family=poisson))
Output:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.6376 0.1622 22.424 < 2e-16 ***
locationBos 0.4055 0.2094 1.936 0.0529 .
locationDH 1.4744 0.1798 8.199 2.43e-16 ***
Problem: I would like to change the (Intercept) so I can get all my values relative to Bos
I looked Change reference group using glm with binomial family and How to force R to use a specified factor level as reference in a regression?. I tried there method and it did not work, and I am not sure why.
Tried:
df1 <- within(df, location <- relevel(location, ref = 1))
#poisson
summary(glm(count ~ location, family=poisson, data = df1))
Desired Output:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) ...
locationBeth ...
locationDH ...
Question: How do I solve this problem?

I think your problem is that you are modifying the data frame, but in your model you are not using the data frame. Use the data argument in the model to use the data in the data frame.
location = c("DH", "Bos", "Beth")
count = c(166, 57, 38)
# make into df
df = data.frame(location, count)
Note that location by itself is a character vector. data.frame() coerces it to a factor by default in the data frame. After this conversion, we can use relevel to specify the reference level.
df$location = relevel(df$location, ref = "Bos") # set Bos as reference
summary(glm(count ~ location, family=poisson, data = df))
# Call:
# glm(formula = count ~ location, family = poisson, data = df)
# ...
# Coefficients:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) 4.0431 0.1325 30.524 < 2e-16 ***
# locationBeth -0.4055 0.2094 -1.936 0.0529 .
# locationDH 1.0689 0.1535 6.963 3.33e-12 ***
# ...

Related

How do I filter on the name of the glm coefficient?

The following code
df <- data.frame(place = c("South","South","North","East"),
temperature = c(30,30,20,12),
outlookfine=c(TRUE,TRUE,FALSE,FALSE)
)
glm.fit <- glm(outlookfine ~ .,df , family= binomial)
coef.glm <-coef(summary(glm.fit))
coef.glm
outputs
Estimate Std. Error z value Pr(>|z|)
(Intercept) -23.56607 79462.00 -0.0002965703 0.9997634
placeNorth 0.00000 112376.25 0.0000000000 1.0000000
placeSouth 47.13214 97320.68 0.0004842972 0.9996136
I want to re-display the list without the intercept and without places containing the phrase "South"
I thought of trying to name the index column and then subset on it but have had no success.
[Update]
I added more data to understand why George Sava's answer also stripped out "North"
df <- data.frame(place = c("South","South","North","East","West"),
temperature = c(30,30,20,12,15),
outlookfine=c(TRUE,TRUE,FALSE,FALSE,TRUE)
)
glm.fit <- glm(outlookfine ~ .,df, family= binomial )
coef.glm <-coef(summary(glm.fit))
coef.glm[!grepl(pattern = ("South|Intercept"), rownames(coef.glm)),]
outputs
Estimate Std. Error z value Pr(>|z|)
placeNorth 3.970197e-14 185277.1 2.142843e-19 1.0000000
placeWest 4.913214e+01 185277.2 2.651818e-04 0.9997884
To keep only rows that match (or do not match) a certain pattern, you can use:
coef.glm[!grepl("South|Intercept", rownames(coef.glm)),]
Note when there's only one row selected this becomes a vector.
If you want to retain row names as a column then you could do something like:
library(tibble)
library(dplyr)
as.data.frame(coef.glm) %>%
rownames_to_column("x") %>%
filter(!grepl("Intercept|South", x))
Output
x Estimate Std. Error t value Pr(>|t|)
1 placeNorth -1.281975e-16 3.140185e-16 -0.4082483 0.7532483

R - Fixed-effects regression "plm" vs "lm + as.factor()": interpretation of R and R-Squared

I understand from this question here that coefficients are the same whether we use a lm regression with as.factor() and a plm regression with fixed effects.
N <- 10000
df <- data.frame(a = rnorm(N), b = rnorm(N),
region = rep(1:100, each = 100), year = rep(1:100, 100))
df$y <- 2 * df$a - 1.5 * df$b + rnorm(N)
model.a <- lm(y ~ a + b + factor(year) + factor(region), data = df)
summary(model.a)
# (Intercept) -0.0522691 0.1422052 -0.368 0.7132
# a 1.9982165 0.0101501 196.866 <2e-16 ***
# b -1.4787359 0.0101666 -145.450 <2e-16 ***
library(plm)
pdf <- pdata.frame(df, index = c("region", "year"))
model.b <- plm(y ~ a + b, data = pdf, model = "within", effect = "twoways")
summary(model.b)
# Coefficients :
# Estimate Std. Error t-value Pr(>|t|)
# a 1.998217 0.010150 196.87 < 2.2e-16 ***
# b -1.478736 0.010167 -145.45 < 2.2e-16 ***
library(lfe)
model.c <- felm(y ~ a + b | factor(region) + factor(year), data = df)
summary(model.c)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# a 1.99822 0.01015 196.9 <2e-16 ***
# b -1.47874 0.01017 -145.4 <2e-16 ***
However, the R and R-squared differ significantly. Which one is correct and how does the interpretation changes between the two models? In my case, the R-squared is much larger for the plm specification and is even negative for the lm + factor one.

R - Pass column names as Variable with names contain I()

I'm performing the polynomial regression and testing the linear combination of the coefficient. But I'm running to some problems that when I tried to test the linear combination of the coefficient.
LnModel_1 <- lm(formula = PROF ~ UI_1+UI_2+I(UI_1^2)+UI_1:UI_2+I(UI_2^2))
summary(LnModel_1)
It output the values below:
Call:
lm(formula = PROF ~ UI_1 + UI_2 + I(UI_1^2) + UI_1:UI_2 + I(UI_2^2))
Residuals:
Min 1Q Median 3Q Max
-3.4492 -0.5405 0.1096 0.4226 1.7346
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.66274 0.06444 72.354 < 2e-16 ***
UI_1 0.25665 0.07009 3.662 0.000278 ***
UI_2 0.25569 0.09221 2.773 0.005775 **
I(UI_1^2) -0.15168 0.04490 -3.378 0.000789 ***
I(UI_2^2) -0.08418 0.05162 -1.631 0.103643
UI_1:UI_2 -0.02849 0.05453 -0.522 0.601621
Then I use names(coef()) to extract the coefficient names
names(coef(LnModel_1))
output:
[1] "(Intercept)" "UI_1" "UI_2" "I(UI_1^2)"
"I(UI_2^2)""UI_1:UI_2"
For some reasons, when I use glht(), it give me an error on UI_2^2
slope <- glht(LnModel_1, linfct = c("UI_2+ UI_1:UI_2*2.5+ 2*2.5*I(UI_2^2) =0
") )
Output:
Error: multcomp:::expression2coef::walkCode::eval: within ‘UI_2^2’, the term
‘UI_2’ must not denote an effect. Apart from that, the term must evaluate to
a real valued constant
Don't know why it would give me this error message. How to input the I(UI_2^2) coefficient to the glht()
Thank you very much
The issue seems to be that I(UI^2) can be interpreted as an expression in R in the same fashion you did here LnModel_1 <- lm(formula = PROF ~ UI_1+UI_2+I(UI_1^2)+UI_1:UI_2+I(UI_2^2))
Therefore, you should indicate R that you want to evaluate a string inside your string:
slope <- glht(LnModel_1, linfct = c("UI_2+ UI_1:UI_2*2.5+ 2*2.5*\`I(UI_2^2)\` =0
") )
Check my example (since I cannot reproduce your problem):
library(multcomp)
cars <- copy(mtcars)
setnames(cars, "disp", "UI_2")
model <- lm(mpg~I(UI_2^2),cars)
names(coef(model))
slope <- glht(model, linfct = c("2*2.5*\`I(UI_2^2)\` =0") )

Updating model factor levels within function not working properly

I'm running some GLM models in R on a some data related to feeding trials I am doing. I'm regressing my variables of interest on a two predictors: one factor with three levels and one continuous variable. I want to compare the intercepts for each level of the factor to one another to determine if they're different. To do this, I wrote a function (called interceptCompare in the reproducible code below) which relevels the factor and updates the model and then saves the results of each model. It's my quick way of doing all the pair-wise comparisons of the intercepts.
The problem is that when I run the function, it doesn't appear to properly update the model. Each item of the list returned is the same, when they should be changing so that each item has a different level of the factor as the "(Intercept)" that the other levels are being compared against. I suspect it has something to do with the environment of the function, but I'm not sure. I haven't been able to find a similar example on stackoverflow or google.
Here's what should be a reproducible example:
food <- as.factor(rep(c("a", "b", "c"), each = 20))
variable <- rbinom(60, 1, 0.7)
movement <- rgamma(60, 10, 2)
binomial.model <- glm(variable ~ food,
family = "binomial")
gamma.model <- glm(movement ~ food,
family = Gamma)
interceptCompare <- function(model, factor) {
results <- list() # empty list to store results
for (i in unique(factor)) {
factor <- relevel(factor, ref = i)
model <- update(model)
results[[i]] <- summary(model)$coefficients[1:3, ]
}
results <- lapply(results, function(x) round(x, 4))
return(results)
}
interceptCompare(binomial.model, food)
interceptCompare(gamma.model, food)
You will need to add one line, in order to change the data, and use it within the update:
interceptCompare <- function(model, factor) {
results <- list() # empty list to store results
s <- deparse(substitute(factor))#ADD THIS LINE
for (i in unique(factor)) {
factor <- relevel(factor, ref = i)
model[["model"]][[s]] <- factor #CHANGE THE DATA IN THE MODEL
model <- update(model,data=model[["model"]])# UPDATE THE MODEL
results[[i]] <- summary(model)$coefficients[1:3, ]
}
results <- lapply(results, function(x) round(x, 4))
return(results)
}
interceptCompare(binomial.model, food)
$a
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.3863 0.5590 2.4799 0.0131
foodb -0.7673 0.7296 -1.0516 0.2930
foodc -0.2877 0.7610 -0.3780 0.7054
$b
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.6190 0.4688 1.3205 0.1867
fooda 0.7673 0.7296 1.0516 0.2930
foodc 0.4796 0.6975 0.6876 0.4917
$c
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.0986 0.5164 2.1275 0.0334
foodb -0.4796 0.6975 -0.6876 0.4917
fooda 0.2877 0.7610 0.3780 0.7054
interceptCompare(gamma.model, food)
$a
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2246 0.0156 14.3919 0.0000
foodb -0.0170 0.0213 -0.8022 0.4257
foodc -0.0057 0.0218 -0.2608 0.7952
$b
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2076 0.0144 14.3919 0.0000
fooda 0.0170 0.0213 0.8022 0.4257
foodc 0.0114 0.0210 0.5421 0.5898
$c
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2189 0.0152 14.3919 0.0000
foodb -0.0114 0.0210 -0.5421 0.5898
fooda 0.0057 0.0218 0.2608 0.7952
You need to be much more careful when trying to swap out symbols from formulas which is what you are doing. You need to put it in terms that the R language can understand. You want to pass in the name "food" not the values stored in the "food" vector as you are now. Here's an update that seems to do what you were trying to do
interceptCompare <- function(model, factor) {
sym <- substitute(factor)
results <- list() # empty list to store results
for (i in unique(factor)) {
change <- eval(bquote(~.-.(sym)+relevel(.(sym), ref=.(i))))
new_model <- update(model, change)
results[[i]] <- summary(new_model)$coefficients[1:3, ]
}
results <- lapply(results, function(x) round(x, 4))
return(results)
}
Here we capture the name "food" with the substitute. Then we use bquote() to build a new formula that will remove the original value, and relevel the factor variable with a particular reference. Then we save this to a new object so we don't keep updating the same model. For the binomial.model, this returns
$`a`
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.8473 0.4879 1.7364 0.0825
relevel(food, ref = "a")b 0.0000 0.6901 0.0000 1.0000
relevel(food, ref = "a")c -0.8473 0.6619 -1.2801 0.2005
$b
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.8473 0.4879 1.7364 0.0825
relevel(food, ref = "b")a 0.0000 0.6901 0.0000 1.0000
relevel(food, ref = "b")c -0.8473 0.6619 -1.2801 0.2005
$c
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.0000 0.4472 0.0000 1.0000
relevel(food, ref = "c")a 0.8473 0.6619 1.2801 0.2005
relevel(food, ref = "c")b 0.8473 0.6619 1.2801 0.2005
You can see how it changed the ref= at each iteration

Extract regression coefficient values

I have a regression model for some time series data investigating drug utilisation. The purpose is to fit a spline to a time series and work out 95% CI etc. The model goes as follows:
id <- ts(1:length(drug$Date))
a1 <- ts(drug$Rate)
a2 <- lag(a1-1)
tg <- ts.union(a1,id,a2)
mg <-lm (a1~a2+bs(id,df=df1),data=tg)
The summary output of mg is:
Call:
lm(formula = a1 ~ a2 + bs(id, df = df1), data = tg)
Residuals:
Min 1Q Median 3Q Max
-0.31617 -0.11711 -0.02897 0.12330 0.40442
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.77443 0.09011 8.594 1.10e-11 ***
a2 0.13270 0.13593 0.976 0.33329
bs(id, df = df1)1 -0.16349 0.23431 -0.698 0.48832
bs(id, df = df1)2 0.63013 0.19362 3.254 0.00196 **
bs(id, df = df1)3 0.33859 0.14399 2.351 0.02238 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
I am using the Pr(>|t|) value of a2 to test if the data under investigation are autocorrelated.
Is it possible to extract this value of Pr(>|t|) (in this model 0.33329) and store it in a scalar to perform a logical test?
Alternatively, can it be worked out using another method?
A summary.lm object stores these values in a matrix called 'coefficients'. So the value you are after can be accessed with:
a2Pval <- summary(mg)$coefficients[2, 4]
Or, more generally/readably, coef(summary(mg))["a2","Pr(>|t|)"]. See here for why this method is preferred.
The package broom comes in handy here (it uses the "tidy" format).
tidy(mg) will give a nicely formated data.frame with coefficients, t statistics etc. Works also for other models (e.g. plm, ...).
Example from broom's github repo:
lmfit <- lm(mpg ~ wt, mtcars)
require(broom)
tidy(lmfit)
term estimate std.error statistic p.value
1 (Intercept) 37.285 1.8776 19.858 8.242e-19
2 wt -5.344 0.5591 -9.559 1.294e-10
is.data.frame(tidy(lmfit))
[1] TRUE
Just pass your regression model into the following function:
plot_coeffs <- function(mlr_model) {
coeffs <- coefficients(mlr_model)
mp <- barplot(coeffs, col="#3F97D0", xaxt='n', main="Regression Coefficients")
lablist <- names(coeffs)
text(mp, par("usr")[3], labels = lablist, srt = 45, adj = c(1.1,1.1), xpd = TRUE, cex=0.6)
}
Use as follows:
model <- lm(Petal.Width ~ ., data = iris)
plot_coeffs(model)
To answer your question, you can explore the contents of the model's output by saving the model as a variable and clicking on it in the environment window. You can then click around to see what it contains and what is stored where.
Another way is to type yourmodelname$ and select the components of the model one by one to see what each contains. When you get to yourmodelname$coefficients, you will see all of beta-, p, and t- values you desire.

Resources