Extract p value from multiple linear regression f-test [duplicate] - r

How do you pull out the p-value (for the significance of the coefficient of the single explanatory variable being non-zero) and R-squared value from a simple linear regression model? For example...
x = cumsum(c(0, runif(100, -1, +1)))
y = cumsum(c(0, runif(100, -1, +1)))
fit = lm(y ~ x)
summary(fit)
I know that summary(fit) displays the p-value and R-squared value, but I want to be able to stick these into other variables.

r-squared: You can return the r-squared value directly from the summary object summary(fit)$r.squared. See names(summary(fit)) for a list of all the items you can extract directly.
Model p-value: If you want to obtain the p-value of the overall regression model,
this blog post outlines a function to return the p-value:
lmp <- function (modelobject) {
if (class(modelobject) != "lm") stop("Not an object of class 'lm' ")
f <- summary(modelobject)$fstatistic
p <- pf(f[1],f[2],f[3],lower.tail=F)
attributes(p) <- NULL
return(p)
}
> lmp(fit)
[1] 1.622665e-05
In the case of a simple regression with one predictor, the model p-value and the p-value for the coefficient will be the same.
Coefficient p-values: If you have more than one predictor, then the above will return the model p-value, and the p-value for coefficients can be extracted using:
summary(fit)$coefficients[,4]
Alternatively, you can grab the p-value of coefficients from the anova(fit) object in a similar fashion to the summary object above.

Notice that summary(fit) generates an object with all the information you need. The beta, se, t and p vectors are stored in it. Get the p-values by selecting the 4th column of the coefficients matrix (stored in the summary object):
summary(fit)$coefficients[,4]
summary(fit)$r.squared
Try str(summary(fit)) to see all the info that this object contains.
Edit: I had misread Chase's answer which basically tells you how to get to what I give here.

You can see the structure of the object returned by summary() by calling str(summary(fit)). Each piece can be accessed using $. The p-value for the F statistic is more easily had from the object returned by anova.
Concisely, you can do this:
rSquared <- summary(fit)$r.squared
pVal <- anova(fit)$'Pr(>F)'[1]

I came across this question while exploring suggested solutions for a similar problem; I presume that for future reference it may be worthwhile to update the available list of answer with a solution utilising the broom package.
Sample code
x = cumsum(c(0, runif(100, -1, +1)))
y = cumsum(c(0, runif(100, -1, +1)))
fit = lm(y ~ x)
require(broom)
glance(fit)
Results
>> glance(fit)
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual
1 0.5442762 0.5396729 1.502943 118.2368 1.3719e-18 2 -183.4527 372.9055 380.7508 223.6251 99
Side notes
I find the glance function is useful as it neatly summarises the key values. The results are stored as a data.frame which makes further manipulation easy:
>> class(glance(fit))
[1] "data.frame"

While both of the answers above are good, the procedure for extracting parts of objects is more general.
In many cases, functions return lists, and the individual components can be accessed using str() which will print the components along with their names. You can then access them using the $ operator, i.e. myobject$componentname.
In the case of lm objects, there are a number of predefined methods one can use such as coef(), resid(), summary() etc, but you won't always be so lucky.

Extension of #Vincent 's answer:
For lm() generated models:
summary(fit)$coefficients[,4] ##P-values
summary(fit)$r.squared ##R squared values
For gls() generated models:
summary(fit)$tTable[,4] ##P-values
##R-squared values are not generated b/c gls uses max-likelihood not Sums of Squares
To isolate an individual p-value itself, you'd add a row number to the code:
For example to access the p-value of the intercept in both model summaries:
summary(fit)$coefficients[1,4]
summary(fit)$tTable[1,4]
Note, you can replace the column number with the column name in each of the above instances:
summary(fit)$coefficients[1,"Pr(>|t|)"] ##lm
summary(fit)$tTable[1,"p-value"] ##gls
If you're still unsure of how to access a value form the summary table use str() to figure out the structure of the summary table:
str(summary(fit))

This is the easiest way to pull the p-values:
coef(summary(modelname))[, "Pr(>|t|)"]

I used this lmp function quite a lot of times.
And at one point I decided to add new features to enhance data analysis. I am not in expert in R or statistics but people are usually looking at different information of a linear regression :
p-value
a and b
r²
and of course the aspect of the point distribution
Let's have an example. You have here
Here a reproducible example with different variables:
Ex<-structure(list(X1 = c(-36.8598, -37.1726, -36.4343, -36.8644,
-37.0599, -34.8818, -31.9907, -37.8304, -34.3367, -31.2984, -33.5731
), X2 = c(64.26, 63.085, 66.36, 61.08, 61.57, 65.04, 72.69, 63.83,
67.555, 76.06, 68.61), Y1 = c(493.81544, 493.81544, 494.54173,
494.61364, 494.61381, 494.38717, 494.64122, 493.73265, 494.04246,
494.92989, 494.98384), Y2 = c(489.704166, 489.704166, 490.710962,
490.653212, 490.710612, 489.822928, 488.160904, 489.747776, 490.600579,
488.946738, 490.398958), Y3 = c(-19L, -19L, -19L, -23L, -30L,
-43L, -43L, -2L, -58L, -47L, -61L)), .Names = c("X1", "X2", "Y1",
"Y2", "Y3"), row.names = c(NA, 11L), class = "data.frame")
library(reshape2)
library(ggplot2)
Ex2<-melt(Ex,id=c("X1","X2"))
colnames(Ex2)[3:4]<-c("Y","Yvalue")
Ex3<-melt(Ex2,id=c("Y","Yvalue"))
colnames(Ex3)[3:4]<-c("X","Xvalue")
ggplot(Ex3,aes(Xvalue,Yvalue))+
geom_smooth(method="lm",alpha=0.2,size=1,color="grey")+
geom_point(size=2)+
facet_grid(Y~X,scales='free')
#Use the lmp function
lmp <- function (modelobject) {
if (class(modelobject) != "lm") stop("Not an object of class 'lm' ")
f <- summary(modelobject)$fstatistic
p <- pf(f[1],f[2],f[3],lower.tail=F)
attributes(p) <- NULL
return(p)
}
# create function to extract different informations from lm
lmtable<-function (var1,var2,data,signi=NULL){
#var1= y data : colnames of data as.character, so "Y1" or c("Y1","Y2") for example
#var2= x data : colnames of data as.character, so "X1" or c("X1","X2") for example
#data= data in dataframe, variables in columns
# if signi TRUE, round p-value with 2 digits and add *** if <0.001, ** if < 0.01, * if < 0.05.
if (class(data) != "data.frame") stop("Not an object of class 'data.frame' ")
Tabtemp<-data.frame(matrix(NA,ncol=6,nrow=length(var1)*length(var2)))
for (i in 1:length(var2))
{
Tabtemp[((length(var1)*i)-(length(var1)-1)):(length(var1)*i),1]<-var1
Tabtemp[((length(var1)*i)-(length(var1)-1)):(length(var1)*i),2]<-var2[i]
colnames(Tabtemp)<-c("Var.y","Var.x","p-value","a","b","r^2")
for (n in 1:length(var1))
{
Tabtemp[(((length(var1)*i)-(length(var1)-1))+n-1),3]<-lmp(lm(data[,var1[n]]~data[,var2[i]],data))
Tabtemp[(((length(var1)*i)-(length(var1)-1))+n-1),4]<-coef(lm(data[,var1[n]]~data[,var2[i]],data))[1]
Tabtemp[(((length(var1)*i)-(length(var1)-1))+n-1),5]<-coef(lm(data[,var1[n]]~data[,var2[i]],data))[2]
Tabtemp[(((length(var1)*i)-(length(var1)-1))+n-1),6]<-summary(lm(data[,var1[n]]~data[,var2[i]],data))$r.squared
}
}
signi2<-data.frame(matrix(NA,ncol=3,nrow=nrow(Tabtemp)))
signi2[,1]<-ifelse(Tabtemp[,3]<0.001,paste0("***"),ifelse(Tabtemp[,3]<0.01,paste0("**"),ifelse(Tabtemp[,3]<0.05,paste0("*"),paste0(""))))
signi2[,2]<-round(Tabtemp[,3],2)
signi2[,3]<-paste0(format(signi2[,2],digits=2),signi2[,1])
for (l in 1:nrow(Tabtemp))
{
Tabtemp$"p-value"[l]<-ifelse(is.null(signi),
Tabtemp$"p-value"[l],
ifelse(isTRUE(signi),
paste0(signi2[,3][l]),
Tabtemp$"p-value"[l]))
}
Tabtemp
}
# ------- EXAMPLES ------
lmtable("Y1","X1",Ex)
lmtable(c("Y1","Y2","Y3"),c("X1","X2"),Ex)
lmtable(c("Y1","Y2","Y3"),c("X1","X2"),Ex,signi=TRUE)
There is certainly a faster solution than this function but it works.

For the final p-value displayed at the end of summary(), the function uses pf() to calculate from the summary(fit)$fstatistic values.
fstat <- summary(fit)$fstatistic
pf(fstat[1], fstat[2], fstat[3], lower.tail=FALSE)
Source: [1], [2]

Another option is to use the cor.test function, instead of lm:
> x <- c(44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1)
> y <- c( 2.6, 3.1, 2.5, 5.0, 3.6, 4.0, 5.2, 2.8, 3.8)
> mycor = cor.test(x,y)
> mylm = lm(x~y)
# r and rsquared:
> cor.test(x,y)$estimate ** 2
cor
0.3262484
> summary(lm(x~y))$r.squared
[1] 0.3262484
# P.value
> lmp(lm(x~y)) # Using the lmp function defined in Chase's answer
[1] 0.1081731
> cor.test(x,y)$p.value
[1] 0.1081731

x = cumsum(c(0, runif(100, -1, +1)))
y = cumsum(c(0, runif(100, -1, +1)))
fit = lm(y ~ x)
> names(summary(fit))
[1] "call" "terms"
[3] "residuals" "coefficients"
[5] "aliased" "sigma"
[7] "df" "r.squared"
[9] "adj.r.squared" "fstatistic"
[11] "cov.unscaled"
summary(fit)$r.squared

Use:
(summary(fit))$coefficients[***num***,4]
where num is a number which denotes the row of the coefficients matrix. It will depend on how many features you have in your model and which one you want to pull out the p-value for. For example, if you have only one variable there will be one p-value for the intercept which will be [1,4] and the next one for your actual variable which will be [2,4]. So your num will be 2.

Related

How to add a q value (adjusted p-value) to a modelsummary table after pooling the results of a multinomial model over multiple imputed datasets

I am using modelsummary to display the results of several multinomial models, each pooled over 5 datasets using the mice::pool function. It works great, but I want to add the q-value / adjusted p-value for false discovery rate.
I understand I need to create a tidy_custom.mipo function to add this statistic but I can't get it to work.
Below is the code to get the 'pool_univariate' list of mipo objects, which I then pass to modelsummary. It works great, I just want to add the q-value statistic.
Any idea how to do that?
Thanks a lot!
# list of exposures
exposures <- c(
Cs(exposure1,exposure2,exposure3)
## model function
models <- function(x) {
lapply(imputed_data, function(y)
multinom(as.formula(
paste0(
"outcome ~ ",
x
)
), data = y, model = TRUE)
)
}
## run models
models_univariate <- as.list(seq(1,length(exposures)))
models_univariate <- pblapply(exposures, models)
## pool
pool_univariate <- as.list(seq(1,length(exposures)))
# run pool
for(j in seq_along(exposures)) {
pool_univariate[[j]] <- pool(models_univariate[[j]])
}
It is difficult to answer this question without a minimal working example. Here I give a simpler example than the original, for the linear regression context.
First, load the package and estimate a regression model:
library(modelsummary)
mod <- lm(mpg ~ hp + drat + vs + am, data = mtcars)
Second, since we want to summarize a model of class lm, we define a new method called tidy_custom.lm. This function takes a statistical model as input, and returns a data frame that conforms to the broom package specification, with one column called term and other columns containing matching statistics. In the current example, the data frame will include three new statistics (q.value, bonferroni and holm). These values are computed using R’s p.adjust function, which adjusts p values for multiple comparison:
tidy_custom.lm <- function(x, ...) {
out <- broom::tidy(x)
out$q.value <- p.adjust(out$p.value, n = 10, method = "fdr")
out$bonferroni <- p.adjust(out$p.value, n = 10, method = "bonferroni")
out$holm <- p.adjust(out$p.value, n = 10, method = "holm")
return(out)
}
Now, we can call modelsummary with our lm model, and request the statistics:
modelsummary(mod, statistic = "q.value")
We can also compare different p values and label them nicely using glue strings:
modelsummary(mod,
statistic = c(
"p = {p.value}",
"q = {q.value}",
"p (Bonferroni) = {bonferroni}",
"p (Holm) = {holm}"
)
)

Accessing dynamically created variables in R

I created some linear models and stored their parameters in variables. The names of the variables I created in a "dynamic" way.
Now while the creation works, I now do not know how to access for example the paramters afterwards in a "dynamic" way. This example should, although being completely senseless, show the point:
q = c(0.01, 0.02)
for(i in seq_along(q)){
assign(paste0("lm_", q[[i]]), lm(mtcars$mpg ~ mtcars$disp))
}
# access the coefficients for each linear model
for(i in seq_along(q)){
intercept = coef(paste0("lm_", q[[i]]))[[1]] # does not work...
slope = coef(paste0("lm_", q[[i]]))[[2]] # does not work...
cat("Write the coefficients to some file...")
}
So my question is how I could access the coefficients of the dynamically created linear models.
Expanding on my comment above, here's how I'd reformulate your (agreeably senseless) example above. I've renamed your lm_xxxx variables because, using a list, the equivalent variable name would be lm_ or similar, which is rather too similar to the lm function for my liking.
q <- c(0.01, 0.02)
models <- lapply(
1:length(q),
function(x) lm(mtcars$mpg ~ mtcars$disp)
)
names(models) <- q
intercepts <- lapply(models, function(x) coef(x)[[1]])
intercepts
slopes <- lapply(models, function(x) coef(x)[[2]])
slopes
Giving
$`0.01`
[1] 29.59985
$`0.02`
[1] 29.59985
for intercepts and
$`0.01`
[1] -0.04121512
$`0.02`
[1] -0.04121512
for slopes.
use get. The following works as intended
q = c(0.01, 0.02)
for(i in seq_along(q)){
assign(paste0("lm_", q[[i]]), lm(mtcars$mpg ~ mtcars$disp))
}
# access the coefficients for each linear model
for(i in seq_along(q)){
intercept = coef(get(paste0("lm_", q[[i]])))[[1]]
slope = coef(get(paste0("lm_", q[[i]])))[[2]]
cat("Write the coefficients to some file...")
}
#> Write the coefficients to some file...Write the coefficients to some file...
#check
intercept
#> [1] 29.59985
slope
#> [1] -0.04121512
Created on 2021-05-25 by the reprex package (v2.0.0)

Getting a subset of variables in R summary

When using the summary function in R, is there an option I can pass in there to present only a subset of the variables?
In my example, I ran a panel regression I have several explanatory variables, and have many dummy variables whose coefficients I do not want to present. I suppose there is a simple way to do this, but couldn't find it in the function documentation. Thanks
It is in the documentation, but you have to look for the associacted print method for summary.plm. The argument is subset. Use it as in the following example:
library(plm)
data("Grunfeld", package = "plm")
mod <- plm(inv ~ value + capital, data = Grunfeld)
print(summary(mod), subset = c("capital"))
Assuming the regression you ran behaves similarly as the summary() of a basic lm() model:
# set up data
x <- 1:100 * runif(100, .01, .02)
y <- 1:100 * runif(100, .01, .03)
# run a very basic linear model
mylm <- lm(x ~ y)
summary(mylm)
# we can save summary of our linear model as a variable
mylm_summary <- summary(mylm)
# we can then isolate coefficients from this summary (summary is just a list)
mylm_summary$coefficients
#output:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2007199 0.04352267 4.611846 1.206905e-05
y 0.5715838 0.03742379 15.273273 1.149594e-27
# note that the class of this "coefficients" object is a matrix
class(mylm_summ$coefficients)
# output
[1] "matrix"
# we can convert that matrix into a data frame so it is easier to work with and subset
mylm_df_coefficients <- data.frame(mylm_summary$coefficients)

R: obtain coefficients&CI from bootstrapping mixed-effect model results

The working data looks like:
set.seed(1234)
df <- data.frame(y = rnorm(1:30),
fac1 = as.factor(sample(c("A","B","C","D","E"),30, replace = T)),
fac2 = as.factor(sample(c("NY","NC","CA"),30,replace = T)),
x = rnorm(1:30))
The lme model is fitted as:
library(lme4)
mixed <- lmer(y ~ x + (1|fac1) + (1|fac2), data = df)
I used bootMer to run the parametric bootstrapping and I can successfully obtain the coefficients (intercept) and SEs for fixed&random effects:
mixed_boot_sum <- function(data){s <- sigma(data)
c(beta = getME(data, "fixef"), theta = getME(data, "theta"), sigma = s)}
mixed_boot <- bootMer(mixed, FUN = mixed_boot_sum, nsim = 100, type = "parametric", use.u = FALSE)
My first question is how to obtain the coefficients(slope) of each individual levels of the two random effects from the bootstrapping results mixed_boot ?
I have no problem extracting the coefficients(slope) from mixed model by using augment function from broom package, see below:
library(broom)
mixed.coef <- augment(mixed, df)
However, it seems like broom can't deal with boot class object. I can't use above functions directly on mixed_boot.
I also tried to modify the mixed_boot_sum by adding mmList( I thought this would be what I am looking for), but R complains as:
Error in bootMer(mixed, FUN = mixed_boot_sum, nsim = 100, type = "parametric", :
bootMer currently only handles functions that return numeric vectors
Furthermore, is it possible to obtain CI of both fixed&random effects by specifying FUN as well?
Now, I am very confused about the correct specifications for the FUN in order to achieve my needs. Any help regarding to my question would be greatly appreciated!
My first question is how to obtain the coefficients(slope) of each individual levels of the two random effects from the bootstrapping results mixed_boot ?
I'm not sure what you mean by "coefficients(slope) of each individual level". broom::augment(mixed, df) gives the predictions (residuals, etc.) for every observation. If you want the predicted coefficients at each level I would try
mixed_boot_coefs <- function(fit){
unlist(coef(fit))
}
which for the original model gives
mixed_boot_coefs(mixed)
## fac1.(Intercept)1 fac1.(Intercept)2 fac1.(Intercept)3 fac1.(Intercept)4
## -0.4973925 -0.1210432 -0.3260958 0.2645979
## fac1.(Intercept)5 fac1.x1 fac1.x2 fac1.x3
## -0.6288728 0.2187408 0.2187408 0.2187408
## fac1.x4 fac1.x5 fac2.(Intercept)1 fac2.(Intercept)2
## 0.2187408 0.2187408 -0.2617613 -0.2617613
## ...
If you want the resulting object to be more clearly named you can use:
flatten <- function(cc) setNames(unlist(cc),
outer(rownames(cc),colnames(cc),
function(x,y) paste0(y,x)))
mixed_boot_coefs <- function(fit){
unlist(lapply(coef(fit),flatten))
}
When run through bootMer/confint/boot::boot.ci these functions will give confidence intervals for each of these values (note that all of the slopes facW.xZ are identical across groups because the model assumes random variation in the intercept only). In other words, whatever information you know how to extract from a fitted model (conditional modes/BLUPs [ranef], predicted intercepts and slopes for each level of the grouping variable [coef], parameter estimates [fixef, getME], random-effects variances [VarCorr], predictions under specific conditions [predict] ...) can be used in bootMer's FUN argument, as long as you can flatten its structure into a simple numeric vector.

change null hypothesis in lmtest in R

I have a linear model generated using lm. I use the coeftest function in the package lmtest go test a hypothesis with my desired vcov from the sandwich package. The default null hypothesis is beta = 0. What if I want to test beta = 1, for example. I know I can simply take the estimated coefficient, subtract 1 and divide by the provided standard error to get the t-stat for my hypothesis. However, there must be functionality for this already in R. What is the right way to do this?
MWE:
require(lmtest)
require(sandwich)
set.seed(123)
x = 1:10
y = x + rnorm(10)
mdl = lm(y ~ x)
z = coeftest(mdl, df=Inf, vcov=NeweyWest)
b = z[2,1]
se = z[2,2]
mytstat = (b-1)/se
print(mytstat)
the formally correct way to do this:
require(multcomp)
zed = glht(model=mdl, linfct=matrix(c(0,1), nrow=1, ncol=2), rhs=1, alternative="two.sided", vcov.=NeweyWest)
summary(zed)
Use an offset of -1*x
mdl<-lm(y~x)
mdl2 <- lm(y ~ x-offset(x) )
> mdl
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
0.5255 0.9180
> mdl2
Call:
lm(formula = y ~ x - offset(x))
Coefficients:
(Intercept) x
0.52547 -0.08197
You can look at summary(mdl2) to see the p-value (and it is the same as in mdl.
As far as I know, there is no default function to test the model coefficients against arbitrary value (1 in your case). There is the offset trick presented in the other answer, but it's not that straightforward (and always be careful with such model modifications). So, your expression (b-1)/se is actually a good way to do it.
I have two notes on your code:
You can use summary(mdl) to get the t-test for 0.
You are using lmtest with covariance structure (which will change the t-test values), but your original lm model doesn't have it. Perhaps this could be a problem? Perhaps you should use glm and specify the correlation structure from the start.

Resources