Impose Constraint on Intercept in Linear Regression Using R [duplicate] - r

This question already has answers here:
Force certain parameters to have positive coefficients in lm()
(3 answers)
Closed 1 year ago.
I have a linear regression of the form
Y = a + b1 * X1 + b2 * X2 + b3 * X4
I would like to constrain the intercept parameter a to be a => 0
(i.e., a should be a non-negative value).
What are possible ways to do this in R? Specifically, I would be interested in solutions using the caret package.
Thank you for your answers.

A linear model.
m0 <- lm(wt ~ qsec + hp + disp, data = mtcars)
m0
#
# Call:
# lm(formula = wt ~ qsec + hp + disp, data = mtcars)
#
# Coefficients:
# (Intercept) qsec hp disp
# -2.450047 0.201713 0.003466 0.006755
Force the intercept to be zero.
m1 <- lm(wt ~ qsec + hp + disp - 1, data = mtcars)
m1
#
# Call:
# lm(formula = wt ~ qsec + hp + disp - 1, data = mtcars)
#
# Coefficients:
# qsec hp disp
# 0.0842281 0.0002622 0.0072967
You can use nls to apply limits to the paramaters (in this case the lower limit).
m1n <- nls(wt ~ a + b1 * qsec + b2 * hp + b3 * disp,
data = mtcars,
start = list(a = 1, b1 = 1, b2 = 1, b3 = 1),
lower = c(0, -Inf, -Inf, -Inf), algorithm = "port")
m1n
# Nonlinear regression model
# model: wt ~ a + b1 * qsec + b2 * hp + b3 * disp
# data: mtcars
# a b1 b2 b3
# 0.0000000 0.0842281 0.0002622 0.0072967
# residual sum-of-squares: 4.926
#
# Algorithm "port", convergence message: relative convergence (4)
See here for other example solutions.

Related

Interpreting and plotting car::vif() with categorical variable

I am trying to use vif() from the car package to calculate VIF values after a regression based on this guide.
Without any categorical variables you get output that looks like this:
#code
model <- lm(mpg ~ disp + hp + wt + drat, data = mtcars)
vif_values <- vif(model)
vif_values
barplot(vif_values, main = "VIF Values", horiz = TRUE, col = "steelblue")
abline(v = 5, lwd = 3, lty = 2)
disp hp wt drat
8.209402 2.894373 5.096601 2.279547
However, the output changes if you add a categorical variable:
mtcars$cat <- sample(c("a", "b", "c"), size = nrow(mtcars), replace = TRUE)
model <- lm(mpg ~ disp + hp + wt + drat + cat, data = mtcars)
vif_values <- vif(model)
vif_values
GVIF Df GVIF^(1/(2*Df))
disp 8.462128 1 2.908974
hp 3.235798 1 1.798832
wt 5.462287 1 2.337154
drat 2.555776 1 1.598679
cat 1.321969 2 1.072273
Two questions: 1. How do I interpret this different output? Is the GVIF equivalent to the numbers output in the first version? 2. How do I make a nice bar chart with this the way the guide shows?

Path diagram in r

I am trying to plot a path diagram of a Structural Equation Model(SEM) in R. I was able to plot it using semPlot::semPaths(). The output is similar to The SEM was modeled using lavaan package.
I want a plot similar to . with estimates and p values. Can anyone help me out?
My suggestion would be lavaanPlot (see more of it in the author's personal website):
library(lavaan)
library(lavaanPlot)
# path model
model <- 'mpg ~ cyl + disp + hp
qsec ~ disp + hp + wt'
fit1 <- sem(model, data = mtcars)
labels1 <- list(mpg = "Miles Per Gallon", cyl = "Cylinders", disp = "Displacement", hp = "Horsepower", qsec = "Speed", wt = "Weight") #define labels
lavaanPlot(model = fit1, labels = labels1, coefs = TRUE, stand = TRUE, sig = 0.05) #standardized regression paths, showing only paths with p<= .05
check this example, it might be helpful
https://rstudio-pubs-static.s3.amazonaws.com/78926_5aa94ae32fae49f3a384ce885744ef4a.html

R logistic regression extracting coefficients in a loop: error with setting up loop

I'm trying to build a logistic regression model with 3 predictors, and I have a list of IDs for each predictor like below. (using mtcars dataset as an example)
var1 <- c("mpg", "cyl", "disp")
var2 <- c("mpg", "hp", "wt")
var3 <- c("drat", "wt", "gear", "carb")
I want to build multiple regression models with each of these IDs used. am is a fixed variable that I want to predict, so each of my model would look like:
mod1 <- glm(am ~ mpg + mpg + drat, data=mtcars, ...)
mod2 <- glm(am ~ mpg + mpg + wt, data=mtcars, ...)
mod3 <- glm(am ~ mpg + mpg + gear, data=mtcars, ...)
...
mod5 <- glm(am ~ mpg + hp + drat, data=mtcars, ...)
...
mod9 <- glm(am ~ mpg + wt + drat, data=mtcars, ...)
...
mod36 <- glm(am ~ disp + wt + carb, data=mtcars, ...)
So in this case it would be 3*3*4 = 36 models total. I'm trying to use apply like below.
coefs_mat <- expand.grid(var1, var2, var3)
mods = apply(coefs_mat, 1, function(row) {
glm(as.formula(am ~ row[1] + row[2] + row[3]), data = mtcars,
family = "binomial",control=list(maxit=20))
})
(+ Edit: coefs_mat looks like below:
>coefs_mat
var1 var2 var3
1 mpg mpg drat
2 cyl mpg drat
3 disp mpg drat
4 mpg hp drat
...
36 disp wt carb
This gives the following error: "object of type 'closure' is not subsettable".
I searched for other Stackoverflow posts that had similar problems, and tried this instead:
mods = apply(coefs_mat, 1, function(row) {
glm(as.formula(paste("am~", row[1] + row[2] + row[3])), data = mtcars,
family = "binomial",control=list(maxit=20))
})
But this gave another error: "Error in row[1] + row[2] : non-numeric argument to binary operator". What's causing these errors in my code?
I solved this by using sprintf.
var1 <- c("mpg", "cyl", "disp")
var2 <- c("mpg", "hp", "wt")
var3 <- c("drat", "wt", "gear", "carb")
coefs_mat <- expand.grid(var1, var2, var3)
vars_comb <- apply(coefs_mat, 1, function(x){paste(sort(x), collapse = '+')})
formula_vec <- sprintf("am ~ %s", vars_comb)
glm_res <- lapply(formula_vec, function(x) {
fit1 <- glm(x, data = mtcars, family = binomial("logit"))
return(fit1)
})

How do I retrieve the equation of a 3D fit using lm()?

Suppose I have the following code to fit a hyperbolic parabola:
# attach(mtcars)
hp_fit <- lm(mpg ~ poly(wt, disp, degree = 2), data = mtcars)
Where wt is the x variable, disp is the y variable, and mpg is the z variable. (summary(hp_fit))$coefficients outputs the following:
>(summary(hp_fit))$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22.866173 3.389734 6.7457122 3.700396e-07
poly(wt, disp, degree = 2)1.0 -13.620499 8.033068 -1.6955539 1.019151e-01
poly(wt, disp, degree = 2)2.0 15.331818 17.210260 0.8908534 3.811778e-01
poly(wt, disp, degree = 2)0.1 -9.865903 5.870741 -1.6805208 1.048332e-01
poly(wt, disp, degree = 2)1.1 -100.022013 121.159039 -0.8255431 4.165742e-01
poly(wt, disp, degree = 2)0.2 14.719928 9.874970 1.4906301 1.480918e-01
I do not understand how to interpret the varying numbers to the right of poly() under the (Intercept) column. What is the significance of these numbers and how would I construct an equation for the hyperbolic paraboloid fit from this summary?
When you compare
with(mtcars, poly(wt, disp, degree=2))
with(mtcars, poly(wt, degree=2))
with(mtcars, poly(disp, degree=2))
the 1.0 2.0 refer to the first and second degree of wt, and the 0.1 0.2 refer to the first and second degree of disp. The 1.1 is an interaction term. You may check this by comparing:
summary(lm(mpg ~ poly(wt, disp, degree=2, raw=T), data=mtcars))$coe
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 4.692786e+01 7.008139762 6.6961935 4.188891e-07
# poly(wt, disp, degree=2, raw=T)1.0 -1.062827e+01 8.311169003 -1.2787937 2.122666e-01
# poly(wt, disp, degree=2, raw=T)2.0 2.079131e+00 2.333864211 0.8908534 3.811778e-01
# poly(wt, disp, degree=2, raw=T)0.1 -3.172401e-02 0.060528241 -0.5241191 6.046355e-01
# poly(wt, disp, degree=2, raw=T)1.1 -2.660633e-02 0.032228884 -0.8255431 4.165742e-01
# poly(wt, disp, degree=2, raw=T)0.2 2.019044e-04 0.000135449 1.4906301 1.480918e-01
summary(lm(mpg ~ wt*disp + I(wt^2) + I(disp^2) , data=mtcars))$coe[c(1:2, 4:3, 6:5), ]
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 4.692786e+01 7.008139762 6.6961935 4.188891e-07
# wt -1.062827e+01 8.311169003 -1.2787937 2.122666e-01
# I(wt^2) 2.079131e+00 2.333864211 0.8908534 3.811778e-01
# disp -3.172401e-02 0.060528241 -0.5241191 6.046355e-01
# wt:disp -2.660633e-02 0.032228884 -0.8255431 4.165742e-01
# I(disp^2) 2.019044e-04 0.000135449 1.4906301 1.480918e-01
This yields the same values. Note that I used raw=TRUE for comparison purposes.

looping over regression, treating a constant as a variable. ERROR: variable lengths differ

I want to loop over the inclusion / exclusion of certain variable but I ran into an error. Here's the problem with some sample data.
mtcars = data('mtcars')
for(i in 0:1) {
fitlm = lm(mpg ~ cyl + i * drat, data = mtcars)
}
Error in model.frame.default(formula = mpg ~ cyl + i * drat, data = mtcars, : variable lengths differ (found for 'i')
But then this will run without a problem:
fitlm = lm(mpg ~ cyl + 0 * drat, data = mtcars)
fitlm = lm(mpg ~ cyl + 1 * drat, data = mtcars)
Why do the regressions work if there's a number multiplier of the variable, but fail if it's i?
Try using as.formula as follows:
# create an empty list to store the results
fitlm <- list()
# loop, fit the model and assign the result to a new list in fitlm
for(i in 0:1) {
fitlm[[i+1]] <- lm(as.formula(paste("mpg ~ cyl +", i, "* drat")), data = mtcars)
}
You can also use purrr::map instead of loops as follows:
fitlm <- purrr::map(c(0,1), ~lm(as.formula(paste("mpg ~ cyl +", .x, "* drat")), data = mtcars))
And the result will be:
> fitlm
[[1]]
Call:
lm(formula = as.formula(paste("mpg ~ cyl +", i, "* drat")),
data = mtcars)
Coefficients:
cyl
2.79
[[2]]
Call:
lm(formula = as.formula(paste("mpg ~ cyl +", i, "* drat")),
data = mtcars)
Coefficients:
(Intercept) cyl
37.885 -2.876
It's a bit of a hack, but you could try something of the form
fitlm = list()
for(i in 0:1) {
idrat = i*mtcars$drat
fitlm[[i+1]] = lm(mpg ~ cyl + idrat, data = mtcars)
}
which gives the result
fitlm
## [[1]]
##
## Call:
## lm(formula = mpg ~ cyl + idrat, data = mtcars)
##
## Coefficients:
## (Intercept) cyl idrat
## 37.885 -2.876 NA
##
##
## [[2]]
##
## Call:
## lm(formula = mpg ~ cyl + idrat, data = mtcars)
##
## Coefficients:
## (Intercept) cyl idrat
## 28.725 -2.484 1.872
This gets around the lm() function looking for interactions when it sees the * character, as you found when using a number.

Resources