How to add one variable each time into the regression model? - r

I have a question about how to add one variable each time into the regression model to evaluate the adjusted R squared.
For example,
lm(y~x1)
next time, I want to do
lm(y~x1+x2)
and then,
lm(y~x1+x2+x3)
I tried paste, it does not work. for example, lm(y~paste("x1","x2",sep="+")).
Any idea?

Assuming you fit 3 variables to your linear regression model: x1, x2 and x3
lm.fit1 = lm(y ~ x1 + x2 + x3)
Introducing an additional variable (x4) can be achieved by using the update function:
lm.fit2 = update(lm.fit1, .~. + x4)
You could even introduce an interaction term if required:
lm.fit3 = update(lm.fit2, .~. + x2:x3)
Further details on adding variables to regression models can be obtained here

Related

How can I apply interaction between two variable for multiple non-linear regression with GAM?

I have a data with Y and X1, X2 which has different dimension, like X1 = xxxx Volt, X2 = xx hour.
To make regression model with this data, I used code below.
MODEL <- gam(Y ~ s(X1) + s(X2), data = DATA, method = "REML")
It seems to work well, but I want apply a interaction between X1 and X2 to my code.
Could I do the multiple non-linear regression with using a code like :
MODEL <- gam(Y ~ s(X1) + s(X2) + ti(X1, X2), data = DATA, method = "REML")
or should I use different equation to do this work?
The preferred way would be:
MODEL <- gam(Y ~ te(X1, X2), data = DATA, method = "REML")
as that entails selection of fewer smoothness parameters so it is a slightly simpler version of the model you showed.
However, if you want to see if the interaction is significant or otherwise want to separate out the main effects from the interaction, then yes, the model you showed is the way to go
MODEL <- gam(Y ~ s(X1) + s(X2) + ti(X1, X2), data = DATA, method = "REML")

Using margins command in R with quadratic term and interacted dummy variables

My objective is to create marginal effects and a plot similar to what's done in this post under "marginal effects": https://www.drbanderson.com/myresources/interpretinglogisticregressionpartii/
Since I cannot provide the actual model or actual data (data is sensitive), I will provide a generic example.
I have the following model created using the glm function:
model = glm(y ~ as.factor(x1) + x2 + I(x2^2) + x3 + as.factor(x4):as.factor(x5), data = dataFrame,family="binomial")
x2 is a continuous variable that I want to calculate marginal effects at the average of the other continuous variable, x3, and at pre-defined values for x1, x4, and x5. For further simplification, assume x1 is categorical of either morning, afternoon, or night (thus producing two coefficients in the logit model), x4 is categorical of either left or right, and x5 is categorical of either up or down (thus x4:x5 produces coefficient results for left and up, left and down, right and up, with right and down the excluded interaction).
Similar to what is done in the post, I run the following code:
x2.inc <- seq(min(dataFrame$x2), max(dataFrame$x2), by = .1)
to get a sequence of x2 values at which to evaluate the marginal effect. Finally, I attempt to run the margins command:
x2.margins.df <- as.data.frame(summary(margins(model, at = list(x2 = x2.inc, x3 = mean(dataFrame$x3), x1 = 'morning', x4 = 'left', x5 = 'right'))))
However, running this produced the following error:
Error in attributes(.Data) <- c(attributes(.Data), attrib) :
'names' attribute [1] must be the same length as the vector [0]
Any thoughts on how I can successfully run the margins command given a) the quadratic nature of x2 in my model, and b) the interaction of terms in the model?
As a side note: I know I can calculate these things manually if I wanted to. However, for the sake of having less code and ease of reproducibility, I'd like to make this method work. Thank you for the assistance!
The readme of margins says:
https://cran.r-project.org/web/packages/margins/readme/README.html
that it supports logit models. So why implement somethiny manually?
library("car")
library("plm")
data("LaborSupply", package = "plm")
model <- glm(disab ~ kids*age + kids*I(age^2), data = LaborSupply, family="binomial")
summary(margins(model))

Replacing intercept with dummy variables in ARIMAX models in R

I am attempting to fit an ARIMAX model to daily consumption data in R. When I perform an OLS regression with lm() I am able to include a dummy variable for each unit and remove the constant term (intercept) to avoid less then full rank matrices.
lm1 <- lm(y ~ -1 + x1 + x2 + x3, data = dat)
I have not found a way to do this with arima() which forces me to use the constant term and exclude one of the dummy variables.
with(dat, arima(y, xreg = cbind(x1, x2))
Is there a specific reason why arima() doesn't allow this and is there a way to bypass?
See the documentation for the argument include.mean in ?arima, it seems you want the following: arima(y, xreg = cbind(x1, x2), include.mean=FALSE).
Be also aware of the definition of the model fitted by ARIMA as pointed by #RichardHardy.

R: How to run regression (glm function) with some coefficients specified?

Suppose I have one dependent variable, and 4 independent variables. I suspect only 3 of the independent variables are significant, so I use the glm(y~ x1 + x2 + x3...) function. Then I get some coefficients for these variables. Now I want to run glm(y ~ x1 + x2 + x3 + x4), but I want to specify that the x1, x2, x3 coefficients remain the same. How could I accomplish this?
Thanks!
I don't think you can fit a model where some of the independent variables have fixed parameters. What you can do is create a new variable y2 that equals the predicted value of your first model with x1+x2+x3. Then, you can fit a second model y~y2+x4 to include it as an independent variable along with x4.
So basically, something like this:
m1 <- glm(y~x1+x2+x3...)
data$y2 <- predict(glm, newdata=data)
m2 <- glm(y~y2+x4...)

test proportional odds assumption with 2 random variables R ordinal logistic

I'm using the package ordinal in R to run ordinal logistic regression on a dependent variable that is based on a 1 - 5 likert scale and trying to figure out how to test the proportional odds assumption.
My current model is y ~ x1 + x2 + x3 + x4 + x2*x3 + (1|ID) + (1|form) where x1 and x2 are dichotomous and x3 and x4 are continuous variables. (92 subjects, 4 forms).
As far as I know,
-"nominal" is not implemented in the more recent version of clmm.
-clmm2 (the older version) does not accept more than one random variable
-nominal_test() only appears to work for clm2 (without random effects at all)
For a different dv (that only has one random term and no interaction), I had used:
m1 <- clmm2 (y ~ x1 + x2 + x3, random = ID, Hess = TRUE, data = d
m1.nom <- clmm2 (y ~ x1 + x2, random = ID, Hess = TRUE, nominal = ~x3, data = d)
m2.nom <- clmm2 (y ~ x2+ x3, random = ID, Hess = TRUE, nominal = ~ x1, data = d)
m3.nom <- clmm2 (y ~ x1+ x3, random = ID, Hess = TRUE, nominal = ~ x2, data = d)
anova (m1.nom, m1)
anova (m2.nom, m1)
anova (m3.nom, m1) # (as well as considering the output in summary (m#.nom)
But I'm not sure how to modify this approach to handle the current model (2 random terms and an interaction of the fixed effects), nor am I sure that this actually a correct way to test the proportional odds assumption in the first place. (The example in the package tutorial only has 2 fixed effects.)
I'm open to other approaches (be they other packages, software, or graphical approaches) that would let me test this. Any suggestions?
Even in the case of the most basic ordinal logistic regression models, the diagnostic tests for the proportional odds assumption are known to frequently reject the null hypothesis that the coefficients are the same across the levels of the ordered factor. The statistician Frank Harrell suggests here a general graphical method for examining the proportional odds assumption, which is probably your best bet. In this approach you'd just graph the linear predictions from a logit model (with random effects) for each level of the outcome and one predictor variable at a time.

Resources