R: Stepwise regression with multiple orthogonal polynomials - r

I am currently looking for an "optimal" fit for some data. I would like to use AIC stepwise regression to find the "best" polynomial regression for my outcome (y) with three variables (a, b, c) and maximum ^3. I also have interactions.
If I use:
lm_poly <- lm(y ~ a + I(a^2) + I(a^3) + b + I(b^2) + I(b^3) + c + a:b, my_data)
stepAIC(lm_poly, direction = "both")
I will get collinearities due to the use of I(i^j)-terms. This shows in the beta regression coefficients of the final fit. There are terms >|1|.
Is there a possibility to do stepwise regression with orthogonal terms?
Using poly() would be nice, but I just don't understand how to do stepwise regression with poly().
lm_poly2 <- lm(y ~ poly(a,3) + poly(b,3) + c + a:b, my_data)
stepAIC(lm_poly2, direction = "both")
This will not include steps with a, a^2(and b respectivly) and thus not find the results I am looking for.
(I know, that I might still have collinearities do to the interaction a:b.)
I hope, someone can understand my point.
Thank you in advance.
Jans

Related

Do a linear regression on a non-linear function in R

I'm not quite sure to understand clearly how non-linear regression can be analysed in R.
I found that I can still do a linear regression with lm by using the log of my functions.
My first function is this one, where β1 is the intercept, β2 is the slope and ε is the error term :
I think that the following command gives me what I want:
result <- lm(log(Y) ~ log(X1), data=dataset)
The problem is with the following function:
I don't know what should I put inside the lm in order to perform a linear regression on my function... Any idea ?
The following math shows how to transform your equation into a linear regression:
Y = b0*exp(b1*X1 + epsilon)
log(Y) = log(b0) + b1*X1 + epsilon
log(Y) = c0 + b1*X1 + epsilon
So in R this is just
lm(log(Y) ~ X, data = your_data)
You won't get a direct estimate for b0, though, just an estimate of log(b0). But you can back-transform the intercept and its confidence intervals by exponentiating them.
b0_est <- exp(coef(fitted_model)["(Intercept)"])
b0_ci <- exp(confint(fitted_model)["(Intercept)", ])

Compare beta coefficients of the same regression

is there a way to compare (standardized) beta coefficients of one sample and regression without generating two models and conducting an anova? Is there a simpler method with e.g. one function?
For example, if I have this model and would want to compare beta coefficients of SE_gesamt and CE_gesamt (only two variables):
library(lm.beta)
fit1 <- lm(Umint_gesamt ~ Alter + Geschlecht_Dummy + SE_gesamt + CE_gesamt + EmoP_gesamt + Emp_gesamt + IN_gesamt + DN_gesamt + SozID_gesamt, data=dataset)
summary(fit1)
lm.beta(fit1)
All the best,
Karen

manually backward elimination when intercept is not significant in R?

I am using the following code to do Backward Elimination:
regressor = lm (formula = y ~ a + b + c + d + e, data = dataset)
summary (regressor)
then remove the predictor with the highest P-value.
eg. if c has the largest p-value, then
regressor = lm (formula = y ~ a + b + d + e, data = dataset)
summary (regressor)
repeat until we have all variables with p-value < Significant Level.
But I encounter a problem here, I found the intercept has the largest p-value and I cannot specify or remove it in "regressor".
Could someone help me out here plz?
It seems like what you're asking is how to run a regression without an intercept? Then you can do so using Y~x1+x2...-1 as your formula in lm().

Model Fit statistics for a Logistic Regression

I'm running a logistic regression model in R. I've used both the Zelig and Car packages. However, I'm wondering if there is a simple way to get the model fit statistics for the model. (pseudo R-square, chi-square, log liklihood,etc)
Assume glm1 ist your model and your samplesize is n = 100.
Here are some goodness-of-fit-measures:
R2 <- 1 - ((glm1$deviance/-2)/(glm1$null.deviance/-2))
cat("mcFadden R2 = ", R2, "\n")
R2 <- 1 - exp((glm1$deviance - glm1$null.deviance)/2 * n)
cat("Cox-Snell R2 = ", R2, "\n")
R2 <- R2/(1 - exp((-glm1$null.deviance)/n))
cat("Nagelkerke R2 = ", R2, "\n")
AIC <- glm1$deviance + 2 * 2
cat("AIC = ", AIC, "\n")
In this way you have an overview of how calculating the GoF-Measurements.
Typically this is done using the summary() function.
It's hard to answer this question without knowing what the model object is. I'm not sure what Zelig produces.
I would look at names(model), names(summary(model)) or names(anova(model,test = "Chisq")) to see if the stats you want are there. I know that for log-likelihood, logLik(model) will give you what you want.
While I'm no expert, model fit statistics for logistics regression models are not as straightforward in their interpretation as those in linear regression. Assuming you have a binary response, one method I've found useful is to group your data by predicted probability interval (0-10%, 10%-20%,....90%-100%) and comparing the actual probabilities to the predicted ones. This is very helpful because often your model will over predict at the low end or under predict at the high end. This may lead to a better model as well.
have a look at the pscl package. Be careful however, with missing data:
library("MASS","pscl")
admit_2 <- admit
admit_2$gre.quant[sample(1:106, 45)] <- NA
m0 <- MASS::polr(score ~ gre.quant + gre.verbal + ap + pt + female,
Hess=TRUE,
data=admit_2,
method="probit")
m1 <- MASS::polr(score ~ gre.quant + gre.verbal + ap + pt + female,
Hess=TRUE,
data= na.omit(admit_2),
method="probit")
pR2(m0)
llh llhNull G2 McFadden r2ML r2CU
-57.4666891 -151.0299826 187.1265870 0.6195015 0.9534696 0.9602592
pR2(m1)
llh llhNull G2 McFadden r2ML r2CU
-57.4666891 -83.3891852 51.8449922 0.3108616 0.5725500 0.6123230
Also, have a look here:
https://stats.stackexchange.com/questions/8511/how-to-calculate-pseudo-r2-from-rs-logistic-regression

Constrained least squares

I am fitting a simple regression in R on gas usage per capita. The regression formulas looks like:
gas_b <- lm(log(gasq_pop) ~ log(gasp) + log(pcincome) + log(pn) +
log(pd) + log(ps) + log(years),
data=gas)
summary(gas_b)
I want to include a linear constraint that the beta coefficients of log(pn)+log(pd)+log(ps)=1 (sum to one). Is there a simple way of implementing this (possibly in the lm function) in R without having to use constrOptim() function?
Modify your regression as follows:
gas_b <- lm(log(gasq_pop) - log(ps) ~ log(gasp) + log(pcincome) +
I(log(pn)-log(ps)) + I(log(pd)-log(ps)) + log(years), data=gas)
summary(gas_b)
If b=coef(gas_b), then the relevant coefficients are
log(pn): b[4]
log(pd): b[5]
log(ps): 1 - b[4] - b[5]

Resources