manually backward elimination when intercept is not significant in R? - r

I am using the following code to do Backward Elimination:
regressor = lm (formula = y ~ a + b + c + d + e, data = dataset)
summary (regressor)
then remove the predictor with the highest P-value.
eg. if c has the largest p-value, then
regressor = lm (formula = y ~ a + b + d + e, data = dataset)
summary (regressor)
repeat until we have all variables with p-value < Significant Level.
But I encounter a problem here, I found the intercept has the largest p-value and I cannot specify or remove it in "regressor".
Could someone help me out here plz?

It seems like what you're asking is how to run a regression without an intercept? Then you can do so using Y~x1+x2...-1 as your formula in lm().

Related

How do I set y for gmlnet with several predictors

I am supposed to find the intercept term using Ridge Regression model.
"Use Ridge Regression with alpha = 0 to and lambda = 0, divorce as the response and all the other variables as predictors."
I know I'm supposed to convert my data to matrix mode and then transform it to fit the glmnet function. I've converted my response to matrix mode, but I'm not sure how to convert all my predictors into matrix mode, too.
set.seed(100)
require(faraway)
require(leaps)
require(glmnet)
mydata = divusa
mymodel = lm(divorce ~ year + unemployed + femlab + marriage + birth +
military, data=mydata)
summary(mymodel)
.
.
.
y = model.matrix(divorce~.,mydata)
Can anyone help with the code for my x variable? I'm very new to R and finding it very hard to understand it.
Your y = model.matrix(divorce~.,mydata) actually created your predictor matrix (usually called X). Try
X = model.matrix(divorce~.,mydata)
y = mydata$divorce
glmnet(X,y)
glmnet(X,y,alpha=0,lambda=0)
I think if you set lambda=0 you're actually doing ordinary regression (i.e., you're setting the penalty to zero, so ridge -> OLS).

How to find the value of a covariate specific to .5 probability on Logistical Regression

So, I have a binomial glm function, with two predictors, the second (as factor) being with two levels (50, 250).
model <- glm(GetResp.RESP ~ speed + PadLen, family = binomial(link = "logit"), data = myData)
The plot for it looks like this:
My question: How can I find the covariate (ball speed) specific to the .5 probability for each level of the second predictor?
For example, I've tried using the function dose.p(), from the package 'MASS':
dose.p(model, p = 0.5)
and I get
p = 0.5: 36.9868
which, by just looking at the plot, it would be the value for the first (50) level. Now, how can I find it for the second (250) level as well?
Thank you.
dput(myData):
https://pastebin.com/7QYXwwa4
Since this is a logistic regression, you're fitting the function:
log(p/(1-p)) = b0 + b1*speed + b2*PadLen
where p is the probability of GetResp.RESP being equal to 1, b0, b1, and b2 are the regression coefficients, and PadLen is a dummy variable equal to zero when myData$PadLen is 50 and equal to 1 when myData$PadLen is 250.
So you can solve for the speed at p = 0.5:
log(1) = b0 + b1*speed + b2*PadLen
b1*speed = log(1) - b0 - b2*PadLen
speed = (log(1) - b0 - b2*PadLen)/b1
Since log(1) = 0, this reduces to:
speed = (-b0 - b2*c(0,1))/b1
Or, putting in the actual coefficient values:
speed = (-coef(model)[1] - coef(model)[3]*c(0,1))/coef(model)[2]
To solve for speed at other probabilities, just keep the log-odds factor in the equation and enter whatever value of p you desire:
speed = (log(p/(1-p)) - coef(model)[1] - coef(model)[3]*c(0,1))/coef(model)[2]

R: Stepwise regression with multiple orthogonal polynomials

I am currently looking for an "optimal" fit for some data. I would like to use AIC stepwise regression to find the "best" polynomial regression for my outcome (y) with three variables (a, b, c) and maximum ^3. I also have interactions.
If I use:
lm_poly <- lm(y ~ a + I(a^2) + I(a^3) + b + I(b^2) + I(b^3) + c + a:b, my_data)
stepAIC(lm_poly, direction = "both")
I will get collinearities due to the use of I(i^j)-terms. This shows in the beta regression coefficients of the final fit. There are terms >|1|.
Is there a possibility to do stepwise regression with orthogonal terms?
Using poly() would be nice, but I just don't understand how to do stepwise regression with poly().
lm_poly2 <- lm(y ~ poly(a,3) + poly(b,3) + c + a:b, my_data)
stepAIC(lm_poly2, direction = "both")
This will not include steps with a, a^2(and b respectivly) and thus not find the results I am looking for.
(I know, that I might still have collinearities do to the interaction a:b.)
I hope, someone can understand my point.
Thank you in advance.
Jans

ARIMA model with nonlinear exogenous variable in R

I'm doing a non-linear regression in R and want to add one moving-average term to my model to eliminate the autocorrelations in residuals.
Basically, here is the model:
y[n] = a + log((x1[n])^g + (x2[n])^g) + c*e[n-1] + e[n]
where [e] is the moving average term.
I plan to use ARIMA(0, 0, 1) to model residuals. However, I do not know which function I should use in R to add non-linear exogenous part to ARIMA model.
More information: I know how to use nls command to estimate a and g, but do not know how to deal with e[n].
I know that xreg in arima can handle ARIMA model with linear exogenous variables. Is there a similar function to handle ARIMA model with nonlinear exogenous variables?
Thank you for the help in advance!
nlme has such capability, as it is fitting non-linear mixed models. You can think of it an extension to nls (a fixed-effect only non-linear regression), by allowing random effect and correlated errors.
nlme can handle ARMA correlation, by something like correlation = corARMA(0.2, ~ 1, p = 0, q = 1, fixed = TRUE). This means, that residuals are MA(1) process, with initial guess of coefficient 0.2, but to be updated during model fitting. The ~ 1 suggests that MA(1) is on intercept and there is no further grouping structure.
I am not an expert in nlme, but I know nlme is what you need. I produce the following example, but since I am not an expert, I can't get nlme work at the moment. I post it here to give a start / flavour.
set.seed(0)
x1 <- runif(100)
x2 <- runif(100)
## MA(1) correlated error, with innovation standard deviation 0.1
e <- arima.sim(model = list(ma = 0.5), n = 100, sd = 0.1)
## a true model, with `a = 0.2, g = 0.5`
y0 <- 0.2 + log(x1 ^ 0.5 + x2 ^ 0.5)
## observations
y <- y0 + e
## no need to install; it comes with R; just `library()` it
library(nlme)
fit <- nlme(y ~ a + log(x1 ^ g + x2 ^ g), fixed = a + g ~ 1,
start = list(a = 0.5, g = 1),
correlation = corARMA(0.2, form = ~ 1, p = 0, q = 1, fixed = FALSE))
Similar to nls, we have an overall model formula y ~ a + log(x1 ^ g + x2 ^ g), and starting values are required for iteration process. I have chosen start = list(a = 0.5, g = 1). The correlation bit has been explained in the beginning.
fixed and random arguments in nlme specify what should be seen as fixed effects and random effects in the overall formula. Since we have no random effect, we leave it unspecified. We want a and g as fixed effect, so I tried something like fixed = a + g ~ 1. Unfortunately it does not quite work, for some reason I don't know. I read the ?nlme, and thought this formula means that we want a common a and g for all observations, but later nlme reports an error saying this is not a valid group formula.
I am also investing at this; as I said, the above gives us a start. We are already fairly close to the final answer.
Thanks to user20650 for point out my awkward error. I should use gnls function rather than nlme. By design nature of nlme package, functions lme and nlme have to take a random argument to work. Luckily, there are several other routines in nlme package for extending linear models and non-linear models.
gls and gnls extend lm and nls by allowing non-diagonal variance functions.
So, I should really use gnls instead:
## no `fixed` argument as `gnls` is a fixed-effect only
fit <- gnls(y ~ a + log(x1 ^ g + x2 ^ g), start = list(a = 0.5, g = 1),
correlation = corARMA(0.2, form = ~ 1, p = 0, q = 1, fixed = FALSE))
#Generalized nonlinear least squares fit
# Model: y ~ a + log(x1^g + x2^g)
# Data: NULL
# Log-likelihood: 92.44078
#
#Coefficients:
# a g
#0.1915396 0.5007640
#
#Correlation Structure: ARMA(0,1)
# Formula: ~1
# Parameter estimate(s):
# Theta1
#0.4184961
#Degrees of freedom: 100 total; 98 residual
#Residual standard error: 0.1050295

`rms::ols()`: how to fit a model without intercept

I'd like to use the ols() (ordinary least squares) function from the rms package to do a multivariate linear regression, but I would not like it to calculate the intercept. Using lm() the syntax would be like:
model <- lm(formula = z ~ 0 + x + y, data = myData)
where the 0 stops it from calculating an intercept, and only two coefficients are returned, on for x and the other for y. How do I do this when using ols()?
Trying
model <- ols(formula = z ~ 0 + x + y, data = myData)
did not work, it still returns an intercept and a coefficient each for x and y.
Here is a link to a csv file
It has five columns. For this example, can only use the first three columns:
model <- ols(formula = CorrEn ~ intEn_anti_ncp + intEn_par_ncp, data = ccd)
Thanks!
rms::ols uses rms:::Design instead of model.frame.default. Design is called with the default of intercept = 1, so there is no (obvious) way to specify that there is no intercept. I assume there is a good reason for this, but you can try changing ols using trace.

Resources