R Variable Length Differ when build linear model for residuals - r

I am working on a problem where I want to build a linear model using residuals of two other linear models. I have used UN3 data set to show my problem since its easy put the problem here than using my actual data set.
Here is my R code:
head(UN3)
m1.lgFert.purban <- lm(log(Fertility) ~ Purban, data=UN3)
m2.lgPPgdp.purban <- lm(log(PPgdp) ~ Purban, data=UN3)
m3 <- lm(residuals(m1.lgFert.purban) ~ residuals(m2.lgPPgdp.purban))
Here is the error I am getting:
> m3 <- lm(residuals(m1.lgFert.purban) ~ residuals(m2.lgPPgdp.purban))
Error in model.frame.default(formula = residuals(m1.lgFert.purban) ~ residuals(m2.lgPPgdp.purban), :
variable lengths differ (found for 'residuals(m2.lgPPgdp.purban)')
I am not really understanding the why this error actually take place. If it was log related issue then I should have gotten the error when I am building first two models.

Your default na.action is most likely na.omit (check with options("na.action")). This means that NA values get removed silently, resulting in different lengths of the residuals vectors. You probably want to use na.action="na.exclude", which pads the residuals with NAs.
library(alr3)
options("na.action")
#$na.action
#[1] "na.omit"
m1.lgFert.purban <- lm(log(Fertility) ~ Purban, data=UN3,na.action="na.exclude")
m2.lgPPgdp.purban <- lm(log(PPgdp) ~ Purban, data=UN3,na.action="na.exclude")
m3 <- lm(residuals(m1.lgFert.purban) ~ residuals(m2.lgPPgdp.purban))
#Coefficients:
# (Intercept) residuals(m2.lgPPgdp.purban)
# -0.01245 -0.18127

Related

MuMIn dredge gam error using default na.omit

I have a global model I'm trying to dredge, but I keep getting the error "Error in dredge(myglobalmod, evaluate = TRUE, trace = 2) :
'global.model' uses 'na.action' = "na.omit"
I tried running the global model with na.action="na.omit" within the gam() call and leaving it out (since it's the default).
myglobalmod <- gam(response~ s(x1) + s(x2) + s(x3) + offset(x4), data=mydata, family="tw", na.action="na.omit")
options(na.action=na.omit)
mydredge <- dredge(myglobalmod, evaluate=TRUE, trace=2)
When I didn't include na.action="na.omit" within the gam, I got a similar error.
I then tried with a subset of the data that has all the NA rows removed, but same error.
I've gotten dredge to work before so I'm not sure why it doesn't like the na.omit now, I'm using the same code.
MuMIn insists that you use na.action = na.fail, in order to ensure that the same data set is used for every model (if NA values were left in the data set, different subsets could be used for different models depending on which variables were used). You can use na.omit(mydata) or mydata[complete.cases(mydata), ] to get rid of NA values before you start (assuming that the NA values in your data set occur only in variables you will be using for the full model).
> library(MuMIn)
> m1 <- lm(mpg ~ ., data = mtcars)
> d0 <- dredge(m1)
Error in dredge(m1) :
'global.model''s 'na.action' argument is not set and options('na.action') is "na.omit"
> m1 <- lm(mpg ~ ., data = mtcars, na.action = na.fail)
> d1 <- dredge(m1)
Fixed term is "(Intercept)"

Finding specified number of predictors using stepwise regression

I am trying to find limited number of predictors (max=6) among 104 variables. So, I am using stepwise regression (for each variable I have 10532 values). I tried MATLAB:
mdl = stepwiselm(Pr, obs,'PEnter', 0.06)
However, it gave me about 70 variable.
Also, I tried to solve the problem using R package leaps:
b <- leaps::regsubsets(obs ~ ., data=Pr, nbest=1, nvmax=6)
I get the error below:
"Error in leaps.exhaustive(a, really.big) :
Exhaustive search will be S L O W, must specify really.big=T"
I know it should be an easy way to solve this problem, but I cannot seem to figure out the proper formatting.
Thank you in advance.
Use
leaps::regsubsets(obs ~ ., data=Pr, nbest=1, nvmax=6, really.big=T)
or you can try
library(MASS)
# Fit the full model
full.model <- lm(obs ~ ., data=Pr)
# Stepwise regression model
step.model <- stepAIC(full.model, direction = "both",
trace = FALSE)
summary(step.model)

Compare two regression models in R

age25=subset(juul,juul[,"age"]>25.00)## create a subset of age greater than 25
modelgf=lm(age25[,"igf1"]~age25[,"age"])
age20=subset(juul,juul[,"age"]<20.00)
modelgf2=lm(age20[,"igf1"]~age20[,"age"])
I tried to compare the modelgf and modelgf2 models using anova(m1,m2). However, I get a warning message:
In anova.lmlist(object, ...) :
models with response ‘"age20[, \"igf1\"]"’ removed because response differs from model 1
Are there any other ways to compare these two models?
Here you go:
# Dummy for Age>25
juul[,"ageCat25"] <- juul[,"ageCat"] > 25.00
# Collinear dummy for Age<20
juul[,"ageCat20"] <- ifelse(!juul[,"ageCat25"] & juul[,"age"]<20.00, TRUE, juul[,"ageCat25"])
m1 <- lm(foo ~ ageCat25, juul)
m2 <- lm(foo ~ ageCat20, juul)
anova(m1,m2)
Interpretation left to the OP.

Creating a regression model with filter on one of the variables

I used iris data and I tried to build a regression model with a filter on one of the variables.
data(iris)
Here is my model - I wanted to see the regression results when iris$Sepal.Width>=3.0:
gg1<-lm( iris$Sepal.Length~ iris$Sepal.Width[which(iris$Sepal.Width>=3.0)])
however I got this output from R:
Error in model.frame.default(formula = iris$Sepal.Length ~
iris$Sepal.Width[which(iris$Sepal.Width >= : variable lengths differ
(found for 'iris$Sepal.Width[which(iris$Sepal.Width >= 3)]')
Any Ideas how can i set the regression right?
Thats because the part of your formula: iris$Sepal.Length isn't filtered by Sepal Width, which is why the error is telling you that your variable lengths differ.
You need to filter both:
filtered <- iris[which(iris$Sepal.Width>=3.0),]
gg1 <- lm(filtered$Sepal.Length ~ filtered$Sepal.Width)

Updating data in lm() calls

Is there is an equivalent to update for the data part of an lm call object?
For example, say i have the following model:
dd = data.frame(y=rnorm(100),x1=rnorm(100))
Model_all <- lm(formula = y ~ x1, data = dd)
Is there a way of operating on the lm object to have the equivalent effect of:
Model_1t50 <- lm(formula = y ~ x1, data = dd[1:50,])
I am trying to construct some psudo out of sample forecast tests, and it would be very convenient to have a single lm object and to simply roll the data.
I'm fairly certain that update actually does what you want!
example(lm)
dat1 <- data.frame(group,weight)
lm1 <- lm(weight ~ group, data=dat1)
dat2 <- data.frame(group,weight=2*weight)
lm2 <- update(lm1,data=dat2)
coef(lm1)
##(Intercept) groupTrt
## 5.032 -0.371
coef(lm2)
## (Intercept) groupTrt
## 10.064 -0.742
If you're hoping for an effiency gain from this, you'll be disappointed -- R just substitutes the new arguments and re-evaluates the call (see the code of update.default). But it does make the code a lot cleaner ...
biglm objects can be updated to include more data, but not less. So you could do this in the opposite order, starting with less data and adding more. See http://cran.r-project.org/web/packages/biglm/biglm.pdf
However, I suspect you're interested in parameters estimated for subpopulations (ie if rows 1:50 correspond to level "a" of factor variable factrvar. In this case, you should use interaction in your formula (~factrvar*x1) rather than subsetting to data[1:50,]. Interaction of this type will give different effect estimates for each level of factrvar. This is more efficient than estimating each parameter separately and will constrain any additional parameters (ie, x2 in ~factrvar*x1 + x2) to be the same across values of factrvar--if you estimated the same model multiple times to different subsets, x2 would receive a separate parameter estimate each time.

Resources