object not found when export model summary table in R - r

I'm trying to export the model summary data into excel in a loop. I need to export 2 variables' coefficients (variables gain & loss) and I have successfully written the coefficients of the intercept and variable 1, but R tells me the object of the third variable is not found.
My codes: run the model by participant number PID, PIDs is the list of PID.
for (i in 1: length(PIDs)) {
subject<-df[df$PID == PIDs[i],]
myModel <- glm(gamble~Gain + Loss, data = subject, family=binomial)
summ <- summary(myModel)
#save results
ID[i] <- subject$PID
intercept_coef[i]<-summ$coefficients[1,1]
gain_coef[i]<-summ$coefficients[2,1]
loss_coef[i]<-summ$coefficients[3,1]
}
The coefficients summary table looks like below, I notice that the table is off, as the headers are not corresponding to each column. May be that's the issue?
Estimate Std. Error z value Pr(>|z|)
(Intercept) 13.4214135 3353.1375049 0.004002643 0.9968064
Gain 0.2929938 0.1635471 1.791494960 0.0732139
Loss 8.3144005 1619.8731372 0.005132748 0.9959047
Error:
occurrednumber of items to replace is not a multiple of replacement length
Error in loss_coef[i] <- summ$coefficients[3, 1] :
object 'loss_coef' not found
What is the issue here? I can get Intercept and Gain all fine.
Thanks!

by any chance you didn't initialize losscoef (or loss_coef, maybe check the typing) ?

The variables are required to initialized before we store values in the loop.
To initialized:
ID <- 0
intercept_coef <- 0
gain_coef <- 0
loss_coef <- 0
#loop

Related

Store the output from a test in new variable, vector, etc

I am using adf.test on my data to check for stationarity. I wish to store the pvalue of the test in a new variable so that I can use it for further processes.
Basically I want to do something like this:
x <- adf.test(Timeseries_1)$pvalue
But this is not working! Any help?
In continuation to the above, I am trying to extract the value of MAPE from the accuracy check but am getting the following error.
> etsfit <- ets(TS_1)
> accuracy(etsfit)
> if(accuracy(etsfit)$MAPE<10){
+ fcast <- forecast(etsfit)
+ plot(fcast)}else{print("Transformation needed")}
Error in accuracy(etsfit)$MAPE : $ operator is invalid for atomic vectors
> if(accuracy(etsfit)["MAPE"]<10){
+ fcast <- forecast(etsfit)
+ plot(fcast)}else{print("Transformation needed")}
Error in if (accuracy(etsfit)["MAPE"] < 10) {:missing value where TRUE/FALSE needed
figured out the answer. Thought it would be useful.
names(adf.test(Timeseries_1)) #to extract the pvalue of this test
if(adf.test(Timeseries_1)$p.value < 0.05){print("Time series is stationary")}

Error when using msmFit in R

I'm trying to simulate this paper (Point Forecast Markov Switching Model for U.S. Dollar/ Euro Exchange Rate, by Hamidreza Mostafei) in R. The table that I'm trying to get is on page 483. Here is a link to a pdf.
I wrote the following codes and then got an error at the last line:
mydata <- read.csv("C:\\Users\\User\\Downloads\\EURUSD_2.csv", header=T)
mod <- lm(EURUSD~EURUSD.1, mydata)
mod.mswm = msmFit(mod, k=2, p=1, sw=c(T,T,T,T), control=list(parallel=F))
Error in if ((max(abs(object["Fit"]["logLikel"] - oldll))/(0.1 + max(abs(object["Fit"]["logLikel"]))) < :
missing value where TRUE/FALSE needed
Basically the data that's being used is EURUSD, which is the level change in monthly frequency. EURUSD.1 is the one lag variable. Both EURUSD and EURUSD.1 are in my csv file. (I'm not sure how to attach the csv file here. If someone could point that out that would be great).
I changed the EURUSD.1 values to something random and msmFit function seemed to work. But whenever I tried using the original value, i.e. the lag value, the error came out.
Something degenerate is happening when one variable is simply lagged from the other. Consider:
Sample data frame where Y is lagged X:
> d = data.frame(X=runif(100))
> d$Y=c(.5, d$X[-100])
> mod <- lm(X~Y,d)
> mod.mswm = msmFit(mod, k=2, p=1, sw=c(T,T,T,T), control=list(parallel=F))
Error in if ((max(abs(object["Fit"]["logLikel"] - oldll))/(0.1 + max(abs(object["Fit"]["logLikel"]))) < :
missing value where TRUE/FALSE needed
that gives your error. Let's add a tiny tiny bit of noise to Y and see what happens:
> d$Y=d$Y+rnorm(100,0,.000001)
> mod <- lm(X~Y,d)
> mod.mswm = msmFit(mod, k=2, p=1, sw=c(T,T,T,T), control=list(parallel=F))
> mod.mswm
Markov Switching Model
Call: msmFit(object = mod, k = 2, sw = c(T, T, T, T), p = 1, control = list(parallel = F))
AIC BIC logLik
4.3109 47.45234 3.84455
Coefficients:
(Intercept)(S) Y(S) X_1(S) Std(S)
Model 1 0.8739622 -22948.89 22948.83 0.08194545
Model 2 0.4220748 77625.21 -77625.17 0.21780764
Transition probabilities:
Regime 1 Regime 2
Regime 1 0.3707261 0.3886715
Regime 2 0.6292739 0.6113285
It works! Now either:
Having perfectly lagged variables causes some "divide by zero" error because its a purely degenerate case (like having perfectly co-linear variables in a linear model). A little experimenting shows that in this case the resulting output is very sensitive to how much noise you add, so I'm thinking its on a knife-edge here. I suspect having perfectly lagged variables here leads to some singularity or degeneracy.
or
There's some bug in the function.
I have no idea what msmFit does, so that's for you to sort out.

Excluding an intercept in regsubsets (leaps package)?

I am running some model averaging procedures using the output from the regsubsets command from the leaps package. Once I exclude an intercept, I get an error message that I cannot make sense of:
Reordering variables and trying again: Error in if
(any(index[force.out] == -1)) stop("Can't force the same variable in
and out") : missing value where TRUE/FALSE needed
This problem seems to occur only once my predictor matrix has more columns than the dependent variable has observations (which is one of the reasons for using leaps in the first place). See the example code below:
# Load the package --------------------------------------------------------
require(stats)
require(leaps)
# Some artificial data ----------------------------------------------------
y <- rnorm(20)
x1 <- rnorm(20*20)
dim(x1) <- c(20,20)
x2 <- rnorm(20*21)
dim(x2) <- c(20,21)
# Allow intercept ---------------------------------------------------------
summary(regsubsets(x1,y))$which
summary(regsubsets(x2,y))$which
# Without intercept -------------------------------------------------------
summary(regsubsets(x1,y,intercept=FALSE))$which
summary(regsubsets(x2,y,intercept=FALSE))$which
This usually happens when you have a Linear Dependency in the input variables - you should see a warning , when you run it with Intercept = T.
When you remove the linear dependent column from input predictor , you will be able to run regsubsetsObj with intercept = F . You will have manually remove the linearly dependent column . Its usually a derived column, calculated from existing metrics.

Making linear models in a for loop using R programming

I have a dataset that I'll call dataset1 with a predictor variable (e.g. Price). I'm hoping to get a nice single predictor of price based on the n other predictors that exist in the dataset. But if n is large, I can't manually make and examine all these models, so I was hoping to use something like this:
for (i in names(dataset1)) {
model = lm(Price~i, dataset1)
# Do stuff here with model, such as analyze R^2 values.
}
(I thought this would work since replacing the inside of the for loop with print(i) results in the correct names.) The error is as follows:
Error in model.frame.default(formula = Price ~ i, data = dataset1, drop.unused.levels = TRUE) :
variable lengths differ (found for 'i')
Does anyone have advice for dealing with the problem regarding how R reads in the i variable? I know how to approach this problem using other software, but I would like to get a sense of how R works.
I would go for some sort of *apply here personally:
dat <- data.frame(price=1:10,y=10:1,z=1:10)
sapply(dat[2:3], function(q) coef(summary(lm(dat$price ~ q)))[2])
y z
-1 1
or to get a list with full model results:
lapply(dat[2:3], function(q) coef(summary(lm(dat$price ~ q))))
$y
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11 1.137008e-15 9.674515e+15 1.459433e-125
q -1 1.832454e-16 -5.457163e+15 1.423911e-123
$z
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.123467e-15 2.457583e-16 4.571429e+00 1.822371e-03
q 1.000000e+00 3.960754e-17 2.524772e+16 6.783304e-129
to get the r-squared value as you mentioned:
sapply(dat[2:3], function(q) summary(lm(dat$price ~ q))$r.squared)
At the moment you're not cycling through the names. Try
for(i in 2:ncol(dataset1)) #assuming Price is column 1
Then refer to
Price ~ dataset1[, i]
in your loop.
But I'm not sure about your approach from a stats perspective.

use stepAIC on a list of models

I want to do stepwise regression using AIC on a list of linear models. idea is to use e a list of linear models and then apply stepAIC on each list element. It fails.
I tried to track the problem down. I think I found the problem. However, I don't understand the cause. Try the code to see the difference between three cases:
require(MASS)
n<-30
x1<-rnorm(n, mean=0, sd=1) #create rv x1
x2<-rnorm(n, mean=1, sd=1)
x3<-rnorm(n, mean=2, sd=1)
epsilon<-rnorm(n,mean=0,sd=1) # random error variable
dat<-as.data.frame(cbind(x1,x2,x3,epsilon)) # combine to a data frame
dat$id<-c(rep(1,10),rep(2,10),rep(3,10))
# y is combination from all three x and a random uniform variable
dat$y<-x1+x2+x3+epsilon
# apply lm() only resulting in a list of models
dat.lin.model.lst<-lapply(split(dat,dat$id),function(d) lm(y~x1+x2+x3,data=d))
stepAIC(dat.lin.model.lst[[1]]) # FAIL!!!
# apply function stepAIC(lm())- works
dat.lin.model.stepAIC.lst<-lapply(split(dat,dat$id),function(d) stepAIC(lm(y~x1+x2+x3,data=d)))
# create model for particular group with id==1
k<-which(dat$id==1) # manually select records with id==1
lin.model.id1<-lm(dat$y[k]~dat$x1[k]+dat$x2[k]+dat$x3[k])
stepAIC(lin.model.id1) # check stepAIC - works!
I am pretty sure that stepAIC() needs the original data from data.frame "dat". That is what I was thinking of before. (Hope I am right on that)
But there is no parameter in stepAIC() where I can pass the original data frame. Obviously, for plain models not wrapped in a list it's enough to pass the model. (last three lines in code) So I am wondering:
Q1: How does stepAIC knows where to find the original data "dat" (not only the model data which is passed as parameter)?
Q2: How can I possibly know that there is another parameter in stepAIC() which is not explicitly stated in the help pages? (maybe my English is just too bad to find)
Q3: How can I pass that parameter to stepAIC()?
It must be somewhere in the environment of the apply function and passing on the data. Either lm() or stepAIC() and the pointer/link to the raw data must get lost somewhere. I have not a good understanding what an environment in R does. For me it was kind of isolating local from global variables. But maybe its more complicated. Anyone who can explain that to me in regard to the problem above? Honestly, I dont read much out of the R documentation. Any better understanding would help me.
OLD:
I have data in a dataframe df that can be split into several subgroups. For that purpose I created a groupID called df$id. lm() returns the coefficent as expected for the first subgroup. I want to do a stepwise regression using AIC as criterion for each subgroup separately. I use lmList {lme4} which results in a model for each subgroup (id). But if I use stepAIC{MASS} for the list elements it throws an error. see below.
So the question is: What mistake is in my procedure/syntax? I get results for single models but not the ones created with lmList. Does lmList() store different information on the model than lm() does?
But in the help it states:
class "lmList": A list of objects of class lm with a common model.
>lme4.list.lm<-lmList(formula=Scherkraft.N~Gap.um+Standoff.um+Voidflaeche.px |df$id,data = df)
>lme4.list.lm[[1]]
Call: lm(formula = formula, data = data)
Coefficients:
(Intercept) Gap.um Standoff.um Voidflaeche.px
62.306133 -0.009878 0.026317 -0.015048
>stepAIC(lme4.list.lm[[1]], direction="backward")
#stepAIC on first element on the list of linear models
Start: AIC=295.12
Scherkraft.N ~ Gap.um + Standoff.um + Voidflaeche.px
Df Sum of Sq RSS AIC
- Standoff.um 1 2.81 7187.3 293.14
- Gap.um 1 29.55 7214.0 293.37
<none> 7184.4 295.12
- Voidflaeche.px 1 604.38 7788.8 297.97
Error in terms.formula(formula, data = data) :
'data' argument is of the wrong type
Obviously something does not work with the list. But I have not an idea what it might be.
Since I tried to do the same with the base package which creates the same model (at least the same coefficients). Results are below:
>lin.model<-lm(Scherkraft.N ~ Gap.um + Standoff.um + Voidflaeche.px,df[which(df$id==1),])
# id is in order, so should be the same subgroup as for the first list element in lmList
Coefficients:
(Intercept) Gap.um Standoff.um Voidflaeche.px
62.306133 -0.009878 0.026317 -0.015048
Well, this is what I get returned using stepAIC on my linear.model .
As far as I know the akaike information criterion can be used to estimate which model better balances between fit and generalization given some data.
>stepAIC(lin.model,direction="backward")
Start: AIC=295.12
Scherkraft.N ~ Gap.um + Standoff.um + Voidflaeche.px
Df Sum of Sq RSS AIC
- Standoff.um 1 2.81 7187.3 293.14
- Gap.um 1 29.55 7214.0 293.37
<none> 7184.4 295.12
- Voidflaeche.px 1 604.38 7788.8 297.97
Step: AIC=293.14
Scherkraft.N ~ Gap.um + Voidflaeche.px
Df Sum of Sq RSS AIC
- Gap.um 1 28.51 7215.8 291.38
<none> 7187.3 293.14
- Voidflaeche.px 1 717.63 7904.9 296.85
Step: AIC=291.38
Scherkraft.N ~ Voidflaeche.px
Df Sum of Sq RSS AIC
<none> 7215.8 291.38
- Voidflaeche.px 1 795.46 8011.2 295.65
Call: lm(formula = Scherkraft.N ~ Voidflaeche.px, data = df[which(df$id == 1), ])
Coefficients:
(Intercept) Voidflaeche.px
71.7183 -0.0151
I read from the output I should use the model: Scherkraft.N ~ Voidflaeche.px because this is the minimal AIC. Well, it would be nice if someone could shortly describe the output. My understanding of the stepwise regression (assuming backwards elimination) is all regressors are included in the initial model. Then the least important one is eliminated. The criterion to decide is the AIC. and so forth... Somehow I have problems to get the tables interpreted right. It would be nice if someone could confirm my interpretation. The "-"(minus) stands for the eliminated regressor. On top is the "start" model and in the table table below the RSS and AIC are calculated for possible eliminations. So the first row in the first table says a model Scherkraft.N~Gap.um+Standoff.um+Voidflaeche.px - Standoff.um would result in an AIC 293.14. Choose the one without Standoff.um: Scherkraft.N~Gap.um+Voidflaeche.px
EDIT:
I replaced the lmList{lme4} with dlply() to create the list of models.
Still stepAIC is not coping with the list. It throws another error. Actually, I believe it is a problem with the data stepAIC needs to run through. I was wondering how it calculates the AIC-value for each step just from the model data. I would take the original data to construct the models leaving one regressor out each time. Thereof I would calculate the AIC and compare. So how stepAIC is working if it has not access to the original data. (I cant see a parameter where I pass the original data to stepAIC). Still, I have no clue why it works with a plain model but not with the model wrapped in a list.
>model.list.all <- dlply(df, .id, function(x)
{return(lm(Scherkraft.N~Gap.um+Standoff.um+Voidflaeche.px,data=x)) })
>stepAIC(model.list.all[[1]])
Start: AIC=295.12
Scherkraft.N ~ Gap.um + Standoff.um + Voidflaeche.px
Df Sum of Sq RSS AIC
- Standoff.um 1 2.81 7187.3 293.14
- Gap.um 1 29.55 7214.0 293.37
<none> 7184.4 295.12
- Voidflaeche.px 1 604.38 7788.8 297.97
Error in is.data.frame(data) : object 'x' not found
I'm not sure what may have changed in the versioning to make the debugging so difficult, but one solution would be to use do.call, which evaluates the expressions in the call before executing it. This means that instead of storing just d in the call, so that update and stepAIC need to go find d in order to do their work, it stores a full representation of the data frame itself.
That is, do
do.call("lm", list(y~x1+x2+x3, data=d))
instead of
lm(y~x1+x2+x3, data=d)
You can see what it's trying to do by looking at the call element of the model, perhaps like this:
dat.lin.model.lst <- lapply(split(dat, dat$id), function(d)
do.call("lm", list(y~x1+x2+x3, data=d)) )
dat.lin.model.lst[[1]]$call
It's also possible to make your list of data frames in the global environment and then construct the call so that update and stepAIC look for each data frame in turn, because their environment chains always lead back to the global environment; like this:
dats <- split(dat, dat$id)
dat.lin.model.list <- lapply(seq_along(dats), function(d)
do.call("lm", list(y~x1+x2+x3, data=call("[[", quote(dats),i))) )
To see what's changed, run dat.lin.model.lst[[1]]$call again.
As it seems that stepAIC goes out of loop environment (that is in global environment) to look for the data it needs, I trick it using the assign function:
results <- do.call(rbind, lapply(response, function (i) {
assign("i", response, envir = .GlobalEnv)
mdl <- gls(as.formula(paste0(i,"~",paste(expvar, collapse = "+")), data= parevt, correlation = corARMA(p=1,q=1,form= ~as.integer(Year)), weights= varIdent(~1/Linf_var), method="ML")
mdl <- stepAIC(mdl, direction ="backward")
}))

Resources