Error when using msmFit in R - r

I'm trying to simulate this paper (Point Forecast Markov Switching Model for U.S. Dollar/ Euro Exchange Rate, by Hamidreza Mostafei) in R. The table that I'm trying to get is on page 483. Here is a link to a pdf.
I wrote the following codes and then got an error at the last line:
mydata <- read.csv("C:\\Users\\User\\Downloads\\EURUSD_2.csv", header=T)
mod <- lm(EURUSD~EURUSD.1, mydata)
mod.mswm = msmFit(mod, k=2, p=1, sw=c(T,T,T,T), control=list(parallel=F))
Error in if ((max(abs(object["Fit"]["logLikel"] - oldll))/(0.1 + max(abs(object["Fit"]["logLikel"]))) < :
missing value where TRUE/FALSE needed
Basically the data that's being used is EURUSD, which is the level change in monthly frequency. EURUSD.1 is the one lag variable. Both EURUSD and EURUSD.1 are in my csv file. (I'm not sure how to attach the csv file here. If someone could point that out that would be great).
I changed the EURUSD.1 values to something random and msmFit function seemed to work. But whenever I tried using the original value, i.e. the lag value, the error came out.

Something degenerate is happening when one variable is simply lagged from the other. Consider:
Sample data frame where Y is lagged X:
> d = data.frame(X=runif(100))
> d$Y=c(.5, d$X[-100])
> mod <- lm(X~Y,d)
> mod.mswm = msmFit(mod, k=2, p=1, sw=c(T,T,T,T), control=list(parallel=F))
Error in if ((max(abs(object["Fit"]["logLikel"] - oldll))/(0.1 + max(abs(object["Fit"]["logLikel"]))) < :
missing value where TRUE/FALSE needed
that gives your error. Let's add a tiny tiny bit of noise to Y and see what happens:
> d$Y=d$Y+rnorm(100,0,.000001)
> mod <- lm(X~Y,d)
> mod.mswm = msmFit(mod, k=2, p=1, sw=c(T,T,T,T), control=list(parallel=F))
> mod.mswm
Markov Switching Model
Call: msmFit(object = mod, k = 2, sw = c(T, T, T, T), p = 1, control = list(parallel = F))
AIC BIC logLik
4.3109 47.45234 3.84455
Coefficients:
(Intercept)(S) Y(S) X_1(S) Std(S)
Model 1 0.8739622 -22948.89 22948.83 0.08194545
Model 2 0.4220748 77625.21 -77625.17 0.21780764
Transition probabilities:
Regime 1 Regime 2
Regime 1 0.3707261 0.3886715
Regime 2 0.6292739 0.6113285
It works! Now either:
Having perfectly lagged variables causes some "divide by zero" error because its a purely degenerate case (like having perfectly co-linear variables in a linear model). A little experimenting shows that in this case the resulting output is very sensitive to how much noise you add, so I'm thinking its on a knife-edge here. I suspect having perfectly lagged variables here leads to some singularity or degeneracy.
or
There's some bug in the function.
I have no idea what msmFit does, so that's for you to sort out.

Related

Error 'variable lengths differ' while doing Hoslem.test() in R

I am trying to do a hoslem.test() and ROC curve but I am struggling with a constant error. I am obtaining this error:
Error in model.frame.default(formula = cbind(y0 = 1 - y, y1 = y) ~ cutyhat) : variable lengths differ (found for 'cutyhat')
Firstly, I tried to see if the lengths from montevil$icPM10 and nrow(montevil) were the same. They are (8790). After that, I tried to omit NA values from the logistic regression model. Nothing happened. I saw other websites, and people answered by coding the na.omit() clause in the model.
This is my code:
montevil<-read.csv("Montevil.csv")
#Hoslem.test
icPM10<-as.factor(montevil$icPM10)
b <- glm(formula=icPM10 ~ RS_re+vv+PRB+month, data = montevil, family = "binomial"(link=logit))
#install.packages("ResourceSelection")
library(ResourceSelection)
hoslem.test(icPM10, fitted(b), g=10)
#ROC curve
#This 'prob' variable was calculated in order to get the ROC curve
log.df <- data.frame(vv=0.6, RS_re="normal_alta", PRB=1013, month="08")
prob<-predict( b, newdata = log.df,type="response" )
#install.packages("pROC")
library(pROC)
r=roc(icPM10,prob, data=montevil)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
#plot (r)
Here is a link to the csv:
https://drive.google.com/file/d/1ap0Y-QMizgjKf1IB7mm_woJzEGEhlKPl/view?usp=sharing
EDIT
I could figure out (I am not sure if this is OK) that I had to input this for the hoslem.test():
hoslem.test(b$y, fitted(b), g=10)
But I am still struggling with the ROC Curve.

Using the panel regression on Hedonic data using plm package in R

I am trying to run the panel regression for unbalanced panel in R using the plm package. I am using the 'Hedonic' data to run the same.
I was trying to replicate something similar that is done in the following paper: http://ftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/vignettes/plm/plmEN.pdf (page 14, 3.2.5 Unbalanced Panel).
My code looks something like this:
form = mv ~ crim + zn + indus + chas + nox + rm + age + dis + rad + tax + ptratio + blacks + lstat
ba = plm(form, data = Hedonic)
However, I am getting the following error on execution:
Error in names(y) <- namesy :
'names' attribute [506] must be the same length as the vector [0]
traceback() yields the following result:
4: pmodel.response.pFormula(formula, data, model = model, effect = effect,
theta = theta)
3: pmodel.response(formula, data, model = model, effect = effect,
theta = theta)
2: plm.fit(formula, data, model, effect, random.method, random.dfcor,
inst.method)
1: plm(form, data = Hedonic)
I am new to panel regression and would be really grateful if someone can help me with this issue.
Thanks.
That paper is ten years old, and I'm not sure plm works like that. The latest docs are here https://cran.r-project.org/web/packages/plm/vignettes/plm.pdf
Your problem arises because, in the docs:
the current version of plm is capable of working with a regular
data.frame without any further transformation, provided that the
individual and time indexes are in the first two columns,
The Hedonic data set does not have individual and time indexes in the first two columns. I'm not sure where the individual and time indexes are in the data, but if I specify townid for the index I at least get something that runs:
> p <- plm(mv~crim,data=Hedonic)
Error in names(y) <- namesy :
'names' attribute [506] must be the same length as the vector [0]
> p <- plm(mv~crim,data=Hedonic, index="townid")
> p
Model Formula: mv ~ crim
Coefficients:
crim
-0.0097455
because when you don't specify id and time indexes it is going to try using the first two columns, and in Hedonic that is giving unique numbers for the id, so the whole model falls apart.
If you look at the examples in help(plm) you might notice that the first two columns in all the data sets define the id and the time.

error Error: step factor reduced below 0.001 without reducing pwrss when using nlmer

I think this could be more of a stats question rather than R question, but I have an error Error: step factor reduced below 0.001 without reducing pwrss when trying to fit a nlmer function to data. My data is:https://www.dropbox.com/s/cri5n7lewhc8j02/chweight.RData?dl=0
I'm trying to fit the model so that I can predict the weight of chicks based on time, for chicks on diet 1. I did the following:
cw1<-subset(ChickWeight, ChickWeight$Diet==1)
m1 <- nlmer(weight~ SSlogis(Time, Asym, xmid, scal) ~ Asym|Chick, cw1, start=c(Asym = 190, xmid = 730, scal = 350))
Could there be other ways to solve this error? I think the error has to do with Asym values but I'm not understanding well what it is doing, so any brief guidance would help.
I have been asked to improve my answer, so here is my attempt to do so.
This error is usually tripped because your start values aren't adequately close to the "true" values, so the optimizer fails to find any local improvements in fit by moving away from them. You need to try providing better starting guesses--this can sometimes be accomplished by algebraically solving the equation at a few points, as described in many places such as this article. Other times, you can plot the data and try to make educated guesses as to what the parameters might be, if you have knowledge of what the parameters "do" within the non-linear function (that is, maybe parameter a represents an asymptote, b is a scaler, c is the mean rate of change, etc.). That's hard for me personally because I have no math background, but I'm often able to make a reasonable guess most of the time.
To answer the question more directly, though, here is some reproducible code that should illustrate that the error in question comes from bad starting guesses.
#Create independentand dependent variables, X and Y, and a grouping variable Z.
xs = rep(1:10, times = 10)
ys = 3 + 2*exp(-0.5*xs)
zs = rep(1:10, each=10)
#Put random noise in X.
for (i in 1:100) {
xs[i] = rnorm(1, xs[i], 2)
}
df1 = data.frame(xs, ys, zs) #Assemble data into data frame.
require(lme4) #Turn on our package.
#Define our custom function--in this case, a three-parameter exponential model.
funct1 = funct1 = deriv(~beta0 + beta1*exp(beta2*xs), namevec=c('beta0',
'beta1', 'beta2'), function.arg=c('xs','beta0', 'beta1','beta2'))
#This will return the exact same error because our starting guesses are way off.
test1 = nlmer(ys ~ funct1(xs, beta0, beta1, beta2) ~ (beta0|zs), data = df1,
start=c(beta0=-50,beta1=200,beta2=3))
#Our starting guesses are much better now, and so nlmer is able to converge this time.
test1 = nlmer(ys ~ funct1(xs, beta0, beta1, beta2) ~ (beta0|zs), data = df1,
start=c(beta0=3.2,beta1=1.8,beta2=-0.3))

R: How to make column of predictions for logistic regression model?

So I have a data set called x. The contents are simple enough to just write out so I'll just outline it here:
the dependent variable, Report, in the first column is binary yes/no (0 = no, 1 = yes)
the subsequent 3 columns are all categorical variables (race.f, sex.f, gender.f) that have all been converted to factors, and they're designated by numbers (e.g. 1= white, 2 = black, etc.)
I have run a logistic regression on x as follows:
glm <- glm(Report ~ race.f + sex.f + gender.f, data=x,
family = binomial(link="logit"))
And I can check the fitted probabilities by looking at summary(glm$fitted).
My question: How do I create a fifth column on the right side of this data set x that will include the predictions (i.e. fitted probabilities) for Report? Of course, I could just insert the glm$fitted as a column, but I'd like to try to write a code that predicts it based on whatever is in the race, sex, gender columns for a more generalized use.
Right now I the follow code which I will hope create a predicted column as well as lower and upper bounds for the confidence interval.
xnew <- cbind(xnew, predict(glm5, newdata = xnew, type = "link", se = TRUE))
xnew <- within(xnew, {
PredictedProb <- plogis(fit)
LL <- plogis(fit - (1.96 * se.fit))
UL <- plogis(fit + (1.96 * se.fit))
})
Unfortunately I get the error:
Error in eval(expr, envir, enclos) : object 'race.f' not found
after the cbind code.
Anyone have any idea?
There appears to be a few typo in your codes; First Xnew calls on glm5 but your model as far as I can see is glm (by the way using glm as name of your output is probably not a good idea). Secondly make sure the variable race.f is actually in the dataset you wish to do the prediction from. My guess is R can't find that variable hence the error.

Making linear models in a for loop using R programming

I have a dataset that I'll call dataset1 with a predictor variable (e.g. Price). I'm hoping to get a nice single predictor of price based on the n other predictors that exist in the dataset. But if n is large, I can't manually make and examine all these models, so I was hoping to use something like this:
for (i in names(dataset1)) {
model = lm(Price~i, dataset1)
# Do stuff here with model, such as analyze R^2 values.
}
(I thought this would work since replacing the inside of the for loop with print(i) results in the correct names.) The error is as follows:
Error in model.frame.default(formula = Price ~ i, data = dataset1, drop.unused.levels = TRUE) :
variable lengths differ (found for 'i')
Does anyone have advice for dealing with the problem regarding how R reads in the i variable? I know how to approach this problem using other software, but I would like to get a sense of how R works.
I would go for some sort of *apply here personally:
dat <- data.frame(price=1:10,y=10:1,z=1:10)
sapply(dat[2:3], function(q) coef(summary(lm(dat$price ~ q)))[2])
y z
-1 1
or to get a list with full model results:
lapply(dat[2:3], function(q) coef(summary(lm(dat$price ~ q))))
$y
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11 1.137008e-15 9.674515e+15 1.459433e-125
q -1 1.832454e-16 -5.457163e+15 1.423911e-123
$z
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.123467e-15 2.457583e-16 4.571429e+00 1.822371e-03
q 1.000000e+00 3.960754e-17 2.524772e+16 6.783304e-129
to get the r-squared value as you mentioned:
sapply(dat[2:3], function(q) summary(lm(dat$price ~ q))$r.squared)
At the moment you're not cycling through the names. Try
for(i in 2:ncol(dataset1)) #assuming Price is column 1
Then refer to
Price ~ dataset1[, i]
in your loop.
But I'm not sure about your approach from a stats perspective.

Resources