Bootstrapping for nlme model - r

I am trying to use bootstrapping to derive errors around my parameter estimate for the fixed effects in the following model. It is simply estimating the number of times an animal might cross a road based on the road's distance from a stream.
When I run the nlme model it does converge and all is well. I have tried several different methods to do the bootstrapping but have been unsuccessful. I have tried both using the boot package and simply developing a bit of code that resamples my data and drops the parameter estimates into new vectors.
Below is my attempt at the later and the resulting error messages. Any help would be greatly appreciated.
bv <- numeric(100)
cv <- numeric(100)
for(i in 1:100){
ss <- sample(1:130, replace=T)
y <- nwfcross[ss]
x <- nwfdist[ss]
modelb <- nlme(y~a*exp(-b*x), fixed=a+b~1,
random=a+b~1|nwfid, start=c(a=300,b=0.016))
bv[i] <- coef(modelb)[1]
cv[i] <- coef(modelb)[2]
}
Error in nlme.formula(y ~ a * exp(-b * x), fixed = a + b ~ 1, random = a + :
Maximum number of iterations reached without convergence
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Warning messages:
1: Singular precision matrix in level -1, block 1
2: Singular precision matrix in level -1, block 1
3: Singular precision matrix in level -1, block 1….

Related

"Error in if (!all(pars < 1e-06)) pars[pars < 1e-06] <- 0" error in a model with depmixS4

Problem
I want to run a latent class analysis with depmixS4 package in r. The problem appears while trying to fit a model with only one class (or state in depmixS4 package). When I try to adjust the model with a dataset of 6000 cases I get the following error. However, when the cases are 5000 there is no problem.
Error in if (!all(pars < 1e-06)) pars[pars < 1e-06] <- 0 :
missing value where TRUE/FALSE needed
Where is the problem? Could someone help me understand why this error occurs?
A reproducible example
CASE A (n = 6000)
The same case also occurs when it comes to aleatory variables. To have a reproducible example, first I generate a dataset (n = 6000) with two random variables (a and b) with two possible values (0 and 1).
library(depmixS4)
#> Loading required package: nnet
#> Loading required package: MASS
#> Loading required package: Rsolnp
a <- sample(0:1, size = 6000, replace = T)
b <- sample(0:1, size = 6000, replace = T)
foo_large <- data.frame(a,b)
set.seed(123)
mod1 <- mix(response = list(a~1, b~1),
data=foo_large, # the dataset to use
nstates=1, # the number of latent classes
family=list(multinomial("identity"),multinomial("identity")))
fmod1 <- fit(mod1, verbose=TRUE)
#> Error in if (!all(pars < 1e-06)) pars[pars < 1e-06] <- 0: missing value where TRUE/FALSE needed
CASE B (n = 5000) However, with a dataset (n = 5000) with two random variables with the same characteristics as the previous ones, there is no error.
library(depmixS4)
#> Loading required package: nnet
#> Loading required package: MASS
#> Loading required package: Rsolnp
c <- sample(0:1, size = 5000, replace = T)
d <- sample(0:1, size = 5000, replace = T)
foo_short <- data.frame(c,d)
set.seed(123)
mod1 <- mix(response = list(c~1, d~1),
data=foo_short, # the dataset to use
nstates=1, # the number of latent classes
family=list(multinomial("identity"),multinomial("identity")))
fmod1 <- depmixS4::fit(mod1, verbose=TRUE)
#> iteration 0 logLik: -6928.943
#> converged at iteration 1 with logLik: -6928.943
I did a bit of digging and the error seems due to the way depmixS4 provides random starting values to initialize the EM algorithm (it generates random probabilities for class membership with a Dirichlet distribution and the code we use to draw from this distribution doesn't work well for a 1-dimensional Dirichlet). We'll fix this in an upcoming release. For now, you can run the EM without random starting values by using:
fmod1 <- fit(mod1, emcontrol=em.control(random.start=FALSE), verbose=TRUE)
This works in both your examples.
Note that the issue is not due to the difference in the number of observations (n=5000) or (n=6000). That the code converged for (n=6000) was a lucky coincidence of using set.seed(123). After deleting this line you will most likely get the same error as for (n=6000). The latter you can coincidently get working if you set set.seed(1234).

modelling interaction terms in random effects and coding of daytime in growth models lme4

I have a question regarding model setup in R and after a long and thorough search I did not find a thread answering either of my two questions:
Im gonna describe the setup first:
Its a repeated measures dataset with two different interventions (food and training), each of them have two levels. All participants underwent each combination of conditions.
Hormone samples were collected over the course of the day.
Age and BMI are included as covariates.
A reproducible data set is this one:
ID <- rep(rep(c("A","B","C"),each=5),4)
TIME <- rep(paste(sprintf("%02i",9:13),"00",sep=":"),12)
training <- rep(rep(rep(c("T1","T2"),each=5),each=3),2)
food <- c(rep(rep(c("F1","F2"),each=5),each=3),rep(rep(c("F2","F1"),each=5),each=3))
hormone <- rnorm(n = 60,mean=5,sd=2)
BMI <- as.numeric(rep(rep(c(22,27,25),each=5),4))
age <- as.numeric(rep(rep(c(26,27,23),each=5),4))
DF <- as.data.frame(cbind(ID,TIME,training,food,hormone,BMI,age))
DF$BMI <- scale(as.numeric(DF$BMI))
DF$age <- scale(as.numeric(DF$age))
DF$hormone <- as.numeric(as.character(DF$hormone))
Now I have 2 main issues:
1) At first I would like to assess the interaction effect of both interventions on baseline hormone levels.
For that, I setup the following model:
model.df <- DF[DF$TIME=="09:00",]
m1 <- lmer(hormone~training*food+age+BMI+(1+training*food|ID),model.df)
However, this model cannot be setup, because
Error: number of observations (=16) <= number of random effects (=16) for term (1 + training * food | ID); the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable
It does work, when I exclude the interaction term from the random effect:
m1 <- lmer(hormone~training*food+age+BMI+(1+training+food|ID),model.df)
I am now wondering whether this is still a valid model to test for the interaction?
My Null-Model would accordingly be:
m0 <- lmer(hormone~training+food+age+BMI+(1+training+food|ID),model.df)
Now to the second point:
2)
We also want to monitor the hormonal changes over the time course.
However, I am not sure how to include the time of day in the model.
As pointed out in this thread https://stats.stackexchange.com/questions/245866/is-hour-of-day-a-categorical-variable
it can be included as a circular variable, however, since my sampling does not cover the entire day I am not sure how to implement this in my case.
Can anyone help?
In addition, I am not sure how to setup the model including the time-variable.
We are still interested in the interaction of both interventions, so I would setup something like the following model. (this now assumes time as numeric)
model.df <- DF
model.df$TIME <- as.numeric(model.df$TIME)
m2 <- lmer(hormone~training*food*TIME+age+BMI+(1+training*food*TIME||ID),model.df)
However: 1) This model does not converge
Warning messages:
1: In commonArgs(par, fn, control, environment()) :
maxfun < 10 * length(par)^2 is not recommended.
2: In optwrap(optimizer, devfun, getStart(start, rho$lower, rho$pp),:
convergence code 1 from bobyqa: bobyqa -- maximum number of function evaluations exceeded
3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.0035331 (tol = 0.002, component 1)
and 2)
Would you think this is an appropriate model to thest for the interaction effect over time?
It would then be tested against the following null-model:
m02 <- lmer(hormone~training*TIME+food*TIME+age+BMI+(1+training*TIME+food*TIME||ID),model.df)
I hope these arent too many questions for one thread and I would be super-happy about any pointers.
Thanks a lot.

passing sparse xreg to stlf in R causes optimisation error

I am trying to forecast a time series, and regress on temperature. The residuals show a different behaviour at low and high temperatures so I want to use piecewise linear approach, so learn different coeffecients for temperatures above and below 35 degrees.
The data is in a dataframe data$x, data$Season, data$Temp.
#Create data frame
len<-365*3 + 1 +31
x<-rnorm(len,mean=4000000,sd=100000)
Season<-c(rep(3,62),rep(4,91),rep(1,90),rep(2,92),rep(3,92),rep(4,91),rep(1,90),rep(2,92),rep(3,92),rep(4,91),rep(1,91),rep(2,92),rep(3,61))
Temp<-rnorm(len,mean=20,sd=5)
data<-data.frame(x,Season,Temp)
#Create model matrix
season_dummy<-model.matrix(~as.factor(data$Season)+0)
Temp_max=pmax(0,data$Temp-35) # creates 0, or a difference
Temp_restore<-restore_temp_up(Temp_max,data$Temp,35) # restores difference to original value
Temp_season_matrix_max=Temp_restore * season_dummy
#Create time-series and forecast
data_ts<-ts(data$x[1:1000],freq=365,start=c(2009,182))
len_train<-length(data_ts)
xreg1<-Temp_season_matrix_max[1:len_train,]
newxreg1<-Temp_season_matrix_max[(len_train+1):(len_train+30),]
stlf(data_ts,method="arima",h=30,xreg=xreg1,newxreg=newxreg1,s.window="periodic")
> Error in optim(init[mask], armaCSS, method = optim.method, hessian = FALSE, :
non-finite value supplied by optim
Error in auto.arima(x, xreg = xreg, seasonal = FALSE, ...) :
No suitable ARIMA model found
In addition: Warning message:
In auto.arima(x, xreg = xreg, seasonal = FALSE, ...) :
Unable to calculate AIC offset
>
Other threads suggest changing method solver from CSS to ML, but I cant edit these parameters in stlf. The help file shows an optional parameter "forecastfunction" but there are no examples of real explanation how to use it.
Note - when I set the min temperature to say 20, instead of 35, this works ok - I am sure it is because the xreg matrix containing temperatures above 35 degress is sparse (most temperatures are below this value), but I am not sure how to get around this.
(I have included code for restore_temp_up - possibly inefficient, but included here for question completion.)
restore_temp_up<-function(x,original,k){
if(!is.vector(x))
stop('x must be a vector')
for (i in 1:length(x)){
if(!is.na(x[i])){
if (x[i] > 0){
x[i]<-x[i]+k
}
if (original[i] == k){
x[i]<-original[i] ## this is the case if original WAS =k, then dont know whether original is 0,
}
}
}
return(x)
}
Your design matrix is rank deficient so the regression is singular. To see this:
> eigen(t(xreg1) %*% xreg1)$val
[1] 1321.223 0.000 0.000 0.000
You cannot fit a regression model with a rank deficient design matrix.

Model fitting with nls.lm in R, "Error: unused argument"

I'm trying to use the nls.lm function in the minpack.lm to fit a non-linear model to some data from a psychophysics experiment.
I've had a search around and can't find a lot of information about the package so have essentially copied the format of the example given on the nls.lm help page. Unfortunately my script is still failing to run and R is throwing out this error:
Error in fn(par, ...) :
unused argument (observed = c(0.1429, 0.2857, 0.375, 0.3846, 0.4667, 0.6154))
It appears that the script thinks the data I want to fit the model to is irrelevant, which is definitely wrong.
I'm expecting it to fit the model and produce a value of 0.5403 for the spare parameter (w).
Any help is greatly appreciated.
I'm making the transfer from Matlab over to R so apologies if my code looks sloppy.
Here's the script.
install.packages("pracma")
require(pracma)
install.packages("minpack.lm")
require(minpack.lm)
# Residual function, uses parameter w (e.g. .23) to predict accuracy error at a given ratio [e.g. 2:1]
residFun=function(w,n) .5 * erfc( abs(n[,1]-n[,2])/ ((sqrt(2)*w) * sqrt( (n[,1]^2) + (n[,2]^2) ) ) )
# example for residFun
# calculates an error rate of 2.59%
a=matrix(c(2,1),1,byrow=TRUE)
residFun(.23,a)
# Initial guess for parameter to be fitted (w)
parStart=list(w=0.2)
# Recorded accuracies in matrix, 1- gives errors to input into residFun
# i.e. the y-values I want to fit the model
Acc=1-(matrix(c(0.8571,0.7143,0.6250,0.6154,0.5333,0.3846),ncol=6))
# Ratios (converted to proportions) used in testing
# i.e. the points along the x-axis to fit the above data to
Ratios=matrix(c(0.3,0.7,0.4,0.6,0.42,0.58,0.45,0.55,0.47,0.53,0.49,0.51),nrow=6,byrow=TRUE)
# non-linear model fitting, attempting to calculate the value of w using the Levenberg-Marquardt nonlinear least-squares algorithm
output=nls.lm(par=parStart,fn=residFun,observed=Acc,n=Ratios)
# Error message shown after running
# Error in fn(par, ...) :
# unused argument (observed = c(0.1429, 0.2857, 0.375, 0.3846, 0.4667, 0.6154))
The error means you passed a function an argument that it did not expect. ?nls.lm has no argument observed, so it is passed to the function passed to fn, in your case, residFun. However, residFun doesn't expect this argument either, hence the error. You need to redefine this function like this :
# Residual function, uses parameter w (e.g. .23) to predict accuracy error at a given ratio [e.g. 2:1]
residFun=function(par,observed, n) {
w <- par$w
r <- observed - (.5 * erfc( abs(n[,1]-n[,2])/ ((sqrt(2)*w) * sqrt( (n[,1]^2) + (n[,2]^2) ) ) ))
return(r)
}
It gives the following result :
> output = nls.lm(par=parStart,fn=residFun,observed=Acc,n=Ratios)
> output
Nonlinear regression via the Levenberg-Marquardt algorithm
parameter estimates: 0.540285874836135
residual sum-of-squares: 0.02166
reason terminated: Relative error in the sum of squares is at most `ftol'.
Why that happened :
It seems that you were inspired by this example in he documentation :
## residual function
residFun <- function(p, observed, xx) observed - getPred(p,xx)
## starting values for parameters
parStart <- list(a=3,b=-.001, c=1)
## perform fit
nls.out <- nls.lm(par=parStart, fn = residFun, observed = simDNoisy,
xx = x, control = nls.lm.control(nprint=1))
Note that observed is an argument of residFun here.

Error when using msmFit in R

I'm trying to simulate this paper (Point Forecast Markov Switching Model for U.S. Dollar/ Euro Exchange Rate, by Hamidreza Mostafei) in R. The table that I'm trying to get is on page 483. Here is a link to a pdf.
I wrote the following codes and then got an error at the last line:
mydata <- read.csv("C:\\Users\\User\\Downloads\\EURUSD_2.csv", header=T)
mod <- lm(EURUSD~EURUSD.1, mydata)
mod.mswm = msmFit(mod, k=2, p=1, sw=c(T,T,T,T), control=list(parallel=F))
Error in if ((max(abs(object["Fit"]["logLikel"] - oldll))/(0.1 + max(abs(object["Fit"]["logLikel"]))) < :
missing value where TRUE/FALSE needed
Basically the data that's being used is EURUSD, which is the level change in monthly frequency. EURUSD.1 is the one lag variable. Both EURUSD and EURUSD.1 are in my csv file. (I'm not sure how to attach the csv file here. If someone could point that out that would be great).
I changed the EURUSD.1 values to something random and msmFit function seemed to work. But whenever I tried using the original value, i.e. the lag value, the error came out.
Something degenerate is happening when one variable is simply lagged from the other. Consider:
Sample data frame where Y is lagged X:
> d = data.frame(X=runif(100))
> d$Y=c(.5, d$X[-100])
> mod <- lm(X~Y,d)
> mod.mswm = msmFit(mod, k=2, p=1, sw=c(T,T,T,T), control=list(parallel=F))
Error in if ((max(abs(object["Fit"]["logLikel"] - oldll))/(0.1 + max(abs(object["Fit"]["logLikel"]))) < :
missing value where TRUE/FALSE needed
that gives your error. Let's add a tiny tiny bit of noise to Y and see what happens:
> d$Y=d$Y+rnorm(100,0,.000001)
> mod <- lm(X~Y,d)
> mod.mswm = msmFit(mod, k=2, p=1, sw=c(T,T,T,T), control=list(parallel=F))
> mod.mswm
Markov Switching Model
Call: msmFit(object = mod, k = 2, sw = c(T, T, T, T), p = 1, control = list(parallel = F))
AIC BIC logLik
4.3109 47.45234 3.84455
Coefficients:
(Intercept)(S) Y(S) X_1(S) Std(S)
Model 1 0.8739622 -22948.89 22948.83 0.08194545
Model 2 0.4220748 77625.21 -77625.17 0.21780764
Transition probabilities:
Regime 1 Regime 2
Regime 1 0.3707261 0.3886715
Regime 2 0.6292739 0.6113285
It works! Now either:
Having perfectly lagged variables causes some "divide by zero" error because its a purely degenerate case (like having perfectly co-linear variables in a linear model). A little experimenting shows that in this case the resulting output is very sensitive to how much noise you add, so I'm thinking its on a knife-edge here. I suspect having perfectly lagged variables here leads to some singularity or degeneracy.
or
There's some bug in the function.
I have no idea what msmFit does, so that's for you to sort out.

Resources