Manual Maximum-Likelihood Estimation of an AR-Model in R - r

I am trying to estimate a simple AR(1) model in R of the form y[t] = alpha + beta * y[t-1] + u[t] with u[t] being normally distributed with mean zero and standard deviation sigma.
I have simulated an AR(1) model with alpha = 10 and beta = 0.1:
library(stats)
data<-arima.sim(n=1000,list(ar=0.1),mean=10)
First check: OLS yields the following results:
lm(data~c(NA,data[1:length(data)-1]))
Call:
lm(formula = data ~ c(NA, data[1:length(data) - 1]))
Coefficients:
(Intercept) c(NA, data[1:length(data) - 1])
10.02253 0.09669
But my goal is to estimate the coefficients with ML. My negative log-likelihood function is:
logl<-function(sigma,alpha,beta){
-sum(log((1/(sqrt(2*pi)*sigma)) * exp(-((data-alpha-beta*c(NA,data[1:length(data)-1]))^2)/(2*sigma^2))))
}
that is, the sum of all log-single observation normal distributions, that are transformed by u[t] = y[t] - alpha - beta*y[t-1]. The lag has been created (just like in the OLS estimation above) by c(NA,data[1:length(data)-1]).
When I try to put it at work I get the following error:
library(stats4)
mle(logl,start=list(sigma=1,alpha=5,beta=0.05),method="L-BFGS-B")
Error in optim(start, f, method = method, hessian = TRUE, ...) :
L-BFGS-B needs finite values of 'fn'
My log-likelihood function must be correct, when I try to estimate a linear model of the form y[t] = alpha + beta * x[t] + u[t] it works perfectly.
I just do not see how my initial values lead to a non-finite result? Trying any other initial values does not solve the problem.
Any help is highly appreciated!

This works for me -- basically what you've done but leaving out the first element of the response, since we can't predict it with an AR model anyway.
Simulate:
library(stats)
set.seed(101)
data <- arima.sim(n=1000,list(ar=0.1),mean=10)
Negative log-likelihood:
logl <- function(sigma,alpha,beta) {
-sum(dnorm(data[-1],alpha+beta*data[1:length(data)-1],sigma,log=TRUE))
}
Fit:
library(stats4)
mle(logl,start=list(sigma=1,alpha=5,beta=0.05),method="L-BFGS-B")
## Call:
## mle(minuslogl = logl, start = list(sigma = 1, alpha = 5, beta = 0.05),
## method = "L-BFGS-B")
##
## Coefficients:
## 0.96150573 10.02658632 0.09437847
Alternatively:
df <- data.frame(y=data[-1],ylag1=head(data,-1))
library(bbmle)
mle2(y~dnorm(alpha+beta*ylag1,sigma),
start=list(sigma=1,alpha=5,beta=0.05),
data=df,method="L-BFGS-B")

Related

How to incorporate a random effect into a nonlinear mixed effect (nlme) model?

I'd like to build a nonlinear mixed effect model that describes the relationship between two variables, "x" and "y", which vary randomly by a third variable "r" using an exponential rise to a maximum as described by the equation: y = theta(1-exp(-beta*x)).
I've been able to create the nonlinear model for x and y using nls(), but I have not been successful in incorporating a random effect into nlme().
When I build the model using nlme() I end up with an error message: "Error in eval(predvars, data, env) : object 'theta' not found". This error is unexpected to me since the nls() model ran without issue using the same dataframe.
To build the dataset:
x = c(33,35,16,8,31,31,31,23,7,7,7,7,11,11,3,3,6,6,32,32,1,17,17,17,25,40,40,6,6,29,29,13,23,23,44,44,43,43,13,4,6,15,17,22,28,8,11,22,32,6,12,20,27,15,29,29,29,29,29,12,12,16,16,12,12,2,49,49,14,14,14,37,2.87,4.86,7.90,11.95,16.90,16.90,18.90,18.89,22.00,24.08,27.14,30.25,31.22,32.26,7,14,19,31,36,7,14,19,31,36,7,16,16,16,16,16,16,32,32,32,32,32,32,11,11,11,13,13,13,13,13,13,13,13,13,13,13,13,9,9)
y = c(39.61,32.66,27.06,21.74,22.18,38.19,35.02,23.13,9.70,14.20,13.40,15.30,18.80,19.00,3.80,4.90,15.00,14.20,24.90,16.56,1.76,29.29,28.49,18.64,27.10,9.47,14.14,10.27,8.44,26.15,25.43,22.00,19.00,13.00,73.19,67.76,32.34,36.86,8.00,1.57,8.33,16.20,14.69,18.95,20.52,4.92,8.28,15.27,18.37,6.60,10.98,12.56,19.04,5.49,21.00,12.90,17.30,11.40,12.20,15.63,15.22,33.80,17.78,19.33,3.86,8.57,30.40,13.39,11.93,4.55,6.18,12.70,2.71,7.23,5.61,22.74,15.71,16.95,18.31,20.78,17.64,20.00,19.52,24.86,30.06,24.92,4.17,11.02,10.08,14.94,25.98,0.00,3.67,3.67,6.69,11.90,5.06,13.21,10.33,0.00,0.00,6.47,8.38,28.57,25.26,28.67,27.92,33.69,29.61,6.11,7.13,6.93,4.81,15.34,4.90,14.94,8.88,10.24,8.80,10.46,10.48,9.19,9.67,9.40,24.98,50.79)
r = c("A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","C","C","D","E","E","E","F","G","G","H","H","H","H","I","I","I","J","J","J","J","K","L","L","L","L","L","L","L","L","L","L","L","L","L","L","M","N","N","N","N","N","O","P","P","P","P","P","Q","R","R","S","S","S","T","U","U","U","U","U","U","U","U","U","U","U","U","U","U","V","V","V","V","V","V","V","V","V","V","W","X","X","X","X","Y","Y","Z","Z","Z","Z","Z","Z","AA","AA","AA","AB","AB","AB","AB","AB","AB","AB","AB","AB","AB","AB","AB","AC","AC")
df = data.frame(x,y,r)
To build the nonlinear model without "r" as a random effect.
nls_test = nls(y~theta*(1-exp(-beta*x)),
data = df,
start = list(beta = 0.2, theta = 38),
trace = TRUE)
In my model, the only fixed effect is x and the only random effect is r. I've tried building an nlme() model that reflects this, based on the nlme package documentation (https://cran.r-project.org/web/packages/nlme/nlme.pdf),more specifically these lines of code found on page 186 of the documentation linked above.
The nlme() object that I've tried to create with my data is as follows:
nlme_test = nlme(y ~ theta*(1-exp(-beta*x)),
fixed = x~1,
random = r~1,
data = df,
start = c(theta = 38,
beta = 0.2))
And results in the following error.
Error in eval(predvars, data, env) : object 'theta' not found
From what I gather, this is related to 'theta' not being included in the dataframe ("df") used to build the nlme object, but it is unclear to me why this occurs as most examples that I have found for this error are related to the use of the predict() function and missing column or disagreement between column names.
Also, since the nls() model (nls_Test) worked fine using the same start = c(theta = 38, beta = 0.2) and without a 'theta' or 'beta' data column in df, I'm a bit confused as to why I'm receiving this error about column name error.
Does anyone have suggestions or references to help me incorporate the random effect into my nlme model?
Thanks!
Expanding on my (now deleted, because incomplete) comment, I assume this is what you want to do. Please confirm carefully by reading the help page about nlme (i.e. ?nlme::nlme).
nlme_test <- nlme(y ~ theta*(1-exp(-beta*x)),
fixed = theta + beta ~ 1,
random = theta + beta ~ 1,
groups = ~ r,
data = df,
start = c(theta = 38,
beta = 0.2))
The fixed and random arguments should not name the variables in your model formula but the regression parameters. This way the function knows which parts of the model are variables (to be found in data) and which parts are parameters. Also, you missed the groups argument in order to specify how the data is clustered.
Output:
summary(nlme_test)
## Nonlinear mixed-effects model fit by maximum likelihood
## Model: y ~ theta * (1 - exp(-beta * x))
## Data: df
## AIC BIC logLik
## 887.6224 904.6401 -437.8112
##
## Random effects:
## Formula: list(theta ~ 1, beta ~ 1)
## Level: r
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## theta 1.145839e+01 theta
## beta 1.061366e-05 0.01
## Residual 6.215030e+00
##
## Fixed effects: theta + beta ~ 1
## Value Std.Error DF t-value p-value
## theta 21.532188 2.8853414 96 7.462614 0e+00
## beta 0.104404 0.0251567 96 4.150144 1e-04
## Correlation:
## theta
## beta -0.548
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.89510795 -0.51882772 -0.09466037 0.34471808 3.66855121
##
## Number of Observations: 126
## Number of Groups: 29

Using the MLE and NLS functions in R for a nonlinear model

I'm having some issues being able to implement an MLE and NLS model for some estimation that I am trying to perform. The model is as follows
Y=A*(K^b_1)*(L^b_2)+e
Where e is just the error term and B and D are input variables. The goal is to try to estimate b_1 and b_2. My first equation is, how would I put this into the nls function as when I try to do this I get the following error
prod.nls<-nls(Y~A*(K^Beta_1)*(L^Beta_2), start =list(A = 2, Beta_1= 2, Beta_2=2))
Error in numericDeriv(form[[3L]], names(ind), env) :
Missing value or an infinity produced when evaluating the model
My other question is the model above can be rewritten in terms of logs,
log(Y)= log(A)+b_1log(K)+b_2log(L)
I omit the error term as it becomes irreverent and the A is just a scalar so I leave it out as well. However when I place that model into R using the mle function I get an error as follows,
prod.mle<-mle(log(Y)~log(K)+log(L))
Error in minuslogl() : could not find function "minuslogl"
In addition: Warning messages:
1: In formals(fun) : argument is not a function
2: In formals(fun) : argument is not a function
A small table of values from the dataset is provided below to be able to reproduce these errors.
Thank you for the help ahead of time.
Y
K
L
26971.71
32.46371
3013256.014
330252.5
28.42238
135261574.9
127345.3
5.199048
39168414.92
3626843
327.807
1118363069
37192.73
16.01538
9621912.503
1) Try removing A and just optimizing with the other 2 parameters. Then use the result as starting values to reoptimize. In the second application of nls we use the plinear algorithm which does not require starting values for parameters that enter linearly. When plinear is used the right hand side should be a matrix such that a linear parameter multiplies each column. In this case we only have one column.
fo <- Y ~ cbind(A = (K^Beta_1) * (L^Beta_2))
st <- list(Beta_1 = 2, Beta_2 = 2)
fm0 <- nls(fo, DF, start = st)
fm1 <- nls(fo, DF, start = coef(fm0), alg = "plinear"); fm1
giving:
Nonlinear regression model
model: Y ~ cbind(A = (K^Beta_1) * (L^Beta_2))
data: DF
Beta_1 Beta_2 .lin.A
0.25399 0.81422 0.03572
residual sum-of-squares: 2.468e+09
Number of iterations to convergence: 7
Achieved convergence tolerance: 6.56e-06
2) If we take logs of both sides then the formula is linear in all parameters assuming we use log(A) rather than A as one parameter thus we can use lm. Note that that is not an exactly equivalent problem to the original problem although it may be close enough for you. Below we use it as an alternative way to get starting values.
fm2 <- lm(log(Y) ~ log(K) + log(L), DF)
co2 <- coef(fm2)
st2 <- list(Beta_1 = co2[[2]], Beta_2 = co2[[3]])
fm3 <- nls(fo, DF, start = st2, alg = "plinear"); fm3
giving:
Nonlinear regression model
model: Y ~ cbind(A = (K^Beta_1) * (L^Beta_2))
data: DF
Beta_1 Beta_2 .lin.A
0.2540 0.8143 0.0357
residual sum-of-squares: 2.468e+09
Number of iterations to convergence: 6
Achieved convergence tolerance: 3.744e-06
Note
The input DF in reproducible form is:
DF <- structure(list(Y = c(26971.71, 330252.5, 127345.3, 3626843, 37192.73
), K = c(32.46371, 28.42238, 5.199048, 327.807, 16.01538), L = c(3013256.014,
135261574.9, 39168414.92, 1118363069, 9621912.503)),
class = "data.frame", row.names = c(NA, -5L))

R nls singular gradient starting value

I am having a problem while fitting a fucntion via nls
This is the Data:
size<-c(0.0020,0.0063,0.0200,0.0630,0.1250,0.2000,0.6300,2.0000)
cum<-c(6.4,7.1,7.6,37.5,83.0,94.5,99.9,100.0)
I want to fit Gompertz model to it. Therefor i tried:
start<-c(alpha =100, beta = 10, k = 0.03)
fit<-nls(cum~ alpha*exp(-beta*exp(-k*size)),start=start)
The Error says: Singulat gradient.
Some post suggest to choose better starting values.
Can you help me with this problem?
The starting values are too far away from the optimal ones. First take logs of both sides in which case there is only one non-linear parameter, k. Only that needs a starting value if we use the plinear algorithm. Using k from that fit as the k starting value refit using original formula.
fit.log <- nls(log(cum) ~ cbind(1, exp(-k*size)), alg = "plinear", start = c(k = 0.03))
start <- list(alpha = 100, beta = 10, k = coef(fit.log)[["k"]])
fit <- nls(cum ~ alpha*exp(-beta*exp(-k*size)), start = start)
fit
giving:
Nonlinear regression model
model: cum ~ alpha * exp(-beta * exp(-k * size))
data: parent.frame()
alpha beta k
100.116 3.734 22.340
residual sum-of-squares: 45.87
Number of iterations to convergence: 11
Achieved convergence tolerance: 3.351e-06
We can show the fit on a graph
plot(cum ~ size, pch = 20)
lines(fitted(fit) ~ size, col = "red")
giving:

Confidence intervals for the predicted probabilities from glmer object, error with bootMer

I need to calculate 95% confidence intervals or predicted probabilities from a logistic mixed effects model, created using the glmer function from lme4 R package. The model includes a stabilized probability weighting to correct for the selecttion bias on the analized data.
I've read that bootMer function (lme4 package) perform a Model-based semi-parametric bootstraping that makes staighforward to get the CI's as the quantiles of the distribution (quantile approach).
Nevertheless, when I apply the function bootMer, the following error is generated:
"Error in sfun(object, nsim = 1, ftd = rep_len(musim, n * nsim), wts =
weights): cannot simulate from non-integer prior.weights"
I must use a non-integer weights, so my question is ¿How can I solve this problem using bootMer function? Or if it's impossible, ¿Are anny alternatives?
#The model
M1s = glmer(plab ~ 1 + edad2_c + I(edad2_c^2) + periodo_c + cohorte + nocu_c + tipoocu2 + sector + educ + benef + genero + ecivil + area + generojh + edadjh2_c + nhogar_c + nhogar05_c + nhogar0614_c + nhogar66_c + (1 | periodo_c), weights = ipw,
data = seriecasen,family = binomial(link=logit),nAGQ = 10,glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e5)))
#Model-based semi-parametric bootstrap for mixed models - CI's predicted probabilites
merBoot <- bootMer(M1s, predict, nsim = 1000, use.u = TRUE, type = c("parametric"), seed = 1959)
CI.lower = apply(merBoot$t, 2, function(x) as.numeric(quantile(x, probs=.025, na.rm=TRUE)))
CI.upper = apply(merBoot$t, 2, function(x) as.numeric(quantile(x, probs=.975, na.rm=TRUE)))
Error in sfun(object, nsim = 1, ftd = rep_len(musim, n * nsim), wts =
weights): cannot simulate from non-integer prior.weights
An alternative is the std_beta() function from the sjstats package. It's difficult to test on your model without your data, but I've performed this function on my own logistic regression and it seems to provide your standardized beta, along with the confidence interval(s). The following code should likely work:
sjstats::std_beta(M1s)
Here is the link to the function: std_beta

How to simulate an AR(1) model in R with rho equals to 1

I want to simulate an AR(1) model x_t = rho * x_(t-1) + e_t, where rho=1, n=1050, so I tried the following code in R.
y <- arima.sim(list(order = c(1,0,0), ar = 1), n = 1050)
But R returns the following message: Error: 'ar' part of model is not stationary.
How can I simulate this AR(a) model in this case?
An easy way to do this is
y <- cumsum(rnorm(1050, 0, 1))
(assuming your e_t terms are normal with mean 0 and variance 1)
Your AR coefficient of one is essentially just adding a differencing term, so your model is an ARIMA(0,1,0) model. (Since this is non-stationary, R does not like you trying to put it into the AR part.) The code to simulate from this model is:
y <- arima.sim(list(order = c(0,1,0)), n = 1050)

Resources