R: Convergence issues with jagam (mgcv R package) - r

I am trying to fit following model:
mod <- jagam(y_freq ~
s(x, bs="cr", fx=FALSE, k=5) +
s(x, by=a, bs="cr", fx=FALSE, k=5) +
s(x, by=b, bs="cr", fx=FALSE, k=5) +
s(x, by=c, bs="cr", fx=FALSE, k=5),
family = binomial(), data = dt,
file = "file.jags",
weights = dt$total)
where 'a' is a numeric variable with 0 and 1 as potential values, 'b' is another numeric variable with 0 and 1 as potential values, and 'c' is the interaction between 'a' and 'b'.
As I would like to correct for overdispersion, I update the jags file that gets created by the jagam function as follows:
model {
eta <- X %*% b
for (i in 1:n) {
y[i] ~ dbin(p[i],w[i])
p[i] ~ dbeta(alpha[i], beta[i]) T(0.001,0.999)
alpha[i] = phi[i] * mu[i]
beta[i] = phi[i] * (1 - mu[i])
phi[i] ~ dexp(1/250)
mu[i] <- ilogit(eta[i])
}
#splines are defined below here
}
After updating the file, I use the functions jags.model, jags.sample (for parameters: b, rho and mu) with 100k iterations and 3 chains, and sim2jam.
After I check for convergence with the coda package, I get for all three chains following results (only one shown here):
example plot for one chain for rho parameter
I get traces for 8 rho parameters, for which only two seem to converge. I cannot show the results for the other parameters (b and mu) as there are too many (40 for b) to show on a plot.
I would like to know which of these 8 parameters for rho correspond to the null-space parameters, and what the cause of the convergence issue could be (too many parameters, too many splines, ...) and how to fix it?
Thank you,
Kate

Related

R - Singular gradient matrix at initial parameter estimates

I'm trying to fit a harmonic equation to my data, but when I'm applying the nls function, R gives me the following error:
Error in nlsModel(formula, mf, start, wts) : singular gradient matrix at initial parameter estimates.
All posts I've seen, related to this error, are of exponential functions, where a linearization is used to fix this error, but in this case, I'm not able to solve it in this way. I tried to use other starting points but it still not working.
CODE:
y <- c(20.91676, 20.65219, 20.39272, 20.58692, 21.64712, 23.30965, 23.35657, 24.22724, 24.83439, 24.34865, 23.13173, 21.96117)
t <- c(1, 2, 3, 4 , 5 , 6, 7, 8, 9, 10, 11, 12)
# Fitting function
fit <- function(x, a, b, c) {a+b*sin(2*pi*x)+c*cos(2*pi*x)}
res <- nls(y ~ fit(t, a, b, c), data=data.frame(t,y), start = list(a=1,b=0, c=1))
Can you help me? Thanks!
There are several problems:
cos(2*pi*t) is a vector of all ones for the t given in the question so the model is not identifiable given that there is already an intercept
the model is linear in the parameters so one can use lm rather than nls and no starting values are needed
the model does not work well even if we address those points as seen by the large second coefficient. Improve the model.
lm(y ~ sin(2*pi*t))
giving:
Call:
lm(formula = y ~ sin(2 * pi * t))
Coefficients:
(Intercept) sin(2 * pi * t)
2.195e+01 -2.262e+14
Instead try this model using the plinear algorithm which does not require starting values for the parameters that enter linearly. This implements the model .lin1 + .lin2 * cos(a * t + b) where the .lin1 and .lin2 parameters are implicit parameters that enter linearly and don't need starting values.
fm <- nls(y ~ cbind(1, cos(a * t + b)), start = list(a = 1, b = 1), alg = "plinear")
plot(y ~ t)
lines(fitted(fm) ~ t, col = "red")
fm
giving:
Nonlinear regression model
model: y ~ cbind(1, cos(a * t + b))
data: parent.frame()
a b .lin1 .lin2
0.5226 4.8814 22.4454 -2.1530
residual sum-of-squares: 0.7947
Number of iterations to convergence: 9
Achieved convergence tolerance: 8.865e-06

Outcome prediction using JAGS from R

[Code is updated and does not correspond to error messages anymore]
I am trying to understand how JAGS predicts outcome values (for a mixed markov model). I've trained the model on a dataset which includes outcome m and covariates x1, x2 and x3.
Predicting the outcome without fixing parameter values works in R, but the output seems completely random:
preds <- run.jags("model.txt",
data=list(x1=x1, x2=x2, x3=x3, m=m,
statealpha=rep(1,times=M), M=M, T=T, N=N), monitor=c("m_pred"),
n.chains=1, inits = NA, sample=1)
Compiling rjags model...
Calling the simulation using the rjags method...
Note: the model did not require adaptation
Burning in the model for 4000 iterations...
|**************************************************| 100%
Running the model for 1 iterations...
Simulation complete
Finished running the simulation
However, as soon as I try to fix parameters (i.e. use model estimates to predict outcome m, I get errors:
preds <- run.jags("model.txt",
data=list(x1=x1, x2=x2, x3=x3,
statealpha=rep(1,times=M), M=M, T=T, N=N, beta1=beta1), monitor=c("m"),
n.chains=1, inits = NA, sample=1)
Compiling rjags model...
Error: The following error occured when compiling and adapting the model using rjags:
Error in rjags::jags.model(model, data = dataenv, n.chains = length(runjags.object$end.state), :
RUNTIME ERROR:
Compilation error on line 39.
beta1[2,1] is a logical node and cannot be observed
beta1 in this case is a 2x2 matrix of coefficient estimates.
How is JAGS predicting m in the first example (no fixed parameters)? Is it just completely randomly choosing m?
How can I include earlier acquired model estimates to simulate new outcome values?
The model is:
model{
for (i in 1:N)
{
for (t in 1:T)
{
m[t,i] ~ dcat(ps[i,t,])
}
for (state in 1:M)
{
ps[i,1,state] <- probs1[state]
for (t in 2:T)
{
ps[i,t,state] <- probs[m[(t-1),i], state, i,t]
}
for (prev in 1:M){
for (t in 1:T) {
probs[prev,state,i,t] <- odds[prev,state,i,t]/totalodds[prev,i,t]
odds[prev,state,i,t] <- exp(alpha[prev,state,i] +
beta1[prev,state]*x1[t,i]
+ beta2[prev,state]*x2[t,i]
+ beta3[prev,state]*x3[t,i])
}}
alpha[state,state,i] <- 0
for (t in 1:T) {
totalodds[state,i,t] <- odds[state,1,i,t] + odds[state,2,i,t]
}
}
alpha[1,2,i] <- raneffs[i,1]
alpha[2,1,i] <- raneffs[i,2]
raneffs[i,1:2] ~ dmnorm(alpha.means[1:2],alpha.prec[1:2, 1:2])
}
for (state in 1:M)
{
beta1[state,state] <- 0
beta2[state,state] <- 0
beta3[state,state] <- 0
}
beta1[1,2] <- rcoeff[1]
beta1[2,1] <- rcoeff[2]
beta2[1,2] <- rcoeff[3]
beta2[2,1] <- rcoeff[4]
beta3[1,2] <- rcoeff[5]
beta3[2,1] <- rcoeff[6]
alpha.Sigma[1:2,1:2] <- inverse(alpha.prec[1:2,1:2])
probs1[1:M] ~ ddirich(statealpha[1:M])
for (par in 1:6)
{
alpha.means[par] ~ dt(T.constant.mu,T.constant.tau,T.constant.k)
rcoeff[par] ~ dt(T.mu, T.tau, T.k)
}
T.constant.mu <- 0
T.mu <- 0
T.constant.tau <- 1/T.constant.scale.squared
T.tau <- 1/T.scale.squared
T.constant.scale.squared <- T.constant.scale*T.constant.scale
T.scale.squared <- T.scale*T.scale
T.scale <- 2.5
T.constant.scale <- 10
T.constant.k <- 1
T.k <- 1
alpha.prec[1:2,1:2] ~ dwish(Om[1:2,1:2],2)
Om[1,1] <- 1
Om[1,2] <- 0
Om[2,1] <- 0
Om[2,2] <- 1
## Prediction
for (i in 1:N)
{
m_pred[1,i] <- m[1,i]
for (t in 2:T)
{
m_pred[t,i] ~ dcat(ps_pred[i,t,])
}
for (state in 1:M)
{
ps_pred[i,1,state] <- probs1[state]
for (t in 2:T)
{
ps_pred[i,t,state] <- probs_pred[m_pred[(t-1),i], state, i,t]
}
for (prev in 1:M)
{
for (t in 1:T)
{
probs_pred[prev,state,i,t] <- odds_pred[prev,state,i,t]/totalodds_pred[prev,i,t]
odds_pred[prev,state,i,t] <- exp(alpha[prev,state,i] +
beta1[prev,state]*x1[t,i]
+ beta2[prev,state]*x2[t,i]
+ beta3[prev,state]*x3[t,i])
}}
for (t in 1:T) {
totalodds_pred[state,i,t] <- odds_pred[state,1,i,t] + odds_pred[state,2,i,t]
}
}
}
TL;DR: I think you're just missing a likelihood.
Your model is complex, so perhaps I'm missing something, but as far as I can tell there is no likelihood. You are supplying the predictors x1, x2, and x3 as data, but you aren't giving any observed m. So in what sense can JAGS be "fitting" the model?
To answer your questions:
Yes, it appears that m is drawn as random from a categorical distribution conditioned on the rest of the model. Since there are no m supplied as data, none of the parameter distributions have cause for update, so your result for m is no different than you'd get if you just did random draws from all the priors and propagated them through the model in R or whatever.
Though it still wouldn't constitute fitting the model in any sense, you would be free to supply values for beta1 if they weren't already defined completely in the model. JAGS is complaining because currently beta1[i] = rcoeff[i] ~ dt(T.mu, T.tau, T.k), and the parameters to the T distribution are all fixed. If any of (T.mu, T.tau, T.k) were instead given priors (identifying them as random), then beta1 could be supplied as data and JAGS would treat rcoeff[i] ~ dt(T.mu, T.tau, T.k) as a likelihood. But in the model's current form, as far as JAGS is concerned if you supply beta1 as data, that's in conflict with the fixed definition already in the model.
I'm stretching here, but my guess is if you're using JAGS you have (or would like to) fit the model in JAGS too. It's a common pattern to include both an observed response and a desired predicted response in a jags model, e.g. something like this:
model {
b ~ dnorm(0, 1) # prior on b
for(i in 1:N) {
y[i] ~ dnorm(b * x[i], 1) # Likelihood of y | b (and fixed precision = 1 for the example)
}
for(i in 1:N_pred) {
pred_y[i] ~ dnorm(b * pred_x[i], 1) # Prediction
}
}
In this example model, x, y, and pred_x are supplied as data, the unknown parameter b is to be estimated, and we desire the posterior predictions pred_y at each value of pred_x. JAGS knows that the distribution in the first for loop is a likelihood, because y is supplied as data. Posterior samples of b will be constrained by this likelihood. The second for loop looks similar, but since pred_y is not supplied as data, it can do nothing to constrain b. Instead, JAGS knows to simply draw pred_y samples conditioned on b and the supplied pred_x. The values of pred_x are commonly defined to be the same as observed x, giving a predictive interval for each observed data point, or as a regular sequence of values along the x axis to generate a smooth predictive interval.

Nonlinear regression with sampling weights (package survey)

I would like to estimate the coefficients of a nonlinear model with a binary dependent variable. The nonlinearity arises because two regressors, A and B, depend on a subset of the dataset and on the two parameters lambda1 and lambda2 respectively:
y = alpha + beta1 * A(lambda1) + beta2 * B(lambda2) + delta * X + epsilon
where for each observation i, we have
Where a and Rs are variables in the data.frame. The regressor B(lambda2) is defined in a similar way.
Moreover, I need to include what in Stata are known as pweights, i.e. survey weights or sampling weights. For this reason, I'm working with the R package survey by Thomas Lumley.
First, I create a function for A (and B), i.e.:
A <- function(l1){
R <- as.matrix(data[,1:(80)])
a <- data[,169]
N = length(a)
var <- numeric(N)
for (i in 1:N) {
ai <- rep(a[i],a[i]-1) # vector of a(i)
k <- 1:(a[i]-1) # numbers from 1 to a(i)-1
num <- (ai-k)^l1
den <- sum((ai-k)^l1)
w <- num/den
w <- c(w,rep(0,dim(R)[2]-length(w)))
var[i] <- R[i,] %*% w
}
return(var)
}
B <- function(l2){
C <- as.matrix(data[,82:(161-1)])
a <- data[,169]
N = length(a)
var <- numeric(N)
for (i in 1:N) {
ai <- rep(a[i],a[i]-1) # vector of a(i)
k <- 1:(a[i]-1) # numbers from 1 to a(i)-1
num <- (ai-k)^l2
den <- sum((ai-k)^l2)
w <- num/den
w <- c(w,rep(0,dim(C)[2]-length(w)))
var[i] <- C[i,] %*% w
}
return(var)
}
But the problem is that I don't know how to include the nonlinear regressors in the model (or in the survey design, using the function svydesign):
d_test <- svydesign(id=~1, data = data, weights = ~data$hw0010)
Because, when I try to estimate the model:
# loglikelihood function:
LLsvy <- function(y, model, lambda1, lambda2){
aux1 <- y * log(pnorm(model))
aux2 <- (1-y) * log(1-pnorm(model))
LL <- (aux1) + (aux2)
return(LL)
}
fit <- svymle(loglike=LLsvy,
formulas=list(~y, model = ~ A(lambda1)+B(lambda2)+X,lambda1=~1,lambda2=~1),
design=d_test,
start=list(c(0,0,0,0),c(lambda1=11),c(lambda2=8)),
na.action="na.exclude")
I get the error message:
Error in eval(expr, envir, enclos) : object 'lambda1' not found
I think that the problem is in including the nonlinear part, because everything works fine if I fix A and B for some lambda1 and lambda2 (so that the model becomes linear):
lambda1=11
lambda2=8
data$A <- A(lambda1)
data$B <- B(lambda2)
d_test <- svydesign(id=~1, data = data, weights = ~data$hw0010)
LLsvylin <- function(y, model){
aux1 <- y * log(pnorm(model))
aux2 <- (1-y) * log(1-pnorm(model))
LL <- (aux1) + (aux2)
return(LL)
}
fitlin <- svymle(loglike=LLsvylin,
formulas=list(~y, model = ~A+B+X),
design=d_test,
start=list(0,0,0,0),
na.action="na.exclude")
On the contrary, if I don't use the sampling weights, I can easily estimate my nonlinear model using the function mle from package stats4 or the function mle2 from package bbmle.
To sum up,
how can I combine sampling weights (svymle) while estimating a nonlinear model (which I can do using mle or mle2)?
=========================================================================
A problem with the nonlinear part of the model arises also when using the function svyglm (with fixed lambda1 and lambda2, in order to get good starting values for svymle):
lambda1=11
lambda2=8
model0 = y ~ A(lambda1) + B(lambda2) + X
probit1 = svyglm(formula = model0,
data = data,
family = binomial(link=probit),
design = d_test)
Because I get the error message:
Error in svyglm.survey.design(formula = model0, data = data, family = binomial(link = probit), :
all variables must be in design= argument
This isn't what svymle does -- it's for generalised linear models, which have linear predictors and a potentially complicated likelihood or loss function. You want non-linear weighted least squares, with a simple loss function but complicated predictors.
There isn't an implementation of design-weighted nonlinear least squares in the survey package, probably because no-one has previously asked for one. You could try emailing the package author.
The upcoming version 4 of the survey package will have a function svynls, so if you know how to fit your model without sampling weights using nls you will be able to fit it with sampling weights.

Fit 'nls': singular gradient matrix at initial parameter estimates

I'm new using 'nls' and I'm encountering problems finding the starting parameters. I've read several posts and tried various parameters and formula constructions but I keep getting errors.
This is a small example of what I'm doing and I'd very much appreciate if anyone could give me some tips!
# Data to which I want to fit a non-linear function
x <- c(0, 4, 13, 30, 63, 92)
y <- c(0.00000000, 0.00508822, 0.01103990, 0.02115466, 0.04036655, 0.05865331)
z <- 0.98
# STEPS:
# 1 pool, z fixed. This works.
fit <- nls(y ~ z * ((1 - exp(-k1*x))),
start=list(k1=0))
# 2 pool model, z fixed
fit2 <- nls(y ~ z * (1 - exp(-k1*x)) + (1 - exp(-k2*x)),
start=list(k1=0, k2=0)) # Error: singular gradient matrix at initial parameter estimates
# My goal: 2 pool model, z free
fit3 <- nls(y ~ z * (1 - exp(-k1*x)) + (1 - exp(-k2*x)),
start=list(z=0.5, k1=0, k2=0))
It has been a while since you asked the question but maybe you are still interested in some comments:
At least your fit2 works fine when one varies the starting parameters (see code and plots below). I guess that fit3 is then just a "too complicated" model given these data which follow basically just a linear trend. That implies that two parameters are usually sufficient to describe the data reasonable well (see second plot).
So as a general hint: When you obtain
singular gradient matrix at initial parameter estimates
you can
1) vary the starting values/your initial parameter estimates
and/or
2) try to simplify your model by looking for redundant parameters which usually cause troubles.
I also highly recommend to always plot the data first together with your initial guesses (check also this question).
Here is a plot showing the outcome for your fit, fit2 and a third function defined by me which is given in the code below:
As you can see, there is almost no difference between your fit2 and the function which has a variable z and one additional exponential. Two parameters seem pretty much enough to describe the system reasonable well (also one is already quite good represented by the black line in the plot above). If you then want to fit a line through a certain data point, you can also check out this answer.
So how does it now look like when one uses a linear function with two free parameters and a function with variable z, one exponential term and a variable offset? That is shown in the following plot; again there is not much of a difference:
How do the residuals compare?
> fit
Nonlinear regression model
model: y ~ zfix * ((1 - exp(-k1 * x)))
data: parent.frame()
k1
0.0006775
residual sum-of-squares: 1.464e-05
> fit2
Nonlinear regression model
model: y ~ zfix * (1 - exp(-k1 * x)) + (1 - exp(-k2 * x))
data: parent.frame()
k1 k2
-0.0006767 0.0014014
residual sum-of-squares: 9.881e-06
> fit3
Nonlinear regression model
model: y ~ Z * (1 - exp(-k1 * x))
data: parent.frame()
Z k1
0.196195 0.003806
residual sum-of-squares: 9.59e-06
> fit4
Nonlinear regression model
model: y ~ a * x + b
data: parent.frame()
a b
0.0006176 0.0019234
residual sum-of-squares: 6.084e-06
> fit5
Nonlinear regression model
model: y ~ z * (1 - exp(-k1 * x)) + k2
data: parent.frame()
z k1 k2
0.395106 0.001685 0.001519
residual sum-of-squares: 5.143e-06
As one could guess, the fit with only one free parameter gives the worst while the one with three free parameters gives the best result; however, there is not much of a difference (in my opinion).
Here is the code I used:
x <- c(0, 4, 13, 30, 63, 92)
y <- c(0.00000000, 0.00508822, 0.01103990, 0.02115466, 0.04036655, 0.05865331)
zfix <- 0.98
plot(x,y)
# STEPS:
# 1 pool, z fixed. This works.
fit <- nls(y ~ zfix * ((1 - exp(-k1*x))), start=list(k1=0))
xr = data.frame(x = seq(min(x),max(x),len=200))
lines(xr$x,predict(fit,newdata=xr))
# 2 pool model, z fixed
fit2 <- nls(y ~ zfix * (1 - exp(-k1*x)) + (1 - exp(-k2*x)), start=list(k1=0, k2=0.5))
lines(xr$x,predict(fit2,newdata=xr), col='red')
# 3 z variable
fit3 <- nls(y ~ Z * (1 - exp(-k1*x)), start=list(Z=zfix, k1=0.2))
lines(xr$x,predict(fit3,newdata=xr), col='blue')
legend('topleft',c('fixed z, single exp', 'fixed z, two exp', 'variable z, single exp'),
lty=c(1,1,1),
lwd=c(2.5,2.5,2.5),
col=c('black', 'red','blue'))
#dev.new()
plot(x,y)
# 4 fit linear function a*x + b
fit4 <- nls(y ~ a *x + b, start=list(a=1, b=0.))
lines(xr$x,predict(fit4,newdata=xr), col='blue')
fit5 <- nls(y ~ z * (1 - exp(-k1*x)) + k2, start=list(z=zfix, k1=0.1, k2=0.5))
lines(xr$x,predict(fit5,newdata=xr), col='red')
legend('topleft',c('linear approach', 'variable z, single exp, offset'),
lty=c(1,1),
lwd=c(2.5,2.5),
col=c('blue', 'red'))

R: Dynamic linear regression with dynlm package, how to predict()?

I am trying to build a dynamic regression model and so far I did it with the dynlm package. Basically the model looks like this
y_t = a*x1_t + b*x2_t + ... + c*y_(t-1).
y_t shall be predicted, x1_t and x2_t will be given and so is y_(t-1).
Building the model with the dynlm package worked fine, but when it came to predict y_t I got confused...
I found this, which seems to be a very similar problem, but it did not help me to handle my own problem.
Here is the problem I am facing (basically what predict() does, seems to be weird. See comments!):
library(dynlm)
# Create Data
set.seed(1)
y <- arima.sim(model = list(ar = c(.9)), n = 11) #Create AR(1) dependant variable
A <- rnorm(11) #Create independent variables
B <- rnorm(11)
y <- y + .5 * A + .2 * B #Add relationship to independent variables
data = cbind(y, A, B)
# subset used for the fitting of the model
reg <- data[1:10, ]
# Fit dynamic linear model
model <- dynlm(y ~ A + B + L(y, k = 1), data = reg) # dynlm
model
# Time series regression with "zooreg" data:
# Start = 2, End = 11
#
# Call:
# dynlm(formula = y ~ A + B + L(y, k = 1), data = reg)
# Coefficients:
# (Intercept) A B L(y, k = 1)
# 0.8930 -0.2175 0.2892 0.5176
# subset last two rows.
# the last row (r11) for which y_t shall be predicted, where from the same time A and B are input for the prediction
# and the second last row (r10), so y_(t-1) can be input for the model as well
pred <- as.data.frame(data[10:11, ])
# prediction using predict()
predict(model, newdata = pred)
# 1 2
# 1.833134 1.483809
# manual calculation of prediction of y in r11 (how I thought it should be...), taking y_(t-1) as input
predicted_value <- model$coefficients[1] + model$coefficients[2] * pred[2, 2] + model$coefficients[3] * pred[2, 3] + model$coefficients[4] * pred[1, 1]
predicted_value
# (Intercept)
# 1.743334
# and then what gives the value from predict() above taking y_t into the model (which is the value that should be predicted and not y_(t-1))
predicted_value <- model$coefficients[1] + model$coefficients[2] * pred[2, 2] + model$coefficients[3] * pred[2, 3] + model$coefficients[4] * pred[2, 1]
predicted_value
# (Intercept)
# 1.483809
Of course I could just use my own prediction function, but the problem is that my real model will have way more variables (which can even vary as I use the the step function to optimize the model according to AIC) and that I is why I want to use the predict() function.
Any ideas, how to solve this?
Unfortunately, the dynlm package does not provide a predict() method. At the moment the package completely separates the data pre-processing (which knows about functions like d(), L(), trend(), season() etc.) and the model fitting (which itself is not aware of the functions). A predict() method has been on my wishlist but so far I did not get round to write one because the flexibility of the interface allows so many models where it is not quite straightforward what to do. In the meantime, I should probably add a method that throws a warning before the lm method is found by inheritance.

Resources