Failing to optimise negative binomial model using optim - r

I am trying to manually optimise a negative binomial regression model using the optim package in R trying to predict a count variable y using a matrix of factors X using the following code:
# generating some fake data
n <- 1000
X <- matrix(NA, ncol = 5, nrow = n)
X[,1] <- 1
X[,2] <- sample(size = n, x = c(0,1), replace = TRUE)
X[,3] <- sample(size = n, x = c(0,1), replace = TRUE)
X[,4] <- sample(size = n, x = c(0,1), replace = TRUE)
X[,5] <- sample(size = n, x = c(0,1), replace = TRUE)
beta0 <- 3
beta1 <- -2
beta2 <- -2
beta3 <- -4
beta4 <- -0.9
k <- 0.9
## draws from negative binomial distribution
mu <- exp(beta0 + beta1 * X[,2] + beta2 * X[,3] + beta3 * X[,4] + beta4 * X[,5])
theta <- mu + mu ^2 / k
# dependent variable
y <- rnegbin(n, mu = mu, theta = theta)
# function to be optimised
negbin_ll <- function(y, X, theta){
beta <- theta[1:ncol(X)]
alpha <- theta[ncol(X) + 1]
logll <- y * log(alpha) + y *( beta %*% t(X) ) - (y + (1 / alpha ) ) * log( 1 + alpha * exp(beta %*% t(X))) + lgamma(y + (1 / alpha)) - lgamma ( y + 1) - lgamma ( 1 / alpha)
logll <- sum( logll )
return(logll)
}
stval <- rep(0, ncol(X) + 1)
res <-
optim(
stval,
negbin_ll,
y = y,
X = X,
control = list(fnscale = -1),
hessian = TRUE,
method = "BFGS"
)
The code should produce point estimates from the optimisation process, but instead fails when executing the optim-function with the error in optim(stval, negbin_ll, y = y, X = X, control = list(fnscale = -1), : initial value in 'vmmin' is not finite.
I already tried to change log(gamma(...)) to lgamma(...) in the likelihood function and tried many other ways, but I fail to get estimates.
Changing the start values of optim also does not help.
Do you have any idea if there is any particularity to the likelihood function that leads to values being treated in any odd fashion?
Help would be much appreciated.

optim tries several points to get to the minimum, in your case it hits some non-positive values in the arguments inside the logs. One way is to discard the values that return any non-positive inside the problematic functions by returning a negative (in your case) large number, like -lenght(series)*10^6. Remade the log-likelihood function, like this it kinda works:
negbin_ll <- function(y, X, theta){
beta <- theta[1:ncol(X)]
alpha <- theta[ncol(X) + 1]
if(any(alpha<=0)) return(-length(y)*10^6)
if(any(1 + alpha * exp(beta %*% t(X))<=0)) return(-length(y)*10^6)
logll <- y * log(alpha) + y *( beta %*% t(X) ) - (y + (1 / alpha ) ) * log( 1 + alpha * exp(beta %*% t(X))) + lgamma(y + (1 / alpha)) - lgamma ( y + 1) - lgamma ( 1 / alpha)
logll <- sum( logll )
return(logll)
}

Related

How to estimate the Kalman Filter with 'KFAS' R package, with an AR(1) transition equation and covariates?

I am using 'KFAS' package from R to estimate a state-space model with the Kalman filter. My measurement and transition equations are:
y_t = b_0 + b_1xx_t + Z_t * x_t + \eps_t (measurement)
x_t = T_t * x_{t-1} + R_t * \eta_t (transition),
with \eps_t ~ N(0,H_t) and \eta_t ~ N(0,Q_t),
where xx_t are covariates. I have read this question and wrote the following code
library(KFAS)
set.seed(100)
xx <- rnorm(200)
beta0 <- 0.1
beta1 <- 0.1
eps <- rt(200, 4, 1)
y <- as.matrix(beta0 + beta1*xx + (arima.sim(n=200, list(ar=0.6), innov = rnorm(200)*sqrt(0.5)) + eps),
ncol=1)
Zt <- 1
Ht <- matrix(NA)
Tt <- matrix(NA)
Rt <- 1
Qt <- matrix(NA)
ss_model <- SSModel(y ~ xx + SSMcustom(Z = Zt, T = Tt, R = Rt,
Q = Qt), H = Ht)
updatefn <- function(pars, model) {
model$H[1] <- pars[1]
model$T[1] <- pars[2]
model$Q[1] <- pars[3]
model
}
fit <- fitSSM(ss_model, c(1, 0.5, 1), updatefn, method = "L-BFGS-B",
lower = c(0, -0.99, 0), upper = c(100, 0.99, 100))
I get the error
Error in is.SSModel(do.call(updatefn, args = c(list(inits, model), update_args)), :
System matrices (excluding Z) contain NA or infinite values, covariance matrices contain values larger than 1e+07
I have tried to change the initial vector to c(1, 0.5, 1, 1, 1) but it returns the same message. Does anyone know how can I do this?
Thanks!

Nonlinear regression in R error in step factor

I need aproximate datapoints by exponential function with some type of lower limit (variable y is price in time and I need fix minimal value, so asymptote of exponential function cant be at 0). For some "y" is my code function, but at others return error. How can I solve it? Thanks
R code:
y <- c(26973, 24907, 22999, 21236, 19609, 18107, 16720, 15439, 14256, 13163,
12155, 11224, 10364, 9570, 8836)
x <- c(1:15)
train <- data.frame(x, y)
colnames(train) <- c("x", "y")
# Select an approximate $\theta$, since theta must be lower than min(y), and greater than zero
theta.0 <- min(train$y) * 0.5 #min(data.df$y) * 0.5
# Estimate the rest parameters using a linear model
model.0 <- lm(log(price - theta.0) ~ age, data = train)
alpha.0 <- exp(coef(model.0)[1])
beta.0 <- coef(model.0)[2]
# Starting parameters
start <- list(alpha = alpha.0, beta = beta.0, theta = theta.0)
print(start)
model <- nls(y ~ alpha * exp(beta * x) + theta , data = train, start = start)
plot(train$x, train$y)
lines(train$x, predict(model, list(x = train$x)), col = 'skyblue', lwd = 3)
Output:
Error in nls(y ~ alpha * exp(beta * x) + theta, data = train, start = start) :
step factor 0.000488281 reduced below 'minFactor' of 0.000976562

Performing residual bootstrap using kernel regression in R

Kernel regression is a non-parametric technique that wants to estimate the conditional expectation of a random variable. It uses local averaging of the response value, Y, in order to find some non-linear relationship between X and Y.
I am have used bootstrap for kernel density estimation and now want to use it for kernel regression as well. I have been told to use residual bootstrapping for kernel regression and have read a couple of papers on this. I am however unsure how to perform this. Programming has been done in R using the FKSUM package. I have made an attempt to use standard resampling on kernel regression:
library(FKSUM)
set.seed(1)
n <- 5000
sample.size <- 500
B.replications <- 200
x <- rbeta(n, 2, 2) * 10
y <- 3 * sin(2 * x) + 10 * (x > 5) * (x - 5)
y <- y + rnorm(n) + (rgamma(n, 2, 2) - 1) * (abs(x - 5) + 3)
#taking x.y to be the population
x.y <- data.frame(x, y)
xs <- seq(min(x), max(x), length = 1000)
ftrue <- 3 * sin(2 * xs) + 10 * (xs > 5) * (xs - 5)
#Sample from the population
seqx<-seq(1,5000,by=1)
sample.ind <- sample(seqx, size = sample.size, replace = FALSE)
sample.reg<-x.y[sample.ind,]
x_s <- sample.reg$x
y_s <- sample.reg$y
fhat_loc_lin.pop <- fk_regression(x, y)
fhat_loc_lin.sample <- fk_regression(x = x_s, y = y_s)
plot(x, y, col = rgb(.7, .7, .7, .3), pch = 16, xlab = 'x',
ylab = 'x', main = 'Local linear estimator with amise bandwidth')
lines(xs, ftrue, col = 2, lwd = 3)
lines(fhat_loc_lin, lty = 2, lwd = 2)
#Bootstrap
n.B.sample = sample.size # sample bootstrap size
boot.reg.mat.X <- matrix(0,ncol=B.replications, nrow=n.B.sample)
boot.reg.mat.Y <- matrix(0,ncol=B.replications, nrow=n.B.sample)
fhat_loc_lin.boot <- matrix(0,ncol = B.replications, nrow=100)
Temp.reg.y <- matrix(0,ncol = B.replications,nrow = 1000)
for(i in 1:B.replications){
sequence.x.boot <- seq(from=1,to=n.B.sample,by=1)
sample.ind.boot <- sample(sequence.x.boot, size = sample.size, replace = TRUE)
boot.reg.mat <- sample.reg[sample.ind.boot,]
boot.reg.mat.X <- boot.reg.mat$x
boot.reg.mat.Y <- boot.reg.mat$y
fhat_loc_lin.boot <- fk_regression(x = boot.reg.mat.X ,
y = boot.reg.mat.Y,
h = fhat_loc_lin.sample$h)
lines(y=fhat_loc_lin.boot$y,x= fhat_loc_lin.sample$x, col =c(i) )
Temp.reg.y[,i] <- fhat_loc_lin.boot$y
}
quan.reg.l <- vector()
quan.reg.u <- vector()
for(i in 1:length(xs)){
quan.reg.l[i] <- quantile(x = Temp.reg.y[i,],probs = 0.025)
quan.reg.u[i] <- quantile(x = Temp.reg.y[i,],probs = 0.975)
}
# Lower Bound
Temp.reg.2 <- quan.reg.l
lines(y=Temp.reg.2,x=fhat_loc_lin.boot$x ,col="red",lwd=4,lty=1)
# Upper Bound
Temp.reg.3 <- quan.reg.u
lines(y=Temp.reg.3,x=fhat_loc_lin.boot$x ,col="navy",lwd=4,lty=1)
Asking the question on here now since I haven't received any response on CV. Any help would be greatly appreciated!

Fiting 1 - exp(x) giving higher weight to the first values

I want to fit to a 1 - exp(x) function to a data set , but giving higher weight to the first values. However, the following code is not working in such way:
x <-sqrt((0.05)^2+(0.05)^2)*seq(from = 1, to = 20, by = 1)
y <- c(11,20,27,32,35,36,36.5,25,16,9,4,1,7.87e-16,2.07e-15,-9.36e-16,1.61e-15,-3.81e-16,3.92e-16,7.65e-16,-8.26e-16)
temp <- data.frame(cbind(x,y))
we <- 1/(log1p(seq_along(x)))
# fit non-linear model
mod <- nls(y ~ (1 - exp(a + b * x)), data = temp, start = list(a = 0, b = 0), weights = we)
#add fitted curve
lines(temp$x, predict(mod, list(x = temp$x)))
Here is the output:
Your specification of weights is correct. The bad fit you obtained is due to your faulty model assumption. You assumed:
y ~ 1 - exp(a + b * x)
Note that exp() gives strictly positive values, so y will be no larger than 1. However, y values in your data range up to 35.
My idea is not perfect, but it might give you a better starting point. Consider:
y ~ a * x * exp(b * x * x + c * x)
Using your data:
x <- c(0, sqrt((0.05)^2+(0.05)^2)*seq(from = 1, to = 20, by = 1))
y <- c(0, 11,20,27,32,35,36,36.5,25,16,9,4,1,7.87e-16,2.07e-15,-9.36e-16,1.61e-15,-3.81e-16,3.92e-16,7.65e-16,-8.26e-16)
fit <- nls(y ~ a * x * exp(b * x * x + c * x), start = list(a = 30, b= -1, c = -1))
plot(x, y)
lines(x, predict(fit, list(x)))

Reproduce Fisher linear discriminant figure

Many books illustrate the idea of Fisher linear discriminant analysis using the following figure (this particular is from Pattern Recognition and Machine Learning, p. 188)
I wonder how to reproduce this figure in R (or in any other language). Pasted below is my initial effort in R. I simulate two groups of data and draw linear discriminant using abline() function. Any suggestions are welcome.
set.seed(2014)
library(MASS)
library(DiscriMiner) # For scatter matrices
# Simulate bivariate normal distribution with 2 classes
mu1 <- c(2, -4)
mu2 <- c(2, 6)
rho <- 0.8
s1 <- 1
s2 <- 3
Sigma <- matrix(c(s1^2, rho * s1 * s2, rho * s1 * s2, s2^2), byrow = TRUE, nrow = 2)
n <- 50
X1 <- mvrnorm(n, mu = mu1, Sigma = Sigma)
X2 <- mvrnorm(n, mu = mu2, Sigma = Sigma)
y <- rep(c(0, 1), each = n)
X <- rbind(x1 = X1, x2 = X2)
X <- scale(X)
# Scatter matrices
B <- betweenCov(variables = X, group = y)
W <- withinCov(variables = X, group = y)
# Eigenvectors
ev <- eigen(solve(W) %*% B)$vectors
slope <- - ev[1,1] / ev[2,1]
intercept <- ev[2,1]
par(pty = "s")
plot(X, col = y + 1, pch = 16)
abline(a = slope, b = intercept, lwd = 2, lty = 2)
MY (UNFINISHED) WORK
I pasted my current solution below. The main question is how to rotate (and move) the density plot according to decision boundary. Any suggestions are still welcome.
require(ggplot2)
library(grid)
library(MASS)
# Simulation parameters
mu1 <- c(5, -9)
mu2 <- c(4, 9)
rho <- 0.5
s1 <- 1
s2 <- 3
Sigma <- matrix(c(s1^2, rho * s1 * s2, rho * s1 * s2, s2^2), byrow = TRUE, nrow = 2)
n <- 50
# Multivariate normal sampling
X1 <- mvrnorm(n, mu = mu1, Sigma = Sigma)
X2 <- mvrnorm(n, mu = mu2, Sigma = Sigma)
# Combine into data frame
y <- rep(c(0, 1), each = n)
X <- rbind(x1 = X1, x2 = X2)
X <- scale(X)
X <- data.frame(X, class = y)
# Apply lda()
m1 <- lda(class ~ X1 + X2, data = X)
m1.pred <- predict(m1)
# Compute intercept and slope for abline
gmean <- m1$prior %*% m1$means
const <- as.numeric(gmean %*% m1$scaling)
z <- as.matrix(X[, 1:2]) %*% m1$scaling - const
slope <- - m1$scaling[1] / m1$scaling[2]
intercept <- const / m1$scaling[2]
# Projected values
LD <- data.frame(predict(m1)$x, class = y)
# Scatterplot
p1 <- ggplot(X, aes(X1, X2, color=as.factor(class))) +
geom_point() +
theme_bw() +
theme(legend.position = "none") +
scale_x_continuous(limits=c(-5, 5)) +
scale_y_continuous(limits=c(-5, 5)) +
geom_abline(intecept = intercept, slope = slope)
# Density plot
p2 <- ggplot(LD, aes(x = LD1)) +
geom_density(aes(fill = as.factor(class), y = ..scaled..)) +
theme_bw() +
theme(legend.position = "none")
grid.newpage()
print(p1)
vp <- viewport(width = .7, height = 0.6, x = 0.5, y = 0.3, just = c("centre"))
pushViewport(vp)
print(p2, vp = vp)
Basically you need to project the data along the direction of the classifier, plot a histogram for each class, and then rotate the histogram so its x axis is parallel to the classifier. Some trial-and-error with scaling the histogram is needed in order to get a nice result. Here's an example of how to do it in Matlab, for the naive classifier (difference of class' means). For the Fisher classifier it is of course similar, you just use a different classifier w. I changed the parameters from your code so the plot is more similar to the one you gave.
rng('default')
n = 1000;
mu1 = [1,3]';
mu2 = [4,1]';
rho = 0.3;
s1 = .8;
s2 = .5;
Sigma = [s1^2,rho*s1*s1;rho*s1*s1, s2^2];
X1 = mvnrnd(mu1,Sigma,n);
X2 = mvnrnd(mu2,Sigma,n);
X = [X1; X2];
Y = [zeros(n,1);ones(n,1)];
scatter(X1(:,1), X1(:,2), [], 'b' );
hold on
scatter(X2(:,1), X2(:,2), [], 'r' );
axis equal
m1 = mean(X(1:n,:))';
m2 = mean(X(n+1:end,:))';
plot(m1(1),m1(2),'bx','markersize',18)
plot(m2(1),m2(2),'rx','markersize',18)
plot([m1(1),m2(1)], [m1(2),m2(2)],'g')
%% classifier taking only means into account
w = m2 - m1;
w = w / norm(w);
% project data onto w
X1_projected = X1 * w;
X2_projected = X2 * w;
% plot histogram and rotate it
angle = 180/pi * atan(w(2)/w(1));
[hy1, hx1] = hist(X1_projected);
[hy2, hx2] = hist(X2_projected);
hy1 = hy1 / sum(hy1); % normalize
hy2 = hy2 / sum(hy2); % normalize
scale = 4; % set manually
h1 = bar(hx1, scale*hy1,'b');
h2 = bar(hx2, scale*hy2,'r');
set([h1, h2],'ShowBaseLine','off')
% rotate around the origin
rotate(get(h1,'children'),[0,0,1], angle, [0,0,0])
rotate(get(h2,'children'),[0,0,1], angle, [0,0,0])

Resources