How to estimate the Kalman Filter with 'KFAS' R package, with an AR(1) transition equation and covariates? - r

I am using 'KFAS' package from R to estimate a state-space model with the Kalman filter. My measurement and transition equations are:
y_t = b_0 + b_1xx_t + Z_t * x_t + \eps_t (measurement)
x_t = T_t * x_{t-1} + R_t * \eta_t (transition),
with \eps_t ~ N(0,H_t) and \eta_t ~ N(0,Q_t),
where xx_t are covariates. I have read this question and wrote the following code
library(KFAS)
set.seed(100)
xx <- rnorm(200)
beta0 <- 0.1
beta1 <- 0.1
eps <- rt(200, 4, 1)
y <- as.matrix(beta0 + beta1*xx + (arima.sim(n=200, list(ar=0.6), innov = rnorm(200)*sqrt(0.5)) + eps),
ncol=1)
Zt <- 1
Ht <- matrix(NA)
Tt <- matrix(NA)
Rt <- 1
Qt <- matrix(NA)
ss_model <- SSModel(y ~ xx + SSMcustom(Z = Zt, T = Tt, R = Rt,
Q = Qt), H = Ht)
updatefn <- function(pars, model) {
model$H[1] <- pars[1]
model$T[1] <- pars[2]
model$Q[1] <- pars[3]
model
}
fit <- fitSSM(ss_model, c(1, 0.5, 1), updatefn, method = "L-BFGS-B",
lower = c(0, -0.99, 0), upper = c(100, 0.99, 100))
I get the error
Error in is.SSModel(do.call(updatefn, args = c(list(inits, model), update_args)), :
System matrices (excluding Z) contain NA or infinite values, covariance matrices contain values larger than 1e+07
I have tried to change the initial vector to c(1, 0.5, 1, 1, 1) but it returns the same message. Does anyone know how can I do this?
Thanks!

Related

Nonlinear regression in R error in step factor

I need aproximate datapoints by exponential function with some type of lower limit (variable y is price in time and I need fix minimal value, so asymptote of exponential function cant be at 0). For some "y" is my code function, but at others return error. How can I solve it? Thanks
R code:
y <- c(26973, 24907, 22999, 21236, 19609, 18107, 16720, 15439, 14256, 13163,
12155, 11224, 10364, 9570, 8836)
x <- c(1:15)
train <- data.frame(x, y)
colnames(train) <- c("x", "y")
# Select an approximate $\theta$, since theta must be lower than min(y), and greater than zero
theta.0 <- min(train$y) * 0.5 #min(data.df$y) * 0.5
# Estimate the rest parameters using a linear model
model.0 <- lm(log(price - theta.0) ~ age, data = train)
alpha.0 <- exp(coef(model.0)[1])
beta.0 <- coef(model.0)[2]
# Starting parameters
start <- list(alpha = alpha.0, beta = beta.0, theta = theta.0)
print(start)
model <- nls(y ~ alpha * exp(beta * x) + theta , data = train, start = start)
plot(train$x, train$y)
lines(train$x, predict(model, list(x = train$x)), col = 'skyblue', lwd = 3)
Output:
Error in nls(y ~ alpha * exp(beta * x) + theta, data = train, start = start) :
step factor 0.000488281 reduced below 'minFactor' of 0.000976562

Why does MLE for mixture model not equal to flexmix?

I want to write a mle for finite mixture model in R,but coefficients estimated by model are not same as coefficients estimated by package flexmix. I wonder if you can point out my mistakes.
my code is as following:
#prepare data
slope1 <- -.3;slope2 <- .3;slope3 <- 1.8; slope4 <- 0.5;intercept1 <- 1.5
age <- sample(seq(18,60,len=401), 200)
grade <- sample(seq(0,100,len=401), 200)
not_smsa <- sample(seq(-2,2,len=401), 200)
unemployment <- rnorm(200,mean=0,sd=1)
wage <- intercept1 + slope1*age +slope2*grade + slope3*not_smsa + rnorm(length(age),0,.15)
y <- wage
X <- cbind(1, age , grade , not_smsa)
mydata <- cbind.data.frame(X,y)
anso <- lm(wage ~ age + grade + not_smsa,
data = mydata)
vi <- c(coef(anso),0.01,0.02,0.03,0.04,0.1)
#function
fmm <- function(beta) {
mu1 <- c(X %*% beta[1:4])
mu2 <- c(X %*% beta[5:8])
p1 <- 1 / (1 + exp(-beta[9]))
p2 <- 1-p1
llk <- p1*dnorm(y,mu1)+p2*dnorm(y,mu2)
-sum(log(llk),na.rm=T)
}
fit <- optim(vi,fmm , method = "BFGS", control = list(maxit=50000), hessian = TRUE)
fit$par
library(flexmix)
flexfit <- flexmix(wage ~ age + grade + not_smsa, data = mydata, k = 2)
flexfit$par
c1 <- parameters(flexfit,component=1)
c2 <- parameters(flexfit, component=2)
Are there any mistakes esisted in my code?
I have solved mistakes esisted in my code,parameters of main function should be added some constraints.
fmm <- function(pars) {
beta1 = pars[1:4]
sigma1 = log(1 + exp(pars[4]))
beta2 = pars[6:10]
sigma2 = log(1 + exp(pars[11]))
p1 = 1 / (1 + exp(-pars[12]))
mu1 <- c(X %*% beta1)
mu2 <- c(X %*% beta2)
p2 <- 1-p1
llk <- p1*dnorm(y,mu1,sigma1)+p2*dnorm(y,mu2,sigma2)
-sum(log(llk),na.rm=T)
}

Log-likelihood calculation given estimated parameters

In general: I want to calculate the (log) likelihood of data N given the estimated model parameters from data O.
More specifically, I want to know if my ll_given_modPars function below exists in one of the may R packages dealing with data modeling (lme4, glmm, etc.) as shown in this abstract example (not run):
library(lme4)
o_model <- lmer(observed ~ fixed.id + (1|random.id), data = O, REML = F)
n_logLik <- ll_given_modPars(model.estimates = o_model, data = N)
The fictional example above is on a linear mixed model for simplicity but I would like to eventually do this in a generalized linear mixed model which deals with the Poisson family or directly the negative binomial (for lme4: glmer(..., family="poisson") or glmer.nb ).
From what I could see most packages deal with parameter estimation (great, I need that) but then compare models on the same data with different combinations of fixed and random effects using anova or something to that extent which is not what I want to do.
I want the log likelihood for the same parameters on different data.
The main attempts made:
After not finding a function which seems to be doing that I thought of 'simply' tweaking the lme4 code to my purposes: it calculates the log likelihood for parameters given the data so I thought I could use the same framework but not have it optimize over different parameters but isolate the likelihood calculation function and just give it the parameters and the data. Unfortunately the code is a bit above my current skills https://github.com/lme4/lme4/blob/master/R/nbinom.R (I get a bit lost in how they use the objects over which they optimize).
I thought of doing the likelihood calculation myself, starting with a linear mixed model and then working my way up to more involved ones. But already with this example I'm having a hard time following the math and even when using the formula as specified the obtained log-likelihood is still different (I don't know why, see code in appendix) and I fear it will take me too long before I'll be able to do it for the more involved models (such as Poisson or negative binomial)
At this point I'm not sure what avenue is best to pursue and would appreciate any input you might have.
Appendix: Trying to calculate the log-likelihood (or finding a closed form approximation) based on How does lmer (from the R package lme4) compute log likelihood?. lmer (from lme4) gives a log-likelihood of -17.8 and I get -45.56
library(lme4)
set.seed(7)
n <- 2 # number of groups
m <- 4 # number of instances per group
fixed.effect <- c(0, -2, -1, 1)
tau <- 5 # standard deviation of random effects
sigma <- 2 # standard deviation of error
random.effect <- rnorm(n, mean=0, sd=tau)
sim.data <- data.frame(GROUP.ID=as.factor(rep(1:n, each=m)),
GROUP.EFFECT=rep(random.effect, each=m),
INSTANCE.ID=as.factor(rep(1:m, times=n)),
INSTANCE.EFFECT=rep(fixed.effect, times=n))
# calculate expected Y value
sim.data$EXPECT.Y <- sim.data$GROUP.EFFECT + sim.data$INSTANCE.EFFECT
# now observe Y value, assuming normally distributed with fixed std. deviation
sim.data$OBS.Y <- rnorm(nrow(sim.data), mean=sim.data$EXPECT.Y, sigma)
model <- lmer(OBS.Y ~ INSTANCE.ID + (1|GROUP.ID), data = sim.data, REML=F)
summary(model)
toy.model.var <- VarCorr(model)
toy.model.sigma <- attr(toy.model.var, 'sc') # corresponds to the epsilon, residual standard deviation
toy.model.tau.squared <- toy.model.var[[1]][1] # corresponds to variance of random effects
toy.model.betas <- model#beta
# left product, spread within gropus
toy.data <- rbind(sim.data$OBS.Y[1:4], sim.data$OBS.Y[5:8])
toy.mean.adj <- rbind(toy.data[1,] - mean(unlist(toy.data[1,])), toy.data[2,] - mean(unlist(toy.data[2,])))
toy.mean.adj.prod1 <- prod(dnorm(unlist(toy.mean.adj[1,]), mean = 0, sd = toy.model.sigma))
toy.mean.adj.prod2 <- prod(dnorm(unlist(toy.mean.adj[2,]), mean = 0, sd = toy.model.sigma))
toy.mean.adj.final.prod <- toy.mean.adj.prod1 * toy.mean.adj.prod2
# right product, spread between gropus
toy.mean.beta.adj <- rbind(mean(unlist(toy.data[1,])) - toy.model.betas, mean(unlist(toy.data[2,])) - toy.model.betas)
toy.mean.beta.adj[1,] <- toy.mean.beta.adj[1,] - c(0, toy.model.betas[1], toy.model.betas[1], toy.model.betas[1])
toy.mean.beta.adj[2,] <- toy.mean.beta.adj[2,] - c(0, toy.model.betas[1], toy.model.betas[1], toy.model.betas[1])
toy.mean.beta.adj.prod1 <- prod(dnorm(unlist(toy.mean.beta.adj[1,]), mean = 0, sd = sqrt(toy.model.sigma^2/4 + toy.model.tau.squared)) * sqrt(2/4*pi*toy.model.sigma^2))
toy.mean.beta.adj.prod2 <- prod(dnorm(unlist(toy.mean.beta.adj[2,]), mean = 0, sd = sqrt(toy.model.sigma^2/4 + toy.model.tau.squared)) * sqrt(2/4*pi*toy.model.sigma^2))
toy.mean.beta.adj.final.prod <- toy.mean.beta.adj.prod1 * toy.mean.beta.adj.prod2
toy.total.prod <- toy.mean.adj.final.prod * toy.mean.beta.adj.final.prod
log(toy.total.prod)
EDIT: A helpful link was provided in the comments (https://stats.stackexchange.com/questions/271903/understand-marginal-likelihood-of-mixed-effects-models). Converting my example from above I can replicate the log-likelihood
library(mvtnorm)
z = getME(model, "Z")
zt = getME(model, "Zt")
psi = bdiag(replicate(2, toy.model.tau.squared, simplify=FALSE))
betw = z%*%psi%*%zt
err = Diagonal(8, sigma(model)^2)
v = betw + err
dmvnorm(sim.data$OBS.Y, predict(model, re.form=NA), as.matrix(v), log=TRUE)
While I did not manage to come up with a closed form solution for all of them, I did manage to reproduce the log-likelihoods using numerical integration. I have posted below small examples for how it works in the LMM setting (assuming normal residuals random effects) as well as the GLMM with Poisson and Negative-Binomial. Note that especially the latter one tends so differ ever so slightly when you increase the sample size. My guess is that there is some rounding happening somewhere but for my purposes the precision achieved here is good enough. I will for now accept my own answer but if someone posts a closed form for the Poisson or the Negative-Binomial I will happily accept your answer :)
library(lme4)
library(mvtnorm)
################################################################################
# LMM numerical integration
set.seed(7)
n <- 2 # number of groups
m <- 4 # number of instances per group
fixed.effect <- c(0, -2, -1, 1)
tau <- 5 # standard deviation of random effects
sigma <- 2 # standard deviation of error
random.effect <- rnorm(n, mean=0, sd=tau)
normal.data <- data.frame(GROUP.ID=as.factor(rep(1:n, each=m)),
GROUP.EFFECT=rep(random.effect, each=m),
INSTANCE.ID=as.factor(rep(1:m, times=n)),
INSTANCE.EFFECT=rep(fixed.effect, times=n))
# calculate expected Y value
normal.data$EXPECT.Y <- normal.data$GROUP.EFFECT + normal.data$INSTANCE.EFFECT
# now observe Y value, assuming normally distributed with fixed std. deviation
normal.data$OBS.Y <- rnorm(nrow(normal.data), mean=normal.data$EXPECT.Y, sigma)
normal.model <- lmer(OBS.Y ~ INSTANCE.ID + (1|GROUP.ID), data = normal.data, REML=F)
summary(normal.model)
normal.model.var <- VarCorr(normal.model)
normal.model.sigma <- attr(normal.model.var, 'sc') # corresponds to the epsilon, residual standard deviation
normal.model.tau.squared <- normal.model.var[[1]][1] # corresponds to variance of random effects
normal.model.betas <- normal.model#beta
normal.group.tau <- sqrt(normal.model.tau.squared)
normal.group.sigma <- sigma(normal.model)
normal.group.beta <- predict(normal.model, re.form=NA)[1:4]
integrate_group1 <- function(x){
p1 <- dnorm(normal.data$OBS.Y[1] - normal.group.beta[1] - x, mean = 0, sd = normal.group.sigma) * dnorm(x, mean = 0, sd = normal.group.tau)
p2 <- dnorm(normal.data$OBS.Y[2] - normal.group.beta[2] - x, mean = 0, sd = normal.group.sigma)
p3 <- dnorm(normal.data$OBS.Y[3] - normal.group.beta[3] - x, mean = 0, sd = normal.group.sigma)
p4 <- dnorm(normal.data$OBS.Y[4] - normal.group.beta[4] - x, mean = 0, sd = normal.group.sigma)
p_out <- p1 * p2 * p3 * p4
p_out
}
normal.group1.integration <- integrate(integrate_group1, lower = -10*normal.group.tau, upper = 10*normal.group.tau, subdivisions = 10000L, rel.tol = 1e-10, abs.tol = 1e-50)$value[1]
integrate_group2 <- function(x){
p1 <- dnorm(normal.data$OBS.Y[5] - normal.group.beta[1] - x, mean = 0, sd = normal.group.sigma) * dnorm(x, mean = 0, sd = normal.group.tau)
p2 <- dnorm(normal.data$OBS.Y[6] - normal.group.beta[2] - x, mean = 0, sd = normal.group.sigma)
p3 <- dnorm(normal.data$OBS.Y[7] - normal.group.beta[3] - x, mean = 0, sd = normal.group.sigma)
p4 <- dnorm(normal.data$OBS.Y[8] - normal.group.beta[4] - x, mean = 0, sd = normal.group.sigma)
p_out <- p1 * p2 * p3 * p4
p_out
}
normal.group2.integration <- integrate(integrate_group2, lower = -10*normal.group.tau, upper = 10*normal.group.tau, subdivisions = 10000L, rel.tol = 1e-10, abs.tol = 1e-50)$value[1]
log(normal.group1.integration) + log(normal.group2.integration)
#################################
# Poisson numerical integration
set.seed(13) #13
n <- 2 # number of groups
m <- 4 # number of instances per group
# effect sizes are much smaller since they are exponentiated
fixed.effect <- c(0, -0.2, -0.1, 0.2)
tau <- 1.5 # standard deviation of random effects
# sigma <- 1.5 # standard deviation of error
random.effect <- rnorm(n, mean=0, sd=tau) # guide effect
poisson.data <- data.frame(GROUP.ID=as.factor(rep(1:n, each=m)),
GROUP.EFFECT=rep(random.effect, each=m),
INSTANCE.ID=as.factor(rep(1:m, times=n)),
INSTANCE.EFFECT=rep(fixed.effect, times=n))
# calculate expected Y value
poisson.data$EXPECT.Y <- exp(poisson.data$GROUP.EFFECT + poisson.data$INSTANCE.EFFECT)
# now observe Y value, assuming normally distributed with fixed std. deviation
poisson.data$OBS.Y <- rpois(nrow(poisson.data), poisson.data$EXPECT.Y)
poisson.model <- glmer(OBS.Y ~ INSTANCE.ID + (1|GROUP.ID), data = poisson.data, family="poisson")
summary(poisson.model)
poisson.model.var <- VarCorr(poisson.model)
poisson.model.sigma <- attr(poisson.model.var, 'sc') # corresponds to the epsilon, residual standard deviation
poisson.model.tau.squared <- poisson.model.var[[1]][1] # corresponds to variance of random effects
poisson.model.betas <- poisson.model#beta
poisson.group.tau <- sqrt(poisson.model.tau.squared)
poisson.group.sigma <- sigma(poisson.model)
poisson.group.beta <- predict(poisson.model, re.form=NA)[1:4]
integrate_group1 <- function(x){
p1 <- dpois(poisson.data$OBS.Y[1], lambda = exp(poisson.group.beta[1] + x)) * dnorm(x, mean = 0, sd = poisson.group.tau)
p2 <- dpois(poisson.data$OBS.Y[2], lambda = exp(poisson.group.beta[2] + x))
p3 <- dpois(poisson.data$OBS.Y[3], lambda = exp(poisson.group.beta[3] + x))
p4 <- dpois(poisson.data$OBS.Y[4], lambda = exp(poisson.group.beta[4] + x))
p_out <- p1 * p2 * p3 * p4
p_out
}
poisson.group1.integration <- integrate(integrate_group1, lower = -10*poisson.group.tau, upper = 10*poisson.group.tau, subdivisions = 10000L, rel.tol = 1e-10, abs.tol = 1e-50)$value[1]
integrate_group2 <- function(x){
p1 <- dpois(poisson.data$OBS.Y[5], lambda = exp(poisson.group.beta[1] + x)) * dnorm(x, mean = 0, sd = poisson.group.tau)
p2 <- dpois(poisson.data$OBS.Y[6], lambda = exp(poisson.group.beta[2] + x))
p3 <- dpois(poisson.data$OBS.Y[7], lambda = exp(poisson.group.beta[3] + x))
p4 <- dpois(poisson.data$OBS.Y[8], lambda = exp(poisson.group.beta[4] + x))
p_out <- p1 * p2 * p3 * p4
p_out
}
poisson.group2.integration <- integrate(integrate_group2, lower = -10*poisson.group.tau, upper = 10*poisson.group.tau, subdivisions = 10000L, rel.tol = 1e-10, abs.tol = 1e-50)$value[1]
log(poisson.group1.integration) + log(poisson.group2.integration)
#############
# Negative-Binomial numerical integration
set.seed(13) #13
n <- 100 # number of groups
m <- 4 # number of instances per group
# effect sizes are much smaller since they are exponentiated
fixed.effect <- c(0, -0.2, -0.1, 0.2)
tau <- 1.5 # standard deviation of random effects
theta <- 0.5
# sigma <- 1.5 # standard deviation of error
random.effect <- rnorm(n, mean=0, sd=tau) # guide effect
nb.data <- data.frame(GROUP.ID=as.factor(rep(1:n, each=m)),
GROUP.EFFECT=rep(random.effect, each=m),
INSTANCE.ID=as.factor(rep(1:m, times=n)),
INSTANCE.EFFECT=rep(fixed.effect, times=n))
# calculate expected Y value
nb.data$EXPECT.Y <- exp(nb.data$GROUP.EFFECT + nb.data$INSTANCE.EFFECT)
# now observe Y value, assuming normally distributed with fixed std. deviation
nb.data$OBS.Y <- rnbinom(nrow(nb.data), mu = nb.data$EXPECT.Y, size = theta)
nb.model <- glmer.nb(OBS.Y ~ INSTANCE.ID + (1|GROUP.ID), data = nb.data)
summary(nb.model)
nb.model.var <- VarCorr(nb.model)
nb.model.sigma <- attr(nb.model.var, 'sc') # corresponds to the epsilon, residual standard deviation
nb.model.tau.squared <- nb.model.var[[1]][1] # corresponds to variance of random effects
nb.model.betas <- nb.model#beta
nb.group.tau <- sqrt(nb.model.tau.squared)
nb.group.beta <- predict(nb.model, re.form=NA)[1:4]
nb.group.dispersion <- getME(nb.model, "glmer.nb.theta")
integration_function_generator <- function(input.obs, input.beta, input.dispersion, input.tau){
function(x){
p1 <- dnbinom(input.obs[1], mu = exp(input.beta[1] + x), size = input.dispersion) * dnorm(x, mean = 0, sd = input.tau)
p2 <- dnbinom(input.obs[2], mu = exp(input.beta[2] + x), size = input.dispersion)
p3 <- dnbinom(input.obs[3], mu = exp(input.beta[3] + x), size = input.dispersion)
p4 <- dnbinom(input.obs[4], mu = exp(input.beta[4] + x), size = input.dispersion)
p_out <- p1 * p2 * p3 * p4
p_out
}
}
nb.all.group.integrations <- c()
for(i in 1:n){
temp.obs <- nb.data$OBS.Y[(1:4)+(i-1)*4]
temp_integrate_function <- integration_function_generator(temp.obs, nb.group.beta, nb.group.dispersion, nb.group.tau)
temp.integration <- integrate(temp_integrate_function, lower = -10*nb.group.tau, upper = 10*nb.group.tau, subdivisions = 10000L, rel.tol = 1e-10, abs.tol = 1e-50)$value[1]
nb.all.group.integrations <- c(nb.all.group.integrations, temp.integration)
}
sum(log(nb.all.group.integrations))

Reproduce Fisher linear discriminant figure

Many books illustrate the idea of Fisher linear discriminant analysis using the following figure (this particular is from Pattern Recognition and Machine Learning, p. 188)
I wonder how to reproduce this figure in R (or in any other language). Pasted below is my initial effort in R. I simulate two groups of data and draw linear discriminant using abline() function. Any suggestions are welcome.
set.seed(2014)
library(MASS)
library(DiscriMiner) # For scatter matrices
# Simulate bivariate normal distribution with 2 classes
mu1 <- c(2, -4)
mu2 <- c(2, 6)
rho <- 0.8
s1 <- 1
s2 <- 3
Sigma <- matrix(c(s1^2, rho * s1 * s2, rho * s1 * s2, s2^2), byrow = TRUE, nrow = 2)
n <- 50
X1 <- mvrnorm(n, mu = mu1, Sigma = Sigma)
X2 <- mvrnorm(n, mu = mu2, Sigma = Sigma)
y <- rep(c(0, 1), each = n)
X <- rbind(x1 = X1, x2 = X2)
X <- scale(X)
# Scatter matrices
B <- betweenCov(variables = X, group = y)
W <- withinCov(variables = X, group = y)
# Eigenvectors
ev <- eigen(solve(W) %*% B)$vectors
slope <- - ev[1,1] / ev[2,1]
intercept <- ev[2,1]
par(pty = "s")
plot(X, col = y + 1, pch = 16)
abline(a = slope, b = intercept, lwd = 2, lty = 2)
MY (UNFINISHED) WORK
I pasted my current solution below. The main question is how to rotate (and move) the density plot according to decision boundary. Any suggestions are still welcome.
require(ggplot2)
library(grid)
library(MASS)
# Simulation parameters
mu1 <- c(5, -9)
mu2 <- c(4, 9)
rho <- 0.5
s1 <- 1
s2 <- 3
Sigma <- matrix(c(s1^2, rho * s1 * s2, rho * s1 * s2, s2^2), byrow = TRUE, nrow = 2)
n <- 50
# Multivariate normal sampling
X1 <- mvrnorm(n, mu = mu1, Sigma = Sigma)
X2 <- mvrnorm(n, mu = mu2, Sigma = Sigma)
# Combine into data frame
y <- rep(c(0, 1), each = n)
X <- rbind(x1 = X1, x2 = X2)
X <- scale(X)
X <- data.frame(X, class = y)
# Apply lda()
m1 <- lda(class ~ X1 + X2, data = X)
m1.pred <- predict(m1)
# Compute intercept and slope for abline
gmean <- m1$prior %*% m1$means
const <- as.numeric(gmean %*% m1$scaling)
z <- as.matrix(X[, 1:2]) %*% m1$scaling - const
slope <- - m1$scaling[1] / m1$scaling[2]
intercept <- const / m1$scaling[2]
# Projected values
LD <- data.frame(predict(m1)$x, class = y)
# Scatterplot
p1 <- ggplot(X, aes(X1, X2, color=as.factor(class))) +
geom_point() +
theme_bw() +
theme(legend.position = "none") +
scale_x_continuous(limits=c(-5, 5)) +
scale_y_continuous(limits=c(-5, 5)) +
geom_abline(intecept = intercept, slope = slope)
# Density plot
p2 <- ggplot(LD, aes(x = LD1)) +
geom_density(aes(fill = as.factor(class), y = ..scaled..)) +
theme_bw() +
theme(legend.position = "none")
grid.newpage()
print(p1)
vp <- viewport(width = .7, height = 0.6, x = 0.5, y = 0.3, just = c("centre"))
pushViewport(vp)
print(p2, vp = vp)
Basically you need to project the data along the direction of the classifier, plot a histogram for each class, and then rotate the histogram so its x axis is parallel to the classifier. Some trial-and-error with scaling the histogram is needed in order to get a nice result. Here's an example of how to do it in Matlab, for the naive classifier (difference of class' means). For the Fisher classifier it is of course similar, you just use a different classifier w. I changed the parameters from your code so the plot is more similar to the one you gave.
rng('default')
n = 1000;
mu1 = [1,3]';
mu2 = [4,1]';
rho = 0.3;
s1 = .8;
s2 = .5;
Sigma = [s1^2,rho*s1*s1;rho*s1*s1, s2^2];
X1 = mvnrnd(mu1,Sigma,n);
X2 = mvnrnd(mu2,Sigma,n);
X = [X1; X2];
Y = [zeros(n,1);ones(n,1)];
scatter(X1(:,1), X1(:,2), [], 'b' );
hold on
scatter(X2(:,1), X2(:,2), [], 'r' );
axis equal
m1 = mean(X(1:n,:))';
m2 = mean(X(n+1:end,:))';
plot(m1(1),m1(2),'bx','markersize',18)
plot(m2(1),m2(2),'rx','markersize',18)
plot([m1(1),m2(1)], [m1(2),m2(2)],'g')
%% classifier taking only means into account
w = m2 - m1;
w = w / norm(w);
% project data onto w
X1_projected = X1 * w;
X2_projected = X2 * w;
% plot histogram and rotate it
angle = 180/pi * atan(w(2)/w(1));
[hy1, hx1] = hist(X1_projected);
[hy2, hx2] = hist(X2_projected);
hy1 = hy1 / sum(hy1); % normalize
hy2 = hy2 / sum(hy2); % normalize
scale = 4; % set manually
h1 = bar(hx1, scale*hy1,'b');
h2 = bar(hx2, scale*hy2,'r');
set([h1, h2],'ShowBaseLine','off')
% rotate around the origin
rotate(get(h1,'children'),[0,0,1], angle, [0,0,0])
rotate(get(h2,'children'),[0,0,1], angle, [0,0,0])

R Remove intercepts in logistic regression

I am using the rms library to perform regularized logistic regression, and wish to force the intercept to zero. I'm using the following to simulate and regress:
library(rms)
N = 100
pred <- vapply(1:12, function(i) rnorm(N, mean = 0, sd =1), numeric(N))
resp <- 20*pred[, 1] - 3*pred[, 7] - 2*pred[, 8] + matrix(rnorm(N, sd = 0.1)) + 20
pr <- 1 / (1 + exp(-resp))
y <- rbinom(N, 1, pr)
lrm(y ~ pred, penalty = 1)
The post at How to remove intercept in R suggests including '0 +' or '- 1' in the model formula. However, this does not appear to work for lrm.
You can use glmnet. It also includes a cross validation function for choosing the turning parameter.
library(glmnet)
N = 1000
pred <- vapply(1:12, function(i) rnorm(N, mean = 0, sd =1), numeric(N))
resp <- 20*pred[, 1] - 3*pred[, 7] - 2*pred[, 8] + matrix(rnorm(N, sd = 0.1)) + 20
pr <- 1 / (1 + exp(-resp))
y <- rbinom(N, 1, pr)
result <- cv.glmnet(pred, y, family="binomial", intercept=FALSE)
# best lambda based on cv
result$lambda.min
# coefficient
coef(result$glmnet.fit, s=result$lambda.min)

Resources