Specifying a fixed parameter with mle2 (bbmle) - mle

Specifying a fixed parameter when using mle2 leads to an error. This occurs when I specify the likelihood with a formula.
library(ggplot2)
mle2(carat ~ dnorm(mean = a * x + b, sd = 1), start = list(a = 1), fixed = list(b = 1), data = diamonds)
Error in mle2(carat ~ dnorm(mean = a * x + b, sd = 1), start = list(a = 1), :
some named arguments in 'fixed' are not arguments to the specified log-likelihood function

You can resolve the error by including 'b=1' in the specification of the starting guesses, e.g.:
mle2(carat ~ dnorm(mean = a * x + b, sd = 1), start = list(a = 1, b = 1), fixed = list(b = 1), data = diamonds)
Even though 'b' is listed in the starting guesses, because it is also specified in the list of fixed parameters, mle2 will not tinker with its value.

Related

Can anyone please help fix the following glmmLasso R package errors?

I have been trying to run the following code and I am getting various errors. Anyone know how to fix the current one? I am trying to run a generalized linear mixed model with a tuning parameter (specifically LASSO), but was trying to start at the basics and get the fixed effects to work first. Thank you!
y <- rbinom(n = 50, size = 1, prob = .5)
x <- rnorm(n = 50, mean = 1, sd = .5)
data <- data.frame(x, y)
mod1 <- glmmLasso(fix = y ~ x , rnd = NULL, family = binomial(link = logit), lambda = 10, data = data)
error: the condition has length > 1 and only the first element will be usedthe condition has length > 1 and only the first element will be used
another error: data length is not a multiple of split variable (this does not happen with this simulation data, but it does with my real data)
Another note - I have tried the exact code in the help documentation for generalized linear mixed models with the soccer data and I get the same error about the length > 1
I guess it is a problem with R 4.0.3.
I used glmmLasso without any errors and this error occurred when I updated my Base R from 3.6 to 4.0.3. I wrote an email to the author.
My guess is that with only 1 predictor and data that has no relationship, your lambda is too high for 1 variable and throws a weird solution of the matrix, you can check see the source code throws error at these two lines:
finish<-(sqrt(sum((Eta.ma[l,]-Eta.ma[l+1,])^2))/sqrt(sum((Eta.ma[l,])^2))<eps)
finish2<-(sqrt(sum((Eta.ma[l-1,]-Eta.ma[l+1,])^2))/sqrt(sum((Eta.ma[l-1,])^2))<eps)
if(finish || finish2)
To make this reproducible:
set.seed(2)
y <- rbinom(n = 50, size = 1, prob = .5)
x <- rnorm(n = 50, mean = 1, sd = .5)
data <- data.frame(x, y)
mod1 <- glmmLasso(fix = y ~ x , rnd = NULL, family = binomial(link = logit), lambda = 10, data = data)
Error in if (finish || finish2) break :
missing value where TRUE/FALSE needed
mod1 <- glmmLasso(fix = y ~ x , rnd = NULL, family = binomial(link = logit), lambda = 1, data = data)
mod1
Call:
glmmLasso(fix = y ~ x, rnd = NULL, data = data, lambda = 1, family = binomial(link = logit))
Fixed Effects:
Coefficients:
(Intercept) x
-0.8089034 0.8678967
If we try another seed, you can see there's no issue, although you can see the end solution is a the coefficient set to zero:
set.seed(1)
y <- rbinom(n = 50, size = 1, prob = .5)
x <- rnorm(n = 50, mean = 1, sd = .5)
data <- data.frame(x, y)
mod1 <- glmmLasso(fix = y ~ x , rnd = NULL, family = binomial(link = logit), lambda = 10, data = data)
mod1
Call:
glmmLasso(fix = y ~ x, rnd = NULL, data = data, lambda = 10,
family = binomial(link = logit))
Fixed Effects:
Coefficients:
(Intercept) x
0.1603426 0.0000000
To sum up.. most likely for your data, you need to move it through some lambdas to examine the fit

Logistic regression for non-linear data

I have a data with continuous independent variable and binary dependent. Therefore I was trying to apply logistic regression for the analysis of this data. However in contrast to the classical case with S-shaped transition, I have a two transitions.
Here is an example of what I mean
library(ggplot)
library(visreg)
classic.data = data.frame(x = seq(from = 0, by = 0.5, length = 30),
y = c(rep(0, times = 14), 1, 0, rep(1, times = 14)))
model.classic = glm(formula = y ~ x,
data = classic.data,
family = "binomial")
summary(model.classic)
visreg(model.classic,
partial = FALSE,
scale = "response",
alpha = 0)
my.data = data.frame(x = seq(from = 0, by = 0.5, length = 30),
y = c(rep(0, times = 10), rep(1, times = 10), rep(0, times = 10)))
model.my = glm(formula = y ~ x,
data = my.data,
family = "binomial")
summary(model.my)
visreg(model.my,
partial = FALSE,
scale = "response",
alpha = 0)
The blue lines on both plots - it is outcome of glm, while red line it what I want to have.
Is there any way to apply logistic regression to such data? Or should I apply some other type of regression analysis?
In your second model, y is not a linear function of x. When you write y ~ x you assume that when x increases, y will increase/decrease depending on a positive/negative coefficient. That is not the case, it's increasing and then decreasing, making the average effect of x zero (hence the strait line). You therefore need a non-linear function. You could do that with a gam from the mgcv package, where the effect of x is modelled as a smooth function:
library(mgcv)
my.data = data.frame(x = seq(from = 0, by = 0.5, length = 30),
y = c(rep(0, times = 10), rep(1, times = 10), rep(0, times = 10)))
m = gam(y ~ s(x), data = my.data, family = binomial)
plot(m)
That would lead to the following fit on the original scale:
my.data$prediction = predict(m, type = "response")
plot(my.data$x, my.data$y)
lines(my.data$x, my.data$prediction, col = "red")

Using effects package inside a function

I have a large frame with lots of variables which I'm going to analyze in the same way. Specifically, I want to plot effect confidence intervals in mixed effect model. I want to write function which make a custom plot for one dependent variable. Direct application of effect() function goes well. But the same code inside function cause error.
I tried two variants of function. Both cause errors.
Here is my reproducible example:
library(nlme)
library(effects)
df <- data.frame(y = rnorm(90), x = gl(3, 30), b = factor(rep(1:30, 3)))
fit <- lme(fixed = y ~ x, random = ~ 1 | b, data = df, method = "REML")
ef <- effect("x", fit)
bp <- barplot(as.vector(ef$fit), col = c("tomato", "skyblue", "limegreen"),
ylim = c(min(ef$lower), max(ef$upper) + (max(ef$upper) - min(ef$lower)) * 0.2 ))
arrows(x0 = bp, y0 = ef$lower, y1 = ef$upper, code = 3, angle = 90)
test1 <- function(y, x, b)
{
fit <- lme(fixed = y ~ x, random = ~ 1 | b, method = "REML")
ef <- effect("x", fit)
bp <- barplot(as.vector(ef$fit), col = c("tomato", "skyblue", "limegreen"),
ylim = c(min(ef$lower), max(ef$upper) + (max(ef$upper) - min(ef$lower)) * 0.2 ))
arrows(x0 = bp, y0 = ef$lower, y1 = ef$upper, code = 3, angle = 90)
}
test1(df$y, df$x, df$b)
# Error in eval(predvars, data, env) : object 'y' not found
test2 <- function(y, x, b)
{
frame <- data.frame(y, x, b)
fit <- lme(fixed = y ~ x, random = ~ 1 | b, frame, method = "REML")
ef <- effect("x", fit)
bp <- barplot(as.vector(ef$fit), col = c("tomato", "skyblue", "limegreen"),
ylim = c(min(ef$lower), max(ef$upper) + (max(ef$upper) - min(ef$lower)) * 0.2 ))
arrows(x0 = bp, y0 = ef$lower, y1 = ef$upper, code = 3, angle = 90)
}
test2(df$y, df$x, df$b)
# Error in as.data.frame.default(data, optional = TRUE) :
# cannot coerce class ‘"function"’ to a data.frame
Simpler:
function(df) {
fit <- lme(fixed = y ~ x, random = ~ 1 | b, data = df, method = "REML")
ef <- effect("x", fit)
bp <- barplot(as.vector(ef$fit), col = c("tomato", "skyblue", "limegreen"),
ylim = c(min(ef$lower),
max(ef$upper) + (max(ef$upper) - min(ef$lower)) * 0.2 ))
arrows(x0 = bp, y0 = ef$lower, y1 = ef$upper, code = 3, angle = 90)
}
You need to pass data to lme, the formula doesn't actually pass any data.
That said, your test2 should work. I can replicate your error but it is really very weird. Somehow the code works in the global env but not in the closure. Very surprising.

Weighting using predict function

I have used 'predict' find a fit line for a linear model(lm) I have created. Because the lm was built on only 2 data points and needs to have a positive slope, I have forced it to go thru the origin (0,0). I have also weighted the function by the number of observations underlying each data point.
Question 1: (SOLVED -see comment by #Gregor)
Why does the predicted line lie so much closer to my second data point (B) than my first data point (A), when B has fewer underlying observations? Did I code something wrong here when weighting the model?
Question 2:
Plotting GLM (link=logit) now, but how can still I force this through 0,0? I've tried adding formula = y~0+x in several places, none of which seem to work.
M <- data.frame("rate" = c(0.4643,0.2143), "conc" = c(300,6000), "nr_dead" = c(13,3), "nr_surv" = c(15,11), "region" = c("A","B"))
M$tot_obsv <- (M$nr_dead+M$nr_surv)
M_conc <- M$conc
M_rate <- M$rate
M_tot_obsv <- M$tot_obsv
#**linear model of data, force 0,0 intercept, weighted by nr. of observations of each data point.**
M_lm <- lm(data = M, rate~0+conc, weights = tot_obsv)
#**plot line using "predict" function**
x_conc <-c(600, 6700)
y_rate <- predict(M_lm, list(conc = x_conc), weights = tot_obsv, type = 'response')
plot(x = M$conc, y = M$rate, pch = 16, ylim = c(0, 0.5), xlim = c(0,7000), xlab = "conc", ylab = "death rate")
lines(x_conc, y_rate, col = "red", lwd = 2)
#**EDIT 1:**
M_glm <- glm(cbind(nr_dead, nr_surv) ~ (0+conc), data = M, family = "binomial")
#*plot using 'predict' function*
binomial_smooth <- function(formula = (y ~ 0+x),...) {
geom_smooth(method = "glm", method.args = list(family = "binomial"), formula = (y ~ 0+x), ...)
}
tibble(x_conc = c(seq(300, 7000, 1), M$conc), y_rate = predict.glm(M_glm, list(conc = x_conc), type = "response")) %>% left_join(M, by = c('x_conc' = 'conc')) %>%
ggplot(aes(x = x_conc, y = y_rate)) + xlab("concentration") + ylab("death rate") +
geom_point(aes(y = rate, size = tot_obsv)) + binomial_smooth(formula = (y ~ 0+x)) + theme_bw()

Fit lognormal distribution with R &nls

Consider I have data like this:
df<-data.frame(x=c(1100,800,600,550,500,350),y=c(0.05,0.17,0.91,0.95,1,0.13))
how can I fit a curve through it based on a log normal shape/distribution
I can use a nls model but get an error always:
fit <-nls(y ~ a*dlnorm(x, mean, sd), data = df,
start = list(mean =0, sd = 10,a=1e4))
Thanks a lot!
I'm not sure why nls behaves like that, but you may directly use optim:
opt <- optim(c(1, 1, 1), function(p) sum((dlnorm(df$x, p[1], p[2]) * p[3] - df$y)^2))
opt$par
# [1] 6.3280753 0.2150322 299.3154123
plot(x = df$x, y = df$y, type = 'b', ylim = c(0, 1), xlim = c(0, 1100))
curve(opt$par[3] * dlnorm(x, opt$par[1], opt$par[2]), from = 0, to = 1100, add = TRUE, col = 'red')

Resources