Specifying a fixed parameter when using mle2 leads to an error. This occurs when I specify the likelihood with a formula.
library(ggplot2)
mle2(carat ~ dnorm(mean = a * x + b, sd = 1), start = list(a = 1), fixed = list(b = 1), data = diamonds)
Error in mle2(carat ~ dnorm(mean = a * x + b, sd = 1), start = list(a = 1), :
some named arguments in 'fixed' are not arguments to the specified log-likelihood function
You can resolve the error by including 'b=1' in the specification of the starting guesses, e.g.:
mle2(carat ~ dnorm(mean = a * x + b, sd = 1), start = list(a = 1, b = 1), fixed = list(b = 1), data = diamonds)
Even though 'b' is listed in the starting guesses, because it is also specified in the list of fixed parameters, mle2 will not tinker with its value.
Related
I have been trying to run the following code and I am getting various errors. Anyone know how to fix the current one? I am trying to run a generalized linear mixed model with a tuning parameter (specifically LASSO), but was trying to start at the basics and get the fixed effects to work first. Thank you!
y <- rbinom(n = 50, size = 1, prob = .5)
x <- rnorm(n = 50, mean = 1, sd = .5)
data <- data.frame(x, y)
mod1 <- glmmLasso(fix = y ~ x , rnd = NULL, family = binomial(link = logit), lambda = 10, data = data)
error: the condition has length > 1 and only the first element will be usedthe condition has length > 1 and only the first element will be used
another error: data length is not a multiple of split variable (this does not happen with this simulation data, but it does with my real data)
Another note - I have tried the exact code in the help documentation for generalized linear mixed models with the soccer data and I get the same error about the length > 1
I guess it is a problem with R 4.0.3.
I used glmmLasso without any errors and this error occurred when I updated my Base R from 3.6 to 4.0.3. I wrote an email to the author.
My guess is that with only 1 predictor and data that has no relationship, your lambda is too high for 1 variable and throws a weird solution of the matrix, you can check see the source code throws error at these two lines:
finish<-(sqrt(sum((Eta.ma[l,]-Eta.ma[l+1,])^2))/sqrt(sum((Eta.ma[l,])^2))<eps)
finish2<-(sqrt(sum((Eta.ma[l-1,]-Eta.ma[l+1,])^2))/sqrt(sum((Eta.ma[l-1,])^2))<eps)
if(finish || finish2)
To make this reproducible:
set.seed(2)
y <- rbinom(n = 50, size = 1, prob = .5)
x <- rnorm(n = 50, mean = 1, sd = .5)
data <- data.frame(x, y)
mod1 <- glmmLasso(fix = y ~ x , rnd = NULL, family = binomial(link = logit), lambda = 10, data = data)
Error in if (finish || finish2) break :
missing value where TRUE/FALSE needed
mod1 <- glmmLasso(fix = y ~ x , rnd = NULL, family = binomial(link = logit), lambda = 1, data = data)
mod1
Call:
glmmLasso(fix = y ~ x, rnd = NULL, data = data, lambda = 1, family = binomial(link = logit))
Fixed Effects:
Coefficients:
(Intercept) x
-0.8089034 0.8678967
If we try another seed, you can see there's no issue, although you can see the end solution is a the coefficient set to zero:
set.seed(1)
y <- rbinom(n = 50, size = 1, prob = .5)
x <- rnorm(n = 50, mean = 1, sd = .5)
data <- data.frame(x, y)
mod1 <- glmmLasso(fix = y ~ x , rnd = NULL, family = binomial(link = logit), lambda = 10, data = data)
mod1
Call:
glmmLasso(fix = y ~ x, rnd = NULL, data = data, lambda = 10,
family = binomial(link = logit))
Fixed Effects:
Coefficients:
(Intercept) x
0.1603426 0.0000000
To sum up.. most likely for your data, you need to move it through some lambdas to examine the fit
I have a data with continuous independent variable and binary dependent. Therefore I was trying to apply logistic regression for the analysis of this data. However in contrast to the classical case with S-shaped transition, I have a two transitions.
Here is an example of what I mean
library(ggplot)
library(visreg)
classic.data = data.frame(x = seq(from = 0, by = 0.5, length = 30),
y = c(rep(0, times = 14), 1, 0, rep(1, times = 14)))
model.classic = glm(formula = y ~ x,
data = classic.data,
family = "binomial")
summary(model.classic)
visreg(model.classic,
partial = FALSE,
scale = "response",
alpha = 0)
my.data = data.frame(x = seq(from = 0, by = 0.5, length = 30),
y = c(rep(0, times = 10), rep(1, times = 10), rep(0, times = 10)))
model.my = glm(formula = y ~ x,
data = my.data,
family = "binomial")
summary(model.my)
visreg(model.my,
partial = FALSE,
scale = "response",
alpha = 0)
The blue lines on both plots - it is outcome of glm, while red line it what I want to have.
Is there any way to apply logistic regression to such data? Or should I apply some other type of regression analysis?
In your second model, y is not a linear function of x. When you write y ~ x you assume that when x increases, y will increase/decrease depending on a positive/negative coefficient. That is not the case, it's increasing and then decreasing, making the average effect of x zero (hence the strait line). You therefore need a non-linear function. You could do that with a gam from the mgcv package, where the effect of x is modelled as a smooth function:
library(mgcv)
my.data = data.frame(x = seq(from = 0, by = 0.5, length = 30),
y = c(rep(0, times = 10), rep(1, times = 10), rep(0, times = 10)))
m = gam(y ~ s(x), data = my.data, family = binomial)
plot(m)
That would lead to the following fit on the original scale:
my.data$prediction = predict(m, type = "response")
plot(my.data$x, my.data$y)
lines(my.data$x, my.data$prediction, col = "red")
I have a large frame with lots of variables which I'm going to analyze in the same way. Specifically, I want to plot effect confidence intervals in mixed effect model. I want to write function which make a custom plot for one dependent variable. Direct application of effect() function goes well. But the same code inside function cause error.
I tried two variants of function. Both cause errors.
Here is my reproducible example:
library(nlme)
library(effects)
df <- data.frame(y = rnorm(90), x = gl(3, 30), b = factor(rep(1:30, 3)))
fit <- lme(fixed = y ~ x, random = ~ 1 | b, data = df, method = "REML")
ef <- effect("x", fit)
bp <- barplot(as.vector(ef$fit), col = c("tomato", "skyblue", "limegreen"),
ylim = c(min(ef$lower), max(ef$upper) + (max(ef$upper) - min(ef$lower)) * 0.2 ))
arrows(x0 = bp, y0 = ef$lower, y1 = ef$upper, code = 3, angle = 90)
test1 <- function(y, x, b)
{
fit <- lme(fixed = y ~ x, random = ~ 1 | b, method = "REML")
ef <- effect("x", fit)
bp <- barplot(as.vector(ef$fit), col = c("tomato", "skyblue", "limegreen"),
ylim = c(min(ef$lower), max(ef$upper) + (max(ef$upper) - min(ef$lower)) * 0.2 ))
arrows(x0 = bp, y0 = ef$lower, y1 = ef$upper, code = 3, angle = 90)
}
test1(df$y, df$x, df$b)
# Error in eval(predvars, data, env) : object 'y' not found
test2 <- function(y, x, b)
{
frame <- data.frame(y, x, b)
fit <- lme(fixed = y ~ x, random = ~ 1 | b, frame, method = "REML")
ef <- effect("x", fit)
bp <- barplot(as.vector(ef$fit), col = c("tomato", "skyblue", "limegreen"),
ylim = c(min(ef$lower), max(ef$upper) + (max(ef$upper) - min(ef$lower)) * 0.2 ))
arrows(x0 = bp, y0 = ef$lower, y1 = ef$upper, code = 3, angle = 90)
}
test2(df$y, df$x, df$b)
# Error in as.data.frame.default(data, optional = TRUE) :
# cannot coerce class ‘"function"’ to a data.frame
Simpler:
function(df) {
fit <- lme(fixed = y ~ x, random = ~ 1 | b, data = df, method = "REML")
ef <- effect("x", fit)
bp <- barplot(as.vector(ef$fit), col = c("tomato", "skyblue", "limegreen"),
ylim = c(min(ef$lower),
max(ef$upper) + (max(ef$upper) - min(ef$lower)) * 0.2 ))
arrows(x0 = bp, y0 = ef$lower, y1 = ef$upper, code = 3, angle = 90)
}
You need to pass data to lme, the formula doesn't actually pass any data.
That said, your test2 should work. I can replicate your error but it is really very weird. Somehow the code works in the global env but not in the closure. Very surprising.
I have used 'predict' find a fit line for a linear model(lm) I have created. Because the lm was built on only 2 data points and needs to have a positive slope, I have forced it to go thru the origin (0,0). I have also weighted the function by the number of observations underlying each data point.
Question 1: (SOLVED -see comment by #Gregor)
Why does the predicted line lie so much closer to my second data point (B) than my first data point (A), when B has fewer underlying observations? Did I code something wrong here when weighting the model?
Question 2:
Plotting GLM (link=logit) now, but how can still I force this through 0,0? I've tried adding formula = y~0+x in several places, none of which seem to work.
M <- data.frame("rate" = c(0.4643,0.2143), "conc" = c(300,6000), "nr_dead" = c(13,3), "nr_surv" = c(15,11), "region" = c("A","B"))
M$tot_obsv <- (M$nr_dead+M$nr_surv)
M_conc <- M$conc
M_rate <- M$rate
M_tot_obsv <- M$tot_obsv
#**linear model of data, force 0,0 intercept, weighted by nr. of observations of each data point.**
M_lm <- lm(data = M, rate~0+conc, weights = tot_obsv)
#**plot line using "predict" function**
x_conc <-c(600, 6700)
y_rate <- predict(M_lm, list(conc = x_conc), weights = tot_obsv, type = 'response')
plot(x = M$conc, y = M$rate, pch = 16, ylim = c(0, 0.5), xlim = c(0,7000), xlab = "conc", ylab = "death rate")
lines(x_conc, y_rate, col = "red", lwd = 2)
#**EDIT 1:**
M_glm <- glm(cbind(nr_dead, nr_surv) ~ (0+conc), data = M, family = "binomial")
#*plot using 'predict' function*
binomial_smooth <- function(formula = (y ~ 0+x),...) {
geom_smooth(method = "glm", method.args = list(family = "binomial"), formula = (y ~ 0+x), ...)
}
tibble(x_conc = c(seq(300, 7000, 1), M$conc), y_rate = predict.glm(M_glm, list(conc = x_conc), type = "response")) %>% left_join(M, by = c('x_conc' = 'conc')) %>%
ggplot(aes(x = x_conc, y = y_rate)) + xlab("concentration") + ylab("death rate") +
geom_point(aes(y = rate, size = tot_obsv)) + binomial_smooth(formula = (y ~ 0+x)) + theme_bw()
Consider I have data like this:
df<-data.frame(x=c(1100,800,600,550,500,350),y=c(0.05,0.17,0.91,0.95,1,0.13))
how can I fit a curve through it based on a log normal shape/distribution
I can use a nls model but get an error always:
fit <-nls(y ~ a*dlnorm(x, mean, sd), data = df,
start = list(mean =0, sd = 10,a=1e4))
Thanks a lot!
I'm not sure why nls behaves like that, but you may directly use optim:
opt <- optim(c(1, 1, 1), function(p) sum((dlnorm(df$x, p[1], p[2]) * p[3] - df$y)^2))
opt$par
# [1] 6.3280753 0.2150322 299.3154123
plot(x = df$x, y = df$y, type = 'b', ylim = c(0, 1), xlim = c(0, 1100))
curve(opt$par[3] * dlnorm(x, opt$par[1], opt$par[2]), from = 0, to = 1100, add = TRUE, col = 'red')