Related
I have a set of observations from a Cauchy (theta,1) and I have a plot for the log-likelihood against different x values
obs=c(1.77, -0.23, 2.76, 3.80, 3.47, 56.75, -1.34, 4.24, -2.44, 3.29, 3.71, -2.40, 4.53, -0.07, -1.05, -13.87, -2.53, -1.75, 0.27, 43.21)
ll_c=function(theta, obs){ #define Loglikelihood function for Cauchy(θ,1) distribution
logl= sum(dcauchy(obs, location = theta, scale = 1, log = T))
return(logl)
}
x = seq(from=-10,to=10,by=0.1) #create test values
ll = NULL
for (i in x){
ll = c(ll, ll_c(i, obs)) #perform ll_c for all test values and store
}
plot(x, ll)
I also need to make a plot of the first derivative of the log-likelihood function against the same x values and I can not figure out how to do so.
fdll_c=function(theta,obs){
Dlogl=D(sum(dcauchy(obs,location=theta,scale=1,log=T)),'theta')
return(Dlogl)
}
fdll = NULL
for (j in x){
fdll = c(fdll, fdll_c(j,obs))
}
plot(x,fdll)
I have tried different variations on this code, but every time it has come back with an error or with a derivative of 0 at all points.
Maybe the following answers the question.
It uses an explicit log-likelihood partial derivative function and then applies it to a vector around 0.
obs <- c(1.77, -0.23, 2.76, 3.80, 3.47, 56.75, -1.34, 4.24, -2.44, 3.29, 3.71, -2.40, 4.53, -0.07, -1.05, -13.87, -2.53, -1.75, 0.27, 43.21)
dll_theta <- function(x, theta, scale){
cc <- (x - theta)/scale
-2*sum(1/cc)/scale
}
x <- seq(from = -10, to = 10, by = 0.001)
y <- sapply(x, function(.x) dll_theta(obs, theta = .x, scale = 1))
i <- which(abs(y) > 1e15)
plot(x[-i], y[-i], pch = ".")
I would like to plot a threshold model with smooth 95% confidence interval lines between line segments. You would think this would be on the simple side but I have not been able to find an answer!
My threshold/breakpoints are known, it would be great if there were a way to visualize this data. I have tried the segmented package which produces the following plot:
The plot shows a threshold model with a breakpoint at 5.4. However, the confidence intervals are not smooth between regression lines.
If anyone knows of any way to produce smooth (i.e. without the jump between line segments) CI lines between segmented regression lines (ideally in ggplot) that would be amazing. Thank you so much.
I have included sample data and the code I have tried below:
x <- c(2.26, 1.95, 1.59, 1.81, 2.01, 1.63, 1.62, 1.19, 1.41, 1.35, 1.32, 1.52, 1.10, 1.12, 1.11, 1.14, 1.23, 1.05, 0.95, 1.30, 0.79,
0.81, 1.15, 1.10, 1.29, 0.97, 1.05, 1.05, 0.84, 0.64, 0.80, 0.81, 0.61, 0.71, 0.75, 0.30, 0.30, 0.49, 1.13, 0.55, 0.77, 0.51,
0.67, 0.43, 1.11, 0.29, 0.36, 0.57, 0.02, 0.22, 3.18, 3.79, 2.49, 2.44, 2.12, 2.45, 3.22, 3.44, 3.86, 3.53, 3.13)
y <- c(22.37, 18.93, 16.99, 15.65, 14.62, 13.79, 13.09, 12.49, 11.95, 11.48, 11.05, 10.66, 10.30, 9.96, 9.65, 9.35, 9.07, 8.81,
8.56, 8.32, 8.09, 7.87, 7.65, 7.45, 7.25, 7.05, 6.86, 6.68, 6.50, 6.32, 6.15, 5.97, 5.80, 5.63, 5.47, 5.30,
5.13, 4.96, 4.80, 4.63, 4.45, 4.28, 4.09, 3.90, 3.71, 3.50, 3.27, 3.01, 2.70, 2.28, 22.37, 16.99, 11.05, 8.81,
8.56, 8.32, 7.25, 7.05, 6.50, 6.15, 5.63)
lin.mod <- lm(y ~ x)
segmented.mod <- segmented(lin.mod, seg.Z = ~x, psi=2)
plot(x, y)
plot(segmented.mod, add=TRUE, conf.level = 0.95)
which produces the following plot (and associated jumps in 95% confidence intervals):
segmented plot
Background: The non-smoothness in existing change point packages are due to the fact that frequentist packages operate with a fixed change point value. But as with all inferred parameters, this is wrong because there is indeed uncertainty concerning the location of the change.
Solution: AFAIK, only Bayesian methods can quantify that and the mcp package fills this space.
library(mcp)
model = list(
y ~ 1 + x, # Segment 1: Intercept and slope
~ 0 + x # Segment 2: Joined slope (no intercept change)
)
fit = mcp(model, data = data.frame(x, y))
Default plot (plot.mcpfit() returns a ggplot object):
plot(fit) + ggtitle("Default plot")
Each line represents a possible model that generated the data. The posterior for the change point is shown as a blue density. You can add a credible interval on top using plot(fit, q_fit = TRUE) or plot it alone:
plot(fit, lines = 0, q_fit = c(0.025, 0.975), cp_dens = FALSE) + ggtitle("Credible interval only")
If your change point is indeed known and if you want to model different residual scales for each segment (i.e., quasi-emulate segmented), you can do:
model2 = list(
y ~ 1 + x,
~ 0 + x + sigma(1) # Add intercept change in residual scale
)
fit = mcp(model2, df, prior = list(cp_1 = 1.9)) # Note: prior is a fixed value - not a distribution.
plot(fit, q_fit = TRUE, cp_dens = FALSE)
Notice that the CI does not "jump" around the change point as in segmented. I believe that this is the correct behavior. Disclosure: I am the author of mcp.
I'm trying to figure out how to calculate absolute EC50 values using the LL.3 and LL.4 (3 and 4 parameter) dose response models in the package drc, but I keep getting these errors of "Warning message:In log(exp(-tempVal/parmVec[5]) - 1) : NaNs produced" and the EC50 value is "NA".
Here is an example of the code I'm trying to run
###use rygrass dataset in drc
gr.LL.3 <- drm(ryegrass$rootl ~ ryegrass$conc, fct = LL.3()) # 3 parameter log-logistic model
gr.LL.4 <- drm(ryegrass$rootl ~ ryegrass$conc, fct = LL.4()) # 4 parameter log-logistic model
plot(gr.LL.3) #graph looks fine
plot(gr.LL.4) #graph looks fine
ED (gr.LL.3, respLev = c(50), type = "relative") # this works fine
ED (gr.LL.4, respLev = c(50), type = "relative") # this works fine
ED (gr.LL.3, respLev = c(50), type = "absolute") # this gives me "NA" for EC50 along with warning message
ED (gr.LL.4, respLev = c(50), type = "absolute") # this gives me "NA" for EC50 along with warning message
It's not due to 0 values for concentrations
### It's not due to 0 values for concentrations
# ryegrass dataset with 0 value concentrations and corresponding rootl removed
rootlength <- c(8.3555556, 6.9142857, 7.75, 6.8714286, 6.45, 5.9222222, 1.925, 2.8857143, 4.2333333, 1.1875, 0.8571429, 1.0571429, 0.6875, 0.525, 0.825, 0.25, 0.22, 0.44)
conc.wo.0 <- c(0.94, 0.94, 0.94, 1.88, 1.88, 1.88, 3.75, 3.75, 3.75, 7.5, 7.5, 7.5, 15, 15, 15, 30, 30, 30)
gro.LL.3 <- drm(rootlength ~ conc.wo.0, fct = LL.3())
plot(gro.LL.3) #graph looks fine
ED (gro.LL.3, respLev = c(50), type = "relative") # this works fine
ED (gro.LL.3, respLev = c(50), type = "absolute") # once again, this gives me "NA" for EC50 along with warning message
It's also not due to the response being in absolute vs relative terms
### It's also not due to the response being in absolute vs relative terms
# ryegrass dataset with response relative to average response with 0 concentration (sorry, I did the absolute to relative conversion in excel, I'm still learning r)
rel.rootl <- c(0.98, 1.03, 1.07, 0.94, 0.95, 1.03, 1.08, 0.89, 1.00, 0.89, 0.83, 0.76, 0.25, 0.37, 0.55, 0.15, 0.11, 0.14, 0.09, 0.07, 0.11, 0.03, 0.03, 0.06)
concentration <- c(0, 0, 0, 0, 0, 0, 0.94, 0.94, 0.94, 1.88, 1.88, 1.88, 3.75, 3.75, 3.75, 7.5, 7.5, 7.5, 15, 15, 15, 30, 30, 30)
rel.gro.LL.3 <- drm(rel.rootl ~ concentration, fct = LL.3())
plot(rel.gro.LL.3) #graph looks fine
ED (rel.gro.LL.3, respLev = c(50), type = "relative") # this works fine
ED (rel.gro.LL.3, respLev = c(50), type = "absolute") # once again, this gives me "NA" for EC50 along with warning message
I'm new to this, so any help is appreciated.
rel.rootl <- c(0.98, 1.03, 1.07, 0.94, 0.95, 1.03, 1.08, 0.89, 1.00, 0.89, 0.83, 0.76, 0.25, 0.37, 0.55, 0.15, 0.11, 0.14, 0.09, 0.07, 0.11, 0.03, 0.03, 0.06)
concentration <- c(0, 0, 0, 0, 0, 0, 0.94, 0.94, 0.94, 1.88, 1.88, 1.88, 3.75, 3.75, 3.75, 7.5, 7.5, 7.5, 15, 15, 15, 30, 30, 30)
rel.gro.LL.3 <- drm(rel.rootl ~ concentration, fct = LL.3())
plot(rel.gro.LL.3) #graph looks fine
ED (rel.gro.LL.3, respLev = c(50), type = "relative") # this works fine
ED (rel.gro.LL.3, respLev = c(50), type = "absolute") # once again, this gives me "NA" for EC50 along with warning message
The problem is because when you are trying to estimate the absolute EC50 the ED function solves for the point on the curve where you want (i.e. the respLev argument) so if your relative response level does not have 50% on the y-axis it will run into an error because your y-axis is proportions.
To fix this issue either multiply your normalized response by 100 to turn it into a percent relative response
rel.gro.LL.3.percent <- drm(rel.rootl*100 ~ concentration, fct = LL.3())
ED (rel.gro.LL.3.percent, respLev = c(50), type = "relative") # same result as above
Estimated effective doses
Estimate Std. Error
e:1:50 3.26520 0.19915
ED (rel.gro.LL.3.percent, respLev = c(50), type = "absolute") # very similar to relative EC50
Estimated effective doses
Estimate Std. Error
e:1:50 3.30154 0.20104
Alternatively, you could change the respLev to 0.5 in your original model.
ED (rel.gro.LL.3, respLev = c(50), type = "relative") # this still works fine
Estimated effective doses
Estimate Std. Error
e:1:50 3.26520 0.19915
ED (rel.gro.LL.3, respLev = c(0.5), type = "absolute") # Now this works and is the same as we got before with response multiplied by 100
Estimated effective doses
Estimate Std. Error
e:1:0.5 3.30154 0.20104
Background: Species richness scales to the negative -0.75 of body weight. However, when I fit my data, I get a value of 0.57. A friend told me that the summary(lm) results prints the 'best fit' slope of the data. Nevertheless, I'm wondering if I can create a regression plot wherein I force the slope to be -0.75 like the literature.
My code is:
y value
log.nspecies.dec = c(3.05, 2.95, 2.97, 2.98, 2.84, 2.85, 2.83, 2.71, 2.64, 2.62, 2.58, 2.37,
2.26, 2.17, 2.00, 1.88, 1.75, 1.62, 1.36, 1.30, 1.08, 1.20, 0.90, 0.30, 0.70,
0.30, 0.48, 0.00, 0.30, 0.00)
x value
logbio.dec = c(2.1, 2.3, 2.5, 2.7, 2.9, 3.1, 3.3, 3.7, 3.9, 4.1, 4.3, 4.5, 4.7, 4.9, 5.1,
5.3, 5.5, 5.7, 5.9, 6.1, 6.3, 6.5, 6.7, 6.9, 7.1, 7.3, 7.5, 7.7, 7.9)
plot a barplot and superimpose a regression line
name the y variables with the x
names(log.nspecies.dec) = logbio.dec
order the y variables
log.nspecies.dec = log.nspecies.dec[order (as.numeric(names(log.nspecies.dec)))]
do the barplot
xpos = barplot(log.nspecies.dec, las = 2, space = 0)
lm.fit = lm(log.nspecies.dec ~ as.numeric(names(log.nspecies.dec)))
summary(lm.fit)
y.init = lm.fit$coefficients[2] * as.numeric(names(log.nspecies.dec))1 +
lm.fit$coefficients1
y.end = lm.fit$coefficients[2] * as.numeric(names(log.nspecies.dec))[length(log.nspecies.dec)] +
lm.fit$coefficients1
segments(xpos1, y.init, xpos [length(xpos)], y.end, lwd = 2, col = 'red')
title(main = 'ln Number of species ~ lm Weight')
coef(lm.fit)
gives a result wherein the slope is 0.57. How do I force the slope to -0.75?
You can use offset to fix the y-intercept at a negative value. For example
## Example data
x = 1:10
y = -2 + 2* x
# Fit the model
(m = lm(y ~ 0 + x, offset = rep(-2, length(y))))
#Call:
#lm(formula = y ~ 0 + x, offset = rep(-2, length(y)))
#Coefficients:
#x
#2
The output correctly identifies the gradient as 2.
The reason your code doesn't work is that you are using abline(). Looking at ?abline, it states that to draw the line it will in turn call coef(MODEL). When you use offset, the coef function doesn't return the y-intercept.
R> coef(m)
x
2
Hence abline draws the wrong line.
If the intercept is changed, the code still works
x = 1:10
y = 2 + 2*x
lm(y ~ 0 + x, offset = rep(2, length(y)))
I am trying to fit a negative exponential to some data in R, but the fitted line looks too high compared to the data, whereas the fit I get using Excel's built-in power fit looks more believable. Can someone tell me why? I've tried using the nls() function and also optim() and get similar parameters from both of those methods, but the fits for both look high.
x <- c(5.96, 12.86, 8.40, 2.03, 12.84, 21.44, 21.45, 19.97, 8.92, 25.00, 19.90, 20.00, 20.70, 16.68, 14.90, 26.00, 22.00, 22.00, 10.00, 5.70, 5.40, 3.20, 7.60, 0.59, 0.14, 0.85, 9.20, 0.79, 1.40, 2.68, 1.91)
y <- c(5.35, 2.38, 1.77, 1.87, 1.47, 3.27, 2.01, 0.52, 2.72, 0.85, 1.60, 1.37, 1.48, 0.39, 2.39, 1.83, 0.71, 1.24, 3.14, 2.16, 2.22, 11.50, 8.32, 38.98, 16.78, 32.66, 3.89, 1.89, 8.71, 9.74, 23.14)
xy.frame <- data.frame(x,y)
nl.fit <- nls(formula=(y ~ a * x^b), data=xy.frame, start = c(a=10, b=-0.7))
a.est <- coef(nl.fit)[1]
b.est <- coef(nl.fit)[2]
plot(x=xy.frame$x,y=xy.frame$y)
# curve looks too high
curve(a.est * x^b.est , add=T)
# these parameters from Excel seem to fit better
curve(10.495 * x^-0.655, add=T)
# alternatively use optim()
theta.init <- c(1000,-0.5, 50)
exp.nll <- function(theta, data){
a <- theta[1]
b <- theta[2]
sigma <- theta[3]
obs.y <- data$y
x <- data$x
pred.y <- a*x^b
nll <- -sum(dnorm(x=obs.y, mean=pred.y , sd=sigma, log=T))
nll
}
fit.optim <- optim(par=theta.init,fn=exp.nll,method="BFGS",data=xy.frame )
plot(x=xy.frame$x,y=xy.frame$y)
# still looks too high
curve(a.est * x^b.est, add=T)
The reason you're seeing the unexpected behavior is that the curves that look "too high" actually have much lower sums of squared errors than the curves from excel:
# Fit from nls
sum((y - a.est*x^b.est)^2)
# [1] 1588.313
# Fit from excel
sum((y - 10.495*x^ -0.655)^2)
# [1] 1981.561
The reason nls favors the higher curve is that it is working to avoid huge errors at small x values at the cost of slightly larger errors with large x values. One way to address this might be to apply a log-log transformation:
mod <- lm(log(y)~log(x))
(a.est2 <- exp(coef(mod)["(Intercept)"]))
# (Intercept)
# 10.45614
(b.est2 <- coef(mod)["log(x)"])
# log(x)
# -0.6529741
These are quite close to the coefficients from excel, and yield a more visually appealing fit (despite the worse performance on the sum-of-squared-errors metric):