"NaNs produced" warning when calculating absolute EC50 values with drc package - r

I'm trying to figure out how to calculate absolute EC50 values using the LL.3 and LL.4 (3 and 4 parameter) dose response models in the package drc, but I keep getting these errors of "Warning message:In log(exp(-tempVal/parmVec[5]) - 1) : NaNs produced" and the EC50 value is "NA".
Here is an example of the code I'm trying to run
###use rygrass dataset in drc
gr.LL.3 <- drm(ryegrass$rootl ~ ryegrass$conc, fct = LL.3()) # 3 parameter log-logistic model
gr.LL.4 <- drm(ryegrass$rootl ~ ryegrass$conc, fct = LL.4()) # 4 parameter log-logistic model
plot(gr.LL.3) #graph looks fine
plot(gr.LL.4) #graph looks fine
ED (gr.LL.3, respLev = c(50), type = "relative") # this works fine
ED (gr.LL.4, respLev = c(50), type = "relative") # this works fine
ED (gr.LL.3, respLev = c(50), type = "absolute") # this gives me "NA" for EC50 along with warning message
ED (gr.LL.4, respLev = c(50), type = "absolute") # this gives me "NA" for EC50 along with warning message
It's not due to 0 values for concentrations
### It's not due to 0 values for concentrations
# ryegrass dataset with 0 value concentrations and corresponding rootl removed
rootlength <- c(8.3555556, 6.9142857, 7.75, 6.8714286, 6.45, 5.9222222, 1.925, 2.8857143, 4.2333333, 1.1875, 0.8571429, 1.0571429, 0.6875, 0.525, 0.825, 0.25, 0.22, 0.44)
conc.wo.0 <- c(0.94, 0.94, 0.94, 1.88, 1.88, 1.88, 3.75, 3.75, 3.75, 7.5, 7.5, 7.5, 15, 15, 15, 30, 30, 30)
gro.LL.3 <- drm(rootlength ~ conc.wo.0, fct = LL.3())
plot(gro.LL.3) #graph looks fine
ED (gro.LL.3, respLev = c(50), type = "relative") # this works fine
ED (gro.LL.3, respLev = c(50), type = "absolute") # once again, this gives me "NA" for EC50 along with warning message
It's also not due to the response being in absolute vs relative terms
### It's also not due to the response being in absolute vs relative terms
# ryegrass dataset with response relative to average response with 0 concentration (sorry, I did the absolute to relative conversion in excel, I'm still learning r)
rel.rootl <- c(0.98, 1.03, 1.07, 0.94, 0.95, 1.03, 1.08, 0.89, 1.00, 0.89, 0.83, 0.76, 0.25, 0.37, 0.55, 0.15, 0.11, 0.14, 0.09, 0.07, 0.11, 0.03, 0.03, 0.06)
concentration <- c(0, 0, 0, 0, 0, 0, 0.94, 0.94, 0.94, 1.88, 1.88, 1.88, 3.75, 3.75, 3.75, 7.5, 7.5, 7.5, 15, 15, 15, 30, 30, 30)
rel.gro.LL.3 <- drm(rel.rootl ~ concentration, fct = LL.3())
plot(rel.gro.LL.3) #graph looks fine
ED (rel.gro.LL.3, respLev = c(50), type = "relative") # this works fine
ED (rel.gro.LL.3, respLev = c(50), type = "absolute") # once again, this gives me "NA" for EC50 along with warning message
I'm new to this, so any help is appreciated.

rel.rootl <- c(0.98, 1.03, 1.07, 0.94, 0.95, 1.03, 1.08, 0.89, 1.00, 0.89, 0.83, 0.76, 0.25, 0.37, 0.55, 0.15, 0.11, 0.14, 0.09, 0.07, 0.11, 0.03, 0.03, 0.06)
concentration <- c(0, 0, 0, 0, 0, 0, 0.94, 0.94, 0.94, 1.88, 1.88, 1.88, 3.75, 3.75, 3.75, 7.5, 7.5, 7.5, 15, 15, 15, 30, 30, 30)
rel.gro.LL.3 <- drm(rel.rootl ~ concentration, fct = LL.3())
plot(rel.gro.LL.3) #graph looks fine
ED (rel.gro.LL.3, respLev = c(50), type = "relative") # this works fine
ED (rel.gro.LL.3, respLev = c(50), type = "absolute") # once again, this gives me "NA" for EC50 along with warning message
The problem is because when you are trying to estimate the absolute EC50 the ED function solves for the point on the curve where you want (i.e. the respLev argument) so if your relative response level does not have 50% on the y-axis it will run into an error because your y-axis is proportions.
To fix this issue either multiply your normalized response by 100 to turn it into a percent relative response
rel.gro.LL.3.percent <- drm(rel.rootl*100 ~ concentration, fct = LL.3())
ED (rel.gro.LL.3.percent, respLev = c(50), type = "relative") # same result as above
Estimated effective doses
Estimate Std. Error
e:1:50 3.26520 0.19915
ED (rel.gro.LL.3.percent, respLev = c(50), type = "absolute") # very similar to relative EC50
Estimated effective doses
Estimate Std. Error
e:1:50 3.30154 0.20104
Alternatively, you could change the respLev to 0.5 in your original model.
ED (rel.gro.LL.3, respLev = c(50), type = "relative") # this still works fine
Estimated effective doses
Estimate Std. Error
e:1:50 3.26520 0.19915
ED (rel.gro.LL.3, respLev = c(0.5), type = "absolute") # Now this works and is the same as we got before with response multiplied by 100
Estimated effective doses
Estimate Std. Error
e:1:0.5 3.30154 0.20104

Related

How do you code covaried errors for an SEM?

I am re-analyzing data from a published paper using their correlation matrix and reconstructing their SEM. However, I do not know how to code the errors. Specifically, I attempting to code the errors on the "Empathy" and "Teaching Self-Efficacy" variables in the SEM (pictured). How do you code the errors?
Below is my code:
library(MVN)
library(lavaan)
library(haven)
library(semPlot)
library(lavaanPlot)
Goroshit_Hen_cor_mat <- matrix(c(1, 0.11, -0.12, -0.1, -0.09, 0.03,
0.11, 1, .3, -0.01, -0.05, 0.06,
-0.12, 0.3, 1, 0.21, 0.23, 0.24,
-0.1, -0.01, 0.21, 1, 0.56, 0.53,
-0.09, -0.05, 0.23, 0.56, 1, 0.38,
0.03, 0.06, 0.24, 0.53, 0.38, 1),
nrow = 6, ncol = 6)
var_names = c("Gender", "Degree", "Years_Teaching", "Emotional_SE", "Empathy", "Teaching_SE")
Goroshit_Hen_cor_mat
var_sds <- c(0.47, 0.42, 10.19, 0.48, 0.43, 0.55)
Goroshit_Hen_cov <- cor2cov(Goroshit_Hen_cor_mat, var_sds, names = var_names)
Goroshit_Hen_cov
Goroshit_Hen_SEM <-"Empathy ~ Emotional_SE
Empathy ~ Gender
Empathy ~ Degree
Empathy ~ Years_Teaching
Teaching_SE ~ Emotional_SE
Teaching_SE ~ Gender
Teaching_SE ~ Degree
Teaching_SE ~ Years_Teaching
Emotional_SE ~~ Gender
Emotional_SE ~~ Degree
Emotional_SE ~~ Years_Teaching
Gender ~~ Degree
Gender ~~ Years_Teaching
Degree ~~ Years_Teaching
Empathy ~~ Teaching_SE
Empathy ~~ Empathy
Teaching_SE ~~ Teaching_SE
"
Goroshit_Hen_SEM_fit <- sem(Goroshit_Hen_SEM, sample.cov = Goroshit_Hen_cov, sample.nobs = 273)
semPaths(Goroshit_Hen_SEM_fit, "path", whatLabels = "est", edge.label.cex = 1, intercepts = FALSE, residuals = TRUE, curve = 1, curvature = 2, nCharNodes = 8, sizeMan = 6, sizeMan2 = 3, optimizeLatRes = T, rotation = 3, edge.color = "#000000")
summary(Goroshit_Hen_SEM_fit)
All double-headed arrows are specified with the ~~ operator, as explained in the ?model.syntax help page and the lavaan tutorial. So you can add this line to your existing syntax
Empathy ~~ Teaching_SE
But if you merely want to free that parameter, it is not necessary when using the sem() or cfa() wrappers, which call lavaan() with the setting auto.cov.y=TRUE (see the ?lavOptions for a description). You should see the parameter estimate in your summary() output.

How can I calculate F statistics for the Gamma regression in R manually?

I'm trying to understand how anova calculates F-value for a Gamma glm.
I have some weird skewed data:
y <- c(0, 0.88, 0.94, 0, 0.95, 0.77, 3.22, 3.52, 1.22, 1.52, 1.23,
0.92, 1.11, 1.18, 1.47, 1.53, 0, 0, 1.09, 0.83, 0.8, 1.56, 6,
0.74, 1.18, 1.01, 0.82, 3.83, 1.75, 1.27, 1.54, 1.05, 1.08, 0.9,
0.77, 1.44, 4.55, 0, 1.44, 2.91, 0.71, 12.93, 0.77, 0, 1.14,
1.06, 3.96, 1.57, 1.63)
x <- c(6.9469287, 6.290469147, 6.1918829, 6.104770097, 5.939523496,
5.942857082, 6.163662277, 6.399218779, 5.783065061, 5.638420345,
5.552741687, 5.683432022, 5.857426116, 6.162680044, 5.957396843,
6.571818964, 5.446848271, 5.712962062, 5.653265224, 6.349141363,
5.46503105, 6.049651518, 7.380125424, 5.722479551, 5.950585693,
5.808206582, 6.096318404, 5.913429847, 5.997807119, 6.206943676,
6.550982371, 6.543636484, 6.822385253, 6.507588297, 5.940914702,
6.439753879, 6.899586949, 6.156580921, 7.116019293, 6.355315455,
6.538796291, 6.498027706, 6.196593891, 6.339028678, 6.23909998,
6.551869452, 6.688031206, 6.492259138, 5.997315277)
y <- y + 0.001
I added 0.001 to y to avoid zeros. For a simple regression I could reproduce the F test run by anova:
lm0 <- lm(y ~ 1)
lm1 <- lm(y ~ x)
#
y.p <- lm1$fitted.values # predicted/fitted values
SSE <- sum((y - y.p)^2)
SSR <- sum((y.p - mean(y))^2)
SST <- sum((y - mean(y))^2)
round(SST - (SSE + SSR), 4) #check
# [1] 0
#
SS1 <- sum(residuals(lm0, "deviance")^2) #=SST
SS2 <- sum(residuals(lm1, "deviance")^2) #=SSE
df1 <- lm0$df.residual
df2 <- lm1$df.residual
MSE <- SS2/df2
MSR <- ((SS1 - SS2)/(df1 - df2))
MSR/MSE # F-value
# [1] 5.927608
anova(lm0, lm1, test="F")$F[2]
# [1] 5.927608
However, I could not reproduce F for a Gamma-version of the regression:
lm0 <- glm(y ~ 1, family=Gamma(link="log"))
lm1 <- glm(y ~ x, family=Gamma(link="log"))
#
oo <- Gamma(link="log") # family info
y.p <- oo$mu.eta(eta) # fitted values on the original scale
# ... the same as for lm example above
MSR/MSE # F-value
# [1] 3.862559
anova(lm0, lm1, test="F")$F[2]
# [1] 7.356901
However if I take the MSR, which is obviously called "Deviance" in the anova-output and divide it by the Dispersion parameter from the model summary (which strangely for me can be produced from working residuals) I'll get the correct F:
# correct F for the gamma regression:
disp <- summary(lm1)$dispersion
mdisp <- sum(residuals(lm1, "working")^2)/df2 # MSE-variant with working residuals
disp - mdisp # check
# [1] 0
Dev <- anova(lm1)$Deviance[2]
MSR-Dev # check
# [1] 0
MSR/mdisp # correct F as in anova
# [1] 7.356901
For me (w/o a deeper mathematical education) these last manipulations - which I found through try and error - look like magic. Could somebody help me to understand how the link-function of the Gamma glm is "interwoven" in the MSR/MSE calculation? I need the understanding to be able to calculate F for a Gamma regression performed with the R fastglm package, which is not compatible with anova.

R survival survreg not producing a good fit

I am new to using R, and I am trying to use survival analysis in order to find correlation in censored data.
The x data is the envelope mass of protostars. The y data is the intensity of an observed molecular line, and some values are upper limits. The data is:
x <- c(17.299, 4.309, 7.368, 29.382, 1.407, 3.404, 0.450, 0.815, 1.027, 0.549, 0.018)
y <- c(2.37, 0.91, 1.70, 1.97, 0.60, 1.45, 0.25, 0.16, 0.36, 0.88, 0.42)
censor <- c(0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1)
I am using the function survreg from the R Survival library
modeldata<-survreg(formula=Surv(y,censor)~x, dist="exponential", control = list(maxiter=90))
Which gives the following result:
summary(modeldata)
Call:
survreg(formula = Surv(y, censor) ~ x, dist = "exponential",
control = list(maxiter = 90))
Value Std. Error z p
(Intercept) -0.114 0.568 -0.20 0.841
x 0.153 0.110 1.39 0.163
Scale fixed at 1
Exponential distribution
Loglik(model)= -6.9 Loglik(intercept only)= -9
Chisq= 4.21 on 1 degrees of freedom, p= 0.04
Number of Newton-Raphson Iterations: 5
n= 11
However, when I plot the data and the model using the following method:
plot(x,y,pch=(censor+1))
xnew<-seq(0,30)
model<-predict(modeldata,list(x=xnew))
lines(xnew,model,col="red")
I get this plot of x and y data; triangles are censored data
I am not sure where I am going wrong. I have tried different distributions, but all produce similar results. The same is true when I use other data, for example:
x <- c(1.14, 1.14, 1.19, 0.78, 0.43, 0.24, 0.19, 0.16, 0.17, 0.66, 0.40)
I am also not sure if I am interpreting the results correctly.
I have tried other examples using the same method (e.g. https://stats.idre.ucla.edu/r/examples/asa/r-applied-survival-analysis-ch-1/), and it works well, as far as I can tell.
So my questions are:
Am I using the correct function for fitting the data? If not, which would be more suitable?
If it is the correct function, why is the model not fitting the data even closely? Does it have to do with the plotting?
Thank you for your help.
The "shape" of the relationship looks concave downward, so I would have guessed a ~ log(x) fit might be be more appropriate:
dfrm <- data.frame( x = c(17.299, 4.309, 7.368, 29.382, 1.407, 3.404, 0.450, 0.815, 1.027, 0.549, 0.018),
y = c(2.37, 0.91, 1.70, 1.97, 0.60, 1.45, 0.25, 0.16, 0.36, 0.88, 0.42),
censor= c(0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1))
modeldata<-survreg(formula=Surv(y,censor)~log(x), data=dfrm, dist="loggaussian", control = list(maxiter=90))
You code seemed appropriate:
png(); plot(y~x,pch=(censor+1),data=dfrm)
xnew<-seq(0,30)
model<-predict(modeldata,list(x=xnew))
lines(xnew,model,col="red"); dev.off()
modeldata
Call:
survreg(formula = Surv(y, censor) ~ log(x), data = dfrm, dist = "loggaussian",
control = list(maxiter = 90))
Coefficients:
(Intercept) log(x)
0.02092589 0.32536509
Scale= 0.7861798
Loglik(model)= -6.6 Loglik(intercept only)= -8.8
Chisq= 4.31 on 1 degrees of freedom, p= 0.038
n= 11

How to fit a linear regression in R with a fixed negative intercept?

Background: Species richness scales to the negative -0.75 of body weight. However, when I fit my data, I get a value of 0.57. A friend told me that the summary(lm) results prints the 'best fit' slope of the data. Nevertheless, I'm wondering if I can create a regression plot wherein I force the slope to be -0.75 like the literature.
My code is:
y value
log.nspecies.dec = c(3.05, 2.95, 2.97, 2.98, 2.84, 2.85, 2.83, 2.71, 2.64, 2.62, 2.58, 2.37,
2.26, 2.17, 2.00, 1.88, 1.75, 1.62, 1.36, 1.30, 1.08, 1.20, 0.90, 0.30, 0.70,
0.30, 0.48, 0.00, 0.30, 0.00)
x value
logbio.dec = c(2.1, 2.3, 2.5, 2.7, 2.9, 3.1, 3.3, 3.7, 3.9, 4.1, 4.3, 4.5, 4.7, 4.9, 5.1,
5.3, 5.5, 5.7, 5.9, 6.1, 6.3, 6.5, 6.7, 6.9, 7.1, 7.3, 7.5, 7.7, 7.9)
plot a barplot and superimpose a regression line
name the y variables with the x
names(log.nspecies.dec) = logbio.dec
order the y variables
log.nspecies.dec = log.nspecies.dec[order (as.numeric(names(log.nspecies.dec)))]
do the barplot
xpos = barplot(log.nspecies.dec, las = 2, space = 0)
lm.fit = lm(log.nspecies.dec ~ as.numeric(names(log.nspecies.dec)))
summary(lm.fit)
y.init = lm.fit$coefficients[2] * as.numeric(names(log.nspecies.dec))1 +
lm.fit$coefficients1
y.end = lm.fit$coefficients[2] * as.numeric(names(log.nspecies.dec))[length(log.nspecies.dec)] +
lm.fit$coefficients1
segments(xpos1, y.init, xpos [length(xpos)], y.end, lwd = 2, col = 'red')
title(main = 'ln Number of species ~ lm Weight')
coef(lm.fit)
gives a result wherein the slope is 0.57. How do I force the slope to -0.75?
You can use offset to fix the y-intercept at a negative value. For example
## Example data
x = 1:10
y = -2 + 2* x
# Fit the model
(m = lm(y ~ 0 + x, offset = rep(-2, length(y))))
#Call:
#lm(formula = y ~ 0 + x, offset = rep(-2, length(y)))
#Coefficients:
#x
#2
The output correctly identifies the gradient as 2.
The reason your code doesn't work is that you are using abline(). Looking at ?abline, it states that to draw the line it will in turn call coef(MODEL). When you use offset, the coef function doesn't return the y-intercept.
R> coef(m)
x
2
Hence abline draws the wrong line.
If the intercept is changed, the code still works
x = 1:10
y = 2 + 2*x
lm(y ~ 0 + x, offset = rep(2, length(y)))

Efficient way to calculate average MAPE and MSE in R

I have a real data and predicted data and I want to calculate overall MAPE and MSE. The data are time series, with each column representing data for different weeks. I predict value for each of the 52 weeks for each of the items as shown below. What would be the best possible calculate overall Error in R.
real = matrix(
c("item1", "item2", "item3", "item4", .5, .7, 0.40, 0.6, 0.3, 0.29, 0.7, 0.09, 0.42, 0.032, 0.3, 0.37),
nrow=4,
ncol=4)
colnames(real) <- c("item", "week1", "week2", "week3")
predicted = matrix(
c("item1", "item2", "item3", "item4", .55, .67, 0.40, 0.69, 0.13, 0.9, 0.47, 0.19, 0.22, 0.033, 0.4, 0.37),
nrow=4,
ncol=4)
colnames(predicted) <- c("item", "week1", "week2", "week3")
How do you get the predicted values in the first place? The model you use to get the predicted values is probably based on minimising some function of prediction errors (usually MSE). Therefore, if you calculate your predicted values, the residuals and some metrics on MSE and MAPE have been calculated somewhere along the line in fitting the model. You can probably retrieve them directly.
If the predicted values happened to be thrown into your lap and you have nothing to do with fitting the model, then you calculate MSE and MAPE as per below:
You have only one record per week for every item. So for every item, you can only calculate one prediction error per week. Depending on your application, you can choose to calculate the MSE and MAPE per item or per week.
This is what your data looks like:
real <- matrix(
c(.5, .7, 0.40, 0.6, 0.3, 0.29, 0.7, 0.09, 0.42, 0.032, 0.3, 0.37),
nrow = 4, ncol = 3)
colnames(real) <- c("week1", "week2", "week3")
predicted <- matrix(
c(.55, .67, 0.40, 0.69, 0.13, 0.9, 0.47, 0.19, 0.22, 0.033, 0.4, 0.37),
nrow = 4, ncol = 3)
colnames(predicted) <- c("week1", "week2", "week3")
Calculate the (percentage/squared) errors for every entry:
pred_error <- real - predicted
pct_error <- pred_error/real
squared_error <- pred_error^2
Calculate MSE, MAPE:
# For per-item prediction errors
apply(squared_error, MARGIN = 1, mean) # MSE
apply(abs(pct_error), MARGIN = 1, mean) # MAPE
# For per-week prediction errors
apply(squared_error, MARGIN = 0, mean) # MSE
apply(abs(pct_error), MARGIN = 0, mean) # MAPE

Resources