Bioassay dose response fitting with heteroscedastic data - r

I am using the drc package in R to fit dose response curves (4-param logistic: LL.4) for biological assays. The data I collect is typically heteroscedastic (example image below). I am looking for ways to account for this when calling drm. I have found three possibilities that seem promising:
Use the type="Poisson" parameter to drm. However, over- and under-dispersion are probable for many assays so this isn't likely to be a general solution
Follow drm with a call to drc.boxcox. This seems to be more general and could work.
Use the "varPower" tranform that used to be implemented in drc.multdrc and in drc.drm before it was commented out (search for "varPower" in the drm source). I could un-comment those sections to restore the varPower functionality.
My questions are, what is the most accepted way to handle this? Also, does anyone know why varPower variance handling was removed from the drc package?
Example code:
# Naive method
a <- drm(y~x,data=subs, fct=LL.4(),control=ctl, start=params)
#Poisson Method
a <- drm(y~x,data=subs, fct=LL.4(),control=ctl, start=params, type="Poisson")
#BOXCOX method
a <- drm(y~x,data=subs, fct=LL.4(),control=ctl, start=params)
a2 <- boxcox(a)
Example Data:

I found the answer to this question in this paper by the authors of the drc package. In the paper they comment:
Weights may be used for addressing variance heterogeneity in the
response. However, the transform-both-sides approach should be
preferred over using often very imprecisely determined weights
The "transform-both-sides" approach refers to using the drc.boxcox function (code in the original question).
Further advice was provided in a personal communication with one of the authors of the drc package. He advised that presently, the medrc R package is better suited for dose response analysis in R.

Related

mlr3 optimized average of ensemble

I try to optimize the averaged prediction of two logistic regressions in a classification task using a superlearner.
My measure of interest is classif.auc
The mlr3 help file tells me (?mlr_learners_avg)
Predictions are averaged using weights (in order of appearance in the
data) which are optimized using nonlinear optimization from the
package "nloptr" for a measure provided in measure (defaults to
classif.acc for LearnerClassifAvg and regr.mse for LearnerRegrAvg).
Learned weights can be obtained from $model. Using non-linear
optimization is implemented in the SuperLearner R package. For a more
detailed analysis the reader is referred to LeDell (2015).
I have two questions regarding this information:
When I look at the source code I think LearnerClassifAvg$new() defaults to "classif.ce", is that true?
I think I could set it to classif.auc with param_set$values <- list(measure="classif.auc",optimizer="nloptr",log_level="warn")
The help file refers to the SuperLearner package and LeDell 2015. As I understand it correctly, the proposed "AUC-Maximizing Ensembles through Metalearning" solution from the paper above is, however, not impelemented in mlr3? Or do I miss something? Could this solution be applied in mlr3? In the mlr3 book I found a paragraph regarding calling an external optimization function, would that be possible for SuperLearner?
As far as I understand it, LeDell2015 proposes and evaluate a general strategy that optimizes AUC as a black-box function by learning optimal weights. They do not really propose a best strategy or any concrete defaults so I looked into the defaults of the SuperLearner package's AUC optimization strategy.
Assuming I understood the paper correctly:
The LearnerClassifAvg basically implements what is proposed in LeDell2015 namely, it optimizes the weights for any metric using non-linear optimization. LeDell2015 focus on the special case of optimizing AUC. As you rightly pointed out, by setting the measure to "classif.auc" you get a meta-learner that optimizes AUC. The default with respect to which optimization routine is used deviates between mlr3pipelines and the SuperLearner package, where we use NLOPT_LN_COBYLA and SuperLearner ... uses the Nelder-Mead method via the optim function to minimize rank loss (from the documentation).
So in order to get exactly the same behaviour, you would need to implement a Nelder-Mead bbotk::Optimizer similar to here that simply wraps stats::optim with method Nelder-Mead and carefully compare settings and stopping criteria. I am fairly confident that NLOPT_LN_COBYLA delivers somewhat comparable results, LeDell2015 has a comparison of the different optimizers for further reference.
Thanks for spotting the error in the documentation. I agree, that the description is a little unclear and I will try to improve this!

How to create response surface using random forest model in R?

I have a made a rf model in R having six predictors and a response. The predictive model seems to be good enough but we also wanted to generate a response surface for this model.
attach(al_mf)
library(randomForest)
set.seed(1)
rfalloy=randomForest(Mf~.,data=al_mf,mtry=6,importance=TRUE)
rfalloy
rfpred=predict(rfalloy,al_mf$Mf)
rfpred
sse=sum((rfpred-mean(al_mf$Mf))^2)
sse
ssr=sum((rfpred-al_mf$Mf)^2)
ssr
Rsqaure=1-(ssr/(sse+ssr))
Rsqaure
importance(rfalloy)
At a general level, since you haven't provided too many specifics about exactly what you are looking for in your response surface, here are a few hopefully helpful starting points:
Have you taken a look at rsm? This documentation provides some good use cases for the package.
These in-class notes from a University of New Mexico stats lecture are full of code examples related to response surfaces. Just check out the table of contents and you'll probably find what you're looking for.
This StackOverflow post also provides an example using the rgl package.

Using a 'gbm' model created in R package 'dismo' with functions in R package 'gbm'

This is a follow-up to a previous question I asked a while back that was recently answered.
I have built several gbm models with dismo::gbm.step, which relies on the gbm fitting functions found in R package gbm, as well as cross validation tools from R package splines.
As part of my analysis, I would like to use some of the graphical tools available in R (e. g. perspective plots) to visualize pairwise interactions in the data. Both the gbm and the dismo packages have functions for detecting and modelling interactions in the data.
The implementation in dismo is explained in Elith et. al (2008) and returns a statistic which indicates departures of the model predictions from a linear combination of the predictors, while holding all other predictors at their means.
The implementation in gbm uses Friedman`s H statistic (Friedman & Popescue, 2005), and returns a different metric, and also does NOT set the other variables at their means.
The interactions modelled and plotted with dismo::gbm.interactions are great and have been very informative. However, I would also like to use gbm::interact.gbm, partly for publication strength and also to compare the results from the two methods.
If I try to run gbm::interact.gbm in a gbm.object created with dismo, an error is returned…
"Error in is.factor(data[, x$var.names[j]]) :
argument "data" is missing, with no default"
I understand dismo::gmb.step adds extra data the authors thought would be useful to the gbm model.
I also understand that the answer to my question lies somewherein the source code.
My questions is...
Is it possible to modify a gbm object created in dismo to be used in gbm::gbm.interact? If so, would this be accomplished by...
a. Modifying the gbm object created in dismo::gbm.step?
b. Modifying the source code for gbm::interact.gbm?
c. Doing something else?
I will be going through the source code trying to solve this myself, if I come up with a solution before anyone answers I will answer my own question.
The gbm::interact.gbm function requires data as an argument interact.gbm <- function(x, data, i.var = 1, n.trees = x$n.trees).
The dismo gbm.object is essentially the same as the gbm gbm.object, but with extra information attached so I don't imagine changing the gbm.object would help.

How do you perform a goodness of link test for a generalized linear model in R?

I'm working on fitting a generalized linear model in R (using glm()) for some data that has two predictors in full factorial. I'm confident that the gamma family is the right error distribution to use but not sure about which link function to use so I'd like to test all possible link functions against one another. Of course, I can do this manually by making a separate model for each link function and then compare deviances, but I imagine there is a R function that will do this and compile results. I have searched on CRAN, SO, Cross-validated, and the web - the closest function I found was clm2 but I do not believe I want a cumulative link model - based on my understanding of what clm's are.
My current model looks like this:
CO2_med_glm_alf_gamma <- glm(flux_median_mod_CO2~PercentH2OGrav+
I(PercentH2OGrav^2)+Min_Dist+
I(Min_Dist^2)+PercentH2OGrav*Min_Dist,
data = NC_alf_DF,
family=Gamma(link="inverse"))
How do I code this model into an R function that will do such a 'goodness-of-link' test?
(As far as the statistical validity of such a test goes, this discussion as well as a discussion with a stats post-doc lead me to believe that is valid to compare AIC or deviances between generalized linear models that are identical except for having different link functions)
This is not "all possible links", it's testing against a specified class of links, but there is a goodness-of-link test by Pregibon that is implemented in the LDdiag package. It's not on CRAN, but you can install it from the archives via
devtools::install_version("LDdiag","0.1")
The example given (not that exciting) is
quine$Days <- ifelse(quine$Days==0, 1, quine$Days)
ex <- glm(Days ~ ., family = Gamma(link="log"), data = quine)
pregibon(ex)
The pregibon family of link functions is implemented in the glmx package. As pointed out by Achim Zeleis in comments, the package provides various parametric link functions and supports general estimation and inference based on such parametric links (or more generally parametric families). To see a worked example how this can be employed for a variety of goodness-of-link assessements, see example("WECO", package = "glmx"). This replicates the analyses from two papers by Koenker and Yoon (see below).
This example might be useful too.
Koenker R (2006). “Parametric Links for Binary Response.” R News, 6(4), 32--34; link to page with supplementary materials.
Koenker R, Yoon J (2009). “Parametric Links for Binary Choice Models: A Fisherian-Bayesian Colloquy.” Journal of Econometrics, 152, 120--130; PDF.
I have learned that the dredge function (MuMIn package) can be used to perform goodness-of-link tests on glms, lms, etc. More generally it is a model selection function but allows for a good deal of customization. In this case, you can use the varying option to compare models fit with different link functions. See the Beetle example that they work for details.

Setting Contrasts for ANOVA in R

I've been attempting to perform an ANOVA in R recently on the attached data frame.
My question revolves around the setting of contrasts.
My design is a 3x5 within-subjects design.
There are 3 visual conditions under 'Circle1' and 5 audio under 'Beep1'.
Does anyone have any idea how I should set the contrasts? This is something I'm unfamiliar with as I'm making the transition from point and click stats in SPSS to coded in R.
Thanks for your time
Data file:
Reiterating my answer from another stackoverflow question that was flagged as similar, since you didn't provide any code, you might start by having a look at the contrast package in R. As they note in the document:
"The purpose of the contrast package is to provide a standardized interface for testing linear combinations of parameters from common regression models. The syntax mimics the contrast. Design function from the Design library. The contrast class has been extended in this package to linear models produced using the functions lm, glm, gls, lme and geese."
There is also a nice little tutorial here by Dr. William King who talks about factorial between subjects ANOVA and also includes an abundance of R code. This is wider scoped than you question but would be a great place to start (just to get context).
Finally, here is another resource that you can refer to which talks about setting up orthogonal contrasts in R.

Resources