Negative binomial in GEE - r

For R packages implementing GEE such as gee, geepack, it seems that the negative binomial family is not included. I have two questions:
Are there any other R packages for GEE that I am not aware of?
If not, is there a simple step to allow the creation of a family, i.e providing the link function (log mu) and the variance function (mu + mu^2/theta), assuming theta is specified (otherwise the NB is not a GLM) and then to let the gee or geepack codes do the business in a similar fashion to glm?

You should be able to use the negative.binomial family defined in the MASS package to do this (set up a NB family with a specified theta value). It looks like geepack::geese (at least) will accept family specifications in this form. To estimate theta you might try embedding the GEE fit with a fixed theta into a loop, or make a geefit_NB(theta) function and optimize over theta.
If negative.binomial did not already exist in MASS, you could define your own family (this is admittedly a bit advanced -- I would start by downloading the source code of the MASS package and looking at the file R/neg.bin.R).

Related

binomial()$linkinv(fixef()) and binomial_pred_ci() functions: what exactly does these function are for when applied to mixed generalized analysis?

I was workinf on a dataset, trying re-perfomrming an already run statysical analysis and I met the following function:
binomial()$linkinv(fixef(m))
after running the following model
summary((m = glmer(T1.ACC ~ COND + (COND | ID), d9only, family = binomial)))
My first question is what exactly does this functions is made for? Beacuse throgh other command lines the reciprocal code as well as a slightly modified code based always on it are also reported:
1) 1- binomial()$linkinv(fixef())
2) d9only$fit = binomial()$linkinv(model.matrix(m) %*% fixef(m)) #also the sense of the operator %*% is quite misterious too.
Moreover, another function present is the following one:
binomial_pred_ci()
To be honest, I've to search through the overall script and no customized function there was or the package where that has been called from either? Anyone knows where does it may come from? Maybe the package 'runjags'? Just in case, any on how to download it?
Thanks for your answers
I agree with most of #Oliver's answer. I will add a few comments (since I had an answer partly composed already).
I would be very wary of the script you are following: some parts look wrong (I could obviously be mistaken since these bits are taken completely out of context ...)
binomial()$linkinv refers to the inverse link function for the model used. By default (which applies in this case since no optional link= argument has been specified), this is the inverse-logit or logistic function A nearly equivalent function is available via plogis(), but using $linkinv could be better in some cases since it would generalize to binomial analyses done with other link functions [e.g. probit or cloglog].
as #Oliver mentions, applying the inverse link function to the coefficients is at least weird, I would even say wrong. Researchers often exponentiate coefficients estimated on the logit/log-odds scale to obtain odds ratios, but applying the inverse link (usually logistic function) is rarely correct.
binomial()$linkinv(model.matrix(m) %*% fixef(m)) is indeed computing the predicted estimates on the link scale and converting them back to the data (= probability) scale. You can get the same results more reliably (handling missing values, etc.) by using predict(m, type = "response", re.form = ~0) (this extends #Oliver's answer to a case that also applies the inverse-link function for you).
I don't know what binomial_pred_ci is either, but I would suggest you look at predictInterval() from the merTools package ...
PS these answers all have not much to do with runjags, which uses an entirely different model structure. Presumably glmer models are being fitted for comparison ...
help(binomial) describes the link function and inverse link function and their uses. binomial()$linkinv is the binomial inverse-link function (sigmoid function) prob(y|eta) = 1 / (1 + exp(-eta)) where eta is the linear predictor. Using this with the coefficients (or fixed effects) is a bit odd, but is not unusual to get an idea of how large the effect of each coefficient is. I would not encourage it however.
%*% is the matrix multiplier, while model.matrix(m) (for lme4) extracts the fixed effect model matrix. So model.matrix(m) %*% fixef(m) is the linear predictor using only fixed effects. It would be the same as predict(m, re.form = ~ 0). This is often used in case you want to use the fixed effect model either because you want to correct for between-group-variation or because you are predicting new data.
binomial_pred_ci no idea. Guessing it's a function for predicting confidence levels.

Is it possible to let the precision/variance parameter in a beta regression via GAM vary with the predictor as well?

I want to fit a spatiotemporal model where my dependent variable is in the range [>0,<1].
A beta regression seems suitable for this case.
I tried the betareg package, that works like a charm, but to my knowledge I cannot include complex interaction terms that occur e.g. in spatiotemporal datasets to account for autocorrelation.
I know that GAMs e.g. package mgcv support beta regression via the betar() family. To my knowledge the precision/variance parameter is held constant though and only the mean (mu) changes as a function of the predictors.
my model looks like this (it is conceptual so no example data needed):
mgcv::gam(Y~ te(latitude,longitude,day)+s(X1)+s(X2)+s(X3),family=betar())
The problem is that only mu is modelled but not phi / precision
In the betareg I can let vary phi with my predictors:
betareg::betareg(Y ~ X1+X2+X3+latitude+longitude | X1+X2+X3+latitude+longitude)
but this doesn´t let me model the spatiotemporal term as needed, because simple additive effects are not suitable for that and I need something like what is supported with the te() functionality from mgcv or any other kind of interaction term.
Is there any work around or a way to model phi but account for my spatiotemporal term either via mgcv or betareg or any other R package?
Thanks a lot!

Which loss function is used in R package gbm for multinomial distribution?

I am using the R package gbm to fit probabilistic classifiers in a dataset with > 2 classes. I am using distribution = "multinomial" as argument, however, I have some difficulties to find out implementation details of what is actually implemented by that choice.
The help function for gbm states that
Currently available options are "gaussian" (squared error), "laplace" (absolute loss), "tdist" (t-distribution loss), "bernoulli" (logistic regression for 0-1 outcomes), "huberized" (huberized hinge loss for 0-1 outcomes), classes), "adaboost" (the AdaBoost exponential loss for 0-1 outcomes), "poisson" (count outcomes), "coxph" (right censored observations), "quantile", or "pairwise" (ranking measure using the LambdaMart algorithm).
and does not list multinomial, whereas the paragraph preceding the one I copied states that
... If not specified, gbm will try to guess: ... if the response is a factor,multinomial is assumed; ...
I would like to know which loss function is implemented if I specify distribution = "multinomial". The documentation in the vignette which can be accessed via
utils::browseVignettes("gbm")
does not contain the word "multinomial" or any descriptions of what that argument implies.
I have tried to look at the package source code, but can't find the information there as well. It seems that the relevant things happen in the C++ functions in the file /src/multinomial.cpp, however, my knowledge of C++ is too limited to understand what is going on there.

R: robust package -- lmRob how to find the psi function used in the calculations

I am using lmRob.
require(robust)
stack.rob.int <- lmRob(Loss ~.*., data = stack.dat)
Fine but, I was wondering how I could obtain the psi-function that is used by the lmRob function in the actual fitting. Thanks in advance for any help!
If I were to use the lmrob function in robustbase, is it possible to change the psi function to subtract it by a constant. I am trying to implement the bootstrap as per Lahiri (Annals of Statistics, 1992) where the way to still keep the bootstrap valid is mentioned to be to replace the psi() with the originalpsi() minus the mean ot the residuals while fitting the bootstrap for the robust linear model.
So, there is no way to access the psi function directly for robust::lmRob().
Simply put, lmRob() calls lmRob.fit() (or lmRob.wfit() if you supply weights) which subsequently calls lmRob.fit.compute() that then sets initial values for a Fortran version depending on the lmRob.control() set to either "bisquare" or "optimal".
As a result of the above discussion, if you need access to the psi functions, you may wish to use robustbase as it has easy access to many psi functions (c.f. the biweights)
Edit 1
Regarding:
psi function evaluated at the residuals in lmRob
No. The details of what is available after running lmRob is available in the lmRob.object. The documentation is accessible via ?lmRob.object. Regarding residuals, the following are available in the lmRob object.
residuals: the residual vector corresponding to the estimates returned in coefficients.
T.residuals: the residual vector corresponding to the estimates returned in T.coefficients.
M.weights: the robust estimate weights corresponding to the final MM-estimates in coefficients, if applies.
T.M.weights: the robust estimate weights corresponding to the initial S-estimates in T.coefficients, if applies.
Regarding
what does "optimal" do in lmRob?
Optimal refers to the following psi function:
sign(x)*(- (phi'(|x|) + c) / (phi(|x|) )
For other traditional psi functions, you may wish to look at robustbase's vignette
or a robust textbook.

Fitting mixture of beta distributions with flexmix in R

Given the data (one-dimensional, non-repeated observations between 0 and 1), fit the model of one uniform (which is just fixed beta) and k beta distributions with varying k.
After trying many approaches (my own EM implementation as well), I turned to flexmix package on CRAN. However, it seems there is no inbuilt family for Beta distributions (like for Poisson, Binomial and Gamma). However there is an option to write an M-step for an arbitrary mixture.
Given the example M-driver in the vignette, mymclust I just got horribly entangled with the objects and dependencies and do not even know where to start.
How do I write this driver for my task?

Resources