Fitting mixture of beta distributions with flexmix in R - r

Given the data (one-dimensional, non-repeated observations between 0 and 1), fit the model of one uniform (which is just fixed beta) and k beta distributions with varying k.
After trying many approaches (my own EM implementation as well), I turned to flexmix package on CRAN. However, it seems there is no inbuilt family for Beta distributions (like for Poisson, Binomial and Gamma). However there is an option to write an M-step for an arbitrary mixture.
Given the example M-driver in the vignette, mymclust I just got horribly entangled with the objects and dependencies and do not even know where to start.
How do I write this driver for my task?

Related

glm with gamma family with NA / 0 Values in r

I would like to use a generalized linear mixed effect model. My data follows a gamma distribution but contains NA and 0 values. However, the gamma family does not allow me to compute these models if I have 0 values. Does anyone know of a way to go around this problem?
I heard that the glmmTMB package allows the use of gamma distributions with negative values, but I work on a mac, and it seems that I cannot download this package.
When I try, I get an error code stating "clang: error: unsupported option '-fopenmp'".
It would be great if any of you had an idea.
The Gamma distribution has no support on the non-positive real numbers. Accordingly, you are basically asking it to model data which it could never produce and therefore the software throws an error.
Similarly, the missing data cannot be modelled because the model you specify does not itself model or marginalize out the missing data. You will need to either replace the missing values with some number (impute missing values) deterministically/probabilistically or drop the observations with missing values.
In short, you will need to employ an alternative model. You could use the zero-inflated gamma model or the gamma hurdle. See here for an example. There is no "correct" alternative model: it is a model and you will need to think about their relative strengths and weaknesses (assumptions, etc.).

Is it possible to let the precision/variance parameter in a beta regression via GAM vary with the predictor as well?

I want to fit a spatiotemporal model where my dependent variable is in the range [>0,<1].
A beta regression seems suitable for this case.
I tried the betareg package, that works like a charm, but to my knowledge I cannot include complex interaction terms that occur e.g. in spatiotemporal datasets to account for autocorrelation.
I know that GAMs e.g. package mgcv support beta regression via the betar() family. To my knowledge the precision/variance parameter is held constant though and only the mean (mu) changes as a function of the predictors.
my model looks like this (it is conceptual so no example data needed):
mgcv::gam(Y~ te(latitude,longitude,day)+s(X1)+s(X2)+s(X3),family=betar())
The problem is that only mu is modelled but not phi / precision
In the betareg I can let vary phi with my predictors:
betareg::betareg(Y ~ X1+X2+X3+latitude+longitude | X1+X2+X3+latitude+longitude)
but this doesn´t let me model the spatiotemporal term as needed, because simple additive effects are not suitable for that and I need something like what is supported with the te() functionality from mgcv or any other kind of interaction term.
Is there any work around or a way to model phi but account for my spatiotemporal term either via mgcv or betareg or any other R package?
Thanks a lot!

Which loss function is used in R package gbm for multinomial distribution?

I am using the R package gbm to fit probabilistic classifiers in a dataset with > 2 classes. I am using distribution = "multinomial" as argument, however, I have some difficulties to find out implementation details of what is actually implemented by that choice.
The help function for gbm states that
Currently available options are "gaussian" (squared error), "laplace" (absolute loss), "tdist" (t-distribution loss), "bernoulli" (logistic regression for 0-1 outcomes), "huberized" (huberized hinge loss for 0-1 outcomes), classes), "adaboost" (the AdaBoost exponential loss for 0-1 outcomes), "poisson" (count outcomes), "coxph" (right censored observations), "quantile", or "pairwise" (ranking measure using the LambdaMart algorithm).
and does not list multinomial, whereas the paragraph preceding the one I copied states that
... If not specified, gbm will try to guess: ... if the response is a factor,multinomial is assumed; ...
I would like to know which loss function is implemented if I specify distribution = "multinomial". The documentation in the vignette which can be accessed via
utils::browseVignettes("gbm")
does not contain the word "multinomial" or any descriptions of what that argument implies.
I have tried to look at the package source code, but can't find the information there as well. It seems that the relevant things happen in the C++ functions in the file /src/multinomial.cpp, however, my knowledge of C++ is too limited to understand what is going on there.

parameter C. epsilon as vector in kernlab's ksvm in R

I am trying to use ksvm function of kernlab package in R for epsilon-SVM regression. I want to put parameters C(regularization constant) and epsilon (insensitivity) as vectors(length of vector = training data length). But I am not able to figure out how to do this. Please suggest some way.
Why do you assume that you can do it? According to documentation of ksvm you can only weight classes, not particular samples. Such modification is accessible in for example sklearn python library (as samples' weights).
To artificialy implement per samples C-weights you could oversample your data. It will be very inefficient (especially if you have large differences in C values), but it can be applied to almost any SVM library.

Negative binomial in GEE

For R packages implementing GEE such as gee, geepack, it seems that the negative binomial family is not included. I have two questions:
Are there any other R packages for GEE that I am not aware of?
If not, is there a simple step to allow the creation of a family, i.e providing the link function (log mu) and the variance function (mu + mu^2/theta), assuming theta is specified (otherwise the NB is not a GLM) and then to let the gee or geepack codes do the business in a similar fashion to glm?
You should be able to use the negative.binomial family defined in the MASS package to do this (set up a NB family with a specified theta value). It looks like geepack::geese (at least) will accept family specifications in this form. To estimate theta you might try embedding the GEE fit with a fixed theta into a loop, or make a geefit_NB(theta) function and optimize over theta.
If negative.binomial did not already exist in MASS, you could define your own family (this is admittedly a bit advanced -- I would start by downloading the source code of the MASS package and looking at the file R/neg.bin.R).

Resources