Is there a way to do a Regression in Catboost for a Gamma distributed response variable? - poisson

I'm working on an Insurance model and I'd like to run Severity and Frequency models using the Catboost gradient boosting algorithm. The problem is that according to the literature, a Severity model assumes a Gamma distributed response variable, while according to the Catboost documentation, a Gamma objective model is not supported. Is there a way utilize one of the existing objectives (e.g. Poisson or Tweedie), to achieve that?

Yes, you can set the variance_power (p symbol in the literature) parameter of Tweedie distribution to 2 to get a Gamma distribution.
example:
CatBoostRegressor(loss_function='Tweedie:variance_power=2', n_estimators=500, silent=True)

Related

GARCH model augmented with exponential part

I am currently working on univariate GARCH models with different specifications and got stuck on including the exponential term in the variance equation:
mean model (setting ω4 = 0)
variance model
I am using the rugarch package in R and (unsuccessfully) tried the 'eGARCH' model type and external regressor option for the recession dummy INBER to get the estimates. Is this generally the correct way for including the exponential part or am I completely off?

Is it possible to let the precision/variance parameter in a beta regression via GAM vary with the predictor as well?

I want to fit a spatiotemporal model where my dependent variable is in the range [>0,<1].
A beta regression seems suitable for this case.
I tried the betareg package, that works like a charm, but to my knowledge I cannot include complex interaction terms that occur e.g. in spatiotemporal datasets to account for autocorrelation.
I know that GAMs e.g. package mgcv support beta regression via the betar() family. To my knowledge the precision/variance parameter is held constant though and only the mean (mu) changes as a function of the predictors.
my model looks like this (it is conceptual so no example data needed):
mgcv::gam(Y~ te(latitude,longitude,day)+s(X1)+s(X2)+s(X3),family=betar())
The problem is that only mu is modelled but not phi / precision
In the betareg I can let vary phi with my predictors:
betareg::betareg(Y ~ X1+X2+X3+latitude+longitude | X1+X2+X3+latitude+longitude)
but this doesn´t let me model the spatiotemporal term as needed, because simple additive effects are not suitable for that and I need something like what is supported with the te() functionality from mgcv or any other kind of interaction term.
Is there any work around or a way to model phi but account for my spatiotemporal term either via mgcv or betareg or any other R package?
Thanks a lot!

can we get probabilities the same way that we get them in logistic regression through random forest?

I have a data structure with binary 0-1 variable (click & Purchase; click & not-purchase) against a vector of the attributes. I used logistic regression to get the probabilities of the purchase. How can I use Random Forest to get the same probabilities? Is it by using Random Forest regression? or is it Random Forest classification with type='prob' in R which gives the probability of categorical variable?
It won't give you the same result since the structure of the two method are different. Logistic regression is given by a definitive linear specification, where RF is a collective vote from multiple independent/random trees. If specification and input feature are properly tuned for both, they can produce comparable results. Here is the major difference between the two:
RF will give more robust fit against noise, outliers, overfitting or multicollinearity etc which are common pitfalls in regression type of solution. Basically if you don't know or don't want to know much about whats going in with the input data, RF is a good start.
logistic regression will be good if you know expertly about the data and how to properly specify the equation. Or somehow want to engineer how the fit/prediction works. The explicit form of GLM specification will allow you to do that.

Lambda's from glmnet (R) used in online SGD

I'm using cv.glmnet from glmnet package (in R). In the outcome I get a vector of lambda's (regularization parameter). I would like to use it in the online SGD algorithm. Is there a way of doing so and how?
Any suggestion would be helpful.
I am wondering how can I compare results (in the terms of a model's coefficients and regularization output parameter) in generalized linear model with l1 regularization and binomial distribution (logistic link function) that was calculated once in offline using cv.glmnet function from R package that I think uses Raphson-Newton estimation algorithm WITH online evaluating model of the same type but where the estimates are re-calculated after every new observation using stochastic-gradient-descent algorithm ( classic, type I ).

gbm::interact.gbm vs. dismo::gbm.interactions

Background
The reference manual for the gbm package states the interact.gbm function computes Friedman's H-statistic to assess the strength of variable interactions. the H-statistic is on the scale of [0-1].
The reference manual for the dismo package does not reference any literature for how the gbm.interactions function detects and models interactions. Instead it gives a list of general procedures used to detect and model interactions. The dismo vignette "Boosted Regression Trees for ecological modeling" states that the dismo package extends functions in the gbm package.
Question
How does dismo::gbm.interactions actually detect and model interactions?
Why
I am asking this question because gbm.interactions in the dismo package yields results >1, which the gbm package reference manual says is not possible.
I checked the tar.gz for each of the packages to see if the source code was similar. It is different enough that I cannot determine if these two packages are using the same method to detect and model interactions.
To summarize, the difference between the two approaches boils down to how the "partial dependence function" of the two predictors is estimated.
The dismo package is based on code originally given in Elith et al., 2008 and you can find the original source in the supplementary material. The paper very briefly describes the procedure. Basically the model predictions are obtained over a grid of two predictors, setting all other predictors at their means. The model predictions are then regressed onto the grid. The mean squared errors of this model are then multiplied by 1000. This statistic indicates departures of the model predictions from a linear combination of the predictors, indicating a possible interaction.
From the dismo package, we can also obtain the relevant source code for gbm.interactions. The interaction test boils down to the following commands (copied directly from source):
interaction.test.model <- lm(prediction ~ as.factor(pred.frame[,1]) + as.factor(pred.frame[,2]))
interaction.flag <- round(mean(resid(interaction.test.model)^2) * 1000,2)
pred.frame contains a grid of the two predictors in question, and prediction is the prediction from the original gbm fitted model where all but two predictors under consideration are set at their means.
This is different than Friedman's H statistic (Friedman & Popescue, 2005), which is estimated via formula (44) for any pair of predictors. This is essentially the departure from additivity for any two predictors averaging over the values of the other variables, NOT setting the other variables at their means. It is expressed as a percent of the total variance of the partial dependence function of the two variables (or model implied predictions) so will always be between 0-1.

Resources