Is there an R function to fit a GLMM for count data with range 0-15? (Possible right censoring needed?) - r

We are due to collect some survey data that has a range 0-15 score, potentially skewed, and has a multilevel structure (repeated measures and clustering). I'm anticipating that fitting a linear mixed model in R lmer will be problematic given the outcome distribution.
I am considering whether some sort of right-censored generalized linear mixed model (Poisson) may be a solution but I'm struggling to find something to fit this model.
I think the closest that I can find is the VGAM::vglm with family = cens.poisson but, as far as I can tell, it cannot include multilevel structure?
Does anyone know any R functions that would permit this model? If so, is there an equivalent power calc function or would this be written as a simulation?

Related

Multilevel mixed-effects tobit regression in R

I have a dataset with data left censored and I wanted to apply a multilevel mixed-effects tobit regression, but I only find information about how to do it in Stata. Is it possible to do it in R?
I found the packages 'VGAM' and 'CensREG', but I don't get how to add fixed and random effects.
Also my data is log-normal distributed, is there a way to add this to the model?
Thanks!
According to Section 3.5 of a vignette, the censReg package can handle a mixed model if the data are prepared properly via the plm package.
This Cross Validated page shows an example.
I don't have experience with this; it might only work with formal panel data rather than more general random-effects structures.
If your data are truly log-normal, you could take logs first and set the lower censoring limit on the log scale. Note that an apparent log-normal distribution of outcomes might just represent a corresponding distribution of predictor values with an underlying normal error distribution around the predictions. Don't jump blindly into a log-normal assumption.

How to do crossvalidation on a two-part regression model?

I am currently working on regression analysis to model EQ-5D-5L health data. This data is inflated at the upper bound i.e. 1, and one of the approaches I use to model is with two-part models. By doing that, I combine a logistic model with binary data (1 or not 1), and continuous data.
The issue comes when trying to cross-validate (K-fold) the two-part models: I cannot find a way to include both "parts" of the model in the caret package in R, and I have not been able to find anybody that has solved the problem for me.
When I generate the predictions from the two-part model, it is essentially the coefficients from the two separate models that are multiplied together. So the models are developed separately, as they model different things from the same variable (binary and continuous outcome), but joined together when used to predict values.
Could it be possible to somehow cross-validate each part of the model separately, and get some kind of useful answer out of it?
Hope you guys can help.

Interpreting ARIMA for ITSA

I want to test for efficacy of an intervention. I have pre- and post-intervention data and I used auto.arima to find the best fit for the two data sets.
I'm stuck in the actual use of these models now. What do I do with the auto.arima fit? Can I graph it and test for statistically significant differences in the coefficients? If so, how do I graph it?
This is what I have right now ( specified from auto.arima)
myPreFit<-arima(myPre,order=c(0,1,0))
myPostFit<-arima(myPost,order=c(1,0,1))

can we get probabilities the same way that we get them in logistic regression through random forest?

I have a data structure with binary 0-1 variable (click & Purchase; click & not-purchase) against a vector of the attributes. I used logistic regression to get the probabilities of the purchase. How can I use Random Forest to get the same probabilities? Is it by using Random Forest regression? or is it Random Forest classification with type='prob' in R which gives the probability of categorical variable?
It won't give you the same result since the structure of the two method are different. Logistic regression is given by a definitive linear specification, where RF is a collective vote from multiple independent/random trees. If specification and input feature are properly tuned for both, they can produce comparable results. Here is the major difference between the two:
RF will give more robust fit against noise, outliers, overfitting or multicollinearity etc which are common pitfalls in regression type of solution. Basically if you don't know or don't want to know much about whats going in with the input data, RF is a good start.
logistic regression will be good if you know expertly about the data and how to properly specify the equation. Or somehow want to engineer how the fit/prediction works. The explicit form of GLM specification will allow you to do that.

Mixed Logit fitted probabilities in RSGHB

My question has to do with using the RSGHB package for predicting choice probabilities per alternative by applying mixed logit models (variation across respondents) with correlated coefficients.
I understand that the choice probabilities are simulated on an individual level and in order to get preference share an average of the individual shares would do. All the sources I have found treat each prediction as a separate simulation which makes the whole process cumbersome if many predictions are needed.
Since one can save the respondent specific coefficient draws wouldn't it be faster to simply apply the logit transform to each each (vector of) coefficient draw? Once this is done new or existing alternatives could be calculated faster than rerunning a whole simulation process for each required alternative. For the time being using a fitted() approach will not help me understand how prediction actually works.

Resources