How does one implement a conditional poisson regression in R? - r

I am keen to implement a conditional (bivariate?) poisson regression in R to assess the change in rates of a variable (stratified by treatment condition) pre- / post- an intervention. Is anyone familiar with a package that runs this type of analysis?

Check out this "gnm" package in R. It has a function of gnm() where you can specify you model formula, family=poisson(), offset, dataset and strata id in "eliminate". Please read it.

Related

Negative binomial model with multiply imputed, weighted dataset in R

I am running an analysis of hospital length of stay based on a number of parameters in R, with one primary exposure. Many of the covariates are missing, most commonly lab values, because these aren't checked for all patients. I've worked out what seems to be a good multiple imputation schema using MICE. Because of imbalance between exposed and unexposed groups, I'm also weighting using propensity scores.
I've managed to run a successful weighted Poisson model with MICE and WeightThem. However, when I checked the models for overdispersion, it does appear that the variance is greater than the mean, implying I should be using a quasipoisson or negative binomial model. However, I can't find documentation on negative binomial models with WeightThem or WeightIt in R.
Does anyone have any experience? To run a negative binomial model, i can just use the following code:
results <- with(models, MASS::glm.nb(LOS ~ exposure + covariate1 + covariate2)
in which "models" is the multiply-imputed WeightIt object.
However, according to the WeightIt documentation, when using any glm model you need to run it as a svyglm to get proper standard errors:
results <- with(models, svyglm(LOS ~ exposure + covariate1 + covariate2,
family = poisson()))
There is a function in the sjstats package called svyglm.nb, but this requires creating a design matrix or the model won't run. I have no idea how/whether this is necessary - is the first version (just glm.nb) sufficient? Am I entirely thinking about this wrong?
Thanks so much, advice is much appreciated.

Ordinal logistic regression (or Beta regression) with a LASSO regularization in R?

I was wondering if someone would know an R package that would allow me to fit an Ordinal Logistic regression with a LASSO regularization or, alternatively, a Beta regression still with the LASSO? And if you also know of a nice tutorial to help me code that in R (with appropriate cross-validation), that would be even better!
Some context: My response variable is a satisfaction score between 0 and 10 (actually, values lie between 2 and 10) so I can model it with a Beta regression or I can convert its values into ranked categories. My interest is to identify important variables explaining this score but as I have too many potential explanatory variables (p = 12) compared to my sample size (n = 105), I need to use a penalized regression method for model selection, hence my interest in the LASSO.
The ordinalNet package does this. There's a paper with example here:
https://www.jstatsoft.org/article/download/v099i06/1440
Also the glmnetcr package: https://cran.r-project.org/web/packages/glmnetcr/vignettes/glmnetcr.pdf

How to build a model for temperature-outcome using dlm?

I have a dataset containing information about weather, air pollution and healthoutcomes. I want to regress temperature (T) and temperature lag (T1) against cardiac deaths (CVD). I have previously used the glm model in R using the following script:
#for mean daily temperature and temperature lags separately.
modelT<-glm(cvd~T, data=datapoisson, family=poisson(link="log"), na=na.omit)
I get the effect estimates and standard error values which i used to convert to risk ratio.
Now i want to use dynamic linear model or distributed linear model for check the predictor-outcome and lagged predictor outcome association. However, i can't find the script for running the model in R.
I installed the DLM package in R, but still can't figure out how to build a model using DLM package in R.
I would appreciate if someone can help with it.
Could you try least squares multiple regression to predict the outcome? I used that method when I tried to 'predict' which factors influenced power in a floating offshore wind turbine. It is good for correlating multiple parameters.
They fit a plane to a set of points, but it seems like a similar idea.
https://math.stackexchange.com/questions/99299/best-fitting-plane-given-a-set-of-points

Use glm to predict on fresh data

I'm relatively new to glm - so please bear with me.
I have created a glm (logistic regression) to predict whether an individual CONTINUES studies ("0") or does NOTCONTINUE ("1"). I am interested in predicting the latter. The glm uses seven factors in the dataset and the confusion matrices are very good for what I need and combining seven years' of data have also been done. Straight-forward.
However, I now need to apply the model to the current years' data, which of course does not have the NOTCONTINUE column in it. Lets say the glm model is "CombinedYears" and the new data is "Data2020"
How can I use the glm model to get predictions of who will ("0") or will NOT ("1") continue their studies? Do I need to insert a NOTCONTINUE column into the latest file ?? I have tried this structure
Predict2020 <- predict(CombinedYears, data.frame(Data2020), type = 'response')
but the output only holds values <0.5.
Any help very gratefully appreciated. Thank you in advance
You mentioned that you already created a prediction model to predict whether a particular student will continue studies or not. You used the glm package and your model name is CombinedYears.
Now, what you have to know is that your problem is a binary classification and you used logistic regression for this. The output of your model when you apply it on new data, or even the same data used to fit the model, is probabilities. These are values between zero and one. In the development phase of your model, you need to determine the cutoff threshold of these probabilities which you can use later on when you predict new data. For example, you may determine 0.5 as a cutoff, and every probability above that is considered NOTCONTINUE and below that is CONTINUE. However, the best threshold can be determined from your data as well by maximizing both specificity and sensitivity. This can be done by calculating the area under the receiver operating characteristic curve (AUC). There are many packages than can do this for you, such as pROC and AUC packages in R. The same packages can determine the best cutoff as well.
What you have to do is the following:
Determine the cutoff threshold after calculating the AUC
library(pROC)
roc_object = roc(your_fit_data$NOTCONTINUE ~ fitted(CombinedYears))
coords(roc.roc_object, "best", ret="threshold", transpose = FALSE)
Use your model to predict on your new data year (as you did)
Predict2020 = predict(CombinedYears, data.frame(Data2020), type = 'response')
Now, the content of Predict2020 is just probabilities for each
student. Use the cutoff you obtained from step (1) to classify your
students accordingly

GLM with autoregressive term to correct for serial correlation

I have a stationary time series to which I want to fit a linear model with an autoregressive term to correct for serial correlation, i.e. using the formula At = c1*Bt + c2*Ct + ut, where ut = r*ut-1 + et
(ut is an AR(1) term to correct for serial correlation in the error terms)
Does anyone know what to use in R to model this?
Thanks
Karl
The GLMMarp package will fit these models. If you just want a linear model with Gaussian errors, you can do it with the arima() function where the covariates are specified via the xreg argument.
There are several ways to do this in R. Here are two examples using the "Seatbelts" time series dataset in the datasets package that comes with R.
The arima() function comes in package:stats that is included with R. The function takes an argument of the form order=c(p, d, q) where you you can specify the order of the auto-regressive, integrated, and the moving average component. In your question, you suggest that you want to create a AR(1) model to correct for first-order autocorrelation in the errors and that's it. We can do that with the following command:
arima(Seatbelts[,"drivers"], order=c(1,0,0),
xreg=Seatbelts[,c("kms", "PetrolPrice", "law")])
The value for order specifies that we want an AR(1) model. The xreg compontent should be a series of other Xs we want to add as part of a regression. The output looks a little bit like the output of summary.lm() turned on its side.
Another alternative process might be more familiar to the way you've fit regression models is to use gls() in the nlme package. The following code turns the Seatbelt time series object into a dataframe and then extracts and adds a new column (t) that is just a counter in the sorted time series object:
Seatbelts.df <- data.frame(Seatbelts)
Seatbelts.df$t <- 1:(dim(Seatbelts.df)[1])
The two lines above are only getting the data in shape. Since the arima() function is designed for time series, it can read time series objects more easily. To fit the model with nlme you would then run:
library(nlme)
m <- gls(drivers ~ kms + PetrolPrice + law,
data=Seatbelts.df,
correlation=corARMA(p=1, q=0, form=~t))
summary(m)
The line that begins with "correlation" is the way you pass in the ARMA correlation structure to GLS. The results won't be exactly the same because arima() uses maximum likelihood to estimate models and gls() uses restricted maximum likelihood by default. If you add method="ML" to the call to gls() you will get identical estimates you got with the ARIMA function above.
What is your link function?
The way you describe it sounds like a basic linear regression with autocorrelated errors. In that case, one option is to use lm to get a consistent estimate of your coefficients and use Newey-West HAC standard errors.
I'm not sure the best answer for GLM more generally.

Resources