Constrained multinomial logistic regression in R using mlogit - r

I would like to add some constraints to a multinomial logistic regression model using mlogit in R. For example only look for negative values during coefficient estimation. But apparently the model doesn't have such capabilities. I was wondering if there is any way to add constraints or boundaries to mlogit or any other packages which can be used for multinomial logistic regression.
Here is the code:
Proc_MDC3 <- function(x){
fm <- mFormula(decision ~ term_f1yc + term_f2yc + eff_rate2 + term_special2 + peak -1 )
fit <- mlogit(fm, x)
Out2<-fit$coefficients
return( Out2)
}
Coeff4<-data.frame(do.call("rbind", by(MDC3, MDC3$SEG, Proc_MDC3)))
I only want negative values for eff_rate2 coefficients however this code gives me both negative and positive values.
I appreciate your help in advance.

Related

How to obtain analysis of variance table for a nonlinear regression model in R

Previously I used SAS to fit data into nonlinear regression model. SAS was able to produce an analysis of variance table for the model. The table displays the degrees of freedom, sums of squares, and mean squares along with the model F test.
Please refer to Table 69.4 in this pdf file.
Source: https://support.sas.com/documentation/onlinedoc/stat/132/nlin.pdf
How can I re-create something similar in R? Thanks in advance.
I'm not sure what type of nonlinear regression you're interested in- but the general approach would be to run the model and call for a summary. The typical linear model would be:
linearmodel = lm(`outcomevar` ~ `predictorvar`, data = dataset)
linearmodel #gives coefficients
summary(linearmod) # gives model fit
For nonlinear regression you would add the polynomial term. For quadratic fit it would be
y = b0 + b1(Var) + b2(Var * Var) or:
nonlinmodel = lm(`outcomevar` ~ `predictorvar` + I(`predictorvar`^2), data = dataset)
nonlinmodel
summary(nonlinmodel)
other methods here: https://data-flair.training/blogs/r-nonlinear-regression/

Simulation "zelig style" for GLMER multilevel in r

I run a logistic mixed-effects regression with r. The regression is somehow like this:
glmer ( Y~ X1 + X2 + X1:X2 + (1 | country), data = hdp, family = binomial)
Now, with the fixed effects I would like to plot predicted probabilities of Y. I tried with Zelig as this is what I learnt as the easiest way to do simulations and get predicted probabilities, but I've seen the new version does not include multilevel models and the former Zelig Multilevel is very "unstable". Is there any easy alternative? How can I do simulations that could be plotted??
Thanks in advance!
You can use the merTools package.

Incorporating random intercepts in R package rms for mixed effects logistic regression

Frank Harrell's R package rms is an amazing tool for implementing multiple logistic regression. However, I wish to know how/ if it is possible to incorporate random effects into a model run through rms. I know that rms can run through nlme, but only the generalized least squares function (Gls) and not the lme function, which allows for the incorporation of random effects. Mixed effects models can be problematic for analysis/ interpretation but are occasionally necessary in order to account for nested effects in models.
I'm not sure if it's helpful in this case but I have copied some code from the rms help files that runs a simple logistic regression model and added a line demonstrating a mixed effects logistic regression model run through glmmPQL of the MASS package.
n <- 1000 # define sample size
require(rms)
set.seed(17) # so can reproduce the results
age <- rnorm(n, 50, 10)
blood.pressure <- rnorm(n, 120, 15)
cholesterol <- rnorm(n, 200, 25)
sex <- factor(sample(c('female','male'), n,TRUE))
label(age) <- 'Age' # label is in Hmisc
label(cholesterol) <- 'Total Cholesterol'
label(blood.pressure) <- 'Systolic Blood Pressure'
label(sex) <- 'Sex'
units(cholesterol) <- 'mg/dl' # uses units.default in Hmisc
units(blood.pressure) <- 'mmHg'
ch <- cut2(cholesterol, g=40, levels.mean=TRUE) # use mean values in intervals
table(ch)
f <- lrm(ch ~ age)
require(MASS)
f1<-glmmPQL(ch~age, random=~1|sex, family=binomial)
summary(f1)
I would be interested in any insight as to whether random effects can be incorporated in rms both for logistic regression (lrm) or run through nlme for linear regression.
Thanks to all

Comparison of R and scikit-learn for a classification task with logistic regression

I am doing a Logistic Regression described in the book 'An Introduction to Statistical Learning with Applications in R' by James, Witten, Hastie, Tibshirani (2013).
More specifically, I am fitting the binary classification model to the 'Wage' dataset from the R package 'ISLR' described in ยง7.8.1.
Predictor 'age' (transformed to polynomial, degree 4) is fitted against the binary classification wage>250. Then the age is plotted against the predicted probabilities of the 'True' value.
The model in R is fit as follows:
fit=glm(I(wage>250)~poly(age,4),data=Wage, family=binomial)
agelims=range(age)
age.grid=seq(from=agelims[1],to=agelims[2])
preds=predict(fit,newdata=list(age=age.grid),se=T)
pfit=exp(preds$fit)/(1+exp(preds$fit))
Complete code (author's site): http://www-bcf.usc.edu/~gareth/ISL/Chapter%207%20Lab.txt
The corresponding plot from the book: http://www-bcf.usc.edu/~gareth/ISL/Chapter7/7.1.pdf (right)
I tried to fit a model to the same data in scikit-learn:
poly = PolynomialFeatures(4)
X = poly.fit_transform(df.age.reshape(-1,1))
y = (df.wage > 250).map({False:0, True:1}).as_matrix()
clf = LogisticRegression()
clf.fit(X,y)
X_test = poly.fit_transform(np.arange(df.age.min(), df.age.max()).reshape(-1,1))
prob = clf.predict_proba(X_test)
I then plotted probabilities of the 'True' values against the age range. But the result/plot looks quite different. (Not talking about the CI bands or rugplot, just the probability plot.) Am I missing something here?
After some more reading I understand that scikit-learn implements a regularized logistic regression model, whereas glm in R is not regularized. Statsmodels' GLM implementation (python) is unregularized and gives identical results as in R.
http://statsmodels.sourceforge.net/stable/generated/statsmodels.genmod.generalized_linear_model.GLM.html#statsmodels.genmod.generalized_linear_model.GLM
The R package LiblineaR is similar to scikit-learn's logistic regression (when using 'liblinear' solver).
https://cran.r-project.org/web/packages/LiblineaR/

glm.nb with random effect as matrix

I am analyzing gene expression data in R. I would like to test for differences in expression when accounting for the phylogenetic effect.
I can run GLM with a negative binomial distribution and normalization factor as an offset:
library(MASS)
glm.nb(expression ~ Group + offset(log(normFactor)), data=data)
However, I don't know how to include phylogenetic effect in this model. I can obtain a variance-covariance or correlation matrix from my phylogeny:
library(ape)
tree <- read.tree("tree.nwk")
varCovMatrix <- vcv(tree, model = "Brownian", cor = FALSE)
I found that lmekin allows to specify the variance-covariance structure of the random effects:
library(coxme)
lmekin(expression ~ Group + (1| animal) + offset(log(normFactor)), data=data, varlist= varCovMatrix)
But I cannot specify negative binomial distribution and it isn't clear whether it understands offset.
The same problem is for MCMCglmm.
Please, help me to put into one GLMM:
the variance-covariance matrix
normalization factor as an offset
negative binomial distribution

Resources