Calculation of log likelihood function of multinomial logistic regression in R - r

Suppose I have the following data set
df=data.frame(x1=rnorm(100), #predictor 1
x2=rpois(100,2.5), #predictor 2
x3=rgeom(100,prob = 0.48), #predictor 3
y=as.factor(sample(1:3,100,replace = T)) #categorical response
)
If I run the multinomial logistic regression by considering the 1 as the reference category, then the estimated parameters are
Call:
multinom(formula = y ~ ., data = df)
Coefficients:
(Intercept) x1 x2 x3
2 -0.71018723 -0.4193710 0.15820110 0.05849252
3 -0.05987773 -0.2978596 -0.08335957 0.10149408
I would like to calculate the loglikelihood value of the multinomial logistic regression using these estimated parameters.
Any help is appreciated.

This should work. The log-likelihood is just the sum of the log of the probabilities that each observation takes on its observed value. In the code below probs is an N x m matrix of probabilities for each of the N observations on each of the m categories. We can then get y from the model frame and turn it into a numeric variable which will indicate the category number. We then use cbind(1:length(y), y) to index the probability matrix. This makes an N x 2 matrix that gives for each row number (in the first column) the column number of the probs matrix that you should keep. So, probs[cbind(1:length(y), y)] creates a vector of probabilities that each observation takes on its observed y value. We can log them and then sum them to get the log-likelihood.
df=data.frame(x1=rnorm(100), #predictor 1
x2=rpois(100,2.5), #predictor 2
x3=rgeom(100,prob = 0.48), #predictor 3
y=as.factor(sample(1:3,100,replace = T)) #categorical response
)
mod <- nnet::multinom(formula = y ~ ., data = df)
probs <- predict(mod, type="probs")
y <- as.numeric(model.response(model.frame(mod)))
indiv_ll <- log(probs[cbind(1:length(y), y)])
sum(indiv_ll)
# [1] -106.8012
logLik(mod)
# 'log Lik.' -106.8012 (df=8)

Related

When does brms assume parameters are distributed multivariate normal?

Suppose I fit a varying-intercepts model in brms per below.
library(brms)
# Example data in "long" format:
# 50 subjects each completing 2 trials. Outcomes were recorded 2x per subject
# per trial: once when treat = 1 and once when treat = 0.
trials <- 2
subjects <- 50
N <- trials * subjects
df <- data.frame(
outcome = rnorm(N),
treated = rep(0:1, N),
subject = rep(1:subjects, each = trials)
)
# Fit varying slopes and varying intercepts model
mod <- brm(outcome ~ treated + (1 | subject), df,
iter = 500, warmup = 490, chains = 1)
If we look at the brms documentation, it notes:
(...) group-level parameters u are assumed to come from a multivariate
normal distribution with mean zero and unknown covariance matrix Σ:
However, there is no such covariance matrix for the intercepts:
# empty
mod$cov_ranef
What is assumed to be multivariate normal by brms - i.e. it only when coefficients are hierarchically modeled as varying across groups that there is a covariance matrix? What if there's a predictor that is included that only varies by group but is not modelled hierarchically?

Find the parameter estimates for each random term in a binomial GLMM (lme4)?

Does anyone know how to extract the parameter estimates of random term when using the (1 | …) syntax in a glmer model (including se, t ratio and p value)? I’m only able to access the average variance and std. deviance with the summary function.
Some background: I used cohort and period random terms (both factorized), where period = each survey year, and cohort = 8 birth cohorts. My model empty model looks like this :
glmer(pid ~ age + age2 + (1 | cohort) + (1| period)
There's a bit of a conceptual problem with what you are doing. The random effects do not have the same standing in statistical theory as the fixed effects. You are not really supposed to be making inferences on their estimates since you don't have a random sampling from their overall population. Hence you need to make some unteseted assumptions on their distribution. That said, there are apparently times when you might want to do it but with care that you are not making unsupportable claims. See: https://stats.stackexchange.com/questions/392314/interpretation-of-fixed-effect-coefficients-from-glms-and-glmms .
Dimitris Rizopoulosthen responded to a request for the possibility of getting "an average" of the random effects conditional on the fixed effects (rather the flipped version of mixed models inference). He offered a function in his GLMM package:
https://drizopoulos.github.io/GLMMadaptive/articles/Methods_MixMod.html#marginalized-coefficients
This is his example ......
install.packages("GLMMadaptive"); library(GLMMadaptive)
set.seed(1234)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 15 # maximum follow-up time
# we constuct a data frame with the design:
# everyone has a baseline measurment, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
# design matrices for the fixed and random effects
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ time, data = DF)
betas <- c(-2.13, -0.25, 0.24, -0.05) # fixed effects coefficients
D11 <- 0.48 # variance of random intercepts
D22 <- 0.1 # variance of random slopes
# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22)))
# linear predictor
eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, ]))
# we simulate binary longitudinal data
DF$y <- rbinom(n * K, 1, plogis(eta_y))
#We continue by fitting the mixed effects logistic regression for y assuming random intercepts and random slopes for the random-effects part.
fm <- mixed_model(fixed = y ~ sex * time, random = ~ time | id, data = DF,
family = binomial())
.... and then the call to his marginal_coefs function.
marginal_coefs(fm, std_errors=TRUE)
Estimate Std.Err z-value p-value
(Intercept) -1.6025 0.2906 -5.5154 < 1e-04
sexfemale -1.0975 0.3676 -2.9859 0.0028277
time 0.1766 0.0337 5.2346 < 1e-04
sexfemale:time 0.0508 0.0366 1.3864 0.1656167

Obtain predicted probabilities from rstanarm in ordinal regression

How can I generate the posterior probability distribution for each outcome for each predictor in an ordinal regression?
e.g.
what I am looking for is this:
library(rstanarm)
fit_f <- MASS::polr(tobgp ~ agegp, data = esoph)
predict(fit_f,newdata=data.frame(agegp=factor(levels(esoph$agegp))),type = "probs")
Now with rstanarm I do:
fit <- stan_polr(tobgp ~ agegp, data = esoph, method = "logit",
prior = R2(0.2, "mean"), init_r = 0.1, seed = 12345)
But how do I obtain the distribution for the individual outcomes/predictors?
I do get a distribution of probabilities using epred, but I don't understand for which outcome/predictor?
posterior_epred(fit, newdata=data.frame(agegp=factor(levels(esoph$agegp))))
The easiest way to do this in rstanarm is to use the posterior_predict function to obtain posterior predictions and then calculate the proportion of predictions that fall in each outcome category by observation. In code,
PPD <- posterior_predict(fit) # uses esoph
probs <- t(apply(PPD, MARGIN = 2, FUN = table) / nrow(PPD))
The matrix called probs has rows equal to the number of observations (in esoph) and columns equal to the number of categories in tobgp and each of its rows sums to 1.
head(probs)
0-9g/day 10-19 20-29 30+
1 0.26400 0.26250 0.22875 0.24475
2 0.25650 0.26750 0.23050 0.24550
3 0.25175 0.27975 0.22450 0.24400
4 0.25575 0.26000 0.24025 0.24400
5 0.26350 0.26625 0.23575 0.23450
6 0.28275 0.26025 0.21500 0.24200

incorrect logistic regression output

I'm doing logistic regression on Boston data with a column high.medv (yes/no) which indicates if the median house pricing given by column medv is either more than 25 or not.
Below is my code for logistic regression.
high.medv <- ifelse(Boston$medv>25, "Y", "N") # Applying the desired
`condition to medv and storing the results into a new variable called "medv.high"
ourBoston <- data.frame (Boston, high.medv)
ourBoston$high.medv <- as.factor(ourBoston$high.medv)
attach(Boston)
# 70% of data <- Train
train2<- subset(ourBoston,sample==TRUE)
# 30% will be Test
test2<- subset(ourBoston, sample==FALSE)
glm.fit <- glm (high.medv ~ lstat,data = train2, family = binomial)
summary(glm.fit)
The output is as follows:
Deviance Residuals:
[1] 0
Coefficients: (1 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -22.57 48196.14 0 1
lstat NA NA NA NA
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 0.0000e+00 on 0 degrees of freedom
Residual deviance: 3.1675e-10 on 0 degrees of freedom
AIC: 2
Number of Fisher Scoring iterations: 21
Also i need:
Now I'm required to use the misclassification rate as the measure of error for the two cases:
using lstat as the predictor, and
using all predictors except high.medv and medv.
but i am stuck at the regression itself
With every classification algorithm, the art relies on choosing the threshold upon which you will determine whether the the result is positive or negative.
When you predict your outcomes in the test data set you estimate probabilities of the response variable being either 1 or 0. Therefore, you need to the tell where you are gonna cut, the threshold, at which the prediction becomes 1 or 0.
A high threshold is more conservative about labeling a case as positive, which makes it less likely to produce false positives and more likely to produce false negatives. The opposite happens for low thresholds.
The usual procedure is to plot the rates that interests you, e.g., true positives and false positives against each other, and then choose what is the best rate for you.
set.seed(666)
# simulation of logistic data
x1 = rnorm(1000) # some continuous variables
z = 1 + 2*x1 # linear combination with a bias
pr = 1/(1 + exp(-z)) # pass through an inv-logit function
y = rbinom(1000, 1, pr)
df = data.frame(y = y, x1 = x1)
df$train = 0
df$train[sample(1:(2*nrow(df)/3))] = 1
df$new_y = NA
# modelling the response variable
mod = glm(y ~ x1, data = df[df$train == 1,], family = "binomial")
df$new_y[df$train == 0] = predict(mod, newdata = df[df$train == 0,], type = 'response') # predicted probabilities
dat = df[df$train==0,] # test data
To use missclassification error to evaluate your model, first you need to set up a threshold. For that, you can use the roc function from pROC package, which calculates the rates and provides the corresponding thresholds:
library(pROC)
rates =roc(dat$y, dat$new_y)
plot(rates) # visualize the trade-off
rates$specificity # shows the ratio of true negative over overall negatives
rates$thresholds # shows you the corresponding thresholds
dat$jj = as.numeric(dat$new_y>0.7) # using 0.7 as a threshold to indicate that we predict y = 1
table(dat$y, dat$jj) # provides the miss classifications given 0.7 threshold
0 1
0 86 20
1 64 164
The accuracy of your model can be computed as the ratio of the number of observations you got right against the size of your sample.

Fitted values for multinom in R: Coefficients for Reference Category?

I'm using the function multinom from the nnet package to run a multinomial logistic regression.
In multinomial logistic regression, as I understand it, the coefficients are the changes in the log of the ratio of the probability of a response over the probability of the reference response (i.e., ln(P(i)/P(r))=B1+B2*X... where i is one response category, r is the reference category, and X is some predictor).
However, fitted(multinom(...)) produces estimates for each category, even the reference category r.
EDIT Example:
set.seed(1)
library(nnet)
DF <- data.frame(X = as.numeric(rnorm(30)),
Y = factor(sample(letters[1:5],30, replace=TRUE)))
DF$Y<-relevel(DF$Y, ref="a") #ensure a is the reference category
model <- multinom(Y ~ X, data = DF)
coef(model)
# (Intercept) X
#b 0.1756835 0.55915795
#c -0.2513414 -0.31274745
#d 0.1389806 -0.12257963
#e -0.4034968 0.06814379
head(fitted(model))
# a b c d e
#1 0.2125982 0.2110692 0.18316042 0.2542913 0.1388810
#2 0.2101165 0.1041655 0.26694618 0.2926508 0.1261210
#3 0.2129182 0.2066711 0.18576567 0.2559369 0.1387081
#4 0.1733332 0.4431170 0.08798363 0.1685015 0.1270647
#5 0.2126573 0.2102819 0.18362323 0.2545859 0.1388516
#6 0.1935449 0.3475526 0.11970164 0.2032974 0.1359035
head(DF)
# X Y
#1 -0.3271010 a
To calculate the predicted probability ratio between response b and response a for row 1, we calculate exp(0.1756835+0.55915795*(-0.3271010))=0.9928084. And I see that this corresponds to the fitted P(b)/P(a) for row 1 (0.2110692/0.2125982=0.9928084).
Is the fitted probability for the reference category calculated algebraically (e.g., 0.2110692/exp(0.1756835+0.55915795*(-0.3271010)))?
Is there a way to obtain the equation for the predicted probability of the reference category?
I had the same question, and after looking around I think the solution is:
given 3 classes: a,b,c and the fitted(model) probabilities pa,pb,pc output by the algorithm, you can reconstruct those probabilities from these 3 equations:
log(pb/pa) = beta1*X
log(pc/pa) = beta2*X
pa+pb+pc=1
Where beta1,beta2 are the rows of the output of coef(model), and X is your input data.
Playing with those equations you get to:
pb = exp(beta1*X)/(1+exp(beta1*X)+exp(beta2*X))
pc = exp(beta2*X)/(1+exp(beta1*X)+exp(beta2*X))
pa = 1 - pb - pc
The key here is that in the help file for multinom() it says that "A log-linear model is fitted, with coefficients zero for the first class."
So that means the predicted values for the reference class can be calculated directly assuming that the coefficients for class "a" are both zero. For example, for the sample row given above, we could calculate the predicted probability for class "a" using the softmax transform:
exp(0+0)/(exp(0+0) + exp(0.1756835 + 0.55915795*(-0.3271010)) + exp(-0.2513414 + (-0.31274745)*(-0.3271010)) + exp(0.1389806 + (-0.12257963)*(-0.3271010)) + exp(-0.4034968 + 0.06814379*(-0.3271010)))
or perhaps more simply, using non-hard-coded numbers, we can calculate the entire set of probabilities for the first row of data as:
softMax <- function(x){
expx <- exp(x)
return(expx/sum(expx))
}
coefs <- rbind(c(0,0), coef(model))
linear.predictor <- as.vector(coefs%*%c(1,-0.3271010))
softMax(linear.predictor)
FWIW: the example in the original question does not reproduce for me exactly, my seed gives different random deviates. So I have reproduced the example freshly and with my calculations below.
library(nnet)
set.seed(1)
DF <- data.frame(
X = as.numeric(rnorm(30)),
Y = factor(sample(letters[1:5],30, replace=TRUE)))
DF$Y<-relevel(DF$Y, ref="a") #ensure a is the reference category
model <- multinom(Y ~ X, data = DF)
coef(model)
## (Intercept) X
## b -0.33646439 1.200191e-05
## c -0.36390688 -1.773889e-01
## d -0.45197598 1.049034e+00
## e -0.01418543 3.076309e-01
DF[1,]
## X Y
## 1 -0.6264538 c
fitted.values(model)[1,]
## a b c d e
## 0.27518921 0.19656378 0.21372240 0.09076844 0.22375617
coefs <- rbind(c(0,0), coef(model))
linear.predictor <- as.vector(coefs%*%c(1,DF[1,"X"]))
softMax(linear.predictor)
## [1] 0.27518921 0.19656378 0.21372240 0.09076844 0.22375617

Resources