I am fitting a logit model on R using mlogit package. All the parameters of the dummy variables (var1, var2, var3) are normally distributed. How can I put a constraint on one variable (var1) to make its parameter zero-mean? The current code returns a non-zero mean for var1. Thanks!
model <- mlogit(outcome ~ var1 + var2 + var3 | 0 ,
data=data,
rpar = c(var1 = "n",
var2 = "n",
var3 = "n"),
correlation = FALSE,
R = 100,
halton = NA)
I know it is relatively easy in STATA to put constraints on parameters. for my model, my code in STATA is:
constraint 1 _b[var1]=0
mixlogit choice var1 var2 var3, group(qid) id(id) constraint(1)
I just want to know how to do the same thing in R. Thanks!
based on gmnl package, we can constrain the estimation using maxLik. For this question,
A <- matrix (c(1,0,0,0,0,0, -1,0,0,0,0,0))
B <- matrix (c(0, 0))
model <- gmnl(outcome ~ var1 + var2 + var3 | 0 ,
data=data,
rpar = c(var1 = "n",
var2 = "n",
var3 = "n"),
constraints = list (eqA=A, eqB=B),
model = "mixl",
correlation = FALSE,
R = 100,
halton = Null)
The constraints mean in math
1* mean(var1) + 0*mean(var2) + 0*mean(var3) + 0*sd(var1) + 0*sd(var2) > + 0*sd(var3) = 0;
and
-1* mean(var1) + 0*mean(var2) + 0*mean(var3) + 0*sd(var1) + 0*sd(var2) + 0*sd(var3) = 0
so that the mean of var1 has to be 0 in the estimation.
Related
I fitted a regression model as the following:
y <- hiedata_short$income
x1 <- hiedata_short$education
x2 <- hiedata_short$age
x3 <- hiedata_short$ghi
lm.more <- lm(formula = y ~ x1 + x2 + x3,
data = hiedata_short
)
summary(lm.more)
and am then predicting values using that model with fixed age and ghi for values of education ranging from 0 to 25. Therefore, I create a new data frame.
education.pts <- seq(from = 0, # A (starting value)
to = 25, # B (ending value)
by = 1 # C (value determining the size of the sequence steps)
)
fitted.values.data <- data.frame(age = 30, # Create predictions holding constant `age`
education = education.pts,
ghi = 70
)
Then, I predict the values using the regression model and transform the matrix towards a data frame.
fitted.values <- predict(object = lm.more,
newdata = fitted.values.data,
interval = "confidence",
level = 0.95
)
fitted.values <- data.frame(fitted.values) # Feed the matrix to `data.frame()`
Unfortunately the plotting then fails because the fitted.values$fit contains way more variables than the 26 I have in education.pts / x…
**Warning message: 'newdata' had 26 rows but variables found have 1932 rows**
I can see this also in the environment in R studio. But it is not clear to me why the predict function stores the same amount of rows as the data basis of the model has instead of just filling up the given data frame "fitted.values.data"… ?!
plot(x = education.pts,
y = fitted.values$fit,
type = "l",
col = "black",
xlab = "Years of Education",
ylab = "Predicted Family Income (1000s of USD)",
main = "Predicted Income Given Education\nRAND Health Insurance Experiment \n",
sub = "Age fixed at 30; GHI fixed at 70",
ylim = c(0, 20)
)
I am trying to create a data frame using (either tidyr::expand.grid or tibble::data_frame) in order to then generate posterior predictions using the tidybayes::epred_draws function from tidybayes (akin to posterior_predict). I have three continuous predictors that I could like to vary simultaneously at three set values: 1 standard dev below the mean of each predictor, the mean of each predictor, and 1 standard deviation above the mean of each predictor. The issue I am running into is that I cannot figure out a way to generate values in between the set standard deviation while keeping the structure of the dataset intact.
I created a reproducible example below, as you can see the final posterior prediction doesn't look great. Is there any way to generate additional incremental values in between the set standard deviation and mean?
My go to method would be either be seq() or even
modelr::seq_range(data_var_1, pretty=TRUE, n=100), but I'm not sure how to incorporate that in the new dataset in a way that allows me to see what happens the predictors simultaneously shift at once.
Let me know if I can explain anything else.
library(brms)
library(tidybayes)
library(ggplot2)
library(ggthemes)
## create a dataset
data <- tibble(
outcome = rnorm(100, 2, 2),
var_1 = rnorm(100, 5, 2),
var_2 = rnorm(100, 8, 2),
var_3 = rnorm(100, 10, 2)
)
## model the data
m1 <- brms::brm(outcome ~ var_1 + var_2 + var_3, data) # run model (takes a few sec.)
## prepare for predictions with set values
new_data = tibble(
var_1 = c(mean(var_1) - sd(var_1)*1, mean(var_1), mean(var_1) + sd(var_1)*1),
var_2 = c(mean(var_2) - sd(var_2)*1, mean(var_2), mean(var_2) + sd(var_2)*1),
var_3 = c(mean(var_3) - sd(var_3)*1, mean(var_3), mean(var_3) + sd(var_3)*1))
pred_1 <- m1 %>%
tidybayes::epred_draws(new_data)
# generate grand mean posterior predictions (for more on this,
# see: https://www.andrewheiss.com/blog/2021/11/10/ame-bayes-re-guide/)
plot_1 <- ggplot(pred_1, aes(x = var_1, y = .epred)) +
stat_lineribbon() +
scale_fill_brewer(palette = "Reds") +
labs(x = "Shifts in Var 1, 2, and 3", y = "Outcome",
fill = "Credible interval") +
ggthemes::theme_pander() +
theme(legend.position = "bottom") +
scale_x_continuous(limits = c(new_data$var_1[1], new_data$var_1[3]),
breaks=c(new_data$var_1[1],
new_data$var_1[2],
new_data$var_1[3]),
labels = c("-1 SD", "Mean", "+1 SD"))
# visualize posterior predictions (example isn't so pretty, sorry)
I am new to R and trying to get predicted probabilities after a logistic regression and then do a pairwise comparison. I want to do the same as the margins command in stata. I have therefore used the prediction() package from Thomas Leeper. However, I can't figure out how to da a pairwise comparison of the predicted probabilities. I would prefer the pariwise comparison to be done based on the predicted probabilities from prediction() and not emmeans(), because they give slightly different results in my real data (not the toy data). So, I would really appreciate any help!
Below, I have inserted some toy data - I hope it meets the requirements for a minimal reproducible example. Otherwise let me know.
library(margins)
M_W <- sample(x = c("W", "M"), size = 100, replace = TRUE)
M_W <- factor(M_W)
value <- sample(x = 0:1, size = 100, replace = TRUE)
xx <- sample(x = 1:10, size = 100, replace = TRUE)
dataframe <- data.frame("value" = value, "M_W" = M_W, "xx" = xx)
model <- glm(value ~ M_W + xx, data = dataframe, family = "binomial")
summary(model)
pp <- prediction(model, at = list(M_W = c("M", "W")))
summary(pp)
I need to convert the probabilities to true or false (> 0.5), and then print a confusion matrix. I can't find an example of how to do this.
In my attempts, I'm also having difficulty referencing the transformed "Success" target, which is now "SuccessTRUE."
Success is Boolean, Comp is a factor (4 levels), the others are numeric.
require(neuralnet)
m <- model.matrix(~Success + Comp + Var2 + Var3 + Var4, data=Train3)
m1 <- model.matrix(~Success + Comp + Var2 + Var3 + Var4, data=Test3)
nn=neuralnet(SuccessTrue~Comp2 + Comp3 + Comp4 + Var2 + Var3 + Var4,data=m, hidden=4, act.fct = "logistic",linear.output = FALSE)
pred <- compute(nn,m1)
I figured this out.
> predicted.classes <- ifelse(pred$net.result > 0.5, "TRUE", "FALSE")
> t <- table(predicted.classes,Test3$Success)
> confusionMatrix(t)
I´m trying to estimate marginal effect of a logit model in which I have several dichotomous explanatory variables.
Let's say the model estimated by
logit<- svyglm ( if_member ~ if_female + dummy_agegroup_2 + dummy_agegroup_3 + dummy_education_2 + dummy_education_3 + dummy_education_4, family = quasibinomial(link = "logit"), design = survey_design)
I know about the marginpred function in survey package, but I am not very familiar with it. I have only dichotomous variebles in the model so I am wondering how to estimate marginal effects by this function, especially I am not sure about the predictat (A data frame giving values of the variables in model to predict at).
Are you looking for marginal effects or marginal predictions?
As the name implies, the marginpred() function returns predictions. The argument for predictat is a data frame with both the control variables and the variables that are in the model. Let me emphasize that: control variables should be left out of the model.
library("survey")
odds2prob <- function(x) x / (x + 1)
prob2odds <- function(x) x / (1 - x)
expit <- function(x) odds2prob(exp(x))
logit <- function(x) log(prob2odds(x))
set.seed(1)
survey_data <- data.frame(
if_female = rbinom(n = 100, size = 1, prob = 0.5),
agegroup = factor(sample(x = 1:3, size = 100, replace = TRUE)),
education = NA_integer_,
if_member = NA_integer_)
survey_data["agegroup"] <- relevel(survey_data$agegroup, ref = 3)
# Different probabilities between female and male persons
survey_data[survey_data$if_female == 0, "education"] <- sample(
x = 1:4,
size = sum(survey_data$if_female == 0),
replace = TRUE,
prob = c(0.1, 0.1, 0.5, 0.3))
survey_data[survey_data$if_female == 1, "education"] <-sample(
x = 1:4,
size = sum(survey_data$if_female == 1),
replace = TRUE,
prob = c(0.1, 0.1, 0.3, 0.5))
survey_data["if_member"] <- rbinom(n = 100, size = 1, prob =
expit((survey_data$education - 3)/2))
survey_data["education"] <- factor(survey_data$education)
survey_data["education"] <- relevel(survey_data$education, ref = 3)
survey_design <- svydesign(ids = ~ 1, data = survey_data)
logit <- svyglm(if_member ~ if_female + agegroup + education,
family = quasibinomial(link = "logit"),
design = survey_design)
exp(cbind(`odds ratio` = coef(logit), confint(logit)))
newdf <- data.frame(if_female = 0:1, education = c(3, 3), agegroup = = c(3, 3))
# Fails
mp <- marginpred(model = logit, adjustfor = ~ agegroup + education,
predictat = newdf, se = TRUE, type = "response")
logit2 <- svyglm(if_member ~ if_female,
family = quasibinomial(link = "logit"),
design = survey_design)
mp <- marginpred(model = logit2, adjustfor = ~ agegroup + education,
predictat = newdf, se = TRUE, type = "response")
# Probability for male and for female persons controlling for agegroup and education
cbind(prob = mp, confint(mp))
That's how I estimate marginal effects with the survey package:
# Probability difference between female and male persons
# when agegroup and education are set to 3
svycontrast(full_model, quote(
(exp(`(Intercept)` + if_female) / (exp(`(Intercept)` + if_female) + 1)) -
(exp(`(Intercept)`) / (exp(`(Intercept)`) + 1))))
# Can't use custom functions like expit :_(
There are probably smarter ways, but I hope it helps.
Please note that the difference between the probabilities predicted by marginpred() is different from the difference estimated by svycontrast(). The probabilities predicted by marginpred() don't seem to be affected by changing the value of the control variables (in example,
education = c(4, 4) instead of education = c(3, 3)), but the estimates from svycontrast() are affected as implied by the regression model.