Looping through function arguments (series of contrasts with multcomp::glht) - r

I wish to write a function that runs contrasts over a regression model and bootstraps those results to get confidence intervals, looping that function over a list of contrasts.
I have tried for loops nested within functions, lapply, map ... none seem to get me what I want (returns results for either only the first contrast in the list or the last).
For a single contrast from the list of contrasts, the code looks like this:
df <- data.frame(
H0013301_new_data = c(0,2,3,6,0,4,2,4,8,1),
drink_stat94_KEYES_2 = c("Heavy","Abstainer","Occasional","Moderate","Abstainer","Occasional","Heavy","Moderate","Moderate","Abstainer"),
drink_stat02_KEYES_2 = c("Heavy","Abstainer","Occasional","Abstainer","Abstainer","Heavy","Heavy","Moderate","Moderate","Abstainer"),
drink_stat06_KEYES_2 = c("Occasional","Abstainer","Occasional","Abstainer","Occasional","Heavy","Heavy","Moderate","Moderate","Heavy"),
FIN_weight_survPS_trimmed=
c(.5,2.4,.6,4.8,1.2,.08,.34,.56,1.6,.27)
)
#reordering factors
df$drink_stat94_KEYES_2<-fct_relevel(df$drink_stat94_KEYES_2, "Abstainer", "Occasional", "Moderate", "Heavy")
contrasts(df$drink_stat94_KEYES_2)<-contr.treatment(4,base=1)
df$drink_stat02_KEYES_2<-fct_relevel(df$drink_stat02_KEYES_2, "Abstainer", "Occasional", "Moderate", "Heavy")
contrasts(df$drink_stat02_KEYES_2)<-contr.treatment(4,base=1)
df$drink_stat06_KEYES_2<-fct_relevel(df$drink_stat06_KEYES_2, "Abstainer", "Occasional", "Moderate", "Heavy")
contrasts(df$drink_stat06_KEYES_2)<-contr.treatment(4,base=1)
#defining contrast
c1 <- rbind("A,A,A"=c(1,0,0,0,0,0,0,0,0,0)
)
#defining function to feed to boostrap
fc_2<-function(d,i){
TrialOutcomeModel_M<-lm(H0013301_new_data ~ drink_stat94_KEYES_2 + drink_stat02_KEYES_2 + drink_stat06_KEYES_2, weights=FIN_weight_survPS_trimmed, data = d[i,])
test <- multcomp::glht(TrialOutcomeModel_M, linfct=c1)
return(coef(test))
}
boot_out<-boot(data=df, fc_2, R=500)
boot.ci(boot_out, type="perc")
But let's assume that instead of just c1, I want to run my function (and boostrap the results) over the following list of contrasts:
c1 <- rbind("A,A,A"=c(1,0,0,0,0,0,0,0,0,0)
)
c2 <- rbind("A,A,O"=c(1,0,0,0,0,0,0,1,0,0)
)
c3 <- rbind("A,A,M"=c(1,0,0,0,0,0,0,0,1,0)
)
c_vector<-list(c1,c2,c3)
Any suggested code for how I would go about this?
(P.S. I know that the linfct argument can take a matrix of contrasts, but I'm specifically looking for a loop/lapply solution).

(In the following I'll reference the objects you create in the example code)
The plan has 2 steps:
preparing a function fun_boot() that takes a contrast object (like c1), and returns a boot object based on it and the df data;
applying that function to the list c_vector of contrasts.
Consequently, the implementation has 2 elements:
# [!] Assume all required libraries loaded
# [!] Assume all necessary data exists
# Step 1
fun_boot <- function(contrast)
{
# Make statistic function
fun_statistic <- function(d, i)
{
TrialOutcomeModel_M <- lm(
formula = H0013301_new_data ~ drink_stat94_KEYES_2 + drink_stat02_KEYES_2 + drink_stat06_KEYES_2,
data = d[i,],
weights = FIN_weight_survPS_trimmed
)
test <- multcomp::glht(
TrialOutcomeModel_M,
linfct = contrast
)
return(coef(test))
}
# Make boot call (hehe)
return (boot(
data = df,
statistic = fun_statistic,
R = 500
))
}
# Step 2
boot_out_vector <- lapply(
X = c_vector,
FUN = fun_boot
)

Related

Apply logistic regression in a function in R

I want to run logistic regression for multiple parameters and store the different metrics i.e AUC.
I wrote the function below but I get an error when I call it: Error in eval(predvars, data, env) : object 'X0' not found even if the variable exists in both my training and testing dataset. Any idea?
new.function <- function(a) {
model = glm(extry~a,family=binomial("logit"),data = train_df)
pred.prob <- predict(model,test_df, type='response')
predictFull <- prediction(pred.prob, test_df$extry)
auc_ROCR <- performance(predictFull, measure = "auc")
my_list <- list("AUC" = auc_ROCR)
return(my_list)
}
# Call the function new.function supplying 6 as an argument.
les <- new.function(X0)
The main reason why your function didn't work is that you are trying to call an object into a formula. You can fix it with paste formula function, but that is ultimately quite limiting.
I suggest instead that you consider using update. This allow you more flexibility to change with multiple variable combination, or change a training dataset, without breaking the function.
model = glm(extry~a,family=binomial("logit"),data = train_df)
new.model = update(model, .~X0)
new.function <- function(model){
pred.prob <- predict(model, test_df, type='response')
predictFull <- prediction(pred.prob, test_df$extry)
auc_ROCR <- performance(predictFull, measure = "auc")
my_list <- list("AUC" = auc_ROCR)
return(my_list)
}
les <- new.function(new.model)
The function can be further improved by calling the test_df as a separate argument, so that you can fit it with an alternative testing data.
To run the function in the way you intended, you would need to use non-standard evaluation to capture the symbol and insert it in a formula. This can be done using match.call and as.formula. Here's a fully reproducible example using dummy data:
new.function <- function(a) {
# Convert symbol to character
a <- as.character(match.call()$a)
# Build formula from character strings
form <- as.formula(paste("extry", a, sep = "~"))
model <- glm(form, family = binomial("logit"), data = train_df)
pred.prob <- predict(model, test_df, type = 'response')
predictFull <- ROCR::prediction(pred.prob, test_df$extry)
auc_ROCR <- ROCR::performance(predictFull, "auc")
list("AUC" = auc_ROCR)
}
Now we can call the function in the way you intended:
new.function(X0)
#> $AUC
#> A performance instance
#> 'Area under the ROC curve'
new.function(X1)
#> $AUC
#> A performance instance
#> 'Area under the ROC curve'
If you want to see the actual area under the curve you would need to do:
new.function(X0)$AUC#y.values[[1]]
#> [1] 0.6599759
So you may wish to modify your function so that the list contains auc_ROCR#y.values[[1]] rather than auc_ROCR
Data used
set.seed(1)
train_df <- data.frame(X0 = sample(100), X1 = sample(100))
train_df$extry <- rbinom(100, 1, (train_df$X0 + train_df$X1)/200)
test_df <- data.frame(X0 = sample(100), X1 = sample(100))
test_df$extry <- rbinom(100, 1, (test_df$X0 + test_df$X1)/200)
Created on 2022-06-29 by the reprex package (v2.0.1)

Multiple imputation and mlogit for a multinomial regression

I am trying to run a multinomial regression with imputed data. I can do this with the nnet package, however I want to use mlogit. Using the mlogit package I keep getting the following error "Error in 1:nrow(data) : argument of length 0".
So making the data
library(mlogit)
library(nnet)
library(tidyverse)
library(mice)
df <- data.frame(vax = sample(1:6, 500, replace = T),
age = runif(500, 12, 18),
var1 = sample(1:2, 500, replace = T),
var2 = sample(1:5, 500, replace = T))
# Create missing data using the mice package:
df2 <- ampute(df, prop = 0.15)
df3 <- df2$amp
df3$vax <- as.factor(df3$vax)
df3$var1 <- as.factor(df3$var1)
df3$var2 <- as.factor(df3$var2)
# Inpute missing data:
df4 <- mice(df3, m = 5, print = T, seed = 123)
It works using nnet's multinom:
multinomtest <- with(df4, multinom(vax ~ age + var1 + var2, data = df, model = T))
summary(pool(multinomtest))
But throws up an error when I try to reshape the data into mlogit format
test <- with(df4, dfidx(data = df4, choice = "vax", shape = "wide"))
Does anyone have any idea how I can get the imputed data into mlogit format, or even whether mlogit has compatibility with mice or any other imputation package?
Answer
You are using with.mids incorrectly, and thus both lines of code are wrong; the multinom line just doesn't give an error. If you want to apply multiple functions to the imputed datasets, you're better off using something like lapply:
analyses <- lapply(seq_len(df4$m), function(i) {
data.i <- complete(df4, i)
data.idx <- dfidx(data = data.i, choice = "vax", shape = "wide")
mlogit(vax ~ 1 | age + var1 + var2,
data = data.idx,
reflevel = "1",
nests = list(type1 = c("1", "2"), type2 = c("3","4"), type3 = c("5","6")))
})
test <- list(call = "", call1 = df4$call, nmis = df4$nmis, analyses = analyses)
oldClass(test) <- c("mira", "matrix")
summary(pool(test))
How with.mids works
When you apply with to a mids object (AKA the output of mice::mice), then you are actually calling with.mids.
If you use getAnywhere(with.mids) (or just type mice:::with.mids), you'll find that it does a couple of things:
It loops over all imputed datasets.
It uses complete to get one dataset.
It runs the expression with the dataset as the environment.
The third step is the problem. For functions that use formulas (like lm, glm and multinom), you can use that formula within a given environment. If the variables are not in the current environment (but rather in e.g. a data frame), you can specify a new environment by setting the data variable.
The problems
This is where both your problems derive from:
In your multinom call, you set the data variable to be df. Hence, you are actually running your multinom on the original df, NOT the imputed dataset!
In your dfidx call, you are again filling in data directly. This is also wrong. However, leaving it empty also gives an error. This is because with.mids doesn't fill in the data argument, but only the environment. That isn't sufficient for you.
Fixing multinom
The solution for your multinom line is simple: just don't specify data:
multinomtest <- with(df4, multinom(vax ~ age + var1 + var2, model = T))
summary(pool(multinomtest))
As you will see, this will yield very different results! But it is important to realise that this is what you are trying to obtain.
Fixing dfidx (and mlogit)
We cannot do this with with.mids, since it uses the imputed dataset as the environment, but you want to use the modified dataset (after dfidx) as your environment. So, we have to write our own code. You could just do this with any looping function, e.g. lapply:
analyses <- lapply(seq_len(df4$m), function(i) {
data.i <- complete(df4, i)
data.idx <- dfidx(data = data.i, choice = "vax", shape = "wide")
mlogit(vax ~ 1 | age + var1 + var2, data = data.idx, reflevel = "1", nests = list(type1 = c("1", "2"), type2 = c("3","4"), type3 = c("5","6")))
})
From there, all we have to do is make something that looks like a mira object, so that we can still use pool:
test <- list(call = "", call1 = df4$call, nmis = df4$nmis, analyses = analyses)
oldClass(test) <- c("mira", "matrix")
summary(pool(test))
Offering this as a way forward to circumvent the error with dfidx():
df5 <- df4$imp %>%
# work with a list, where each top-element is a different imputation run (imp_n)
map(~as.list(.x)) %>%
transpose %>%
# for each run, impute and return the full (imputed) data set
map(function(imp_n.x) {
df_out <- df4$data
df_out$vax[is.na(df_out$vax)] <- imp_n.x$vax
df_out$age[is.na(df_out$age)] <- imp_n.x$age
df_out$var1[is.na(df_out$var1)] <- imp_n.x$var1
df_out$var2[is.na(df_out$var2)] <- imp_n.x$var2
return(df_out)
}) %>%
# No errors with dfidx() now
map(function(imp_n.x) {
dfidx(data = imp_n.x, choice = "vax", shape = "wide")
})
However, I'm not too familiar with mlogit(), so can't help beyond this.
Update 8/2/21
As #slamballais mentioned in their answer, the issue is with dataset you refer to when fitting the model. I assume that mldata (from your code in the comments section) is a data.frame? This is probably why you are seeing the same coefficients - you are not referring to the imputed data sets (which I've identified as imp_n.x in the functions). The function purrr::map() is very similar to lapply(), where you apply a function to elements of a list. So to get the code working properly, you would want to change mldata to imp_n.x:
# To fit mlogit() for each imputed data set
df5 %>%
map(function(imp_n.x) {
# form as specified in the comments
mlogit(vax ~ 1 | age + var1 + var2,
data = imp_n.x,
reflevel = "1",
nests = list(type1 = c('1', '2'),
type2 = c('3','4'),
type3 = c('5','6')))
})

Passing variable names as strings into the contrasts() argument in lm

I am trying to create a function that allows me to pass outcome and predictor variable names as strings into the lm() regression function. I have actually asked this before here, but I learned a new technique here and would like to try and apply the same idea in this new format.
Here is the process
library(tidyverse)
# toy data
df <- tibble(f1 = factor(rep(letters[1:3],5)),
c1 = rnorm(15),
out1 = rnorm(15))
# pass the relevant inputs into new objects like in a function
d <- df
outcome <- "out1"
predictors <- c("f1", "c1")
# now create the model formula to be entered into the model
form <- as.formula(
paste(outcome,
paste(predictors, collapse = " + "),
sep = " ~ "))
# now pass the formula into the model
model <- eval(bquote( lm(.(form),
data = d) ))
model
# Call:
# lm(formula = out1 ~ f1 + c1, data = d)
#
# Coefficients:
# (Intercept) f1b f1c c1
# 0.16304 -0.01790 -0.32620 -0.07239
So this all works nicely, an adaptable way of passing variables into lm(). But what if we want to apply special contrast coding to the factorial variable? I tried
model <- eval(bquote( lm(.(form),
data = d,
contrasts = list(predictors[1] = contr.treatment(3)) %>% setNames(predictors[1])) ))
But got this error
Error: unexpected '=' in:
" data = d,
contrasts = list(predictors[1] ="
Any help much appreciated.
Reducing this to the command generating the error:
list(predictors[1] = contr.treatment(3))
Results in:
Error: unexpected '=' in "list(predictors[1] ="
list() seems to choke when the left-hand side naming is a variable that needs to be evaluated.
Your approach of using setNames() works, but needs to be wrapped around the list construction step itself.
setNames(list(contr.treatment(3)), predictors[1])
Output is a named list containing a contrast matrix:
$f1
2 3
1 0 0
2 1 0
3 0 1

R order lapply output from a function with multiple outputs by variable (column) rather than by function

I have a function in R which includes multiple other functions, including a custom one. I then use lapply to run the combined function across multiple variables. However, when the output is produced it is in the order of
function1: variable a, variable b, variable c
function2: variable a, variable b, variable c
When what I would like is for it to be the other way around:
variable a: function 1, function 2...
variable b: function 1, function 2...
I have recreated an example below using the mtcars dataset, with number of cylinders as a predictor variable, and vs and am as outcome variables.
library(datasets)
library(tidyverse)
library(skimr)
library(car)
data(mtcars)
mtcars_binary <- mtcars %>%
dplyr::select(cyl, vs, am)
# logistic regression function
logistic.regression <- function(logmodel) {
dev <- logmodel$deviance
null.dev <- logmodel$null.deviance
modelN <- length(logmodel$fitted.values)
R.lemeshow <- 1 - dev / null.dev
R.coxsnell <- 1 - exp ( -(null.dev - dev) / modelN)
R.nagelkerke <- R.coxsnell / ( 1 - ( exp (-(null.dev / modelN))))
cat("Logistic Regression\n")
cat("Hosmer and Lemeshow R^2 ", round(R.lemeshow, 3), "\n")
cat("Cox and Snell R^2 ", round(R.coxsnell, 3), "\n")
cat("Nagelkerke R^2" , round(R.nagelkerke, 3), "\n")
}
# all logistic regression results
log_regression_tests1 <- function(df_vars, df_data) {
glm_summary <- glm(df_data[,df_vars] ~ df_data[,1], data = df_data, family = binomial, na.action = "na.omit")
glm_print <- print(glm_summary)
log_results <- logistic.regression(glm_summary)
blr_coefficients <- exp(glm_summary$coefficients)
blr_confint <- exp(confint(glm_summary))
list(glm_summary = glm_summary, glm_print = glm_print, log_results = log_results, blr_coefficients = blr_coefficients, blr_confint = blr_confint)
}
log_regression_results1 <- sapply(colnames(mtcars_binary[,2:3]), log_regression_tests1, mtcars_binary, simplify = FALSE)
log_regression_results1
When I do this, the output is being produced as:
glm_summary: vs, am
log_results: vs, am
etc. etc.
When what I would like for the output to be ordered is:
vs: all function outputs
am: all function outputs
In addition, when I run this line of code, log_regression_results1 <- sapply(colnames(mtcars_binary[,2:3]), log_regression_tests1, mtcars_binary, simplify = FALSE) I get only the results of the logistic regression function, but when I print the overall results log_regression_results1 I get the remaining output, could anyone explain why?
Finally, the glm_summary function is not producing all of the output which it should. When I run the functions independently on a single variable, like so
glm_vs <- glm(vs ~ cyl, data = mtcars_binary, family = binomial, na.action = "na.omit")
summary(glm_vs)
logistic.regression(glm_vs)
exp(glm_vs$vs)
exp(confint(glm_vs))
it also produces the standard error, z value, and p value for summary(glm_vs) which it does not do embedded in the function, even though I have ```glm_print <- print(glm_summary)' included. Is there a way to get the output for the full summary function within the log_regression_tests1 function?
when I run your code up to log_regression_results1 I got exactly what you ask for:
summary(log_regression_results1)
Length Class Mode
vs 5 -none- list
am 5 -none- list
maybe you meant to ask the other way round?

R: Clustered robust standard errors using miceadds lm.cluster - error with subset and weights

I am trying to use the lm.cluster function in the package miceadds to get robust clustered standard errors for a multiply imputed dataset.
I am able to get the standard version of it to run but I get the following error when I try to add a subset or weights:
Error in eval(substitute(subset), data, env) :
..1 used in an incorrect context, no ... to look in
Example that works without subset or weights:
require("mice")
require("miceadds")
data(data.ma01)
# imputation of the dataset: use six imputations
dat <- data.ma01[ , - c(1:2) ]
imp <- mice::mice( dat , maxit=3 , m=6 )
datlist <- miceadds::mids2datlist( imp )
# linear regression with cluster robust standard errors
mod <- lapply(datlist, FUN = function(data){miceadds::lm.cluster( data=data ,
formula=read ~ paredu+ female , cluster = data.ma01$idschool )} )
# extract parameters and covariance matrix
betas <- lapply( mod , FUN = function(rr){ coef(rr) } )
vars <- lapply( mod , FUN = function(rr){ vcov(rr) } )
# conduct statistical inference
summary(pool_mi( qhat = betas, u = vars ))
Example that breaks with subset:
mod <- lapply(datlist, FUN = function(data){miceadds::lm.cluster( data=data ,
formula=read ~ paredu+ female , cluster = data.ma01$idschool, subset=
(data.ma01$urban==1))} )
Error during wrapup: ..1 used in an incorrect context, no ... to look in
Example that breaks with weights:
mod <- lapply(datlist, FUN = function(data){miceadds::lm.cluster( data=data ,
formula=read ~ paredu+ female , cluster = data.ma01$idschool,
weights=data.ma01$studwgt)} )
Error during wrapup: ..1 used in an incorrect context, no ... to look in
From searching, I think I am encountering similar issues as others when passing these commands through an lm or glm wrapper (such as: Passing Argument to lm in R within Function or R : Pass argument to glm inside an R function or Passing the weights argument to a regression function inside an R function)
However, I am not sure how to address the issue with the imputed datasets & existing lm.cluster command.
Thanks
This works fine with the estimatr package which is on CRAN and the estimatr::lm_robust() function. Two notes: (1) you can change the type of standard errors using se_type = and (2) I keep idschool in the data because we like the clusters to be in the same data.frame as we fit the model on.
library(mice)
library(miceadds)
library(estimatr)
# imputation of the dataset: use six imputations
data(data.ma01)
dat <- data.ma01[, -c(1)] # note I keep idschool in data
imp <- mice::mice( dat , maxit = 3, m = 6)
datlist <- miceadds::mids2datlist(imp)
# linear regression with cluster robust standard errors
mod <- lapply(
datlist,
function (dat) {
estimatr::lm_robust(read ~ paredu + female, dat, clusters = idschool)
}
)
# subset
mod <- lapply(
datlist,
function (dat) {
estimatr::lm_robust(read ~ paredu + female, dat, clusters = idschool, subset = urban == 1)
}
)
# weights
mod <- lapply(
datlist,
function (dat) {
estimatr::lm_robust(read ~ paredu + female, dat, clusters = idschool, weights = studwgt)
}
)
# note that you can use the `se_type` argument of lm_robust()
# to change the vcov estimation
# extract parameters and covariance matrix
betas <- lapply(mod, coef)
vars <- lapply(mod, vcov)
# conduct statistical inference
summary(pool_mi( qhat = betas, u = vars ))
I'm no expert, but there is an issue with the passing of the weights to lm(). I know this is not an ideal situation, but I managed to get it to work by modifying the lm.cluster() function to hard code the weights pass and then just used my own.
lm.cluster <- function (data, formula, cluster, wgts=NULL, ...)
{
TAM::require_namespace_msg("multiwayvcov")
if(is.null(wgts)) {
mod <- stats::lm(data = data, formula = formula)
} else {
data$.weights <- wgts
mod <- stats::lm(data = data, formula = formula, weights=data$.weights)
}
if (length(cluster) > 1) {
v1 <- cluster
}
else {
v1 <- data[, cluster]
}
dfr <- data.frame(cluster = v1)
vcov2 <- multiwayvcov::cluster.vcov(model = mod, cluster = dfr)
res <- list(lm_res = mod, vcov = vcov2)
class(res) <- "lm.cluster"
return(res)
}

Resources