Error in model.frame.default: variable lengths differ, R predict function - r

This is not a new question, I have seen several proposed solutions elsewhere and have tried them, none works, so I ask.
How can I fix this error? I am using R version 3.5.3 (2019-03-11)
Error in model.frame.default(data = ov_val, formula = Surv(time = ov_dev$futime, : variable lengths differ (found for 'rx')
Here is a reproducible example:
library(survival)
library(survminer)
library(dplyr)
# Create fake development dataset
ov_dev <- ovarian[1:13,]
# Create fake validation dataset
ov_val <- ovarian[13:26,]
# Run cox model
fit.coxph <- coxph(Surv(time = ov_dev$futime, event = ov_dev$fustat) ~ rx + resid.ds + age + ecog.ps, data = ov_dev)
summary(fit.coxph)
# Where error occurs
p <- log(predict(fit.coxph, newdata = ov_val, type = "expected"))

I think this has happened because you have used ov_dev$futime and ov_dev$fustat in your model specification rather than just using futime and fustat. That means that when you come to predict, the model is using the ov_dev data for the dependent variable but ov_val for the independent variables, which are of different length (13 versus 14). Just remove the data frame prefix and trust the data parameter:
library(survival)
library(survminer)
library(dplyr)
# Create fake development dataset
ov_dev <- ovarian[1:13,]
# Create fake validation dataset
ov_val <- ovarian[13:26,]
# Run cox model
fit.coxph <- coxph(Surv(futime, fustat) ~ rx + resid.ds + age + ecog.ps,
data = ov_dev)
p <- log(predict(fit.coxph, newdata = ov_val, type = "expected"))
p
#> [1] 0.4272783 -0.1486577 -1.8988833 -1.1887086 -0.8849632 -1.3374428
#> [7] -1.2294725 -1.5021708 -0.3264792 0.5633839 -3.0457613 -2.2476071
#> [13] -1.6754877 -3.0691996
Created on 2020-08-19 by the reprex package (v0.3.0)

Related

Getting an interaction plot from a pooled lme model with mids object

Preface - I really hope this makes sense!
I ran a linear-mixed effect model using an imputed dataset (FYI, the data is a mids object imputed using mice). The model has a three-way interaction with 3 continuous variables. I am now trying to plot the interaction using the interactions::interact_plot function. However, I'm receiving an error when I run the plot code, which I believe is due to the fact that the model came from a mids object and not a data frame. Does anyone know how to address this error or if there's a better way to get the plot that I'm trying to get?
Thanks very much in advance!
MIDmod1 <- with(data = df.mids, exp = lmer(GC ~ Age + Sex + Edu + Stress*Time*HLI + (1|ID)))
summary(pool(MIDmod1))
interact_plot(
model=MIDmod1,
pred = Time,
modx=Stress,
mod2=HLI,
data = df.mids,
interval=TRUE,
y.label='Global cognition composite score',
modx.labels=c('Low Baseline Stress (-1SD)','Moderate Baseline Stress (Mean)', 'High Baseline Stress (+1SD)'),
mod2.labels=c('Low HLI (-1SD)', 'Moderate HLI (Mean)', 'High HLI (+1SD)'),
legend.main='') + ylim(-2,2)
Error:
Error in rep(1, times = nrow(data)) : invalid 'times' argument
Note - I also get an error if I don't include the data argument (optional argument for this function).
Error in formula.default(object, env = baseenv()) : invalid formula
BTW - I am able to generate the plot when the model comes from a data frame - an example of what this should look like is included here: 1
Sorry, but it won’t be that easy. Multiple imputation object will definitely require special treatment, and none of the many R packages which can plot interactions are likely to work out of hte box.
Here’s a minimal example, adapted from the multiple imputation vignette of the marginaleffects package. (Disclaimer: I am the author.)
library(mice)
library(lme4)
library(ggplot2)
library(marginaleffects)
# insert missing data in an existing dataset and impute
iris_miss <- iris
iris_miss$Sepal.Width[sample(1:nrow(iris), 20)] <- NA
iris_mice <- mice(iris_miss, m = 20, printFlag = FALSE, .Random.seed = 1024)
iris_mice <- complete(iris_mice, "all")
# fit a model on 1 imputed datatset and use the `plot_predictions()` function
# with the `draw=FALSE` argument to extract the data that we want to plot
fit <- function(dat) {
mod <- lmer(Sepal.Width ~ Petal.Width * Petal.Length + (1 | Species), data = dat)
out <- plot_predictions(mod, condition = list("Petal.Width", "Petal.Length" = "threenum"), draw = FALSE)
# `mice` requires a unique row identifier called "term"
out$term <- out$rowid
class(out) <- c("custom", class(out))
return(out)
}
# `tidy.custom()` is needed by `mice` to combine datasets, but the output of fit() also has
# the right structure and column names, so it is useless
tidy.custom <- function(x, ...) return(x)
# Fit on each imputation
mod_mice <- lapply(iris_mice, fit)
# Pool
mod_pool <- pool(mod_mice)$pooled
# Merge back some of the covariates
datplot <- data.frame(mod_pool, mod_mice[[1]][, c("Petal.Width", "Petal.Length")])
# Plot
ggplot(datplot, aes(Petal.Width, estimate, color = Petal.Length)) +
geom_line() +
theme_minimal()

plotting an interaction term in moderated regression using MICE imputation

I'm using imputed data to test a series of regression models, including some moderation models.
Imputation
imp_data <- mice(data,m=20,maxit=20,meth='cart',seed=12345)
I then convert this to long format so I can recode / sum variables as needed, beore turning back to mids format
impdatlong_mids<-as.mids(impdat_long)
Example model:
model1 <- with(impdatlong_mids,
lm(Outcome ~ p1_sex + p2 + p3 + p4
+ p5+ p6+ p7+ p8+ p9+ p10
+ p11+ p1_sex*p12+ p1_sex*p13 + p14)
in non-imputed data, to create a graphic representation of the significant ineraction, I'd use (e.g.)
interact_plot (model=model1, pred = p1_sex, modx = p12)
This doesn't work with imputed data / mids objects.
Has anyone plotted an interaction using imputed data, and able to help or share examples?
Thanks
EDIT: Reproducible example
library(tidyverse)
library(interactions)
library(mice)
# library(reprex) does not work with this
set.seed(42)
options(warn=-1)
#---------------------------------------#
# Data preparations
# loading an editing data
d <- mtcars
d <- d %>% mutate_at(c('cyl','am'),factor)
# create missing data and impute it
mi_d <- d
nr_of_NAs <- 30
for (i in 1:nr_of_NAs) {
mi_d[sample(nrow(mi_d),1),sample(ncol(mi_d),1)] <- NA
}
mi_d <- mice(mi_d, m=2, maxit=2)
#---------------------------------------#
# regressions
#not imputed
lm_d <- lm(qsec ~ cyl*am + mpg*disp, data=d)
#imputed dataset
lm_mi <- with(mi_d,lm(qsec ~ cyl*am + mpg*disp))
lm_mi_pool <- pool(lm_mi)
#---------------------------------------#
# interaction plots
# not imputed
#continuous
interactions::interact_plot(lm_d, pred=mpg,modx=disp, interval=T,int.width=0.3)
#categorical
interactions::cat_plot(lm_d, pred = cyl, modx = am)
#---------------------------------------#
# interaction plots
# imputed
#continuous
interactions::interact_plot(lm_mi_pool, pred=mpg,modx=disp, interval=T,int.width=0.3)
# Error in model.frame.default(model) : object is not a matrix
#categorical
interactions::cat_plot(lm_mi_pool, pred = cyl, modx = am)
# Error in model.frame.default(model) : object is not a matrix
The problem seems to be that neither interact_plot, cat_plot or any other available package allows for (at least categorical) interaction plotting with objects of class mipo or pooled regression outputs.
I am using the walking data from the mice package as an example. One way to get the interaction plot (well version of one type of interaction plot) is to use the gtsummary package. Under the hood it will take the model1 use pool() from mice to average over the models and then use a combo of tbl_regression() and plot() to output a plot of the coefficients in the model. The tbl_regression() function is what is calling the pool() function.
library(mice)
library(dplyr)
library(gtsummary)
imp_data <- mice(mice::walking,m=20,maxit=20,meth='cart',seed=12345)
model1 <- with(imp_data,
lm(age ~ sex*YA))
model1 %>%
tbl_regression() %>%
plot()
The package emmeans allows you to extract interaction effects from a mira object. Here is a gentle introduction. After that, the interactions can be plotted with appropriate ggplot. This example is for the categorical variables but could be extended to the continous case - after the emmeans part things get relatively straighforward.
library(ggplot2)
library(ggstance)
library(emmeans)
library(khroma)
library(jtools)
lm_mi <- with(mi_d,lm(qsec ~ gear*carb))
#extracting interaction effects
emcatcat <- emmeans(lm_mi, ~gear*carb)
tidy <- as_tibble(emcatcat)
#plotting
pd <- position_dodge(0.5)
ggplot(tidy, aes(y=gear, x=emmean, colour=carb)) +
geom_linerangeh(aes(xmin=lower.CL, xmax=upper.CL), position=pd,size = 2) +
geom_point(position=pd,size = 4)+
ggtitle('Interactions') +
labs (x = "aggreageted interaction effect") +
scale_color_bright() +
theme_nice()
this can be extended to a three-way interaction plot with facet_grid as long as you have a third categorical interaction term.

R: Filtering data in plm

I have a pdata.frame for 14 years x 89 observations and 10 variables + 4 dummies.
Those dummies variables are only for filtering (when necessary) my data.
When using Stata, I just add an "if VAR==1" at the end of my code.
How to use this with plm package in R?
Examples
Stata code
quietly xtreg y x1 x2 if x3==1, fe
R code
plm( y ~ x1 + x2, data = PANEL, model = "within")
Must I create separate panels, already filtered data, or is it possible to do it while running plm?
You can either use the subset option in plm (subset=) or you subset the data before fitting it
Using the dataset from the package, subset on region ==6,
library(plm)
data("Produc", package = "plm")
fit1 = plm(gsp ~ hwy + pc, data = Produc, subset = region == 6)
fit2 = plm(gsp ~ hwy + pc, data = subset(Produc, region == 6))
identical(coefficients(fit1), coefficients(fit2))

Running Cox.ph model with GAMM mixed models in R

I am new in using GAM and splines. I am running a survival model in which I want to model the Time to event with the age of the subjects controlling by two variables. Here is the example using a conventional survival model with coxph:
library(survival)
fit_cox<-coxph(Surv(time, event)~ age+ var1 + var2, data=mydata)
I suspect that the relationship between var1 and var2 with the outcome is not linear and also I am thinking that I can include random effects in my model (moving to mixed effect models gamm).
I have tried this syntax:
library(mgcv)
fit_surv<-Surv(time, event)
fit_gam<-gam(fit_surv ~ age + s(var1) + s(var2), data = mydata, family = cox.ph())
And to include the random effects:
library(gamm4)
fit_gamm <- gamm4(fit_surv ~ age + s(var1) + s(var2), random = ~(1 | ID), data = mydata, family = cox.ph)
My problems are:
1. In fit_gam I do not know how to make a summary of this model and to see the coefficients table and plot the model. This error came to me:
summary(fit_gam)
"Error in Ops.Surv(w, object$y) : Invalid operation on a survival time"
In fit_gamm I could not run the model because some error in syntaxis is made or maybe I could not include a surv? The error is:
"Error in ncol(x) : object 'x' not found"
Thank you in advance!
As mentioned in the comments, simple gaussian frailties (gaussian random intercept) can be specified directly within the mgcv::gam call, e.g. by adding ... + s(ID, bs = "re") + ... to your formula (note that ID has to be a factor variable).
Alternatively, you can transform the data to the so called Piece-wise Exponential Data (PED) format and fit the model using any GA(M)M software, which are then called Piece-wise exponential Additive Mixed Models (PAMM). Here is an example:
library(coxme)
library(mgcv)
library(pammtools)
lung <- lung %>% mutate(inst = as.factor(inst)) %>% na.omit()
## cox model with gaussian frailty
cme <- coxme(Surv(time, status) ~ ph.ecog + (1|inst), data=lung)
## pamm with gaussian frailty
ped <- lung %>% as_ped(Surv(time, status)~., id="id")
pam <- gam(ped_status ~ s(tend) + ph.ecog + s(inst, bs = "re"),
data = ped, family = poisson(), offset = offset)
## visualize random effect:
gg_re(pam)
# compare coxme and pamm estimates:
re <- tidy_re(pam)
plot(cme$frail$inst, re$fit, las=1, xlab="Frailty (cox)", ylab="Frailty (PAM)")
abline(0, 1)
## with gamm4
library(gamm4)
#> Loading required package: Matrix
#> Loading required package: lme4
#>
#> Attaching package: 'lme4'
#> The following object is masked from 'package:nlme':
#>
#> lmList
#> This is gamm4 0.2-5
pam2 <- gamm4(ped_status ~ s(tend) + ph.ecog, random = ~(1|inst),
family = poisson(), offset = ped$offset, data = ped)
lattice::qqmath(ranef(pam2$mer)$inst[, 1])
Created on 2018-12-08 by the reprex package (v0.2.1)

sjt.lmer displaying incorrect p-values

I've just noticed that sjt.lmer tables are displaying incorrect p-values, e.g., p-values that do not reflect the model summary. This appears to be a new-ish issue, as this worked fine last month?
Using the provided data and code in the package vignette
library(sjPlot)
library(sjmisc)
library(sjlabelled)
library(lme4)
library(sjstats)
load sample data
data(efc)
prepare grouping variables
efc$grp = as.factor(efc$e15relat)
levels(x = efc$grp) <- get_labels(efc$e15relat)
efc$care.level <- rec(efc$n4pstu, rec = "0=0;1=1;2=2;3:4=4",
val.labels = c("none", "I", "II", "III"))
data frame for fitted model
mydf <- data.frame(
neg_c_7 = efc$neg_c_7,
sex = to_factor(efc$c161sex),
c12hour = efc$c12hour,
barthel = efc$barthtot,
education = to_factor(efc$c172code),
grp = efc$grp,
carelevel = to_factor(efc$care.level)
)
fit sample models
fit1 <- lmer(neg_c_7 ~ sex + c12hour + barthel + (1 | grp), data = mydf)
summary(fit1)
p_value(fit1, p.kr =TRUE)
model summary
p_value summary
sjt.lmer output does not show these p-values??
Note that the first summary comes from a model fitted with lmerTest, which computes p-values with df based on Satterthwaite approximation (see first line in output).
p_value(), however, with p.kr = TRUE, uses the Kenward-Roger approximation from package pbkrtest, which is a bit more conservative.
Your output from sjt.lmer() seems to be messed up somehow, and I can't reproduce it with your example. My output looks ok:

Resources