I am trying to write a function to run GEE using the geepack package. It works fine "on its own" but not within a function, please see example below:
library(geepack)
library(pstools)
df <- data.frame(study_id = c(1:20),
leptin = runif(20),
insulin = runif(20),
age = runif(20, min = 20, max = 45),
sex = sample(c(0,1), size = 20, replace = TRUE))
#Works
geepack::geeglm(leptin ~ insulin + age + sex, id = study_id, data = df)
#Doesn't work
model_function_covariates_gee <- function(x,y) {
M1 <- paste0(x, "~", y, "+ age + sex")
M1_fit <- geepack::geeglm(M1, id = study_id, data = df)
s <- summary(M1_fit)
return(s)
}
model_function_covariates_gee("leptin", "insulin")
Error message:
Error in mcall$formula[3] <- switch(match(length(sformula), c(0, 2, 3)), :
incompatible types (from language to character) in subassignment type fix
Does anyone know why this is? I've fiddled around with it but can't get it to change. Thanks in advance.
Related
I am trying to save a ggsurvplot with risk.table using ggsave. However, the output off ggsave is always just the risk.table. I also tried this and this. None is working.
library(data.table)
library(survival)
library(survminer)
OS <- c(c(1:100), seq(1, 75, length = 50), c(1:50))
dead <- rep(1, times = 200)
variable <- c(rep(0, times = 100), rep(1, times = 50), rep(2, times = 50))
dt <- data.table(OS = OS,
dead = dead,
variable = variable)
survfit <- survfit(Surv(OS, dead) ~ variable, data = dt)
ggsurvplot(survfit, data = dt,
risk.table = TRUE)
ggsave("test.png")
The main issue is that a ggsurvplot object is a list of plots. Hence, when using ggsave only the last plot or element of the list is saved.
There is already a GitHub issue on that topic with several workarounds, e.g. using one of the more recent suggestions this works fine for me
library(survival)
library(survminer)
OS <- c(c(1:100), seq(1, 75, length = 50), c(1:50))
dead <- rep(1, times = 200)
variable <- c(rep(0, times = 100), rep(1, times = 50), rep(2, times = 50))
dt <- data.frame(OS = OS,
dead = dead,
variable = variable)
survfit <- survfit(Surv(OS, dead) ~ variable, data = dt)
# add method to grid.draw
grid.draw.ggsurvplot <- function(x){
survminer:::print.ggsurvplot(x, newpage = FALSE)
}
p <- ggsurvplot(survfit, data = dt, risk.table = TRUE)
ggsave("test.png", p, height = 6, width = 6)
For a study I am working on I need to create bootstrapped datasets and inverse probability weights for each dataset and then run a series of models for each of these datasets/weights. I am attempting to do this with a nested for-loop where the first part of the loop creates the weights and the nested loop runs a series of models, each with different outcome variables and/or predictors. I am running about 80 models for each bootstrapped dataset, hence the reason for a more automated way to do this. Below is a example of what I am doing with some mock data:
# Creation of mock data
data <- data.frame("Severity" = as.factor(c(rep("None", 25), rep("Mild", 25), rep("Moderate", 25), rep("Severe", 25))), "Severity2" = as.factor(c(rep("None", 40), rep("Mild", 20), rep("Moderate", 20), rep("Severe", 20))), "Weight" = rnorm(100, mean = 160, sd = 30), "Age" = rnorm(100, mean = 40, sd = 7), "Gender" = as.factor(rbinom(100, size = 1, prob = 0.5)), "Tested" = as.factor(rbinom(100, size = 1, prob = 0.4)))
data$Severity <- ifelse(data$Tested == 0, NA, data$Severity)
data$Severity2 <- ifelse(data$Tested == 0, NA, data$Severity2)
data$Severity <- ordered(data$Severity, levels = c("None", "Mild", "Moderate", "Severe"))
data$Severity2 <- ordered(data$Severity2, levels = c("None", "Mild", "Moderate", "Severe"))
# Creating boostrapped datasets
nboot <- 2
set.seed(10)
boot.samples <- lapply(1:nboot, function(i) {
data[base::sample(1:nrow(data), replace = TRUE),]
})
# Create empty list to store results later
coefs <- list()
# Setting up the outcomes/predictors of each of the models I will run
mod1 <- list("outcome" <- "Severity", "preds" <- c("Weight","Age"))
mod2 <- list("outcome" <- "Severity2", "preds" <- c("Weight", "Age", "Gender"))
models <- list(mod1, mod2)
# Running the for-loop
for(i in 1:length(boot.samples)) {
#Setting up weight creation
null <- glm(formula = Tested ~ 1, family = "binomial", data = boot.samples[[i]])
full <- glm(formula = Tested ~ Age, family = "binomial", data = boot.samples[[i]])
step <- step(null, k = 2, direction = "forward", scope=list(lower = null, upper = full), trace = 0)
pd.combined <- stats::predict(step, type = "response")
numer.combined <- glm(Tested ~ 1, family = "binomial",
data = boot.samples[[i]])
pn.combined <- stats::predict(numer.combined, type = "response")
# Creating stabilized weights
boot.samples[[i]]$ipw <- ifelse(boot.samples[[i]]$Tested==0, ((1-pn.combined)/(1-pd.combined)), (pn.combined)/(pd.combined))
# Now running each model and storing the coefficients
for(j in 1:length(models)) {
outcome <- models[[j]][[1]] # Set the outcome name
predictors <- models[[j]][[2]] # Set the predictor names
model_results <- polr(boot.samples[[i]][,outcome] ~ boot.samples[[i]][, predictors], weights = boot.samples[[i]]$ipw, method = c("logistic"), Hess = TRUE) #Run the model
coefs[[j]] <- model_results$coefficients # Store regression model coefficients in list
}
}
The portion for creating the IPW weights works just fine, but I keep getting an error for the modeling portion that reads:
"Error in model.frame.default(formula = boot.samples[[i]][, outcome] ~ :
invalid type (list) for variable 'boot.samples[[i]][, predictors]'"
Based on the question asked and answered here: Error in model.frame.default ..... : invalid type (list) for variable I know that the issue is with how I'm calling the outcomes and predictors in the model. I've messed around lots of different ways to handle this to no avail, I need to specify the outcome and predictors as I do because in my actual models the outcomes and predictors changes with each model! Any ideas on how to deal with this would be greatly appreciated!
I've tried something like setting outcome <- boot.samples[[i]][,outcome] outside of the model and then just calling outcome in the model, but that gives me the same error.
I have run the below GAM and am trying to plot a rootogram() using the countreg package to check for overdispersion, but get the error Error in X[, pstart[i] - 1 + 1:object$nsdf[i]] <- Xp : number of items to replace is not a multiple of replacement length.
I understand what the error message is telling me, that the length of two vectors/objects do not match, but am none the wiser as to how to fix it. Any help/suggestions would be appreciated? Has anyone had this problem previously, if so how did you fix it?
This may be arising due to a peculiarity in my data as I have never previously had a problem producing rootograms when using other datasets.
# I cannot fit a rootogram from the following GAM
> knots2 <- list(nMonth = c(0.5, 12.5))
> sup15 <- gam(Number ~ State + Virus + State*Virus + s(nMonth, bs = "cc", k = 12, by = Virus) + s(Time, k = 60, by = Virus),
data = supply.pad,
family = nb(),
method = "REML",
knots = knots2)
> root_nb <- rootogram(sup15, style = "hanging", plot = FALSE)
Error in X[, pstart[i] - 1 + 1:object$nsdf[i]] <- Xp :
number of items to replace is not a multiple of replacement length
# But can fit a rootogram from the below GAM. Note that these are different datasets but pretty much the same code.
> knots1 <- list(month = c(0.5, 12.5))
> gam10 <- gam(n ~ State + s(month, bs = "cc", k = 12) + s(time),
data = rhdv.gp.pad,
family = nb(),
method = "REML",
knots = knots1)
> root_nb1 <- rootogram(gam10, style = "hanging", plot = FALSE)
I am trying to create an effect plot for a cox proportional hazards model:
fitC7 <- coxph(Surv(TimeDeath, event == 1) ~
strata(sex) * mutation + age
+ ns(BM1, 3),
data = data)
I created a new dataset as follows:
ND1a <- with(data, expand.grid(age = seq(30, 75, length.out = 40), mutation = factor(c("Yes", "No")), sex = factor(c("male", "female")), BM1 = 1.583926))
Then, I tried to use the predict function:
predict(fitC7, newdata = ND1a, type = "lp", se.fit = T)
However, I keep getting the error:
Error in newx - xmeans[match(newstrat, row.names(xmeans)), ] : non-conformable arrays
and I do not know how to correct this.
It does work when I put in a model without sex as a stratifier, e.g.,
fitC9 <- coxph(Surv(TimeDeath, event ==1) ~
sex * mutation + age +
ns(BM1, 3), data = data)
I hope someone can help me, I could not figure it out with previous question and answer threads.
library(lme4)
dummy <- as.data.frame(cbind(speed = rpois(100, 10), pop = rep(1:4, each = 25), season = rep(1:2, each = 50), id = seq(1, 100, by = 1)))
dummy2 <- as.data.frame(cbind(speed = c(rnbinom(50, 10, 0.6), rnbinom(50, 10, 0.1)), pop = rep(1:4, each = 25), season = rep(1:2, each = 50), id = seq(1, 100, by = 1)))
poisson <- glmer(speed~pop*season + (1|id),
data=dummy, family="poisson")
neg.bin <- glmer.nb(speed ~ pop*season + (1|id),
data=dummy2, control=glmerControl(optimizer="bobyqa"))
When I run a script creating a Poisson model before a negative binomial model using the lme4 package, I get the following error when running the neg.bin model:
Error in family$family : $ operator not defined for this S4 class
However, if I run the models in the opposite order, I don't the error message.
library(lme4)
dummy <- as.data.frame(cbind(speed = rpois(100, 10), pop = rep(1:4, each = 25), season = rep(1:2, each = 50), id = seq(1, 100, by = 1)))
dummy2 <- as.data.frame(cbind(speed = c(rnbinom(50, 10, 0.6), rnbinom(50, 10, 0.1)), pop = rep(1:4, each = 25), season = rep(1:2, each = 50), id = seq(1, 100, by = 1)))
neg.bin <- glmer.nb(speed ~ pop*season + (1|id),
data=dummy2, control=glmerControl(optimizer="bobyqa"))
poisson <- glmer(speed~pop*season + (1|id),
data=dummy, family="poisson")
The neg.bin model example does have convergence warnings, but the same pattern is happening with my actual models which are converging fine. How is running the Poisson model first affecting the neg.bin model?
Because you have masked R function poisson. The following would work fine (except that there is convergence warning for neg.bin):
library(lme4)
set.seed(0)
dummy <- as.data.frame(cbind(speed = rpois(100, 10), pop = rep(1:4, each = 25), season = rep(1:2, each = 50), id = seq(1, 100, by = 1)))
dummy2 <- as.data.frame(cbind(speed = c(rnbinom(50, 10, 0.6), rnbinom(50, 10, 0.1)), pop = rep(1:4, each = 25), season = rep(1:2, each = 50), id = seq(1, 100, by = 1)))
## use a different name for your model, say `poisson_fit`
poisson_fit <- glmer(speed~pop*season + (1|id),
data=dummy, family="poisson")
negbin_fit <- glmer.nb(speed ~ pop*season + (1|id),
data=dummy2, control=glmerControl(optimizer="bobyqa"))
Here is the issue. Among the very first few lines of glmer.nb there is one line:
mc$family <- quote(poisson)
So, if you mask poisson, correct function poisson from stats package can not be found.
Ben has just fixed this issue, by replacing this to:
mc$family <- quote(stats::poisson)
My original observation on family = "poisson" and match.fun stuff is not the real issue here. It only explains why in routines like glm and mgcv::gam, it is legitimate to pass a string of family.