Error in R: Non-conformable arrays, how to fix? - r

I am trying to create an effect plot for a cox proportional hazards model:
fitC7 <- coxph(Surv(TimeDeath, event == 1) ~
strata(sex) * mutation + age
+ ns(BM1, 3),
data = data)
I created a new dataset as follows:
ND1a <- with(data, expand.grid(age = seq(30, 75, length.out = 40), mutation = factor(c("Yes", "No")), sex = factor(c("male", "female")), BM1 = 1.583926))
Then, I tried to use the predict function:
predict(fitC7, newdata = ND1a, type = "lp", se.fit = T)
However, I keep getting the error:
Error in newx - xmeans[match(newstrat, row.names(xmeans)), ] : non-conformable arrays
and I do not know how to correct this.
It does work when I put in a model without sex as a stratifier, e.g.,
fitC9 <- coxph(Surv(TimeDeath, event ==1) ~
sex * mutation + age +
ns(BM1, 3), data = data)
I hope someone can help me, I could not figure it out with previous question and answer threads.

Related

Getting error "invalid type (list) for variable" when running multiple models in a for loop: how to specify outcome/predictors?

For a study I am working on I need to create bootstrapped datasets and inverse probability weights for each dataset and then run a series of models for each of these datasets/weights. I am attempting to do this with a nested for-loop where the first part of the loop creates the weights and the nested loop runs a series of models, each with different outcome variables and/or predictors. I am running about 80 models for each bootstrapped dataset, hence the reason for a more automated way to do this. Below is a example of what I am doing with some mock data:
# Creation of mock data
data <- data.frame("Severity" = as.factor(c(rep("None", 25), rep("Mild", 25), rep("Moderate", 25), rep("Severe", 25))), "Severity2" = as.factor(c(rep("None", 40), rep("Mild", 20), rep("Moderate", 20), rep("Severe", 20))), "Weight" = rnorm(100, mean = 160, sd = 30), "Age" = rnorm(100, mean = 40, sd = 7), "Gender" = as.factor(rbinom(100, size = 1, prob = 0.5)), "Tested" = as.factor(rbinom(100, size = 1, prob = 0.4)))
data$Severity <- ifelse(data$Tested == 0, NA, data$Severity)
data$Severity2 <- ifelse(data$Tested == 0, NA, data$Severity2)
data$Severity <- ordered(data$Severity, levels = c("None", "Mild", "Moderate", "Severe"))
data$Severity2 <- ordered(data$Severity2, levels = c("None", "Mild", "Moderate", "Severe"))
# Creating boostrapped datasets
nboot <- 2
set.seed(10)
boot.samples <- lapply(1:nboot, function(i) {
data[base::sample(1:nrow(data), replace = TRUE),]
})
# Create empty list to store results later
coefs <- list()
# Setting up the outcomes/predictors of each of the models I will run
mod1 <- list("outcome" <- "Severity", "preds" <- c("Weight","Age"))
mod2 <- list("outcome" <- "Severity2", "preds" <- c("Weight", "Age", "Gender"))
models <- list(mod1, mod2)
# Running the for-loop
for(i in 1:length(boot.samples)) {
#Setting up weight creation
null <- glm(formula = Tested ~ 1, family = "binomial", data = boot.samples[[i]])
full <- glm(formula = Tested ~ Age, family = "binomial", data = boot.samples[[i]])
step <- step(null, k = 2, direction = "forward", scope=list(lower = null, upper = full), trace = 0)
pd.combined <- stats::predict(step, type = "response")
numer.combined <- glm(Tested ~ 1, family = "binomial",
data = boot.samples[[i]])
pn.combined <- stats::predict(numer.combined, type = "response")
# Creating stabilized weights
boot.samples[[i]]$ipw <- ifelse(boot.samples[[i]]$Tested==0, ((1-pn.combined)/(1-pd.combined)), (pn.combined)/(pd.combined))
# Now running each model and storing the coefficients
for(j in 1:length(models)) {
outcome <- models[[j]][[1]] # Set the outcome name
predictors <- models[[j]][[2]] # Set the predictor names
model_results <- polr(boot.samples[[i]][,outcome] ~ boot.samples[[i]][, predictors], weights = boot.samples[[i]]$ipw, method = c("logistic"), Hess = TRUE) #Run the model
coefs[[j]] <- model_results$coefficients # Store regression model coefficients in list
}
}
The portion for creating the IPW weights works just fine, but I keep getting an error for the modeling portion that reads:
"Error in model.frame.default(formula = boot.samples[[i]][, outcome] ~ :
invalid type (list) for variable 'boot.samples[[i]][, predictors]'"
Based on the question asked and answered here: Error in model.frame.default ..... : invalid type (list) for variable I know that the issue is with how I'm calling the outcomes and predictors in the model. I've messed around lots of different ways to handle this to no avail, I need to specify the outcome and predictors as I do because in my actual models the outcomes and predictors changes with each model! Any ideas on how to deal with this would be greatly appreciated!
I've tried something like setting outcome <- boot.samples[[i]][,outcome] outside of the model and then just calling outcome in the model, but that gives me the same error.

rootogram() error when checking for overdispersion in GAM

I have run the below GAM and am trying to plot a rootogram() using the countreg package to check for overdispersion, but get the error Error in X[, pstart[i] - 1 + 1:object$nsdf[i]] <- Xp : number of items to replace is not a multiple of replacement length.
I understand what the error message is telling me, that the length of two vectors/objects do not match, but am none the wiser as to how to fix it. Any help/suggestions would be appreciated? Has anyone had this problem previously, if so how did you fix it?
This may be arising due to a peculiarity in my data as I have never previously had a problem producing rootograms when using other datasets.
# I cannot fit a rootogram from the following GAM
> knots2 <- list(nMonth = c(0.5, 12.5))
> sup15 <- gam(Number ~ State + Virus + State*Virus + s(nMonth, bs = "cc", k = 12, by = Virus) + s(Time, k = 60, by = Virus),
data = supply.pad,
family = nb(),
method = "REML",
knots = knots2)
> root_nb <- rootogram(sup15, style = "hanging", plot = FALSE)
Error in X[, pstart[i] - 1 + 1:object$nsdf[i]] <- Xp :
number of items to replace is not a multiple of replacement length
# But can fit a rootogram from the below GAM. Note that these are different datasets but pretty much the same code.
> knots1 <- list(month = c(0.5, 12.5))
> gam10 <- gam(n ~ State + s(month, bs = "cc", k = 12) + s(time),
data = rhdv.gp.pad,
family = nb(),
method = "REML",
knots = knots1)
> root_nb1 <- rootogram(gam10, style = "hanging", plot = FALSE)

Generalized estimating equations working by themselves but not within functions (R)

I am trying to write a function to run GEE using the geepack package. It works fine "on its own" but not within a function, please see example below:
library(geepack)
library(pstools)
df <- data.frame(study_id = c(1:20),
leptin = runif(20),
insulin = runif(20),
age = runif(20, min = 20, max = 45),
sex = sample(c(0,1), size = 20, replace = TRUE))
#Works
geepack::geeglm(leptin ~ insulin + age + sex, id = study_id, data = df)
#Doesn't work
model_function_covariates_gee <- function(x,y) {
M1 <- paste0(x, "~", y, "+ age + sex")
M1_fit <- geepack::geeglm(M1, id = study_id, data = df)
s <- summary(M1_fit)
return(s)
}
model_function_covariates_gee("leptin", "insulin")
Error message:
Error in mcall$formula[3] <- switch(match(length(sformula), c(0, 2, 3)), :
incompatible types (from language to character) in subassignment type fix
Does anyone know why this is? I've fiddled around with it but can't get it to change. Thanks in advance.

How to add a covariate to a between-within design MANCOVA with R?

I want to run a between-within design MANCOVA with R, with two dependent variables (Planned and Unplanned), two between-subject variables (Genre [Male, Female] and Urb [Yes, No]), one within-subject variable (Period [Before, During]), and one covariate (BMI).
Here is what I've done (see here for similar calculation: https://stats.stackexchange.com/questions/183441/correct-way-to-perform-a-one-way-within-subjects-manova-in-r):
# Create dummy data
data <- data.frame(Quest_before_planned = sample(1:100, 10),
Quest_during_planned = sample(1:100, 10),
Quest_before_unplanned = sample(1:100, 10),
Quest_during_unplanned = sample(1:100, 10),
Genre = sample(rep(c("Male", "Female"), each = 5)),
Urb = sample(rep(c("Yes", "No"), each = 5)),
BMI = sample(1:100, 10))
# Define the within-subjects factor
period <- as.factor(rep(c('before','during'), each = 2))
idata <- data.frame(period)
# Create the data structure for the linear model
data.model <- with(data, cbind(Quest_before_planned, Quest_during_planned,
Quest_before_unplanned, Quest_during_unplanned))
# Build the multivariate-linear model
mod.mlm <- lm(data.model ~ Genre * Urb, data = data_total)
# Run the MANOVA
mav.blpaq <- Anova(mod.mlm, idata = idata, idesign = ~ period, type = 2)
print(mav.blpaq)
Thus, the between-within design MANOVA here works well. However, I failed to add a covariate (i.e., BMI) to this model. Do you know how can I achieve this?
N.B.: I also tried using the (great) mancova() function , which include a covariate parameter; but with this function, I do not know how to specify that Period is a within-subject variable...
blpaq_macov <- mancova(data_tidy,
deps = c("Quest_planned", "Quest_unplanned"),
factors = c("Genre", "Period", "Urb"),
covs = "BMI",
multivar = "pillai")

Errors in segmented package: breakpoints confusion

Using the segmented package to create a piecewise linear regression I am seeing an error when I try to set my own breakpoints; it seems only when I try to set more than two.
(EDIT) Here is the code I am using:
# data
bullard <- structure(list(Rt = c(0, 4.0054, 25.1858, 27.9998, 35.7259, 39.0769,
45.1805, 45.6717, 48.3419, 51.5661, 64.1578, 66.828, 111.1613,
114.2518, 121.8681, 146.0591, 148.8134, 164.6219, 176.522, 177.9578,
180.8773, 187.1846, 210.5131, 211.483, 230.2598, 262.3549, 266.2318,
303.3181, 329.4067, 335.0262, 337.8323, 343.1142, 352.2322, 367.8386,
380.09, 388.5412, 390.4162, 395.6409), Tem = c(15.248, 15.4523,
16.0761, 16.2013, 16.5914, 16.8777, 17.3545, 17.3877, 17.5307,
17.7079, 18.4177, 18.575, 19.8261, 19.9731, 20.4074, 21.2622,
21.4117, 22.1776, 23.4835, 23.6738, 23.9973, 24.4976, 25.7585,
26.0231, 28.5495, 30.8602, 31.3067, 37.3183, 39.2858, 39.4731,
39.6756, 39.9271, 40.6634, 42.3641, 43.9158, 44.1891, 44.3563,
44.5837)), .Names = c("Rt", "Tem"), class = "data.frame", row.names = c(NA,
-38L))
library(segmented)
# create a linear model
out.lm <- lm(Tem ~ Rt, data=bullard)
o<-segmented(out.lm, seg.Z=~Rt, psi=list(Rt=c(200,300)), control=seg.control(display=FALSE))
Using the psi option, I have tried the following:
psi = list(x = c(150, 300)) -- OK
psi = list(x = c(100, 200)) -- OK
psi = list(x = c(200, 300)) -- OK
psi = list(x = c(100, 300)) -- OK
psi = list(x = c(120, 150, 300)) -- error 1 below
psi = list(x = c(120, 300)) -- OK
psi = list(x = c(120, 150)) -- OK
psi = list(x = c(150, 300)) -- OK
psi = list(x = c(100, 200, 300)) -- error 2 below
(1) Error in segmented.lm(out.lm, seg.Z = ~Rt, psi = list(Rt = c(120, 150, :
only 1 datum in an interval: breakpoint(s) at the boundary or too close
(2) Error in diag(Cov[id, id]) : subscript out of bounds
I have already listed my data at this question, but as a guide the limits on the x data are about 0--400.
A second question that pertains to this one is: how do I actually fix the breakpoints using this segmented package?
The issue here seems to be poor error trapping in the segmented package. Having a look at the code for segmented.lm allows a bit of debugging. For example, in the case of psi = list(x = c(100, 200, 300)), an augmented linear model is fitted as shown below:
lm(formula = Tem ~ Rt + U1.Rt + U2.Rt + U3.Rt + psi1.Rt + psi2.Rt +
psi3.Rt, data = mf)
Call:
lm(formula = Tem ~ Rt + U1.Rt + U2.Rt + U3.Rt + psi1.Rt + psi2.Rt +
psi3.Rt, data = mf)
Coefficients:
(Intercept) Rt U1.Rt U2.Rt U3.Rt psi1.Rt
15.34303 0.04149 0.04591 742.74186 -742.74499 1.02252
psi2.Rt psi3.Rt
NA NA
As you can see, the fit has NA values which then result in a degenerate variance-covariance matrix (called Cov in the code). The function doesn't check for this and tries to pull out diagonal entries from Cov and fails with the error message shown. At least the first error, although perhaps not overly helpful, is caught by the function itself and suggests that the break-points are too close.
In the absence of better error trapping in the function, I think that all you can do is adopt a trial and error approach (and avoid break points which are too close). For example, psi = list(x = c(50, 200, 300)) seems to work ok.
If you use while and tryCatch you can make the command repeat itself until it decides there is no error in the model #jaySf. I'm guessing this is down to the randomiser settings in the function, which can be seen in seg.control.
lm.model <- lm(xdat ~ ydat, data = x)
if.false <- F
while(if.false == F){
tryCatch({
s <- segmented(lm.model, seg.Z =~ydata, psi = NA)
if.false <- T
}, error = function(e){
}, finally = {})
}

Resources