Creating Survival Trees with MST package: Undefined Columns Error? - r

I am trying to create a survival Tree with the MST package from R. I have been looking into this paper.
I replicated their example with randomly generated Data and it works just fine. I adjusted my data to fit the same model. My data has the same columns and the same datatypes.
I keep getting this error:
Error in `[.data.frame`(mf_data[col.split.var], , 3) : undefined columns selected
with the following line of code:
fit <- MST(formula = Surv(time,status) ~ x1 + | id), data = data)
I have looked through all of the documentation and I didnt find anything and I can't understand why this error appears.
The code form the paper looks like this:
set.seed(186117)
data <- rmultime(N = 200, K = 4, beta = c(-1, 0.8, 0.8, 0, 0),cutoff = c(0.5, 0.3, 0, 0), model = "marginal.multivariate.exponential", rho = 0.65)$dat
test <- rmultime(N = 100, K = 4, beta = c(-1, 0.8, 0.8, 0, 0), cutoff = c(0.5, 0.3, 0, 0), model = "marginal.multivariate.exponential",rho = 0.65)$dat
fit <- MST(formula = Surv(time, status) ~ x1 + x2 + x3 + x4 | id,data, test, method = "marginal", minsplit = 100, minevents = 20,selection.method = "test.sample")

I tried running your code and I do get an error although not the one you are getting and I'm fairly sure after looking at it that you need to use the [edit] features of SO to modify your question.
> fit <- MST(formula = Surv(time,status) ~ x1 + | id), data = data)
Error: unexpected '|' in "fit <- MST(formula = Surv(time,status) ~ x1 + |"
The formula give is obviously wrong and there is an unnecesary closing parentheses. I am able to get teh error you report with:
> fit <- MST(formula = Surv(time,status) ~ x1 | id, data = data)
[1] "No test sample supplied, changed selection.method = 'bootstrap'"
Error in `[.data.frame`(mf_data[col.split.var], , 3) :
undefined columns selected
.... but not with the original code:
fit <- MST(formula = Surv(time, status) ~ x1 + x2 + x3 + x4 | id,data, test, method = "marginal", minsplit = 100, minevents = 20,selection.method = "test.sample")
I also see an erroir with x1+x2|id on the RHS of the formula but not with three variables:
> fit <- MST(formula = Surv(time, status) ~ x1 +x2 | id,data, test, method = "marginal", minsplit = 100, minevents = 20,selection.method = "test.sample")
Error in `[.data.frame`(mf_data[col.split.var], , 3) :
undefined columns selected
> fit <- MST(formula = Surv(time, status) ~ x1 +x2+x3| id,data, test, method = "marginal", minsplit = 100, minevents = 20,selection.method = "test.sample")
So I'm thinking is is a bug that the developers had not anticipated. Here's how to obtain the needed email address to report:
> maintainer("MST")
[1] "Peter Calhoun <calhoun.peter#gmail.com>"

Related

nlme with correlation structure not fitting and crashes R

I have a mixed model with a non-linear term, so I would like to use the R package nlme instead of lme. However, switching to nlme, even without adding anything to the model, causes Rstudio and R to crash.
I have found that even generated data, which can easily be fitted using lme, causes this behaviour (on my computer).
Let's start by loading the libraries and setting up a data.frame with the grouping id and spatial coordinate x.
library(nlme)
nid <- 300
nx <- 10
data <- expand.grid(
x = seq(nx),
id = seq(nid)
)
Now, let's add correlated error and uncorrelated error as separate columns, as well as a random intercept value per id. The output of arima.sim requires a normalisation step.
data$ec <- c(
replicate(
nid,
as.numeric(
arima.sim(
model = list(
order = c(1, 0, 0),
ar = 0.5
),
n = nx
)
)
)
)
data$ec <- data$ec / sd(data$ec)
data$eu <- rnorm(nid * nx)
data$random <- rep(rnorm(nid), each = nx)
Now, we can create 3 dependent variables, for 3 models. The first is a mixed model with uncorrelated (regular) error. The second includes an exponential (AR1) correlation structure. The third combines both. I am adding an intercept of 1, an sd of the random effect of 2 and an sd of the total residual error of 3.
data$y1 <- 1 + 2 * data$random + 3 * data$eu
data$y2 <- 1 + 2 * data$random + 3 * data$ec
data$y3 <- 1 + 2 * data$random + sqrt(8) * data$ec + sqrt(1) * data$eu
All of the following lme models fit without problem, giving the expected result.
l1 <- lme(
fixed = y1 ~ 1,
random = ~ 1 | id,
data = data,
method = "ML"
)
l2 <- lme(
fixed = y2 ~ 1,
random = ~ 1 | id,
correlation = corExp(
form = ~ x | id
),
data = data,
method = "ML"
)
l3 <- lme(
fixed = y3 ~ 1,
random = ~ 1 | id,
correlation = corExp(
form = ~ x | id,
nugget = TRUE
),
data = data,
method = "ML"
)
As far as I know, the following nlme code specifies exactly the same models as above. The first runs without issues. But the ones with a correlation structure crash R / RStudio. No warning or error message is provided. Fiddling with the arguments and with nlmeControl does not help, though I do think nlmeControl could be the place to search for a solution.
nlme(
model = y1 ~ b0,
fixed = b0 ~ 1,
random = b0 ~ 1,
group = ~ id,
data = data,
start = list(
fixed = fixed.effects(l1),
random = setNames(random.effects(l1), "b0")
),
method = "ML"
)
nlme(
model = y2 ~ b0,
fixed = b0 ~ 1,
random = b0 ~ 1,
group = ~ id,
correlation = corExp(
form = ~ x
),
data = data,
start = list(
fixed = fixed.effects(l2),
random = setNames(random.effects(l2), "b0")
),
method = "ML"
)
nlme(
model = y3 ~ b0,
fixed = b0 ~ 1,
random = b0 ~ 1,
group = ~ id,
correlation = corExp(
form = ~ x,
nugget = TRUE
),
data = data,
start = list(
fixed = fixed.effects(l3),
random = setNames(random.effects(l3), "b0")
),
method = "ML"
)
Has anyone experienced this before? Does my example code give the same problem on your computer? What are good strategies to change nlmeControl to attempt to remedy this?

rootogram() error when checking for overdispersion in GAM

I have run the below GAM and am trying to plot a rootogram() using the countreg package to check for overdispersion, but get the error Error in X[, pstart[i] - 1 + 1:object$nsdf[i]] <- Xp : number of items to replace is not a multiple of replacement length.
I understand what the error message is telling me, that the length of two vectors/objects do not match, but am none the wiser as to how to fix it. Any help/suggestions would be appreciated? Has anyone had this problem previously, if so how did you fix it?
This may be arising due to a peculiarity in my data as I have never previously had a problem producing rootograms when using other datasets.
# I cannot fit a rootogram from the following GAM
> knots2 <- list(nMonth = c(0.5, 12.5))
> sup15 <- gam(Number ~ State + Virus + State*Virus + s(nMonth, bs = "cc", k = 12, by = Virus) + s(Time, k = 60, by = Virus),
data = supply.pad,
family = nb(),
method = "REML",
knots = knots2)
> root_nb <- rootogram(sup15, style = "hanging", plot = FALSE)
Error in X[, pstart[i] - 1 + 1:object$nsdf[i]] <- Xp :
number of items to replace is not a multiple of replacement length
# But can fit a rootogram from the below GAM. Note that these are different datasets but pretty much the same code.
> knots1 <- list(month = c(0.5, 12.5))
> gam10 <- gam(n ~ State + s(month, bs = "cc", k = 12) + s(time),
data = rhdv.gp.pad,
family = nb(),
method = "REML",
knots = knots1)
> root_nb1 <- rootogram(gam10, style = "hanging", plot = FALSE)

Problem with updating terms in the multinom function

I am trying to add1 all interaction terms on top of a multinomial baseline model using multinom() but it shows the error
trying + x1:x2
Error in if (trace) { : argument is not interpretable as logical
Called from: nnet.default(X, Y, w, mask = mask, size = 0, skip = TRUE, softmax = TRUE,
censored = censored, rang = 0, ...)
What is the problem here? I appreciate any input. Here is a reproducible example:
require(nnet)
data <- data.frame(y=sample(1:3, 24, replace = TRUE),
x1 = c(rep(1,12), rep(2,12)),
x2 = rep(c(rep(1,4), rep(2,4), rep(3,4)),2),
x3=rnorm(24),
z1 = sample(1:10, 24, replace = TRUE))
m0 <- multinom(y ~ x1 + x2 + x3 + z1, data = data)
m1 <- add1(m0, scope = .~. + .^2, test="Chisq")
My end goal is to see which terms are appropriate to drop by later adding the line m1[order(add1.m1$'Pr(>Chi)'),].

mgcv::gamm() and MuMIn::dredge() errors

I've been trying to fit multiple GAMs using the package mgcv within a function, and crudely select the most appropriate model through model selection procedures. But my function runs the first model then doesn't seem to recognise the input data dat again.
I get the error
Error in is.data.frame(data) : object 'dat' not found.
I think this is a scoping problem and I've looked here, and here for help but cannot figure it out.
Code and data are as follows (hopefully reproducible):
https://github.com/cwaldock1/Help/blob/master/test_gam.csv
library(mgcv)
# Function to fit multiple models
best.mod <- function(dat) {
# Set up control structure
ctrl <- list(niterEM = 0, msVerbose = TRUE, optimMethod="L-BFGS-B")
# AR(1)
m1 <- get.models(dredge(gamm(Temp ~ s(Month, bs = "cc") + s(Date, bs = 'cr') + Year,
data = dat, correlation = corARMA(form = ~ 1|Year, p = 1),
control = ctrl)), subset=1)[[1]]
# AR(2)
m2 <- get.models(dredge(gamm(Temp ~ s(Month, bs = "cc") + s(Date, bs = 'cr') + Year,
data = dat, correlation = corARMA(form = ~ 1|Year, p = 2),
control = ctrl)), subset=1)[[1]]
# AR(3)
m3 <- get.models(dredge(gamm(Temp ~ s(Month, bs = "cc") + s(Date, bs = 'cr') + Year,
data = dat, correlation = corARMA(form = ~ 1|Year, p = 3),
control = ctrl)), subset = 1)[[1]]
### Select best model to work with based on unselective AIC criteria
if(AIC(m2$lme) > AIC(m1$lme)){mod = m1}else{mod = m2}
if(AIC(mod$lme) > AIC(m3$lme)){mod = m3}else{mod = mod}
return(mod$gam)
}
mod2 <- best.mod(dat = test_gam)
Any help would be greatly appreciated.
Thanks,
Conor
get.models evaluates in model's formula environment, which in gamm is
(always?) .GlobalEnv, while it should be function's environment (i.e.
sys.frames(sys.nframe())).
So, instead of
get.models(ms, 1)
use
eval(getCall(ms, 1))

Errors in segmented package: breakpoints confusion

Using the segmented package to create a piecewise linear regression I am seeing an error when I try to set my own breakpoints; it seems only when I try to set more than two.
(EDIT) Here is the code I am using:
# data
bullard <- structure(list(Rt = c(0, 4.0054, 25.1858, 27.9998, 35.7259, 39.0769,
45.1805, 45.6717, 48.3419, 51.5661, 64.1578, 66.828, 111.1613,
114.2518, 121.8681, 146.0591, 148.8134, 164.6219, 176.522, 177.9578,
180.8773, 187.1846, 210.5131, 211.483, 230.2598, 262.3549, 266.2318,
303.3181, 329.4067, 335.0262, 337.8323, 343.1142, 352.2322, 367.8386,
380.09, 388.5412, 390.4162, 395.6409), Tem = c(15.248, 15.4523,
16.0761, 16.2013, 16.5914, 16.8777, 17.3545, 17.3877, 17.5307,
17.7079, 18.4177, 18.575, 19.8261, 19.9731, 20.4074, 21.2622,
21.4117, 22.1776, 23.4835, 23.6738, 23.9973, 24.4976, 25.7585,
26.0231, 28.5495, 30.8602, 31.3067, 37.3183, 39.2858, 39.4731,
39.6756, 39.9271, 40.6634, 42.3641, 43.9158, 44.1891, 44.3563,
44.5837)), .Names = c("Rt", "Tem"), class = "data.frame", row.names = c(NA,
-38L))
library(segmented)
# create a linear model
out.lm <- lm(Tem ~ Rt, data=bullard)
o<-segmented(out.lm, seg.Z=~Rt, psi=list(Rt=c(200,300)), control=seg.control(display=FALSE))
Using the psi option, I have tried the following:
psi = list(x = c(150, 300)) -- OK
psi = list(x = c(100, 200)) -- OK
psi = list(x = c(200, 300)) -- OK
psi = list(x = c(100, 300)) -- OK
psi = list(x = c(120, 150, 300)) -- error 1 below
psi = list(x = c(120, 300)) -- OK
psi = list(x = c(120, 150)) -- OK
psi = list(x = c(150, 300)) -- OK
psi = list(x = c(100, 200, 300)) -- error 2 below
(1) Error in segmented.lm(out.lm, seg.Z = ~Rt, psi = list(Rt = c(120, 150, :
only 1 datum in an interval: breakpoint(s) at the boundary or too close
(2) Error in diag(Cov[id, id]) : subscript out of bounds
I have already listed my data at this question, but as a guide the limits on the x data are about 0--400.
A second question that pertains to this one is: how do I actually fix the breakpoints using this segmented package?
The issue here seems to be poor error trapping in the segmented package. Having a look at the code for segmented.lm allows a bit of debugging. For example, in the case of psi = list(x = c(100, 200, 300)), an augmented linear model is fitted as shown below:
lm(formula = Tem ~ Rt + U1.Rt + U2.Rt + U3.Rt + psi1.Rt + psi2.Rt +
psi3.Rt, data = mf)
Call:
lm(formula = Tem ~ Rt + U1.Rt + U2.Rt + U3.Rt + psi1.Rt + psi2.Rt +
psi3.Rt, data = mf)
Coefficients:
(Intercept) Rt U1.Rt U2.Rt U3.Rt psi1.Rt
15.34303 0.04149 0.04591 742.74186 -742.74499 1.02252
psi2.Rt psi3.Rt
NA NA
As you can see, the fit has NA values which then result in a degenerate variance-covariance matrix (called Cov in the code). The function doesn't check for this and tries to pull out diagonal entries from Cov and fails with the error message shown. At least the first error, although perhaps not overly helpful, is caught by the function itself and suggests that the break-points are too close.
In the absence of better error trapping in the function, I think that all you can do is adopt a trial and error approach (and avoid break points which are too close). For example, psi = list(x = c(50, 200, 300)) seems to work ok.
If you use while and tryCatch you can make the command repeat itself until it decides there is no error in the model #jaySf. I'm guessing this is down to the randomiser settings in the function, which can be seen in seg.control.
lm.model <- lm(xdat ~ ydat, data = x)
if.false <- F
while(if.false == F){
tryCatch({
s <- segmented(lm.model, seg.Z =~ydata, psi = NA)
if.false <- T
}, error = function(e){
}, finally = {})
}

Resources