Problem with updating terms in the multinom function - r

I am trying to add1 all interaction terms on top of a multinomial baseline model using multinom() but it shows the error
trying + x1:x2
Error in if (trace) { : argument is not interpretable as logical
Called from: nnet.default(X, Y, w, mask = mask, size = 0, skip = TRUE, softmax = TRUE,
censored = censored, rang = 0, ...)
What is the problem here? I appreciate any input. Here is a reproducible example:
require(nnet)
data <- data.frame(y=sample(1:3, 24, replace = TRUE),
x1 = c(rep(1,12), rep(2,12)),
x2 = rep(c(rep(1,4), rep(2,4), rep(3,4)),2),
x3=rnorm(24),
z1 = sample(1:10, 24, replace = TRUE))
m0 <- multinom(y ~ x1 + x2 + x3 + z1, data = data)
m1 <- add1(m0, scope = .~. + .^2, test="Chisq")
My end goal is to see which terms are appropriate to drop by later adding the line m1[order(add1.m1$'Pr(>Chi)'),].

Related

nlme with correlation structure not fitting and crashes R

I have a mixed model with a non-linear term, so I would like to use the R package nlme instead of lme. However, switching to nlme, even without adding anything to the model, causes Rstudio and R to crash.
I have found that even generated data, which can easily be fitted using lme, causes this behaviour (on my computer).
Let's start by loading the libraries and setting up a data.frame with the grouping id and spatial coordinate x.
library(nlme)
nid <- 300
nx <- 10
data <- expand.grid(
x = seq(nx),
id = seq(nid)
)
Now, let's add correlated error and uncorrelated error as separate columns, as well as a random intercept value per id. The output of arima.sim requires a normalisation step.
data$ec <- c(
replicate(
nid,
as.numeric(
arima.sim(
model = list(
order = c(1, 0, 0),
ar = 0.5
),
n = nx
)
)
)
)
data$ec <- data$ec / sd(data$ec)
data$eu <- rnorm(nid * nx)
data$random <- rep(rnorm(nid), each = nx)
Now, we can create 3 dependent variables, for 3 models. The first is a mixed model with uncorrelated (regular) error. The second includes an exponential (AR1) correlation structure. The third combines both. I am adding an intercept of 1, an sd of the random effect of 2 and an sd of the total residual error of 3.
data$y1 <- 1 + 2 * data$random + 3 * data$eu
data$y2 <- 1 + 2 * data$random + 3 * data$ec
data$y3 <- 1 + 2 * data$random + sqrt(8) * data$ec + sqrt(1) * data$eu
All of the following lme models fit without problem, giving the expected result.
l1 <- lme(
fixed = y1 ~ 1,
random = ~ 1 | id,
data = data,
method = "ML"
)
l2 <- lme(
fixed = y2 ~ 1,
random = ~ 1 | id,
correlation = corExp(
form = ~ x | id
),
data = data,
method = "ML"
)
l3 <- lme(
fixed = y3 ~ 1,
random = ~ 1 | id,
correlation = corExp(
form = ~ x | id,
nugget = TRUE
),
data = data,
method = "ML"
)
As far as I know, the following nlme code specifies exactly the same models as above. The first runs without issues. But the ones with a correlation structure crash R / RStudio. No warning or error message is provided. Fiddling with the arguments and with nlmeControl does not help, though I do think nlmeControl could be the place to search for a solution.
nlme(
model = y1 ~ b0,
fixed = b0 ~ 1,
random = b0 ~ 1,
group = ~ id,
data = data,
start = list(
fixed = fixed.effects(l1),
random = setNames(random.effects(l1), "b0")
),
method = "ML"
)
nlme(
model = y2 ~ b0,
fixed = b0 ~ 1,
random = b0 ~ 1,
group = ~ id,
correlation = corExp(
form = ~ x
),
data = data,
start = list(
fixed = fixed.effects(l2),
random = setNames(random.effects(l2), "b0")
),
method = "ML"
)
nlme(
model = y3 ~ b0,
fixed = b0 ~ 1,
random = b0 ~ 1,
group = ~ id,
correlation = corExp(
form = ~ x,
nugget = TRUE
),
data = data,
start = list(
fixed = fixed.effects(l3),
random = setNames(random.effects(l3), "b0")
),
method = "ML"
)
Has anyone experienced this before? Does my example code give the same problem on your computer? What are good strategies to change nlmeControl to attempt to remedy this?

Generalized estimating equations working by themselves but not within functions (R)

I am trying to write a function to run GEE using the geepack package. It works fine "on its own" but not within a function, please see example below:
library(geepack)
library(pstools)
df <- data.frame(study_id = c(1:20),
leptin = runif(20),
insulin = runif(20),
age = runif(20, min = 20, max = 45),
sex = sample(c(0,1), size = 20, replace = TRUE))
#Works
geepack::geeglm(leptin ~ insulin + age + sex, id = study_id, data = df)
#Doesn't work
model_function_covariates_gee <- function(x,y) {
M1 <- paste0(x, "~", y, "+ age + sex")
M1_fit <- geepack::geeglm(M1, id = study_id, data = df)
s <- summary(M1_fit)
return(s)
}
model_function_covariates_gee("leptin", "insulin")
Error message:
Error in mcall$formula[3] <- switch(match(length(sformula), c(0, 2, 3)), :
incompatible types (from language to character) in subassignment type fix
Does anyone know why this is? I've fiddled around with it but can't get it to change. Thanks in advance.

Creating Survival Trees with MST package: Undefined Columns Error?

I am trying to create a survival Tree with the MST package from R. I have been looking into this paper.
I replicated their example with randomly generated Data and it works just fine. I adjusted my data to fit the same model. My data has the same columns and the same datatypes.
I keep getting this error:
Error in `[.data.frame`(mf_data[col.split.var], , 3) : undefined columns selected
with the following line of code:
fit <- MST(formula = Surv(time,status) ~ x1 + | id), data = data)
I have looked through all of the documentation and I didnt find anything and I can't understand why this error appears.
The code form the paper looks like this:
set.seed(186117)
data <- rmultime(N = 200, K = 4, beta = c(-1, 0.8, 0.8, 0, 0),cutoff = c(0.5, 0.3, 0, 0), model = "marginal.multivariate.exponential", rho = 0.65)$dat
test <- rmultime(N = 100, K = 4, beta = c(-1, 0.8, 0.8, 0, 0), cutoff = c(0.5, 0.3, 0, 0), model = "marginal.multivariate.exponential",rho = 0.65)$dat
fit <- MST(formula = Surv(time, status) ~ x1 + x2 + x3 + x4 | id,data, test, method = "marginal", minsplit = 100, minevents = 20,selection.method = "test.sample")
I tried running your code and I do get an error although not the one you are getting and I'm fairly sure after looking at it that you need to use the [edit] features of SO to modify your question.
> fit <- MST(formula = Surv(time,status) ~ x1 + | id), data = data)
Error: unexpected '|' in "fit <- MST(formula = Surv(time,status) ~ x1 + |"
The formula give is obviously wrong and there is an unnecesary closing parentheses. I am able to get teh error you report with:
> fit <- MST(formula = Surv(time,status) ~ x1 | id, data = data)
[1] "No test sample supplied, changed selection.method = 'bootstrap'"
Error in `[.data.frame`(mf_data[col.split.var], , 3) :
undefined columns selected
.... but not with the original code:
fit <- MST(formula = Surv(time, status) ~ x1 + x2 + x3 + x4 | id,data, test, method = "marginal", minsplit = 100, minevents = 20,selection.method = "test.sample")
I also see an erroir with x1+x2|id on the RHS of the formula but not with three variables:
> fit <- MST(formula = Surv(time, status) ~ x1 +x2 | id,data, test, method = "marginal", minsplit = 100, minevents = 20,selection.method = "test.sample")
Error in `[.data.frame`(mf_data[col.split.var], , 3) :
undefined columns selected
> fit <- MST(formula = Surv(time, status) ~ x1 +x2+x3| id,data, test, method = "marginal", minsplit = 100, minevents = 20,selection.method = "test.sample")
So I'm thinking is is a bug that the developers had not anticipated. Here's how to obtain the needed email address to report:
> maintainer("MST")
[1] "Peter Calhoun <calhoun.peter#gmail.com>"

Getting error when using plot.gbm to produce marginal plots

Total novice to R -
I am trying to make some marginal plots from a BRT I completed with the gbm package and keep getting the same error.
Below is my code; boosted.tree_LRFF is the output I got from completing a gbm.fit
> plot.gbm(boosted.tree_LRFF,
+ i.var= 5,
+ n.trees = train.model$finalModel$tuneValue$n.trees,
+ continuous.resolution = 100,
+ return.grid = FALSE,
+ type = "link")
Error in plot.window(...) : need finite 'ylim' values
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
In this first section I just re-create the dataset used to fit a gbm from the "gbm.pdf" for the package:
library(gbm)
N <- 1000
X1 <- runif(N)
X2 <- 2 * runif(N)
X3 <- ordered(sample(letters[1:4], N, replace = TRUE), levels = letters[4:1])
X4 <- factor(sample(letters[1:6], N, replace = TRUE))
X5 <- factor(sample(letters[1:3], N, replace = TRUE))
X6 <- 3 * runif(N)
mu <- c(-1, 0, 1, 2)[as.numeric(X3)]
SNR <- 10 # signal-to-noise ratio
Y <- X1 ** 1.5 + 2 * (X2 ** .5) + mu
sigma <- sqrt(var(Y) / SNR)
Y <- Y + rnorm(N, 0, sigma)
# introduce some missing values
X1[sample(1:N, size = 500)] <- NA
X4[sample(1:N, size = 300)] <- NA
data <- data.frame(Y = Y, X1 = X1, X2 = X2, X3 = X3, X4 = X4, X5 = X5, X6 = X6)
boosted.tree_LRFF <-
gbm(Y ~ X1 + X2 + X3 + X4 + X5 + X6,
data = data,
var.monotone = c(0, 0, 0, 0, 0, 0),
distribution = "gaussian",
n.trees = 1000,
shrinkage = 0.05,
interaction.depth = 3,
bag.fraction = 0.5,
train.fraction = 0.5,
n.minobsinnode = 10,
cv.folds = 3,
keep.data = TRUE,
verbose = FALSE,
n.cores = 1)
Now I plot the tree function values for variable x5, similar to your plot:
plot(boosted.tree_LRFF,
i.var = 5,
n.trees = boosted.tree_LRFF$n.trees,
continuous.resolution = 100,
return.grid = FALSE,
type = "link")
I think your error is due to the n.trees argument. You can either enter it as a constant or it can be from a GBM fitted object. In my example I used it from "boosted.tree_LRFF" that appears to be the name of the original fitted object in your example (although of course my data was different).

How to export all coefficients of a penlized regression model from package `penalized`? Need them for reporting rolling regression estimate

I have been able to run regression with some coefficients constrained to positive territory, but I'm doing alot of rolling regressions where I face the problem. Here is my sample code:
library(penalized)
set.seed(1)
x1=rnorm(100)*10
x2=rnorm(100)*10
x3=rnorm(100)*10
y=sin(x1)+cos(x2)-x3+rnorm(100)
data <- data.frame(y, x1, x2, x3)
win <- 10
coefs <- matrix(NA, ncol=4, nrow=length(y))
for(i in 1:(length(y)-win)) {
d <- data[(1+i):(win+i),]
p <- win+i
# Linear Regression
coefs[p,] <- as.vector(coef(penalized(y, ~ x1 + x2 + x3, ~1,
lambda1=0, lambda2=0, positive = c(F, F, T), data=data)))}
This is how I usually populate matrix with coefs from rolling regression and now I receive error:
Error in coefs[p, ] <- as.vector(coef(penalized(y, ~x1 + x2 + x3, ~1, :
number of items to replace is not a multiple of replacement length
I assume that this error is produced because there is not always Intercept + 3 coefficients coming out of that penalized regression function. Is there away to get penalized function to show 0 coefs as well? or other way to populated matrix / data.frame?
Perhaps you are unaware of the which argument for coef for "penfit" object. Have a look at:
getMethod(coef, "penfit")
#function (object, ...)
#{
# .local <- function (object, which = c("nonzero", "all", "penalized",
# "unpenalized"), standardize = FALSE)
# {
# coefficients(object, which, standardize)
# }
# .local(object, ...)
#}
#<environment: namespace:penalized>
We can set which = "all" to report all coefficients. The default is which = "nonzero", which is causing the "replacement length differs" issue.
The following works:
library(penalized)
set.seed(1)
x1 = rnorm(100)*10
x2 = rnorm(100)*10
x3 = rnorm(100)*10
y = sin(x1) + cos(x2) - x3 + rnorm(100)
data <- data.frame(y, x1, x2, x3)
win <- 10
coefs <- matrix(NA, ncol=4, nrow=length(y))
for(i in 1:(length(y)-win)) {
d <- data[(1+i):(win+i),]
p <- win + i
pen <- penalized(y, ~ x1 + x2 + x3, ~1, lambda1 = 0, lambda2 = 0,
positive = c(F, F, T), data = data)
beta <- coef(pen, which = "all")
coefs[p,] <- unname(beta)
}

Resources