I am using R-INLA to run the following model (Treatment, Animal.1 and Animal.2 are factors and Encounter.Length is continuous):
formula <- Encounter.Length ~ Treatment +f(Animal.1, model = "iid", n = n.animal) +
f(Animal.2, copy = "Animal.1")
m.1 <- inla(formula, data = inla.dat)
However, after running this code I get the following error message:
Error in inla(formula, data = inla.dat) :
In f(Animal.1): 'covariate' must match 'values', and both must either be 'numeric', or 'factor'/'character'.
I am new to using INLA and want to know what this error message means and how to fix it.
Answer (from r-inla.help): The levels of B are not a subset of A (which is used to define the model, for which B copies). So you must define the models on the union on the levels.
For example:
n <- 3
A <- as.factor(letters[1:n])
B <- as.factor(letters[1+1:n])
y <- 1:n
This does not work
inla(y ~ -1 + f(A) + f(B, copy = "A"), data = data.frame(A, B))
But this does
values <- as.factor(unique(c(levels(A), levels(B))))
inla(y ~ -1 + f(A, values = values) + f(B, copy = "A"),
data = list(A = A, B = B, values = values))
Related
I have a datset comprised of 44 variables and ~86,000 rows.
Of the 44 variables 34 variables contain missing data ranging from ~2% to ~25%. Of the variables containing missing, 7 variables are on level 1, 25 variables on level 2, and 2 variables on level 3. The 10 remaining variables are comprise three level identifiers and level 1 variables without missings.
I've been trying to impute the incomplete data following the sample script in the documentation. However, when trying to run the mice imputation function, I get the error Error in solve.default(xtx + diag(pen)): system is computationally singular: reciprocal condition number = 4.3615e-20 which I don't understand. It may sound like I haven't defined a level ID variable, but I fail to see how my code is noticeably different from the example.
Some help understanding the error message would be much appreciated.
My code:
library(dplyr)
library(mice)
library(miceadds)
my_data
x <- paste0("x",1:7)
y <- paste0("y", 1:25)
z <- c("z1", "z2")
#----- specify levels of variables (only relevent for variables
# with missing values)
level <- character(ncol(my_data))
names(level) <- colnames(my_data)
level[y] <- "id2" # level 2 identifier
level[z] <- "id3" # level 3 identifier
#----- specify predictor matrix
predMatrix <- my_data %>%
make.predictorMatrix()
# remove indicator variables from predictor matrix
predMatrix[, c("id2", "id3")] <- 0
# set -2 for level identifier for level 3 variable z1
# because "2lonly" function is used
predMatrix[c(z), "id3"] <- -2
#----- specify imputation methods
impMethod <- my_data %>%
make.method()
# method for lower-level variables (x, y, and z)
impMethod[c(x, y)] <- "ml.lmer"
# method for variables at top level (w)
impMethod[c(z)] <- "2lonly.norm"
#----- specify hierarchical structure of imputation models
levels_id <- list()
#** hierarchical structure for L1 variable
l1 <- list()
l1 <- lapply(x, function(x){
append(l1, c("id2", "id3")) %>%
unlist()
})
names(l1) <- x
#** hierarchical structure for variable y1
l2 <- list()
l2 <- lapply(y, function(x){
append(l2, c("id3")) %>%
unlist()
})
names(l2) <- y
levels_id <- c(levels_id, l1, l2)
rm(l1, l2)
# run mice
imp <- mice(my_data, m = 5, maxit = 10, method = impMethod,
predictorMatrix = predMatrix, levels_id = levels_id,
variables_levels = level)
Output:
iter imp variable
1 1 x1
boundary (singular) fit: see ?isSingular
# (message repeated for all level 1 variables ...)
x7
boundary (singular) fit: see ?isSingular
y1 y2 y3 (all level 2 and 3 variables except z2)
y24 y25 z1
Error in solve.default(xtx + diag(pen)): system is computationally singular: reciprocal condition number = 4.3615e-20
Traceback:
1. mice(t19sel_imp, m = 5, maxit = 10, method = impMethod, predictorMatrix = predMatrix,
. levels_id = levels_id, variables_levels = level)
2. sampler(data, m, ignore, where, imp, blocks, method, visitSequence,
. predictorMatrix, formulas, blots, post, c(from, to), printFlag,
. ...)
3. sampler.univ(data = data, r = r, where = where, type = type,
. formula = ff, method = theMethod, yname = j, k = k, calltype = calltype,
. user = user, ignore = ignore, ...)
4. do.call(f, args = args)
5. mice.impute.2lonly.norm(y = c(1, 1, #[... an extremely long printout of list items follows ...]
. BCBG04 = "IDCNTSCH"))
6. .imputation.level2(y = y, ry = ry, x = x, type = type, wy = wy,
. method = "norm", ...)
7. mice.impute.norm(y = y2, ry = ry2, x = x2, wy = wy2, ...)
8. .norm.draw(y, ry, x, ...)
9. estimice(x[ry, , drop = FALSE], y[ry], ...)
10. solve(xtx + diag(pen))
11. solve.default(xtx + diag(pen))
I am writing a sub-routine to return output of longitudinal mixed-effects models. I want to be able to pass elements from lists of variables into lme/lmer as the outcome and predictor variables. I would also like to be able to specify contrasts within these mixed-effects models, however I am having trouble with getting the contrasts() argument to recognise the strings as the variable names referred to in the model specification within the same lme/lme4 call.
Here's some toy data,
set.seed(345)
A0 <- rnorm(4,2,.5)
B0 <- rnorm(4,2+3,.5)
A1 <- rnorm(4,6,.5)
B1 <- rnorm(4,6+2,.5)
A2 <- rnorm(4,10,.5)
B2 <- rnorm(4,10+1,.5)
A3 <- rnorm(4,14,.5)
B3 <- rnorm(4,14+0,.5)
score <- c(A0,B0,A1,B1,A2,B2,A3,B3)
id <- rep(1:8,times = 4, length = 32)
time <- factor(rep(0:3, each = 8, length = 32))
group <- factor(rep(c("A","B"), times =2, each = 4, length = 32))
df <- data.frame(id = id, group = group, time = time, score = score)
Now the following call to lme works just fine, with contrasts specified (I know these are the default so this is all purely pedagogical).
mod <- lme(score ~ group*time, random = ~1|id, data = df, contrasts = list(group = contr.treatment(2), time = contr.treatment(4)))
The following also works, passing strings as variable names into lme using the reformulate() function.
t <- "time"
g <- "group"
dv <- "score"
mod1R <- lme(reformulate(paste0(g,"*",t), response = "score"), random = ~1|id, data = df)
But if I want to specify contrasts, like in the first example, it doesn't work
mod2R <- lme(reformulate(paste0(g,"*",t), response = "score"), random = ~1|id, data = df, contrasts = list(g = contr.treatment(2), t = contr.treatment(4)))
# Error in `contrasts<-`(`*tmp*`, value = contrasts[[i]]) : contrasts apply only to factors
How do I get lme to recognise that the strings specified to in the contrasts argument refer to the variables passed into the reformulate() function?
You should be able to use setNames() on the list of contrasts to apply the full names to the list:
# Using a %>% pipe so need to load magrittr
library(magrittr)
mod2R <- lme(reformulate(paste0(g,"*",t), response = "score"),
random = ~1|id,
data = df,
contrasts = list(g = contr.treatment(2), t = contr.treatment(4)) %>%
setNames(c(g, t))
)
The data in the following example are from here
library(tidyverse)
library(lme4)
dat <- read.table("aids.dat2",head=T) %>%
filter(day <= 90) %>%
mutate(log10copy = log10(lgcopy)) %>%
na.omit()
> head(dat)
patid day cd4 lgcopy cd8 log10copy
2 11542 2 159.84 4.361728 619.38 0.6396586
3 11542 7 210.60 3.531479 666.90 0.5479566
4 11542 16 204.12 2.977724 635.04 0.4738844
5 11542 29 172.48 2.643453 407.68 0.4221716
6 11542 57 270.94 2.113943 755.78 0.3250933
8 11960 2 324.72 3.380211 856.08 0.5289438
Running the following code gives me the error: Error in eval(expr, envir, enclos) : object 'log10copy' not found, but log10copy is clearly one of the columns in my data set?
lme4.fit <- lme4::nlmer(log10copy ~ exp(p1-b1*day) + exp(p2-b2*day + 1) +
(1|p1) + (1|b1) + (1|p2) + (1|b2), data = dat)
I want to fit a model with 4 fixed effects on p1, b1, p2, b2 and 4 random effects on the same set of parameters.
You have several problems here...
1) The starting values must be a named vector
2) the data argument in nlmer should receive dat as value and not aids.dat as in your example
start <- c(p1 = 10, b1 = 0.5, p2 = 6, b2 = 0.005)
lme4.fit <- lme4::nlmer(log10copy ~ exp(p1-b1*day) + exp(p2-b2*day + 1) ~
(p1|patid) + (b1|patid) + (p2|patid) + (b2|patid), data = dat,
start = start)
This will now trigger the following error :
Erreur : is.matrix(gr <- attr(val, "gradient")) is not TRUE
As explained in the documentation :
Currently, the Nonlin(..) formula part must not only return a numeric
vector, but also must have a "gradient" attribute, a matrix. The
functions SSbiexp, SSlogis, etc, see selfStart, provide this (and
more). Alternatively, you can use deriv() to automatically produce
such functions or expressions.
You can then adapt the example provided by the documentation :
## a. Define formula
nform <- ~ exp(p1-b1*input) + exp(p2-b2*input + 1)
## b. Use deriv() to construct function:
nfun <- deriv(nform, namevec=c("p1", "b1", "p2", "b2"),
function.arg=c("input","p1", "b1", "p2", "b2"))
lme4.fit <- lme4::nlmer(log10copy ~ nfun(day, p1, b1, p2, b2) ~
(p1|patid) + (b1|patid) + (p2|patid) + (b2|patid), data = dat,
start = start)
You will then have the following error
Error in fn(nM$xeval()) : prss failed to converge in 300 iterations
This might mean that your model is too complex for your data...
Or maybe I did a mistake in the specification as I' don't know nlmer very well (I just tried to apply the documentation...) nor do I know your model/question.
When you change the optimizer, the convergence problems seem to be gone...
See here for recommendations about "troubleshooting" (including convergence problems) with lme4
lme4.fit <- lme4::nlmer(log10copy ~ nfun(day, p1, b1, p2, b2) ~
(p1|patid) + (b1|patid) +
(p2|patid) + (b2|patid),
data = dat,
start = start,
nlmerControl(optimizer = "bobyqa"))
I'm having trouble understanding whether I need to be consistent with the categorical / factor encodings of variables. With consistency I mean that I need to assure that the encodings from integers and levels should be the same in the training and the new testing sample.
This answer seems to suggest that it is not necessary. On the contrary, this answer suggests that IT is indeed necessary.
Suppose I have a training sample with an xcat that can take values a, b, c. The expected result is that the y variable will tend to take values close to 1 when xcat is a, 2when xcat is b, and 3 when xcat is c.
First I'll create the dataframe, pass it to h2o and then encode with the function as.factor:
library(h2o)
localH2O = h2o.init(ip = "localhost", port = 54321, startH2O = TRUE)
n = 20
y <- sample(1:3, size = n, replace = T)
xcat <- letters[y]
xnum <- sample(1:10, size = n, replace = T)
y <- dep + rnorm(0, 0.3, n = 20)
df <- data.frame(xcat=xcat, xnum=xnum , y=y)
df.hex <- as.h2o(df, destination_frame="df.hex")
#Encode as factor. You will get: a=1, b=2, c=3
df.hex[ , "xcat"] = as.factor(df.hex[, "xcat"])
Now I'll estimate it with an glm model and predict on the same sample:
x = c("xcat", "xnum")
glm <- h2o.glm( y = c("y"), x = x, training_frame=df.hex,
family="gaussian", seed=1234)
glm.fit <- h2o.predict(object=glm, newdata=df.hex)
glm.fit gives the expected results (no surprises here).
Now I'll create a new test dataset that only has a and c, no b value:
xcat2 = c("c", "c", "a")
xnum2 = c(2, 3, 1)
y = c(1, 2, 1) #not really needed
df.test = data.frame(xcat=xcat2, xnum=xnum2, y=y)
df.test.hex <- as.h2o(df.test, destination_frame="df.test.hex")
df.test.hex[ , "xcat"] = as.factor(df.test.hex[, "xcat"])
Running str(df.test.hex$xcat) shows that this time the factor encoding has assigned 2 to c and 1 to a. This looked like it could be trouble, but then the fitting works as expected:
test.fit = h2o.predict(object=glm, newdata=df.test.hex)
test.fit
#gives 2.8, 2.79, 1.21 as expected
What's going on here? Is it that the glm model carries around the information of levels of the x variables so it doesn't mind if the internal encoding is different in the training and the new test data? Is that the general case for all h2o models?
From looking at one of the answers I linked above, it seems that at least some R models do require consistency.
Thanks and best!
I'm using the gamlss package in R to implement wormplots for the residuals study.
The function wp() has an argument xvar which is used for bucketing.
Assume I have a "numeric" vector x1 which if passed as "xvar = x1" behaves differently than "xvar = ~x1". Basically the second case is treated as a formula. The buckets created for both cases will be different from each other.
Code :-
library(gamlss)
glc<-gamlss.control(n.cyc = 200)
myseed <- 12345
set.seed(myseed) #this will make results reproducible
# generate data
N<-10000 # this is the sample size
dd<-data.frame(x1=rpois(N,1)
,x2=rnorm(N,.7,.3)
,x3=log(rgamma(N,shape=6,scale=10))
,x4=sample(letters[1:3], N, replace = T)
,x5=sample(letters[3:6], N, replace = T)
,ind = rbinom(N,size=1,prob=0.5)
)
#Generate distributions
dd$y_wei1<-rweibull(N,scale=exp(.3*dd$x1+.8*dd$x3),shape=5)
m1 <- gamlss(formula = y_wei1 ~ x1 + x3 + x4 + x5,
data = dd ,
family = "WEI" ,
K = 2,
control = glc
)
# Case 1.
wp(object = m1, xvar = x1, n.iter = 4)
# Case 2.
wp(object = m1, xvar = ~x1, n.iter = 4)
Edit :
I do observed that this happens only when the overlap argument is set to 0. Because when overlap=0 then internally another function( check.overlap) is called. Why is this function called?
the function has been written such that xvar = ~x1 indicated x1 is a factor/char variable and so grouping occurs based on its unique values. When user calls with xvar = x1 then bins are created based on the range and that is used to generate the wormplots.
The difference is because internally there is a check.overlap fucntion written which is impemented only if x1 is numeric. Incase of overlapping, it clips it to have non-overlapping intervals. This is missing if user calls it as xvar = ~x1.