error message lsmeans for beta mixed regression model with glmmTMB - r

I am analyzing the ratio of (biomass of one part of a plant community) vs. (total plant community biomass) across different treatments in time (i.e. repeated measures) in R. Hence, it seems natural to use beta regression with a mixed component (available with the glmmTMB package) in order to account for repeated measures.
My problem is about computing post hoc comparisons across my treatments with the function lsmeans from the lsmeans package. glmmTMB objects are not handled by the lsmeans function so Ben Bolker on recommended to add the following code before loading the packages {glmmTMB} and {lsmeans}:
recover.data.glmmTMB <- function(object, ...) {
fcall <- getCall(object)
recover.data(fcall,delete.response(terms(object)),
attr(model.frame(object),"na.action"), ...)}
lsm.basis.glmmTMB <- function (object, trms, xlev, grid, vcov.,
mode = "asymptotic", component="cond", ...) {
if (mode != "asymptotic") stop("only asymptotic mode is available")
if (component != "cond") stop("only tested for conditional component")
if (missing(vcov.))
V <- as.matrix(vcov(object)[[component]])
else V <- as.matrix(.my.vcov(object, vcov.))
dfargs = misc = list()
if (mode == "asymptotic") {
dffun = function(k, dfargs) NA
}
## use this? misc = .std.link.labels(family(object), misc)
contrasts = attr(model.matrix(object), "contrasts")
m = model.frame(trms, grid, na.action = na.pass, xlev = xlev)
X = model.matrix(trms, m, contrasts.arg = contrasts)
bhat = fixef(object)[[component]]
if (length(bhat) < ncol(X)) {
kept = match(names(bhat), dimnames(X)[[2]])
bhat = NA * X[1, ]
bhat[kept] = fixef(object)[[component]]
modmat = model.matrix(trms, model.frame(object), contrasts.arg = contrasts)
nbasis = estimability::nonest.basis(modmat)
}
else nbasis = estimability::all.estble
list(X = X, bhat = bhat, nbasis = nbasis, V = V, dffun = dffun,
dfargs = dfargs, misc = misc)
}
Here is my code and data:
trt=c(rep("T5",13),rep("T4",13),
rep("T3",13),rep("T1",13),rep("T2",13),rep("T1",13),
rep("T2",13),rep("T3",13),rep("T5",13),rep("T4",13))
year=rep(2005:2017,10)
plot=rep(LETTERS[1:10],each=13)
ratio=c(0.0046856237844411,0.00100861922394448,0.032516291436091,0.0136507743972955,0.0940240065096705,0.0141337428305094,0.00746709315018945,0.437009092691189,0.0708021091805216,0.0327952505849285,0.0192685194751524,0.0914696394299481,0.00281889216102303,0.0111928453399615,0.00188119596836005,NA,0.000874623692966351,0.0181192859074754,0.0176635391424644,0.00922358069727823,0.0525280029990213,0.0975006760149882,0.124726170684951,0.0187132600944396,0.00672592365451266,0.106399234215126,0.0401776844073239,0.00015382736648373,0.000293356424756535,0.000923659501144292,0.000897412901472504,0.00315930225856196,0.0636501228611642,0.0129422445492391,0.0143526630252398,0.0136775931834926,0.00159292971508751,0.0000322313783211749,0.00125352390811532,0.0000288862579879126,0.00590690336494395,0.000417043974238875,0.0000695808216192379,0.001301299696752,0.000209355138230326,0.000153151660178623,0.0000646279598274632,0.000596704590065324,9.52943306579156E-06,0.000113476446629278,0.00825405312309618,0.0001025984082064,0.000887617767039489,0.00273668802742924,0.00469409165130462,0.00312377000134233,0.0015579322817235,0.0582615988387306,0.00146933878743163,0.0405139497779372,0.259097955479886,0.00783997376383192,0.110638003652979,0.00454029511918275,0.00728290246595241,0.00104674197030363,0.00550563937846687,0.000121380392484705,0.000831904606687671,0.00475778829159394,0.000402799910756391,0.00259524300745195,0.000210249875492504,0.00550104485802363,0.000272849546913495,0.0025389089622392,0.00129370075116459,0.00132810234020792,0.00523285954007915,0.00506230599388357,0.00774104695265855,0.00098348404576587,0.174079173227248,0.0153486840317039,0.351820365452281,0.00347674458928481,0.147309225196026,0.0418825705903947,0.00591271021100856,0.0207139520537443,0.0563647804012055,0.000560012457272534,0.00191564842393647,0.01493480083524,0.00353400674061077,0.00771828473058641,0.000202009136938048,0.112695841130448,0.00761492172670762,0.038797330459115,0.217367765362878,0.0680958660605668,0.0100870294641921,0.00493875324236991,0.00136539944656238,0.00264262100866192,0.0847732305020654,0.00460985241335143,0.235802638543116,0.16336020383325,0.225776236687456,0.0204568107372349,0.0455390585228863,0.130969863489582,0.00679523322812889,0.0172325334280024,0.00299970176999806,0.00179347656925317,0.00721658257996989,0.00822443690003783,0.00913096724026346,0.0105920192618379,0.0158013204589482,0.00388803567197835,0.00366268607026078,0.0545418725650633,0.00761485067129418,0.00867583194858734,0.0188232707241144,0.018652666214789)
dat=data.frame(trt,year,plot,ratio)
require(glmmTMB)
require(lsmeans)
mod=glmmTMB(ratio~trt*scale(year)+(1|plot),family=list(family="beta",link="logit"),data=dat)
summary(mod)
ls=lsmeans(mod,pairwise~trt)`
Finally, I get the following error message that I've never encountered and on which I could find no information:
In model.matrix.default(trms, m, contrasts.arg = contrasts) :
variable 'plot' is absent, its contrast will be ignored
Could anyone shine their light? Thanks!

This is not an error message, it's a (harmless) warning message. It occurs because the hacked-up method I wrote doesn't exclude factor variables that are only used in the random effects. You should worry more about this output:
NOTE: Results may be misleading due to involvement in interactions
which is warning you that you are evaluating main effects in a model that contains interactions; you have to think about this carefully to make sure you're doing it right.

Related

"Error: Matrix must have equal dimensions" despite seemingly equal dimensions

pred <- predict(fit, x, type="response", s=cv$lambda.min)
confusion_matrix <- confusionMatrix(data = pred, reference = testXsp)
Error in confusionMatrix.matrix(data = pred, reference = testXsp) :
matrix must have equal dimensions
dim(pred)
[1] 751864 1
dim(testXsp)
[1] 751864 1
dim(testXsp) == dim(pred)
[1] TRUE TRUE
The dimensions seem to be the same, then why am I getting this error message?
confusionMatrix argument data must be square if it is a matrix.
> caret:::confusionMatrix.matrix
function (data, positive = NULL, prevalence = NULL, mode = "sens_spec",
...)
{
if (length(unique(dim(data))) != 1) {
stop("matrix must have equal dimensions")
}
classTable <- as.table(data, ...)
confusionMatrix(classTable, positive, prevalence = prevalence,
mode = mode)
}
<bytecode: 0x126452f88>
<environment: namespace:caret>
Note that the method for class matrix does not even take a reference argument. It is the default method that uses reference. Perhaps you should review the help page for confusionMatrix?
One possibility here is that there are one or more NA values contained in your prediction matrix. Try using the following command:
na.omit(pred)
Afterwards, rerun the above code. If this does not work, please post the package you are using to fit your model. This will allow for a more detailed solution!
Best wishes,
-Matt

How to conduct catboost grid search using GPU in R?

I'm setting up a grid search using the catboost package in R. Following the catboost documentation (https://catboost.ai/docs/), the grid search for hyperparameter tuning can be conducted using the 3 separate commands in R,
fit_control <- trainControl(method = "cv", number = 4, classProbs = TRUE)
grid <- expand.grid(depth = c(7,8,9,10), learning_rate = c(0.1,0.2,0.3,0.4), iterations = c(10,100,1000))
report <- train(df.scale, as.factor(make.names(as.matrix(tier1))), method = catboost.caret, logging_level = 'Verbose', preProc = NULL, tuneGrid = grid, trControl = fit_control)
searching across different values for depth, learning rate, and the number of iterations. These commands seem well enough, it's just I can't figure out where to input the option for the task_type = "GPU". Would appreciate any help on how to specify using the GPU for finding the optimal parameters using R.
It can be done the following way:
fit_control <- trainControl(method = "cv", number = 4, classProbs = TRUE)
grid <- expand.grid(depth = c(7,8,9,10), learning_rate = c(0.1,0.2,0.3,0.4), iterations = c(10,100,1000))
report <- train(df.scale, as.factor(make.names(as.matrix(tier1))), method = catboost.caret, logging_level = 'Verbose', preProc = NULL, tuneGrid = grid, trControl = fit_control,
task_type = "GPU")
This works due to ellipsis mechanics. All arguments that are unknown to caret.train itself are eventually passed to catboost.caret$fit and taken as training parameters for catboost. The exact place in catboost code where it happens is here:
...
catboost.caret$fit <- function(x, y, wts, param, lev, last, weights, classProbs, ...) {
param <- c(param, list(...)) # all ellipsis args are taken to param
if (is.null(param$loss_function)) {
...
If you pass an unknown parameter this way, catboost will trigger an error:
report <- train(x, as.factor(make.names(y)),
method = catboost.caret,
logging_level = 'Verbose', preProc = NULL,
tuneGrid = grid, trControl = fit_control, what_is_this = "GPU")
> warnings()
Warning messages:
1: model fit failed for Fold1: depth=4, learning_rate=0.1, l2_leaf_reg=0.001, rsm=1, border_count=64, iterations=100 Error in catboost.train(pool, test_pool, param) :
catboost/private/libs/options/plain_options_helper.cpp:501: Unknown option {what_is_this} with value "GPU"
It looks like you are using the caret package to perform the training. In this case, it looks like the caret package does not pass any additional arguments to the catboost.train function so it may not support the GPU functionality. You can see from the code in caret for this method that the ... argument is not passed to the catboost.train function.
#' Fit model based on input data
#'
#' #param x, y: the current data used to fit the model
#' #param wts: optional instance weights (not applicable for this particular model)
#' #param param: the current tuning parameter values
#' #param lev: the class levels of the outcome (or NULL in regression)
#' #param last: a logical for whether the current fit is the final fit
#' #param weights: weights
#' #param classProbs: a logical for whether class probabilities should be computed
#'
#' #noRd
catboost.caret$fit <- function(x, y, wts, param, lev, last, weights, classProbs, ...) {
param <- c(param, list(...))
if (is.null(param$loss_function)) {
param$loss_function <- "RMSE"
if (is.factor(y)) {
param$loss_function <- "Logloss"
if (length(lev) > 2) {
param$loss_function <- "MultiClass"
}
y <- as.double(y) - 1
}
}
test_pool <- NULL
if (!is.null(param$test_pool)) {
test_pool <- param$test_pool
if (class(test_pool) != "catboost.Pool")
stop("Expected catboost.Pool, got: ", class(test_pool))
param <- within(param, rm(test_pool))
}
pool <- catboost.from_data_frame(x, y, weight = wts)
model <- catboost.train(pool, test_pool, param)
model$lev <- lev
return(model)
}
Depending on your level of proficiency in R and caret, you can add your own model to caret by basically copying the model in the caret github location and modify it to accept the GPU argument which should go into the parameter list per their documentation

how to solve negative subscript error in R?

I am trying to normalize the data frame before prediction but I get this error :
Error in seq_len(nrows)[i] :
only 0's may be mixed with negative subscripts
Called from: top level
Here is my code :
library('caret')
load(file = "some dataset path here")
DummyDataSet = data
attach(DummyDataSet)
foldCount = 10
classifyLabels = DummyDataSet$ClassLabel
folds = createFolds(classifyLabels,k=foldCount)
for (foldIndex in 1:foldCount){
cat("----- Start Fold -----\n")
#holding out samples of one fold in each iterration
testFold = DummyDataSet[folds[[foldIndex]],]
testLabels = classifyLabels[folds[[foldIndex]]]
trainFolds = DummyDataSet[-folds[[foldIndex]],]
trainLabels = classifyLabels[-folds[[foldIndex]]]
#Zero mean unit variance normalization to ONLY numerical data
for (k in 1:ncol(trainFolds)){
if (!is.integer(trainFolds[,k])){
params = meanStdCalculator(trainFolds[,k])
trainFolds[,k] = sapply(trainFolds[,k], function(x) (x - params[1])/params[2])
testFold[,k] = sapply(testFold[,k], function(x) (x - params[1])/params[2])
}
}
meanStdCalculator = function(data){
Avg = mean(data)
stdDeviation = sqrt(var(data))
return(c(Avg,stdDeviation))
}
cat("----- Start Fold -----\n")
}
where trainFolds is a fold creating by caret package and its type is data.frame.
I have already read these links :
R Debugging
Subset
Negative Subscripts
but I couldn't find out what is wrong with the indexes?
anybody can help me?

Predict with a VECM with exogen variables

After having estimated a VECM model with stationary exogen variables, I would like to compute a prediction with the predict function and the newdata argument. I'm using the Dynts library that offers the possibility to compute VECM models with exogeneous variables, but I don't see how I can use the predict function with newdata for the integrated variables AND the exogeneous ones.
The following code doesn't work. Any idea ?
library(tsDyn)
Fact1<-rnorm(100,0,10)
x<-rnorm(100,0,10)
y<-rnorm(100,0,15)
i<-1:100
Yniv2<-sapply(i,function(k) sum(x[1:k]))
Facti1<-Yniv2+y
Yniv2<-Yniv2[1:99]
plot(Yniv2,type="l")#variable macro que l'on cherche à prévoir à l'instant t
lines(Facti1,col="red")#variable macro cointégrée avec Y dont on dispose l'obs en t
lines(Fact1,col="green")#variable stationnaire qui explique également Y
exog_met1v1<-Fact1[2:99]
exog_i1<-cbind(Yniv2[1:98],Facti1[1:98])
mdl<-VECM(exog_i1, 1, r=1, include = "const", estim = "ML", LRinclude = "const", exogen = exog_met1v1)
newexogi1 <-cbind(Yniv2[1:99],Facti1[1:99])
new <- Fact1[2:100]
newdata<-cbind(newexogi1,new)
Prev_H_1<-data.frame(predict(mdl, newdata))[,1] #pbbb
First error if I want the global fit
Please provide newdata with nrow=lag+1 (note lag=p in VECM representation corresponds to p+1 in VAR rep)
Second error if I provide just the last observations
newexogi1 <-cbind(Yniv2[98:99],Facti1[98:99])
new <- Fact1[99:100]
newdata<-cbind(newexogi1,new)
Prev_H_1<-data.frame(predict(mdl, newdata))[,1] #pbbb
Erreur dans TVAR.gen(B = B, nthresh = 0, type = "simul", n = n, lag = lag, :
Matrix B badly specified: expected 5 elements ( (lagK+ n inc) (nthresh+1) ) but has 6
I made some modifications, (subject to future changes!), but here you go:
## install development version:
library(devtools)
install_github("MatthieuStigler/tsDyn", ref="Dev94", subdir="tsDyn")
## use these arguments:
predict(mdl, newdata=newexogi1, exoPred=new, n.ahead=2)

glmulti wrapper for lmer does not produce results

I am using a glmulti wrapper for glmer (binomial) and the summary is:
This is glmulti 1.0.7, Apr. 2013.
Length Class Mode
0 NULL NULL
Following what has been done on this this thread, though this is for lmer,
glmulti runs indefinitely when using genetic algorithm with lme4, I get the same result as above. Could it be that the versions have changed since and the wrapping has to be done differently? The following is the dummy code (lifted form the link above):
x = as.factor(round(runif(30),1))# dummy grouping factor
yind = runif(30,0,10) # mock dependent variable
a = runif(30) # dummy covariate
b = runif(30) # another dummy covariate
c = runif(30) # an another one
d = runif(30)
tmpdata <- data.frame(x=x,yind=yind,a=a,b=b,c=c,d=d)
lmer.glmulti <- function (formula, data, random = "", ...) {
lmer(paste(deparse(formula), random), data = data, REML=F, ...)
}
summary(glmulti(formula = yind~a*b*c*d,
data = tmpdata,
random = '+(1|x)',
level = 2,
method = 'h',
crit = 'aicc',
marginality = TRUE,
fitfunc = lmer.glmulti))
lme4 version: 1.1.5
glmulti version: 1.0.7
"R version 3.0.2 (2013-09-25)"
SOLUTION
This works:
lmer.glmulti <- function (formula, data, random, ...) {
lmer(paste(deparse(formula), random), data = data)
}
glmulti(y = yind~a*b*c*d,
data = tmpdata,
random = '+(1|x)',
level = 2,
method = 'h',
crit = 'aicc',
marginality = TRUE,
fitfunc = lmer.glmulti)
packageVersion('lme4')
‘1.1.5’
packageVersion('glmulti')
‘1.0.7’
R.version: 3.1.0
FYI: From the package maintainer:
"fitfunc must be the name of a function so your other call including the function definition in the glmulti call cannot work."
"you named the first argument to glmulti 'formula', where it must be unnamed or 'y'... Sorry. But y is a formula (if passing a string it is the dependent variable only). "

Resources