"Error: Matrix must have equal dimensions" despite seemingly equal dimensions - r

pred <- predict(fit, x, type="response", s=cv$lambda.min)
confusion_matrix <- confusionMatrix(data = pred, reference = testXsp)
Error in confusionMatrix.matrix(data = pred, reference = testXsp) :
matrix must have equal dimensions
dim(pred)
[1] 751864 1
dim(testXsp)
[1] 751864 1
dim(testXsp) == dim(pred)
[1] TRUE TRUE
The dimensions seem to be the same, then why am I getting this error message?

confusionMatrix argument data must be square if it is a matrix.
> caret:::confusionMatrix.matrix
function (data, positive = NULL, prevalence = NULL, mode = "sens_spec",
...)
{
if (length(unique(dim(data))) != 1) {
stop("matrix must have equal dimensions")
}
classTable <- as.table(data, ...)
confusionMatrix(classTable, positive, prevalence = prevalence,
mode = mode)
}
<bytecode: 0x126452f88>
<environment: namespace:caret>
Note that the method for class matrix does not even take a reference argument. It is the default method that uses reference. Perhaps you should review the help page for confusionMatrix?

One possibility here is that there are one or more NA values contained in your prediction matrix. Try using the following command:
na.omit(pred)
Afterwards, rerun the above code. If this does not work, please post the package you are using to fit your model. This will allow for a more detailed solution!
Best wishes,
-Matt

Related

Pass arguments to the R Phyloseq subset_taxa wrapper

I'll explain the end goal, and what I'm trying as a test first. (Because I'm likely going about it the wrong way.)
I am using the phyloseq package to visualize microbiome data. I want to "automate" it to an extent by having users choose levels of analysis and have my script generate the visualizations without someone hand typing in each combination.
The issue is passing variables into the subset function. I get these errors primarily (depending on what combinations of paste0, eval, parse, as.logical, expression, noquote....etc that i've tried):
Error in subset.data.frame(oldDF, ...) : 'subset' must be logical
Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent
A user would set the levels of analysis. So lets say for now there are two levels, and selecting the second level automatically means you want the first level as well. (I haven't worked on that part yet, but I wanted to explain it upfront.
#Set lineage level
lin_level <- 1
lin_list <- c("k__Kingdom", "p__Phylum","c__Class", "o__Order","f__Family")
lin_select <- lin_list[lin_level]
sub_lin <- lin_list[(lin_level +1)]
#Kingdom
king_list <- "k__Bacteria"
#set Phylum list
if (lin_select == "p__Phylum"){
phylum_list <- c("p__Firmicutes","p__Proteobacteria","p__Bacteroidetes","p__Actinobacteria","p__Tenericutes")
}
subgroup <- "All"
From here, the script would ultimately get to the graphing section. If lin_level is set to 1, it would look like this:
FIXED
gphic = subset_taxa(physeq1, Kingdom=="k__Bacteria")
title = paste0(subgroup," ", "Bacteria-only")
plot_bar(gpsfb, "Phylum", "Abundance", "Phylum",
title=title, facet_grid="Type~.")
AUTOMATED
gphic = subset_taxa(physeq1, (substring(lin_select,4)) == king_list)
title = paste0(subgroup," ", (substring(king_list,4)),"-only")
plot_bar(gpsfb, (substring(sub_lin,4)), "Abundance", (substring(sub_lin,4)),
title=title, facet_grid="Type~.")
But, trying to pass (substring(lin_select,4)) == king_list as an argument results in errors.
I've searched through the various threads on this issue, but haven't been able to get the different answers to work. Ultimately I need to run the graphing section once for Kingdom, and then again each time for each item in the Phylum list. But before i can get there, I need to be able to pass the arguments into the subset function.
Things I've tried:
test <- paste0(substring(lin_select,4),"==","\"","p__Bacteroidetes","\"")
noquote(test)
[1] Phylum=="p__Bacteroidetes"
gphic = subset_taxa(physeq1, noquote(test))
Error in subset.data.frame(oldDF, ...) : 'subset' must be logical
gphic = subset_taxa(physeq1, paste0(substring(lin_select,4),"==","\"","p__Bacteroidetes","\""))
Error in subset.data.frame(oldDF, ...) : 'subset' must be logical
gphic = subset_taxa(physeq1, as.logical(test))
Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent
as.logical(noquote(test))
[1] NA
gphic = subset_taxa(physeq1, as.logical(noquote(test)))
Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent
noquote(test)
[1] Phylum=="p__Bacteroidetes"
as.logical(noquote(test))
[1] NA
as.logical(as.character(noquote(test)))
[1] NA
test2 <- eval(parse(text= test))
Error in eval(parse(text = test)) : object 'Phylum' not found
test2 <- eval(test)
gphic = subset_taxa(physeq1, as.logical(test2))
Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent
as.logical(test2)
[1] NA
And a lot of other permutations trying to sub in different things, but you get the idea.
gphic = subset_taxa(physeq1, eval(as.name(level_tax)) == king_list)
Here , level_tax is the variable in a loop. Say level_tax = "Order", then we convert the string "Order" into variable name by as.name(level_tax) or as.symbol(level_tax). Then we use eval(), which takes an expression and evaluates in the specified environment

error message lsmeans for beta mixed regression model with glmmTMB

I am analyzing the ratio of (biomass of one part of a plant community) vs. (total plant community biomass) across different treatments in time (i.e. repeated measures) in R. Hence, it seems natural to use beta regression with a mixed component (available with the glmmTMB package) in order to account for repeated measures.
My problem is about computing post hoc comparisons across my treatments with the function lsmeans from the lsmeans package. glmmTMB objects are not handled by the lsmeans function so Ben Bolker on recommended to add the following code before loading the packages {glmmTMB} and {lsmeans}:
recover.data.glmmTMB <- function(object, ...) {
fcall <- getCall(object)
recover.data(fcall,delete.response(terms(object)),
attr(model.frame(object),"na.action"), ...)}
lsm.basis.glmmTMB <- function (object, trms, xlev, grid, vcov.,
mode = "asymptotic", component="cond", ...) {
if (mode != "asymptotic") stop("only asymptotic mode is available")
if (component != "cond") stop("only tested for conditional component")
if (missing(vcov.))
V <- as.matrix(vcov(object)[[component]])
else V <- as.matrix(.my.vcov(object, vcov.))
dfargs = misc = list()
if (mode == "asymptotic") {
dffun = function(k, dfargs) NA
}
## use this? misc = .std.link.labels(family(object), misc)
contrasts = attr(model.matrix(object), "contrasts")
m = model.frame(trms, grid, na.action = na.pass, xlev = xlev)
X = model.matrix(trms, m, contrasts.arg = contrasts)
bhat = fixef(object)[[component]]
if (length(bhat) < ncol(X)) {
kept = match(names(bhat), dimnames(X)[[2]])
bhat = NA * X[1, ]
bhat[kept] = fixef(object)[[component]]
modmat = model.matrix(trms, model.frame(object), contrasts.arg = contrasts)
nbasis = estimability::nonest.basis(modmat)
}
else nbasis = estimability::all.estble
list(X = X, bhat = bhat, nbasis = nbasis, V = V, dffun = dffun,
dfargs = dfargs, misc = misc)
}
Here is my code and data:
trt=c(rep("T5",13),rep("T4",13),
rep("T3",13),rep("T1",13),rep("T2",13),rep("T1",13),
rep("T2",13),rep("T3",13),rep("T5",13),rep("T4",13))
year=rep(2005:2017,10)
plot=rep(LETTERS[1:10],each=13)
ratio=c(0.0046856237844411,0.00100861922394448,0.032516291436091,0.0136507743972955,0.0940240065096705,0.0141337428305094,0.00746709315018945,0.437009092691189,0.0708021091805216,0.0327952505849285,0.0192685194751524,0.0914696394299481,0.00281889216102303,0.0111928453399615,0.00188119596836005,NA,0.000874623692966351,0.0181192859074754,0.0176635391424644,0.00922358069727823,0.0525280029990213,0.0975006760149882,0.124726170684951,0.0187132600944396,0.00672592365451266,0.106399234215126,0.0401776844073239,0.00015382736648373,0.000293356424756535,0.000923659501144292,0.000897412901472504,0.00315930225856196,0.0636501228611642,0.0129422445492391,0.0143526630252398,0.0136775931834926,0.00159292971508751,0.0000322313783211749,0.00125352390811532,0.0000288862579879126,0.00590690336494395,0.000417043974238875,0.0000695808216192379,0.001301299696752,0.000209355138230326,0.000153151660178623,0.0000646279598274632,0.000596704590065324,9.52943306579156E-06,0.000113476446629278,0.00825405312309618,0.0001025984082064,0.000887617767039489,0.00273668802742924,0.00469409165130462,0.00312377000134233,0.0015579322817235,0.0582615988387306,0.00146933878743163,0.0405139497779372,0.259097955479886,0.00783997376383192,0.110638003652979,0.00454029511918275,0.00728290246595241,0.00104674197030363,0.00550563937846687,0.000121380392484705,0.000831904606687671,0.00475778829159394,0.000402799910756391,0.00259524300745195,0.000210249875492504,0.00550104485802363,0.000272849546913495,0.0025389089622392,0.00129370075116459,0.00132810234020792,0.00523285954007915,0.00506230599388357,0.00774104695265855,0.00098348404576587,0.174079173227248,0.0153486840317039,0.351820365452281,0.00347674458928481,0.147309225196026,0.0418825705903947,0.00591271021100856,0.0207139520537443,0.0563647804012055,0.000560012457272534,0.00191564842393647,0.01493480083524,0.00353400674061077,0.00771828473058641,0.000202009136938048,0.112695841130448,0.00761492172670762,0.038797330459115,0.217367765362878,0.0680958660605668,0.0100870294641921,0.00493875324236991,0.00136539944656238,0.00264262100866192,0.0847732305020654,0.00460985241335143,0.235802638543116,0.16336020383325,0.225776236687456,0.0204568107372349,0.0455390585228863,0.130969863489582,0.00679523322812889,0.0172325334280024,0.00299970176999806,0.00179347656925317,0.00721658257996989,0.00822443690003783,0.00913096724026346,0.0105920192618379,0.0158013204589482,0.00388803567197835,0.00366268607026078,0.0545418725650633,0.00761485067129418,0.00867583194858734,0.0188232707241144,0.018652666214789)
dat=data.frame(trt,year,plot,ratio)
require(glmmTMB)
require(lsmeans)
mod=glmmTMB(ratio~trt*scale(year)+(1|plot),family=list(family="beta",link="logit"),data=dat)
summary(mod)
ls=lsmeans(mod,pairwise~trt)`
Finally, I get the following error message that I've never encountered and on which I could find no information:
In model.matrix.default(trms, m, contrasts.arg = contrasts) :
variable 'plot' is absent, its contrast will be ignored
Could anyone shine their light? Thanks!
This is not an error message, it's a (harmless) warning message. It occurs because the hacked-up method I wrote doesn't exclude factor variables that are only used in the random effects. You should worry more about this output:
NOTE: Results may be misleading due to involvement in interactions
which is warning you that you are evaluating main effects in a model that contains interactions; you have to think about this carefully to make sure you're doing it right.

R - numerical errors with analytical gradient?

I've got the following code:
theta=0.05
n=1000
m=200
r=rnorm(2000)
#ER check function
nu=Vectorize(function(a,tau){return(abs(tau-(a<0))*a^2)})
#Selecting 10 lowest sum values (lowest10 function returns indices)
lowest10=function(x){
values=sort(x)[1:min(10,length(x))]
indices=match(values,x)
return(indices)
}
sym.expectile=function(beta,e,abs.r){return(beta[1]+beta[2]*e+beta[3]*abs.r)}
ERsum=function(beta,tau,start,end){
y=r[(start+1):end]
X1=rep(1,n-1)
X3=abs(r[start:(end-1)])
X2=c()
X2[1]=e.sym.optimal[start-m]
for (i in 2:(n-1)){
X2[i]=sym.expectile(beta,X2[i-1],X3[i-1])
}
X=matrix(c(X1,X2,X3),ncol=3)
res=y-X%*%beta
sum.nu=mean(nu(res,tau))
return(sum.nu)
}
ERsum.gr=function(beta,tau,start,end){
y=r[(start+1):end]
X1=rep(1,n-1)
X3=abs(r[start:(end-1)])
X2=c()
X2[1]=e.sym.optimal[start-m]
for (i in 2:(n-1)){
X2[i]=sym.expectile(beta,X2[i-1],X3[i-1])
}
X=matrix(c(X1,X2,X3),ncol=3)
partial.beta0=c()
for (i in 1:(n-1)){partial.beta0[i]=-(1-beta[2]^(i))/(1-beta[2])}
gr.beta0=2/T*sum(abs(tau-(y<X%*%beta))*(y-X%*%beta)*partial.beta0)/1000
partial.beta1=c()
partial.beta1[1]=-X2[1]
for (i in 2:(n-1)){partial.beta1[i]=partial.beta1[i-1]*beta[2]-X2[i]}
gr.beta1=2/T*sum(abs(tau-(y<X%*%beta))*(y-X%*%beta)*partial.beta1)/1000
partial.beta2=c()
partial.beta2[1]=-X3[1]
for (i in 2:(n-1)){partial.beta2[i]=partial.beta2[i-1]*beta[2]-X3[i]}
gr.beta2=2/T*sum(abs(tau-(y<X%*%beta))*(y-X%*%beta)*partial.beta2)/1000
c(gr.beta0,gr.beta1,gr.beta2)
}
beta=matrix(nrow=1e4,ncol=3)
beta[,1]=runif(1e4,-1,0)#beta0
beta[,2]=runif(1e4,0,1)#beta1
beta[,3]=runif(1e4,-1,0)#beta2
e.sym.optimal=c()
tau.found.sym.optim=0.02234724
library('expectreg')
e.sym.optimal[1]=expectile(r[1:m],tau.found.sym.optim)
ERsums.sym=c()
for (i in 1:nrow(beta)){
ERsums.sym[i]=ERsum(beta[i,],tau.found.sym.optim,m+1,m+n)
}
initialbeta.esym=beta[lowest10(ERsums.sym),]
intermedietebeta.esym=matrix(ncol=3,nrow=10)
for (i in 1:10){
intermedietebeta.esym[i,]=optim(initialbeta.esym[i,],ERsum,
gr=ERsum.gr,tau=tau.found.sym.optim,
start=m+1,end=m+n,
method="BFGS")$par
}
I tried to replace the optim function with optimx, but got the following error:
Error: Gradient function might be wrong - check it!
To check if my gradient is ok I tried to evaluate values of gradient function using function grad from numDeriv and directly calling my ERsum.gr function. For the sample vector
beta
[1] -0.8256490 0.7146256 -0.4945032
I obtained following results:
>grad(function(beta) ERsum(c(beta[1],beta[2],beta[3]),tau.found.sym.optim,m+1,m+n),beta)
[1] -0.6703170 2.8812666 -0.5573101
> ERsum.gr2(beta,tau.found.sym.optim,m+1,m+n)
[1] -0.6696467 2.8783853 -0.5567527
So here is my question: is it possible that these differences are just some numerical errors caused by rounding down the partial.beta0, partial.beta1, partial.beta2 which are just the components of the sum representing gradient? I think so, because if my analytical formula for gradient misses something, the discrepancies would be probably much larger, but how can I be sure? If this is a case is there any other way to obtain more accurate values of gradient?
You've got further problems down the line even if you solve the question of whether that is really a proper gradient, which I see as too complex to tackle. If you take out the gr argument and try to run with only optimx instead of optim, you get:
Error in intermedietebeta.esym[i, ] <- optimx(initialbeta.esym[i, ], ERsum, :
number of items to replace is not a multiple of replacement length
This probably relates to the fact that optimx does not return the same structure as is returned by optim:
> optimx(initialbeta.esym[i,],ERsum,
+ tau=tau.found.sym.optim,
+ start=m+1,end=m+n,
+ method="BFGS")$par
NULL
> optimx(initialbeta.esym[i,],ERsum,
+ tau=tau.found.sym.optim,
+ start=m+1,end=m+n,
+ method="BFGS") # leave out `$par`
p1 p2 p3 value fevals gevals niter convcode kkt1 kkt2 xtimes
BFGS -1.0325 0.2978319 0.04921863 0.09326904 102 100 NA 1 TRUE FALSE 3.366
If you disagree with the decision to allow a default gradient estimate, hten you need to narrow down your debugging to the function that throws the error:
Error: Gradient function might be wrong - check it!
> traceback()
3: stop("Gradient function might be wrong - check it! \n", call. = FALSE)
2: optimx.check(par, optcfg$ufn, optcfg$ugr, optcfg$uhess, lower,
upper, hessian, optcfg$ctrl, have.bounds = optcfg$have.bounds,
usenumDeriv = optcfg$usenumDeriv, ...)
1: optimx(initialbeta.esym[i, ], ERsum, gr = ERsum.gr, tau = tau.found.sym.optim,
start = m + 1, end = m + n, method = "BFGS")
And look at the documentation (there was no help page) and code for optimx:::optimx.check. This is the section of code that does the checking:
if (!is.null(ugr) && !usenumDeriv) {
gname <- deparse(substitute(ugr))
if (ctrl$trace > 0)
cat("Analytic gradient from function ", gname,
"\n\n")
fval <- ufn(par, ...)
gn <- grad(func = ufn, x = par, ...)
ga <- ugr(par, ...)
teps <- (.Machine$double.eps)^(1/3)
if (max(abs(gn - ga))/(1 + abs(fval)) >= teps) {
stop("Gradient function might be wrong - check it! \n",
call. = FALSE)
optchk$grbad <- TRUE
}

how to solve negative subscript error in R?

I am trying to normalize the data frame before prediction but I get this error :
Error in seq_len(nrows)[i] :
only 0's may be mixed with negative subscripts
Called from: top level
Here is my code :
library('caret')
load(file = "some dataset path here")
DummyDataSet = data
attach(DummyDataSet)
foldCount = 10
classifyLabels = DummyDataSet$ClassLabel
folds = createFolds(classifyLabels,k=foldCount)
for (foldIndex in 1:foldCount){
cat("----- Start Fold -----\n")
#holding out samples of one fold in each iterration
testFold = DummyDataSet[folds[[foldIndex]],]
testLabels = classifyLabels[folds[[foldIndex]]]
trainFolds = DummyDataSet[-folds[[foldIndex]],]
trainLabels = classifyLabels[-folds[[foldIndex]]]
#Zero mean unit variance normalization to ONLY numerical data
for (k in 1:ncol(trainFolds)){
if (!is.integer(trainFolds[,k])){
params = meanStdCalculator(trainFolds[,k])
trainFolds[,k] = sapply(trainFolds[,k], function(x) (x - params[1])/params[2])
testFold[,k] = sapply(testFold[,k], function(x) (x - params[1])/params[2])
}
}
meanStdCalculator = function(data){
Avg = mean(data)
stdDeviation = sqrt(var(data))
return(c(Avg,stdDeviation))
}
cat("----- Start Fold -----\n")
}
where trainFolds is a fold creating by caret package and its type is data.frame.
I have already read these links :
R Debugging
Subset
Negative Subscripts
but I couldn't find out what is wrong with the indexes?
anybody can help me?

Check the return values from function Gammad and Truncate (from package distr and truncdist)

After searching the forum, I did not find similair questions. If you find one, please let me know. I would really appreciate.
In R, I need to check the return values from function Gammad and Truncate (from lib distr and truncdist).
It means that if they fail to generate the Gammad and Truncate pdf, a fail value or exception can be returned so that I can handle it.
G0 <- Gammad(scale = s, shape = sh)
# what if Gammad() fails ?
TG <- Truncate(G0, lower = lowerbound, upper = upperbound)
# what if Truncate() fails ?
Thanks !
From the rgamma help page: "Invalid arguments will result in return value NaN, with a warning."
If this is what you see, you could use
ow <- options("warn")
options(warn=2)
G0 <- try(Gammad(scale = s, shape = sh), silent=TRUE)
if(inherits(G0, "try-error")) # handle invalid arguments
options(warn=ow)

Resources