Estimate CV error of R's lda using the package cvTools - r

I'm trying to use the package cvTools to estimate the classification error of a LDA fit. I've used cvTools successfully in the past with other models, but with LDA it doesn't work as expected.
data <- rbind(data.frame(cls='A', x=rnorm(10, mean=5)),
data.frame(cls='B', x=rnorm(10, mean=10)))
cv.fit <- cvFit(
lda,
formula=cls ~ x,
data=data,
cost=function(true, pred) {
mean(true != pred)
},
K=10
)
It seems cvTools uses the generic predict model function internally. predict.lda doesn't return a simple vector of class labels, but rather a list with attributes class, posterior and x.
If I set a debug breakpoint in the cost function I see that true is a vector of class labels but pred is said list converted to a vector.
So my question is how I can still use cvTools with LDA.
My first idea was to provide a new predict method by subclassing lda in the hope that cvTools will then call my predict function which returns a vector of class labels:
lda.fit <- lda(cls ~ x, data)
class(lda.fit) <- c('ldaCV', class(lda.fit))
predict.ldaCV <- function(m, newdata) {
MASS:::predict.lda(m, newdata)$class
}
cv.fit <- cvFit(
lda.fit,
data=data,
y=data$cls,
cost=function(true, pred) {
mean(true != pred)
},
K=10
)
Nothing changes however. pred is still the same messed up list.

Related

prediction on test set for Gaussian Process Regression in R

The mlegp package explains how to do Gaussian Process fitting but the R code mentioned in the mlegp package only demonstrates the use of the predict method to reconstruct the original functional output. Can someone help me understand how to predict using GPR on a test set?
The function predict.gp (which gets called when you use predict on an mlpeg object) takes a newData argument, see ?predict.gp:
Usage:
## S3 method for class 'gp'
predict(object, newData = object$X, se.fit = FALSE, ...)
Arguments:
object: an object of class ‘gp’
newData: an optional data frame or matrix with rows corresponding to
inputs for which to predict. If omitted, the design matrix
‘X’ of ‘object’ is used.
...
Consider the simple model
library(mlepg)
x = -5:5
y = sin(x) + rnorm(length(x),sd = 0.1)
fit = mlegp(x, y)
Then
predict(fit)
and
predict(fit, newData = fit$X)
gives the same result. You can then change newData according to your test data.

glmulti: assigning a predict function for glmer with two nested random variables

I'm trying to use glmulti with glmer for model averaging and to get model averaged predictions. I've followed examples in the glmulti documentation ('Using glmulti with any type of statistical model, with examples', included with package) and updates provided on this website (glmulti and liner mixed models) and on the package maintainer's blog (https://vcalcagnoresearch.wordpress.com/package-glmulti/). I've managed to create a wrapper for the glmer function:
glmer.glmulti <- function (formula, data, family, random = "") {
glmer(paste(deparse(formula), random), data = data, family = binomial)
}
And I've managed to assign a getfit method for glmer so I can get the model averaged coefficients:
setMethod('getfit', 'merMod', function(object, ...) {
summ=summary(object)$coef
summ1=summ[,1:2]
if (length(dimnames(summ)[[1]])==1) {
summ1=matrix(summ1, nr=1, dimnames=list(c("(Intercept)"),c("Estimate","Std. Error")))
}
cbind(summ1, df=rep(10000,length(fixef(object))))
})
The next step is to assign a predict function for glmer. This is the example provided in the package documentation:
predict.mer=function(objectmer,random=random, newdata, withRandom=F,se.fit=F, ...){
if (missing(newdata) || is.null(newdata)) {
DesignMat <- model.matrix(objectmer) }
else {
DesignMat=model.matrix(delete.response(terms(objectmer)),newdata)
}
output=DesignMat %*% fixef(objectmer)
if(withRandom){
z=unlist(ranef(objectmer))
if (missing(newdata) || is.null(newdata)) {
Zt<- objectmer#Zt
} else {
Zt<-as(as.factor(newdata[,names(ranef(objectmer))]),"sparseMatrix")
}
output = as.matrix(output + t(Zt) %*% z)
}
if(se.fit){
pvar <- diag(DesignMat %*% tcrossprod(vcov(objectmer),DesignMat))
if(withRandom){
pvar <- pvar+ VarCorr(objectmer)[[1]]
}
output=list(fit=output,se.fit=sqrt(pvar))
}
return(output)
}
Then to get the the model averaged predictions (bab is a fitted glmulti object in the example):
> predict(bab, se.fit=T, withR=T)
Error in predict.merMod(coffee[[i]], se.fit = se.fit, ...) :
cannot calculate predictions with both standard errors and random effects
I've also tried:
> predict(bab, se.fit=T, withR=F)
Error in predict.merMod(coffee[[i]], se.fit = se.fit, ...) :
cannot calculate predictions with both standard errors and random effects
And:
> predict(bab, se.fit=F, withR=T)
Error in waou %*% t(matrix(unlist(preds), nrow = nbpo)) :
non-conformable arguments
In addition: There were 50 or more warnings (use warnings() to see the first 50)
I'm not quite sure what's wrong, although it may be something obvious. I gather that the lme4 package has been updated and changed since this example was written, so it may be something to do with that(?).
Another possibility is that the documentation says that this function will only handle one random variable. My model has two nested random variables: x ~ y + z + w + (1|u/v).
I need to a) get this working, b) update the function to handle two random variables. Any suggestions would be much appreciated.

Getting Error Bootstrapping to test predictive model

rsq <- function(formula, Data1, indices) {
d <- Data1[indices,] # allows boot to select sample
fit <- lm(formula, Data1=d)
return(summary(fit)$r.square)
}
results = boot(data = Data1, statistic = rsq, R = 500)
When I execute the code, I get the following error:
Error in Data1[indices,] : incorrect number of dimensions
Background info: I am creating a predictive model using Linear Regressions. I would like to test my Predictive Model and through some research, I decided to use the Bootstrapping Method.
Credit goes to #Rui Barradas, check comments for original post.
If you read the help page for function boot::boot you will see that the function it calls has first argument data, then indices, then others. So change the order of your function definition to rsq <- function(Data1, indices, formula)
Another problem that I had was that I didn't define the Function.

can mice() handle crr()? Fine-Gray model

My doubt is if it is possible to pool multiple imputation data set, from "mice()", on a fit model of Fine-Gray from "crr()", and if it is statistically correct...
example
library(survival)
library(mice)
library(cmprsk)
test1 <- as.data.frame(list(time=c(4,3,1,1,2,2,3,5,2,4,5,1, 4,3,1,1,2,2,3,5,2,4,5,1),
status=c(1,1,1,0,2,2,0,0,1,1,2,0, 1,1,1,0,2,2,0,0,1,1,2,0),
x=c(0,2,1,1,NA,NA,0,1,1,2,0,1, 0,2,1,1,NA,NA,0,1,1,2,0,1),
sex=c(0,0,0,NA,1,1,1,1,NA,1,0,0, 0,0,0,NA,1,1,1,1,NA,1,0,0)))
dat <- mice(test1,m=10, seed=1982)
#Cox regression: cause 1
models.cox1 <- with(dat,coxph(Surv(time, status==1) ~ x +sex ))
summary(pool(models.cox1))
#Cox regression: cause 1 or 2
models.cox <- with(dat,coxph(Surv(time, status==1 | status==2) ~ x +sex ))
models.cox
summary(pool(models.cox))
#### crr()
#Fine-Gray model
models.FG<- with(dat,crr(ftime=time, fstatus=status, cov1=test1[,c( "x","sex")], failcode=1, cencode=0, variance=TRUE))
summary(pool(models.FG))
#Error in pool(models.FG) : Object has no vcov() method.
models.FG
There are a couple of things that need to be done to get this to work.
Your initial data and imputation.
library(survival)
library(mice)
library(cmprsk)
test1 <- as.data.frame(list(time=c(4,3,1,1,2,2,3,5,2,4,5,1, 4,3,1,1,2,2,3,5,2,4,5,1),
status=c(1,1,1,0,2,2,0,0,1,1,2,0, 1,1,1,0,2,2,0,0,1,1,2,0),
x=c(0,2,1,1,NA,NA,0,1,1,2,0,1, 0,2,1,1,NA,NA,0,1,1,2,0,1),
sex=c(0,0,0,NA,1,1,1,1,NA,1,0,0, 0,0,0,NA,1,1,1,1,NA,1,0,0)))
dat <- mice(test1,m=10, print=FALSE)
There is no vcov method for crr models which mice requires, however,
we can access the covariance matrix using the model$var returned value.
So write own vcov method to extract, and also need a coef method.
vcov.crr <- function(object, ...) object$var # or getS3method('vcov','coxph')
coef.crr <- function(object, ...) object$coef
There is also an error in how the model is passed to with.mids: your code has cov1=test1[,c( "x","sex")], but really you want cov1 to use the imputed data. I am not sure how to correctly write this as an expression due to the cov1 requiring a matrix with relevant variables, but you can easily hard code a function.
# This function comes from mice:::with.mids
Andreus_with <-
function (data, ...) {
call <- match.call()
if (!is.mids(data))
stop("The data must have class mids")
analyses <- as.list(1:data$m)
for (i in 1:data$m) {
data.i <- complete(data, i)
analyses[[i]] <- crr(ftime=data.i[,'time'], fstatus=data.i[,'status'],
cov1=data.i[,c( "x","sex")],
failcode=1, cencode=0, variance=TRUE)
}
object <- list(call = call, call1 = data$call, nmis = data$nmis,
analyses = analyses)
oldClass(object) <- c("mira", "matrix")
return(object)
}
EDIT:
The mice internals have changed since this answer; it now uses the broom package to extract elements from the fitted crr model. So tidy and glance methods for crr models are required:
tidy.crr <- function(x, ...) {
co = coef(x)
data.frame(term = names(co),
estimate = unname(co),
std.error=sqrt(diag(x$var)),
stringsAsFactors = FALSE)
}
glance.crr <- function(x, ...){ }
The above code then allows the data to be pooled.
models.FG <- Andreus_with(dat)
summary(pool(models.FG))
Note that this gives warnings over df.residual not being defined, and so large samples are assumed. I'm not familiar with crr so a more sensible value can perhaps be extracted -- this would then be added to the tidy method. (mice version ‘3.6.0’)

predict() for glm.fit does not work. why?

I've built glm model in R using glm.fit() function:
m <- glm.fit(x = as.matrix(df[,x.id]), y = df[,y.id], family = gaussian())
Afterwards, I tried to make some predictions, using (I am not sure that I chose s correctly):
predict.glm(m, x, s = 0.005)
And got an error:
Error in terms.default(object) : no terms component nor attribute
Here https://stat.ethz.ch/pipermail/r-help/2004-September/058242.html I found some sort of solution to a problem:
predict.glm.fit<-function(glmfit, newmatrix){
newmatrix<-cbind(1,newmatrix)
coef <- rbind(1, as.matrix(glmfit$coef))
eta <- as.matrix(newmatrix) %*% as.matrix(coef)
exp(eta)/(1 + exp(eta))
}
But I can not figure out if it is not possible to use glm.fit and predict afterwards. Why it is possible or not? And how should one choose s correctly?
N.B. The problem can be ommited if using glm() function. But glm() function asks for formula, which is not quite convenient in some cases. STill if someone wants to use glm.fit & predictions afterwards here is some solution: https://stat.ethz.ch/pipermail/r-help/2004-September/058242.html
You should be using glm not glm.fit. glm.fit is the workhorse of glm but glm returns an object of class c("glm", "lm") for which there is a predict.glm method. Then you only have to apply predict to the object returned by glm (possibly with some new data specified and the type of prediction that you want) and the generic predict function will select the correct method function.

Resources