R variable not found, but specifically defined - r

I have written a function to run phylogenetic generalized least squares, and everything looks like it should work fine, but for some reason, a specific variable which is defined in the script (W) keeps coming up as undefined. I have stared at this code for hours and cannot figure out where the problem is.
Any ideas?
myou <- function(alpha, datax, datay, tree){
data.frame(datax[tree$tip.label,],datay[tree$tip.label,],row.names=tree$tip.label)->dat
colnames(dat)<-c("Trait1","Trait2")
W<-diag(vcv.phylo(tree)) # Weights
fm <- gls(Trait1 ~ Trait2, data=dat, correlation = corMartins(alpha, tree, fixed = TRUE),weights = ~ W,method = "REML")
return(as.numeric(fm$logLik))
}
corMartins2<-function(datax, datay, tree){
data.frame(datax[tree$tip.label,],datay[tree$tip.label,],row.names=tree$tip.label)->dat
colnames(dat)<-c("Trait1","Trait2")
result <- optimize(f = myou, interval = c(0, 4), datax=datax,datay=datay, tree = tree, maximum = TRUE)
W<-diag(vcv.phylo(tree)) # Weights
fm <- gls(Trait1 ~ Trait2, data = dat, correlation = corMartins(result$maximum, tree, fixed =T),weights = ~ W,method = "REML")
list(fm, result$maximum)}
#test
require(nlme)
require(phytools)
simtree<-rcoal(50)
as.data.frame(fastBM(simtree))->dat1
as.data.frame(fastBM(simtree))->dat2
corMartins2(dat1,dat2,tree=simtree)
returns "Error in eval(expr, envir, enclos) : object 'W' not found"
even though W is specifically defined!
Thanks!

The error's occuring in the gls calls in myou and corMatrins2: you have to pass in W as a column in dat because gls is looking for it there (when you put weights = ~W as a formula like that it looks for dat$W and can't find it).
Just change data=dat to data=cbind(dat,W=W) in both functions.

The example is not reproducible for me, as lowerB and upperB are not defined, however, perhaps the following will work for you, cbinding dat with W:
myou <- function(alpha, datax, datay, tree){
data.frame(datax[tree$tip.label,],datay[tree$tip.label,],row.names=tree$tip.label)->dat
colnames(dat)<-c("Trait1","Trait2")
W<-diag(vcv.phylo(tree)) # Weights
### cbind W to dat
dat <- cbind(dat, W = W)
fm <- gls(Trait1 ~ Trait2, data=dat, correlation = corMartins(alpha, tree, fixed = TRUE),weights = ~ W,method = "REML")
return(as.numeric(fm$logLik))
}
corMartins2<-function(datax, datay, tree){
data.frame(datax[tree$tip.label,],datay[tree$tip.label,],row.names=tree$tip.label)->dat
colnames(dat)<-c("Trait1","Trait2")
result <- optimize(f = myou, interval = c(lowerB, upperB), datax=datax,datay=datay, tree = tree, maximum = TRUE)
W<-diag(vcv.phylo(tree)) # Weights
### cbind W to dat
dat <- cbind(dat, W = W)
fm <- gls(Trait1 ~ Trait2, data = dat, correlation = corMartins(result$maximum, tree, fixed =T),weights = ~ W,method = "REML")
list(fm, result$maximum)}
#test
require(phytools)
simtree<-rcoal(50)
as.data.frame(fastBM(simtree))->dat1
as.data.frame(fastBM(simtree))->dat2
corMartins2(dat1,dat2,tree=simtree)

Related

How to calculate the cross-validated R2 on a LASSO regression?

I am using this code to fit a model using LASSO regression.
library(glmnet)
IV1 <- data.frame(IV1 = rnorm(100))
IV2 <- data.frame(IV2 = rnorm(100))
IV3 <- data.frame(IV3 = rnorm(100))
IV4 <- data.frame(IV4 = rnorm(100))
IV5 <- data.frame(IV5 = rnorm(100))
DV <- data.frame(DV = rnorm(100))
data<-data.frame(IV1,IV2,IV3,IV4,IV5,DV)
x <-model.matrix(DV~.-IV5 , data)[,-1]
y <- data$DV
AB<-glmnet(x=x, y=y, alpha=1)
plot(AB,xvar="lambda")
lambdas = NULL
for (i in 1:100)
{
fit <- cv.glmnet(x,y)
errors = data.frame(fit$lambda,fit$cvm)
lambdas <- rbind(lambdas,errors)
}
lambdas <- aggregate(lambdas[, 2], list(lambdas$fit.lambda), mean)
bestindex = which(lambdas[2]==min(lambdas[2]))
bestlambda = lambdas[bestindex,1]
fit <- glmnet(x,y,lambda=bestlambda)
I would like to calculate some sort of R2 using the training data. I assume that one way to do this is using the cross-validation that I performed in choosing lambda. Based off of this post it seems like this can be done using
r2<-max(1-fit$cvm/var(y))
However, when I run this, I get this error:
Warning message:
In max(1 - fit$cvm/var(y)) :
no non-missing arguments to max; returning -Inf
Can anyone point me in the right direction? Is this the best way to compute R2 based off of the training data?
The function glmnet does not return cvm as a result on fit
?glmnet
What you want to do is use cv.glmnet
?cv.glmnet
The following works (note you must specify more than 1 lambda or let it figure it out)
fit <- cv.glmnet(x,y,lambda=lambdas[,1])
r2<-max(1-fit$cvm/var(y))
I'm not sure I understand what you are trying to do. Maybe do this?
for (i in 1:100)
{
fit <- cv.glmnet(x,y)
errors = data.frame(fit$lambda,fit$cvm)
lambdas <- rbind(lambdas,errors)
r2[i]<-max(1-fit$cvm/var(y))
}
lambdas <- aggregate(lambdas[, 2], list(lambdas$fit.lambda), mean)
bestindex = which(lambdas[2]==min(lambdas[2]))
bestlambda = lambdas[bestindex,1]
r2[bestindex]

Function lipsitz.test {generalhoslem} is not working for object clm{ordinal}

I am now tring to test the goodness of fit of an ordianl model using lipsitz.test {generalhoslem}. According to the document, the function can deal with both polr and clm. However, when I try to use clm in the lipsitz.testfunction, an error occurs. Here is an example
library("ordinal")
library(generalhoslem)
data("wine")
fm1 <- clm(rating ~ temp * contact, data = wine)
lipsitz.test(fm1)
Error in names(LRstat) <- "LR statistic" :
'names' attribute [1] must be the same length as the vector [0]
In addition: Warning message:
In lipsitz.test(fm1) :
n/5c < 6. Running this test when n/5c < 6 is not recommended.
Is there any solution to fix this? Thanks a lot.
I'm not sure if this is off-topic and should be on CrossValidated. It's partly a problem with the coding of the test and partly about the statistics of the test itself.
There are two problems. I've just spotted a bug in the code when using clm and will push a fix to CRAN (corrected code below).
There does however appear to be a more fundamental problem with the example data. Basically, the Lipsitz test requires fitting a new model with dummy variables of the groupings. When fitting the new model with this example, the model fails and so some of the coefficients are not calculated. If using polr, the new model gets the warning that it is rank-deficient; if using clm, the new model gets a message that two coefficients are not fitted due to singularities. I think this example data set is just unsuitable for this kind of analysis.
The corrected code is below and I have used a larger example dataset on which the test runs.
lipsitz.test <- function (model, g = NULL) {
oldmodel <- model
if (class(oldmodel) == "polr") {
yhat <- as.data.frame(fitted(oldmodel))
} else if (class(oldmodel) == "clm") {
predprob <- oldmodel$model[, 2:ncol(oldmodel$model)]
yhat <- predict(oldmodel, newdata = predprob, type = "prob")$fit
} else warning("Model is not of class polr or clm. Test may fail.")
formula <- formula(oldmodel$terms)
DNAME <- paste("formula: ", deparse(formula))
METHOD <- "Lipsitz goodness of fit test for ordinal response models"
obs <- oldmodel$model[1]
if (is.null(g)) {
g <- round(nrow(obs)/(5 * ncol(yhat)))
if (g < 6)
warning("n/5c < 6. Running this test when n/5c < 6 is not recommended.")
}
qq <- unique(quantile(1 - yhat[, 1], probs = seq(0, 1, 1/g)))
cutyhats <- cut(1 - yhat[, 1], breaks = qq, include.lowest = TRUE)
dfobs <- data.frame(obs, cutyhats)
dfobsmelt <- melt(dfobs, id.vars = 2)
observed <- cast(dfobsmelt, cutyhats ~ value, length)
if (g != nrow(observed)) {
warning(paste("Not possible to compute", g, "rows. There might be too few observations."))
}
oldmodel$model <- cbind(oldmodel$model, cutyhats = dfobs$cutyhats)
oldmodel$model$grp <- as.factor(vapply(oldmodel$model$cutyhats,
function(x) which(observed[, 1] == x), 1))
newmodel <- update(oldmodel, . ~ . + grp, data = oldmodel$model)
if (class(oldmodel) == "polr") {
LRstat <- oldmodel$deviance - newmodel$deviance
} else if (class(oldmodel) == "clm") {
LRstat <- abs(-2 * (newmodel$logLik - oldmodel$logLik))
}
PARAMETER <- g - 1
PVAL <- 1 - pchisq(LRstat, PARAMETER)
names(LRstat) <- "LR statistic"
names(PARAMETER) <- "df"
structure(list(statistic = LRstat, parameter = PARAMETER,
p.value = PVAL, method = METHOD, data.name = DNAME, newmoddata = oldmodel$model,
predictedprobs = yhat), class = "htest")
}
library(foreign)
dt <- read.dta("http://www.ats.ucla.edu/stat/data/hsbdemo.dta")
fm3 <- clm(ses ~ female + read + write, data = dt)
lipsitz.test(fm3)
fm4 <- polr(ses ~ female + read + write, data = dt)
lipsitz.test(fm4)

Caret and rpart - definining method

i am trying to familiarize myself with the caret package. I would previously use rpart directly - e.g. with the following syntax
fit_rpart=rpart(y~.,data=dt1,method="anova").
i have specified anova as i am aiming for regression (rather than classification)
with caret - i would the following syntax:
rpart_fit <- train(y ~ ., data = dt1, method = "rpart",trControl=fitControl)
my question is, as the method slot is already used, where/how can i still specify method="anova"?
Many thanks in advance!
You can make a custom method using the current rpart code. First, get the current code:
library(caret)
rpart_code <- getModelInfo("rpart", regex = FALSE)[[1]]
You then just add the extra option to the code. This method is somewhat convoluted since it handles a bunch of different cases, but here is the edit:
rpart_code$fit <- function(x, y, wts, param, lev, last, classProbs, ...) {
cpValue <- if(!last) param$cp else 0
theDots <- list(...)
if(any(names(theDots) == "control")) {
theDots$control$cp <- cpValue
theDots$control$xval <- 0
ctl <- theDots$control
theDots$control <- NULL
} else ctl <- rpart.control(cp = cpValue, xval = 0)
## check to see if weights were passed in (and availible)
if(!is.null(wts)) theDots$weights <- wts
modelArgs <- c(list(formula = as.formula(".outcome ~ ."),
data = if(is.data.frame(x)) x else as.data.frame(x),
control = ctl,
method = "anova"),
theDots)
modelArgs$data$.outcome <- y
out <- do.call("rpart", modelArgs)
if(last) out <- prune.rpart(out, cp = param$cp)
out
}
then test:
library(rpart)
set.seed(445)
mod <- train(pgstat ~ age + eet + g2 + grade + gleason + ploidy,
data = stagec,
method = rpart_code,
tuneLength = 8)
Max
In caret 'method' refers to the type of model you would like to use, so for example rpart or lm (linear regression) or rf (random forest).
What you're referring to is defined as 'metric' in caret.
If your y-variable is a continuous variable, the metric will be default set to maximizing RMSE. So you don't have to do anything.
You could also explicitly specify this by:
rpart_fit <- train(y ~ ., data = dt1, method = "rpart",trControl=fitControl, metric="RMSE")

R pglm probit estimation: uninformative error "object 'Li' not found"

Using the setup:
id <- c(1,1,2,2,3,4)
t <- c(1,2,1,2,1,1)
x <- c(1,2,2,1,2,1)
y <- c(1,0,0,0,1,0)
df <- data.frame(id, t, x, y)
Preparing fitted values from an OLS regression as the starting values:
tstart <- lm(y ~ x, data = df)
Running probit:
tfit <- pglm(y ~ x, data = df, index = c("id","t"),
model = "within", family = binomial('probit'), start = tstart$fitted.values)
Returns the error
Error in lnl.binomial(param = start, y = y, X = X, id = id, model = model, :
object 'Li' not found
This error seems very uninformative to me.
I am not even looking for some object 'Li' in any of the calls, and have no idea what this object should be.
The traceback makes it seem to occur in the function:
9: lnl.binomial(param = start, y = y, X = X, id = id, model = model,
link = link, rn = rn) at <text>#1
But trying to look at the code of the function for when the error occurs reveals that there is not even a function
lnl.binomial()
Where did I go wrong about this?

predict with kernlab package error Error in .local(object, ...) : test vector does not match model R

I'm testing the kernlab package in a regression problem. It seems it's a common issue to get 'Error in .local(object, ...) : test vector does not match model ! when passing the ksvm object to the predict function. However I just found answers to classification problems or custom kernels that are not applicable to my problem (I'm using a built-in one for regression). I'm running out of ideas here, my sample code is:
data <- matrix(rnorm(200*10),200,10)
tr <- data[1:150,]
ts <- data[151:200,]
mod <- ksvm(x = tr[,-1],
y = tr[,1],
kernel = "rbfdot", type = 'nu-svr',
kpar = "automatic", C = 60, cross = 3)
pred <- predict(mod,
ts
)
You forgot to remove the y variable in the test set, and so it fails because the number of predictors don't match. This will work:
predict(mod,ts[,-1])
You can use pred <- predict(mod, ts) if ts is a dataframe.
It would be
data <- setNames(data.frame(matrix(rnorm(200*10),200,10)),
c("Y",paste("X", 1:9, sep = "")))
tr <- data[1:150,]
ts <- data[151:200,]
mod <- ksvm(as.formula("Y ~ ."), data = tr,
kernel = "rbfdot", type = 'nu-svr',
kpar = "automatic", C = 60, cross = 3)
pred <- predict(mod, ts)

Resources