caret: Error when using anything but LOOCV with rpart - r

I'm trying to use the R caret module for model generation and I want to use some cross-validation function. I found out that the only cross validation function which works together with rpart is LOOCV (leave one out cross validation).
The following code throws the error:
library(cart)
data(trees)
formula=Volume~Girth+Height
train(formula, data=trees, method='rpart')
Warning message: In nominalTrainWorkflow(dat = trainData, info =
trainInfo, method = method, : There were missing values in
resampled performance measures.
What does this error mean and how do I make it go away? I searched on the internet, not a single hit for this error-message. I traced the error down to the rpart model generation. It somehow outputs this error message, all other mode-generation-methods work fine!
Everything works fine if I use LOOCV.
I traced the warning down to the workflows.R file, but I do not understand why this warning gets thrown.
> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] earth_3.2-3 plotrix_3.4 plotmo_1.3-1
[4] leaps_2.9 doMC_1.2.5 multicore_0.1-7
[7] iterators_1.0.6 forecast_3.20 RcppArmadillo_0.3.0.2
[10] Rcpp_0.9.10 fracdiff_1.4-1 tseries_0.10-28
[13] zoo_1.7-7 quadprog_1.5-4 caret_5.15-023
[16] foreach_1.4.0 cluster_1.14.2 reshape_0.8.4
[19] plyr_1.7.1 lattice_0.20-6 mda_0.4-2
[22] class_7.3-3 rpart_3.1-52 data.table_1.8.0
loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_2.15.0 grid_2.15.0

Using the R Mailinglist and the help of the caret author I found out the following solution:
If for some reason the model which is generated is constant the error occurs. Constant means in this case that for all input values the model always yields the same value. In this case, the calculation of R^2 fails. R^2 is calculated per default by caret. As caret does not use the R^2 value for model selection, you can skip this error.
Two questions remain:
It is not clear to me why the R^2 calculation fails if the model is constant. The code in caret explicitely fails if there are not at least two different values in the model prediction. I replaced the R^2 calculation with a selfwritten one which does not have these limits.
The question why rpart sometimes generates a constant model is still open. Especially why it only generates constant models for other cross validations than LOOCV.
In short: You can ignore the warning and if you need, write your own R^2 to fix the warning.

Related

Function `Boot` from `R` package `car` can not find .carEnv

While using the Boot function from the car package I get the error message
Error in get(".y.boot", envir = .carEnv) : object '.carEnv' not found
I suspect I have inadvertently changed/set something in my OS and have no idea what it might be. Running the code below returns an error on my desktop but runs without error on a laptop running the same OS (Yosemite) as well as a desktop running Windows 7 (all using R-3.1.2). The code that triggers the message is
library(car)
swiss.lm <- lm(Fertility ~ Education, data = swiss)
BC <- Boot(swiss.lm, R = 999, method = "case") # No Problems
BR <- Boot(swiss.lm, R = 999, method = "residual") # Problems now
Error in get(".y.boot", envir = .carEnv) : object '.carEnv' not found
I have reinstalled R but the error still appears when running the above code. Any suggestions as to what I have done and how to get the code to run and find the environment would be most appreciated. TIA!
> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] car_2.0-24
loaded via a namespace (and not attached):
[1] boot_1.3-15 grid_3.1.2 lattice_0.20-29 lme4_1.1-7 MASS_7.3- 37 Matrix_1.1-5
[7] mgcv_1.8-4 minqa_1.2.4 nlme_3.1-119 nloptr_1.0.4 nnet_7.3-9 parallel_3.1.2
[13] pbkrtest_0.4-2 quantreg_5.11 Rcpp_0.11.4 SparseM_1.6 splines_3.1.2 tools_3.1.2
Looks like a reproducible bug in a car package. According to package news the changes in latest version (2.0-24) are related to .carEnv handling.
I tried to get around the issue by simply assigning the .carEnv before call to Boot with
.carEnv <- car:::.carEnv
This makes the Boot function execute without errors, but I am not sure of any other effects.
The package maintainer emailed me and indicated a bug had been introduced in 2.0-24 and that he would attempt to fix the bug.

invalid or not-yet-implemented 'Matrix' subsetting in Shiny

When I run my shiny application, I got an error message saying
Error in prob[tw, uni.c] :
invalid or not-yet-implemented 'Matrix' subsetting
That same code ran without error when it was not on Shiny. Any idea how I can troubleshoot this?
I'm not sure how to reproduce the data here, but prob is of class dgCMatrix from the Matrix package, tw is a single integer, and uni.c is a numeric vector.
EDIT:
sessionInfo() output:
R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_Singapore.1252 LC_CTYPE=English_Singapore.1252 LC_MONETARY=English_Singapore.1252
[4] LC_NUMERIC=C LC_TIME=English_Singapore.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] shiny_0.10.1 Matrix_1.1-4
loaded via a namespace (and not attached):
[1] bitops_1.0-6 caTools_1.17.1 digest_0.6.4 grid_3.1.1 htmltools_0.2.6 httpuv_1.3.0 lattice_0.20-29
[8] Rcpp_0.11.3 RJSONIO_1.3-0 tools_3.1.1 xtable_1.7-4
It turned out to be a bug in my code that is exposed by how Shiny works.
Outside Shiny, the function where the code resides worked seamlessly, being fed input from another function in the right format.
In Shiny, I expected the function in server.R to receive the input after the submitButton button is pressed, with sensible input keyed into the field. Apparently, even before the first press of button, the default value in the input field (which was not a sensible one) was passed to my function. That default value is not well-handled by my function and caused the error. Both changing the default value, and building extra error-checking in my function, worked to solve the issue.
Apologies for the confusion; this was a learning experience to be careful with default values and with Shiny processing sequence.

"non-numeric argument to binary operator" error from getReturns

For some reason, a code I usually run in Rstudios is no longer working. I'm hoping that someone has had a similar experience and understands what's going on.
getReturns(c('C','BAC'), start='2004-01-01', end='2008-12-31')
This results in:
Error in unclass(e1) + unclass(e2) :
non-numeric argument to binary operator
I can't find anything online nor on stackoverflow that addresses this issue. Also, I saw that the most recent documentation, from July 2014 doesn't mention anything either:
http://cran.r-project.org/web/packages/stockPortfolio/stockPortfolio.pdf
Does anyone have any idea what's going on here?
It's probably a function name clash issue. Running
timeSeries::getReturns(c('C','BAC'), start='2004-01-01', end='2008-12-31')
gives me the error, but running
stockPortfolio::getReturns(c('C','BAC'), start='2004-01-01', end='2008-12-31')
works fine.
How did this happen?
You must have loaded the stockPortfolio package, and then loaded either timeSeries or another package that depends upon timeSeries. Have a look through your console for a message that looks like
The following object is masked from ‘package:stockPortfolio’:
getReturns
Use the double colon operator (as shown above) to explicitly tell R which package to look in.
I have a similar problem using stockPortfolio in a R Markdown program.
Code that works in a R file does not work in the rmd file.
```{r p3}
recordState()
ff <- allFunds1$Fund
returns <-stockPortfolio::getReturns(ff,freq="month")
save(allFunds1,file='allFunds1.rda')
```
gives the error message and traceback
Error in unclass(e1) + unclass(e2) : non-numeric argument to binary operator
5. structure(unclass(e1) + unclass(e2), class = "Date")
4.`+.Date`(as.Date(origin, ...), x)
3. as.Date.numeric(uDates, origin = minDate)
2. as.Date(uDates, origin = minDate
1. stockPortfolio::getReturns(ff, freq = "month")
My recordState function saves the results of search() and sessionInfo() in the chunk:
[1] "search:"
[1] ".GlobalEnv" "tools:rstudio" "package:stats"
[4] "package:graphics" "package:grDevices" "package:utils"
[7] "package:datasets" "package:methods" "Autoloads"
[10] "package:base"
[1] "sessionInfo():"
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X Yosemite 10.10.5
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] Rcpp_0.12.9 digest_0.6.11 dplyr_0.5.0
[4] rprojroot_1.2 assertthat_0.1 R6_2.2.0
[7] xtable_1.8-2 DBI_0.5-1 backports_1.0.5
[10] magrittr_1.5 evaluate_0.10 stringi_1.1.2
[13] stockPortfolio_1.2 rmarkdown_1.3 tools_3.3.2
[16] stringr_1.1.0 readr_1.0.0 yaml_2.1.14
[19] htmltools_0.3.5 knitr_1.15.1 tibble_1.2
The original posting suggests that this error can result from confusing stockPortfolio::getReturns with the function in timeSeries but I have used the full name and do not have either of the libraries loaded.

LMERConvenienceFunctions error on back and forward fitting functions: model not a mer object

I tried using bfFixefLMER_t.fnc or fitLMER.fnc from the LMERConvenienceFunctions package. In both the cases, I get an error that "the input model is not a mer object".
I tried out the examples from http://artax.karlin.mff.cuni.cz/r-help/library/LMERConvenienceFunctions/html/00Index.html. I get the same errors.
For example when I run from the example
fitLMER.fnc(mB, backfit.on = "t", item = FALSE,
ran.effects = c("(FreqB | Subject)",
"(LengthB | Subject)", "(WMC | Item)"))
this is the result I get.
Warning in fitLMER.fnc(mB, backfit.on = "t", item = FALSE, ran.effects = c("(FreqB | Subject)", :resetting argument "method" to "t"
**backfitting fixed effects**
Warning in bfFixefLMER_t.fnc(model = model, item = item, method = method, :factor variable with more than two levels in model terms, backfitting on t-values is not appropriate, please use function "bfFixefLMER_F.fnc" instead.
Error in bfFixefLMER_t.fnc(model = model, item = item, method = method, : the input model is not a mer object
Has anyone had this experience with these functions?
There are functions that back fit fixed effects and forward fit random effects.
Is there a way to do forward fitting of fixed effects for the glmer models? Or is this statistically meaningless? I am working on ecological modelling, so my understanding of advanced stats is not much, so, please, if someone can explain in layman's terms better
sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] LMERConvenienceFunctions_2.0 lme4_0.99999911-8
[3] RcppEigen_0.3.1.2.1 Rcpp_0.10.4
[5] Matrix_1.0-12 lattice_0.20-23
[7] LCFdata_1.0
loaded via a namespace (and not attached):
[1] grid_3.0.1 MASS_7.3-28 minqa_1.2.1 nlme_3.1-111 rpart_4.1-2
[6] splines_3.0.1 tools_3.0.1
You are using an (older) version of the overhauled lme4 package that returns merMod objects instead of mer objects, and hence is not compatible with LMERConvenienceFunctions. I get the same error when using the soon-to-be-released version 1.0-4.
If I install the latest version from CRAN instead (0.999999-2), no errors arise. I suggest removing your current lme4 and installing the latest from CRAN, and checking its version:
> detach("package:lme4",unload=TRUE)
> remove.packages("lme4")
> install.packages("lme4")
> packageVersion("lme4")
[1] ‘0.999999.2’
This should fix your problems. Be aware, however, that you will lose the advantages of the new version.
Also, in the coming days the new lme4 should appear on CRAN, breaking LMERConvenienceFunctions again if you update your packages. I guess, however, that the authors of LMERConvenienceFunctions will update their package soon to be compatible again.

R - XCMS package Error

I' ve the following problem :
Error in .C("NetCDFOpen", as.character(filename), ncid = integer(1), status = integer(1), :
C symbol name "NetCDFOpen" not in DLL for package "xcms"
How do you get this error :
nc <- xcms:::netCDFOpen(cdfFile)
ncData <- xcms:::netCDFRawData(nc)
xcms:::netCDFClose(nc)
I don't know why this don't works, although it should. For further info feel free to ask. Free .cdf files canf be found in the TargetSearchData package.
Code example :
## The directory with the NetCDF GC-MS files
cdfpath <- file.path(.find.package("TargetSearchData"), "gc-ms-data")
cdfpath
I don't think that it should, as you are implying. First, you are using a non-exported function through the :::. In addition, as stated by the error message, the is no NetCDFOpen symbol defined is the dll/so files.
Using the standard input functionality from xcms, works smoothly:
> library("xcms")
> cdfpath <- file.path(.find.package("TargetSearchData"), "gc-ms-data")
> cdfFile <- dir(cdfpath, full.names=TRUE)[1]
> xs <- xcmsSet(cdfFile)
7235eg04: 135:168 185:314 235:444 285:580
> xr <- xcmsRaw(cdfFile)
If you really want to input your data manually, you should use the functionality from the mzR package, which xcms depends on:
> openMSfile(cdfFile)
Mass Spectrometry file handle.
Filename: /home/lgatto/R/x86_64-unknown-linux-gnu-library/2.16/TargetSearchData/gc-ms-data/7235eg04.cdf
Number of scans: 4400
Finally, do pay attention to always provide the output of sessionInfo, to assure that you are using the latest version. In my case:
> sessionInfo()
R Under development (unstable) (2012-10-23 r61007)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocInstaller_1.9.4 xcms_1.35.1 mzR_1.5.1
[4] Rcpp_0.9.15
loaded via a namespace (and not attached):
[1] Biobase_2.19.0 BiocGenerics_0.5.1 codetools_0.2-8 parallel_2.16.0
[5] tools_2.16.0
although if might be different for you, if you use the stable version of R and Bioconductor (currently 2.15.2/2.11).
Hope this helps.

Resources