R "caret" package running into compile environment error - r

I have tried searching for an answer and absolutely can't seem to find any solution to this. I have recently installed the "caret" package for R, but when I try to train any models at all, it gives me the following error:
Warning: namespace ‘compiler’ is not available and has been replaced
by .GlobalEnv when processing object ‘sep’
Error in comp(expr, env = envir, options = list(suppressUndefined = TRUE)) :
could not find function "makeCenv"
From what I can tell, this has to do with the built-in compiler package. Most of the answers I have seen mention the doMC package, but I don't think this is relevant here because I'm not parallelizing anything. I have been able to run the same code successfully on other machines, so I'm completely stumped as to what the problem might be. Here's some sample code that causes the error for me:
library(caret)
fit.knn <- train(Species ~ ., data=iris, method="knn")
It doesn't seem to matter what method I use, or what data I train on. I have tried reinstalling the package, and reinstalling R as well. I'm running R v.3.3.2 on Windows, and the caret package is the most recent version. Any help would be appreciated!

I have finally found the solution to this issue. I had been using an idiosyncratic setup where I had set my library location to a global folder so I wouldn't have to move all my packages from the "3.2" folder to the "3.3" folder, etc. every time I updated R. My solution worked, but had the side effect of removing the default library location in Program Files (the one that includes the base packages).
That seemed to work completely fine for everything I did, so I never noticed an issue until I started using caret. But for some reason, apparently the built-in compiler package (used by some dependency of caret) doesn't work properly with that setup, and needs an explicit reference to the Program Files library location in .libPaths(). After changing things back to the default, everything seems to work fine.
So although this is an obscure error and probably not something people are likely to experience, for those arriving here from Google, check libPaths() and make sure there are two locations there: your user folder (in "My Documents" by default) and the default library location that includes all the base packages (in "Program Files").

Related

Kernel LDA in Julia. (trouble in package installing)

I want to use Kernel LDA in julia 1.6.1.
I found the repo.
https://github.com/remusao/LDA.jl
I read READEME.md, and I typed
] add LDA
. But it does not work.
The following package names could not be resolved:
LDA (not found in project, manifest or registry
Also, I tried all of the following commands, still does not work.
add https://github.com/remusao/LDA.jl
add https://github.com/remusao/LDA.jl.git
Pkg.clone("https://github.com/remusao/LDA.jl.git")
What is the problem? How can I install LDA.jl in my julia?
The package you have linked, https://github.com/remusao/LDA.jl, has had no commits in over eight years. Among other things, it lacks a Project.toml file, which is necessary for installation in modern Julia.
Since Julia was only about one year old and at version 0.2 back in 2013 when this package last saw maintenance, the language has also changed drastically in this time such that the code in this package would likely no longer function even if you could get it to install.
If you can't find any alternative to this package for your work, forking it and upgrading it to work with modern Julia would be a nice intermediate-beginner project.

Developing R package, testing with `foreach`, while running simulations at same time with different package version

I write almost all my R code in packages at work (and use git). I make heavy use of devtools, in particular short cuts for load_all, etc as I update functions used in a package. I have a rough understanding of devtools, in that load_all makes a temporary copy of the package, and I really like this workflow for testing function updates in packages.
Is there a nice easy way/workflow for running simulations depending on the package, while developing it at the same time, without "breaking" those simulations?
I suspect there is an easy solution that I've overlooked.
Right now what I do is:
get the package "mypackage" up to a point ready for running simulations. copy the whole folder containing the project. Run the simulations in the copied folder using a new package name "mypackage2"). Run simulation scripts which include library(mypackage2) but NOT library(mypackage). This annoyingly means I need to update library(mypackage) calls to library(mypackage2) calls. If I run simulations using library(mypackage) and avoid using library(mypackage2), then I need to make sure the current built version of mypackage is the 'old' one that doesn't reflect updates in 2. below (but 2. below requires rebuilding the package too!). Handling all this gets messy.
While the simulations are running in the copied folder I can update the functions in "mypackage", by either using load_all or rebuilding the package. I often need to Rebuild the package (i.e. using load_all without rebuilding the package when testing updates to the package isn't a workable solution) because I want to test functions that run small parallel simulations with doParallel and foreach, etc (on windows), and any functions I modify and want to test need the latest built "mypackage" in the children processes which spawn new R processes calling "mypackage". I understand that when a package is built in R, it gets stored in ..\R\R-3.6.1\library, and when future R sessions call library(mypackage) they will use that version of the package.
What I'd ideally like to be able to do is, in the same original folder, run simulations with a version of mypackage, and then update the code in the package while simulations are stopped/started, confident my development changes won't break the simulations which are running a specific version of the package.
Is there a simple way for doing the above, without having to recopy folders (and make things like "mypackage2")?
thanks
The issue described here is sort of similar to what I am facing Specify package location in foreach
The problem is that if I run a simulation that takes several days using "mypackage", with many calls to foreach, and update and rebuild "mypackage" when testing changes, future foreach calls from the simulation may pick up the new updated version of the package, which would be a disaster.
I think the answers in the other question do apply,
but you need to do some extra steps.
Let's say you have a version of the package you want to test.
You'd still create a specific folder for that version, but you leave it empty.
Here I'll use /tmp/mypkg2 as an example.
While having your project open in RStudio, you execute:
withr::with_libpaths(c("/tmp/mypkg2", .libPaths()), devtools::install())
That will install that version of the package to the provided folder.
You could then have a wrapper script,
say wrapper.R,
with something like:
pkg_path <- commandArgs(trailingOnly = TRUE)[1L]
cat("Using package at", pkg_path, "\n")
.libPaths(c(pkg_path, .libPaths()))
library(doParallel)
workers <- makeCluster(detectCores())
registerDoParallel(workers)
# We need to modify the lib path in each worker too
parallel::clusterExport(workers, "pkg_path")
parallel::clusterEvalQ(workers, .libPaths(c(pkg_path, .libPaths())))
# ... Your code calling your package and doing stuff
parallel::stopCluster(workers)
Afterwards, from the command line (outside of R/RStudio),
you could type (assuming Rscript is in your path):
Rscript path/to/wrapper.R /tmp/mypkg2
This way, the actual testing code can stay the same
(including calls to library)
and R will automatically search first in pkg_path,
loading your specific package version,
and then searching in the standard locations for any dependencies you may have.
I don't fully understand your use-case (as to why you want to do this) but what I normally do when testing two versions of a package is to push the most recent version to my dev branch in GitHub and then use devtools::load_all() to test what I'm currently working on. Then by using remotes::install_github() and specifying the dev branch you can run the GitHub version with mypackage::func and the devtools version with func

Profiling an installed R package with source line numbers?

I'd like to profile functions in an installed R package (data.table) using Rprof() with line.profiling=TRUE. Normally, installed package are byte compiled, and line numbers are not available for byte compiled packages. The usual instructions for line profiling with Rprof() require using source() or eval(parse()) so that srcref attributes are present.
How can I load data.table so that line numbers are active? My naive attempts to first load the package with library(data.table) and then source('data.table.R') fails because some of the compiled C functions are not found when I attempt to use the package, presumably because library() is using a different namespace. Maybe there is some way to source() into the correct namespace?
Alternatively, perhaps I can build a modified version of data.table that is not byte compiled, and then load that in a way that keeps line numbers? What alterations would I have to make, and how would I then load it? I started by setting ByteCompile: FALSE and then trying R CMD INSTALL -l ~/R/lib --build data.table, but this still seems to be byte compiled.
I'm eager to make this work and will pursue any suggestions. I'm running R 3.2.1 on Linux, have full control over the machine, and can install anything else that is required.
Edit:
A more complete description of the problem I was trying to solve (and the solution for it) is here: https://github.com/Rdatatable/data.table/issues/1249
I ended up doing essentially what Joshua suggested: recompile the package with "KeepSource: TRUE" in the DESCRIPTION. For my purposes, I also found "ByteCompile: FALSE" to be helpful, although this might not apply generally. I also changed the version number so I could see that I was using my modified version.
Then I installed to a different location with "R CMD INSTALL data.table -l ~/R/lib", and loaded with "library(data.table, lib='~/R/lib')". When used with the patches given in the link, I got the line numbers of the allocations as I desired. But if anyone knows a solution that doesn't require recompilation, I'm sure that others would appreciate if you shared.
You should be able to get line numbers even if the package is byte-compiled. But, as it says in ?Rprof (emphasis added):
Individual statements will be recorded in the profile log if
line.profiling is TRUE, and if the code being executed was
parsed with source references. See parse for a discussion of
source references. By default the statement locations are not
shown in summaryRprof, but see that help page for options to
enable the display.
That means you need to set KeepSource: TRUE either in the DESCRIPTION file or via the --with-keep.source argument to R CMD INSTALL.

Undocumented data sets: '.Random.seed' (R CMD check)

I'm building an R package and have run into a perplexing warning during R CMD check:
* checking for missing documentation entries ... WARNING
Undocumented data sets:
‘.Random.seed’
The package has one small data set, which is documented. I'm using R 3.1.1 and RStudio 0.98.1062 on OS X Yosemite, but I get the same error on Windows 7 (and from CRAN). The project also has a vignette that is built with knitr. devtools etc. are all up to date. The file '.Random.seed' doesn't exist in the "data" folder before building, and my reasoning is that it's getting transiently written to disk during the build process by...something. I tried adding '.Random.seed' to .Rbuildignore without success, presumably because it doesn't exist when the build process begins.
Has anyone encountered this before?
Ran into this problem as well. You have almost certainly solved it by now, but I'll post an answer just in case somebody else hits the same problem. At some point, you generated a random number or set the seed in the creation of the Rdata file (or at least that's what happened to me). Simply load the workspace from you data folder, and rm(.Random.seed). Save it. You're done. Easy as pie.
http://www.inside-r.org/r-doc/base/set.seed
I had a similar problem when I was uploading the my.csv dataset, if anyone faces a similar problem the answer is inhere.

Infinite loop caused by require(gWidgetstcltk)

When I require(gWidgetstcktk), I get an infinite loop, with a seemingly endless number of error messages that look like this:
error reading package index file /Library/Frameworks/R.framework/Versions/2.14/Resources/library/tcltk2/tklibs/ttktheme_clearlooks/pkgIndex.tcl: can't find package tile
error reading package index file /Library/Frameworks/R.framework/Versions/2.14/Resources/library/tcltk2/tklibs/ttktheme_clearlooks/pkgIndex.tcl: too many nested evaluations (infinite loop?)
(On each iteration the path is different. The end of these messages seem to be the important parts: can't find package tile and too many nested evaluations (infinite loop?)
I installed the packages as usual using install.package() and the files referred to seem to be present. gWidgets seems to load just fine. I'm running R 2.14.1 via RStudio 0.96.231 on OSX 10.7.4. What is going wrong here?
Update: I now see that the problem is coming from the tcltk2 package.
That shouldn't happen. First off, I'd say try uninstalling the package and then re-installing it. There might have been an error during the process. Another thing you should do is select "Install All Dependencies" when you do this (or install.packages(______, dependencies = TRUE)). Have you installed all of the package's relevant dependencies? Perhaps this library requires a different library which you don't have.

Resources