Does R travis work on large data packages? - r

I seem to be running into a memory allocation issue when using R travis-ci on an R package that depends on a 90 Mb data package (i.e., that's where it gets its data from):
* installing *source* package ‘my_package’ ...
** R
** data
*** moving datasets to lazyload DB
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
Error in system2(file.path(R.home("bin"), "R"), c(if (nzchar(arch)) paste0("--arch=", :
cannot popen ' '/home/travis/R-bin/lib/R/bin/R' --no-save --slave 2>&1 < '/tmp/RtmpGLG3uQ/file2f65432e469d'', probable reason 'Cannot allocate memory'
* removing ‘/home/travis/R/Library/my_package’
Warning in q("no", status = 1, runLast = FALSE) :
system call failed: Cannot allocate memory
Error: Command failed (1)
Execution halted
The command "./travis-tool.sh github_package my_github_handle/my_package" failed and exited with 1 during .
Your build has been stopped.
Is it because travis-ci doesn't work with large data packages like this, or is it some other issue?
Related posts: https://github.com/travis-ci/travis-ci/issues/5713, https://github.com/travis-ci/travis-ci/issues/3656
Here is my travis.yml file
language: r
cache: packages
warnings_are_errors: true
sudo: required
before_install:
- curl -OL http://raw.github.com/craigcitro/r-travis/master/scripts/travis-tool.sh
- chmod 755 ./travis-tool.sh
install:
- ./travis-tool.sh aptget_install r-cran-xml
- ./travis-tool.sh install_github hadley/devtools
- ./travis-tool.sh install_deps
- ./travis-tool.sh github_package my_github_handle/my_package
r_github_packages:
- my_github_handle/my_package
Note that both of my R packages (both the main R package and the data package it requires) are both on GitHub.

The later half of the travis.yml is not needed. Use:
language: r
cache: packages
warnings_are_errors: true
sudo: false
For other package dependencies use devtools' Remote: keyword in DESCRIPTION to specify repositories or opt to create your own repository and use it (Disclaimer: I wrote this article).
Travis images under this setup are restricted to having 4 GB of RAM. For more information on VM builds information see:
https://docs.travis-ci.com/user/ci-environment/#Virtualization-environments
This performs well with large data packages.

Related

R packrat snapshot: upgrading package gives "stale" errors

I'm using packrat to freeze all versions of dependencies for an application. Sometimes I run into troubles with "staleness".
For instance, today I upgraded one package to a newer version. I did this by launching R in the packrat-managed project:
% R --quiet
Packrat mode on. Using library in directory:
- "~/git/myapp/app/packrat/lib"
> install.packages('MyPackage')
Installing package into ‘/Users/kwilliams/git/myapp/app/packrat/lib/x86_64-apple-darwin17.7.0/3.5.3’
(as ‘lib’ is unspecified)
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 3537k 100 3537k 0 0 5530k 0 --:--:-- --:--:-- --:--:-- 5527k
* installing *source* package ‘MyPackage’ ...
** R
** data
*** moving datasets to lazyload DB
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (MyPackage)
The downloaded source packages are in
‘/private/var/folders/zp/hj5hqfw970z0_78mrb_802lm0001z9/T/RtmpzfYDUz/downloaded_packages’
However, when I try to generate a new snapshot file, nothing happens:
> packrat::snapshot()
Snapshot written to '/Users/kwilliams/git/myapp/app/packrat/packrat.lock'
(The file is no different than before - the old version of MyPackage is still listed.)
I verified that the new version was indeed installed, and try the snapshot again:
> packageVersion('MyPackage')
[1] ‘7.4’
> packrat::snapshot()
The following packages are stale:
_
MyPackage 7.4
These packages must be updated by calling packrat::restore() before
snapshotting. If you are sure you want the installed versions of these
packages to be snapshotted, call packrat::snapshot() again with
ignore.stale=TRUE.
--
Snapshot operation was cancelled, no changes were made.
Huh? Not sure why the different results between the two times.
status() does seem to know the situation correctly:
> packrat::status()
The following packages are out of sync between packrat and your current library:
packrat library
MyPackage 7.3.1-22287 7.4
Use packrat::snapshot() to set packrat to use the current library, or use
packrat::restore() to reset the library to the last snapshot.
I figure I'll force it, so I add ignore.stale=TRUE:
> packrat::snapshot(ignore.stale=TRUE)
Upgrading these packages already present in packrat:
from to
MyPackage 7.3.1-22287 7.4
Fetching sources for MyPackage (7.4) ... FAILED
Error in snapshotSources(project, activeRepos(project), allRecordsFlat) :
Errors occurred when fetching source files:
Error in getSourceForPkgRecord(pkgRecord, sourceDir, availablePkgs, repos) :
Could not find sources for MyPackage (7.4).
Bummer. Might this have something to do with the fact that this is a locally-created package, installed from a local CRAN-alike? This would be a packrat bug, because (as noted above) install.packages() can find the source package just fine.
So I think there are two potential packrat bugs here:
Inability to snapshot the newly installed package
Inability to download source for the package
FWIW, I think the first problem is identical to the situation here: https://groups.google.com/forum/#!topic/packrat-discuss/HvD45u6w4Zg, in which Kevin Ushey (author/maintainer of packrat) says "it's possible that the logic around 'stale' packages can just go away."
Here are the workarounds I'm using to get back on my way:
As mentioned above, use ignore.stale=TRUE to force the snapshot even when it thinks things are stale.
Copy the source package manually to packrat/src/MyPackage/.
Now it succeeds:
> packrat::snapshot(ignore.stale=TRUE)
Upgrading these packages already present in packrat:
from to
MyPackage 7.3.1-22287 7.4
Snapshot written to '/Users/kwilliams/git/myapp/app/packrat/packrat.lock'
The packrat/packrat.lock file has been updated correctly:
% git diff
diff --git a/app/packrat/packrat.lock b/app/packrat/packrat.lock
index 6c17020..f717d29 100644
--- a/app/packrat/packrat.lock
+++ b/app/packrat/packrat.lock
## -30,9 +30,9 ## Hash: 9772da3bc51603a19a2b75f008fd63e3
Package: MyPackage
Source: source
-Version: 7.3.1-22287
+Version: 7.4
SourcePath: lib/MyPackage
-Hash: 4fe20417f5711b3c7c90a4efe3bb4bc7
+Hash: 880a308537e8de571106893e839386f6
...

Compile R package "arulesSequence" for older release

I want to use arulessequences for sequence mining. I have to use it in Oracle R distribution version R 3.3.0 (last released) and The problem is that the last version of the arulesSequences package is R >= 3.3.2. So I get an error for this problem:
Error: this is R 3.3.0, package arulesSequences needs >=3.3.2
So I decided to compile the source code for older release. I downloaded an older package that needs R 3.2.5 or above. And I know that this package is depended to arules. so I have installed it already. I used following instructions to compile the arulessequences package:
in the source directory I run this command:
R CMD build arulesSequences
the output of this command is:
c:\rr\arulesSequences_0.2-17>R CMD build arulesSequences
* checking for file 'arulesSequences/DESCRIPTION' ... OK
* preparing 'arulesSequences':
* checking DESCRIPTION meta-information ... OK
* cleaning src Warning in cleanup_pkg(pkgdir, Log) : unable to run 'make clean' in 'src'
* checking for LF line-endings in source and make files
* checking for empty or unneeded directories
* looking to see if a 'data/datalist' file should be added
* building 'arulesSequences_0.2-17.tar.gz'
a file named 'arulesSequences_0.2-17.tar.gz' get created but when I check it as below I get the following as output:
c:\rr\arulesSequences_0.2-17\arulesSequences>R CMD check arulesSequences
* using log directory 'c:/rr/arulesSequences_0.2-17/arulesSequences/arulesSequences.Rcheck'
using R version 3.4.0 (2017-04-21)
using platform: x86_64-w64-mingw32 (64-bit)
using session charset: ISO8859-1
checking for file 'arulesSequences/DESCRIPTION' ... OK
this is package 'arulesSequences' version '0.2-17'
checking package namespace information ... OK
checking package dependencies ... ERROR Package required but not available: 'arules'
See section 'The DESCRIPTION file' in the 'Writing R Extensions'
manual.
* DONE Status: 1 ERROR
I know the arules package is installed and I checked it. It seems the build process is not successful. do you have any idea to help solve this out?
You have to first install c/c++ compiler for R(called gcc) that is under R's additional build tools.
to do that, in RStudio goto File->New File ->c++ File.
It will show the following dialogue:
Then click on yes.
to compile a package under windows, you have to set repo to Null and type to source.
you can use this command to do that:
install.packages("SOURCEADDRESS",type="source",repo=null)
as #EugèneAdell mentioned above you have to first install arules. then arulessequences.
Instead of building, take the archive packages that seem to be ok for your R version and install them. On my Linux, this gives :
wget http://cran.univ-paris1.fr/src/contrib/Archive/arules/arules_1.5-0.tar.gz
R CMD INSTALL $HOME/arules_1.5-0.tar.gz
* installing to library ‘/home/ruser/R-3.2.5/lib64/R/library’
* installing *source* package ‘arules’ ...
...
** testing if installed package can be loaded
* DONE (arules)
wget http://cran.univ-paris1.fr/src/contrib/Archive/arulesSequences/arulesSequences_0.2-17.tar.gz
R CMD INSTALL $HOME/arulesSequences_0.2-17.tar.gz
* installing to library ‘/home/ruser/R-3.2.5/lib64/R/library’
* installing *source* package ‘arulesSequences’ ...
...
** testing if installed package can be loaded
* DONE (arulesSequences)
R
> library(arulesSequences)
Loading required package: arules
Loading required package: Matrix
Attaching package: ‘arules’
Maybe a more recent version for arules is possible, I just took the first one from the 1.5 series.

rjava dependent package installation Segmentation fault (core dumped)

I am trying to reinstall a package that I was previously able to install and use. I was building a package of my own after my computer unexpectedly restarted and then I started to have problems loading the rpgraph package. So I decided to uninstall it and to reinstall it. When I did so I got the following error:
library(devtools)
library(rJava)
install_github("Albluca/rpgraph")
Downloading GitHub repo Albluca/rpgraph#master
from URL https://api.github.com/repos/Albluca/rpgraph/zipball/master
Installing rpgraph
Running command /usr/lib/R/bin/R
Arguments:
CMD
INSTALL
/tmp/Rtmp5OrtLL/devtools505a703b3ccd/Albluca-rpgraph-de04f96
--library=/home/gonzalo/R/x86_64-pc-linux-gnu-library/3.4
--install-tests
installing source package ‘rpgraph’ ...
** R
** data
*** moving datasets to lazyload DB
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
Segmentation fault (core dumped)
ERROR: loading failed
removing ‘/home/gonzalo/R/x86_64-pc-linux-gnu-library/3.4/rpgraph’
Installation failed: run(bin, args = real_cmdargs, stdout_line_callback = real_callback(stdout), stderr_line_callback = real_callback(stderr), stdout_callback = real_block_callback, stderr_callback = real_block_callback, echo_cmd = echo, echo = show, spinner = spinner, error_on_status = fail_on_status, timeout = timeout) : System command error`
I tried reinstalling JAva and cran from zero but somehow there is something that remains in my system that avoids the package to be installed. Also, since the error is not very informative, I have no idea on how to narrow where the problem is.
Thanks for any help you can provide.
I experienced the same problem while installing the libraries vanneuler and wordnet and solved the issue using the solution proposed by Kenneth. In my case the option -Xss2560k was enough to solve:
export _JAVA_OPTIONS="-Xss2560k"
In addition: instead of running the export command from the terminal the java option can be set directly from the R session with the following command:
options(java.parameters = "-Xss2560k")
This seems to be a bug in recent kernel versions, the same problem happens with other R libraries that involve Java, and also other software.
See https://lists.ubuntu.com/archives/ubuntu-devel-discuss/2017-June/017507.html and https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1698919.
This workaround does the trick for me:
export _JAVA_OPTIONS="-Xss2560k -Xmx2g"

R can't find packages installed by travis

We're trying to add some unit tests to the caret package that get run by travis, but not on CRAN. This saves build time on CRAN and reduces the number of dependencies they have to install to check our package, while letting us run a more complete test suite on travis.
I thought I could simply install the requirements for the test using the r_packages: line in my travis.yml file:
r_packages:
- ROSE
- DMwR
However, my NOT_CRAN=TRUE builds are still failing. (NOT_CRAN=FALSE runs fine as the problematic tests are skipped)
This is really strange, as when I look at the build logs, I see travis successfully installing all the packages I need:
* installing *source* package ‘ROSE’ ...
** package ‘ROSE’ successfully unpacked and MD5 sums checked
** R
** data
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (ROSE)
But when my tests run that depend on those packages, R can't find them:
> library(testthat)
> library(caret)
Loading required package: lattice
Loading required package: ggplot2
>
> test_check("caret")
[1] "Reduced dimension to 3 by default. "
1 package is needed for this model and is not installed. (ROSE). Would you like to try to install it now?1. Error: check appropriate sampling calls by name -----------------------------
1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart("muffleMessage"),
warning = function(c) invokeRestart("muffleWarning"))
2: eval(code, new_test_environment)
3: eval(expr, envir, enclos)
4: caret:::parse_sampling(i) at test_sampling_options.R:14
5: checkInstall(pkgs)
6: stop()
testthat results ================================================================
OK: 62 SKIPPED: 0 FAILED: 1
1. Error: check appropriate sampling calls by name
Error: testthat unit tests failed
Execution halted
(I think) the relevant line of code is here in caret's source code:
good <- rep(TRUE, length(pkg))
for(i in seq(along = pkg)){
tested <- try(find.package(pkg[i]), silent = TRUE)
if(class(tested)[1] == "try-error") good[i] <- FALSE
}
Why can't the find.package function find packages installed by travis? Do they go in a special, separate library somewhere?
Also, as an aside, how do I make my travis builds for r less verbose? By default they seem to print way too much information (e.g. it echoes all code run by the tests and manual, even code that doesn't error).
When testing your package on Travis, R CMD check appears to be looking for installed packages in the wrong place(s).
I created a small test package to figure this out:
When testing the package on Travis using R CMD check, .libPaths() contains:
c("/tmp/RtmpnQE1WM/RLIBS_29bd3940b7fa",
"/usr/lib/R/library")
When testing the package on Travis using devtools::test(), .libPaths() contains:
c("/usr/local/lib/R/site-library",
"/usr/lib/R/site-library",
"/usr/lib/R/library")
By default, R packages on Travis are installed into /usr/local/lib/R/site-library (i.e. the first entry of .libPaths()). Clearly, R CMD check is looking in the wrong place(s).
In principle, we could use the --library argument for R CMD check to point to the right place, however when you use --as-cran then --library defaults to /usr/lib/R/library.
The easiest solution is probably to install all packages (in particular the "additional" packages ROSE and DMwR) into /usr/lib/R/library. There are many ways to do that. One solution is to add
sudo mkdir -p /usr/lib/R/library
echo 'R_LIBS=/usr/lib/R/library:/usr/lib/R/site-library/' > ~/.Renviron
sudo chmod 2777 /usr/lib/R/library
to the before_install section of your .travis.yml file.
You could clone the r-travis repo and just source from your copy. That would allow you to make it less verbose.
As to "packages not found": dunno. But the Travis instance is a vanilla Ubuntu installation so you can control things by echo'ing into a suitable ~/.Rprofile etc pp. I have found the old r-travis setup to be more convenient for me and recently blogged about one way to dramatically cut test times down by relying more on pre-built r-cran-* .deb packages.
Michael has well over 1000 in his repo, and you could build your own too via a PPA. Time permitting I may write another blog post detailing that...

Trouble installing an R package using R CMD install

I just ran the following command after running R CMD build pkg and R CMD check pkg and it completed without errors.
R CMD install -t /home/wdkrnls/R/x86_64-unknown-linux-gnu-library/3.1 pkg_0.1.0.tar.gz
However, I still can't use it via library(pkg) from R. Looking in the library directory, all I see is the tarball, no pkg directory. When I try and untar it and then load in R, I get the error:
Error in library(e2pa) : 'e2pa' is not a valid installed package
Alternatively, when I try to install with
R CMD install -l /home/wdkrnls/R/x86_64-unknown-linux-gnu-library/3.1 pkg_0.1.0.tar.gz
It tells me -l is an invalid option.
Another failed possibility:
R CMD install -t /home/wdkrnls/R/x86_64-unknown-linux-gnu-library/3.1/pkg pkg_0.1.0.tar.gz
install: accessing `/home/wdkrnls/R/x86_64-unknown-linux-gnu-library/3.1/pkg': No such file or directory
What is the right way to install a package into a personal library in R?
Unix commands are case-sensitive and
R CMD install ....
as you typed invokes a different /usr/bin/install than the R-internal script INSTALL which the actually mandated form
R CMD INSTALL ...
uses. See all the relevant docs -- it is always UPPERCASE.
Once you have the correct script, -l ... is recognised:
edd#max:~$ R CMD INSTALL -l /tmp/demo git/drat_0.0.2.4.tar.gz
* installing *source* package ‘drat’ ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (drat)
edd#max:~$

Resources