Config/reticulate not setting up environment - ModuleNotFoundError: No module named 'pandas' - r

I am trying to build an R package that wraps an internal python module; however, it seems to not be properly installing the dependencies like I would expect. According to the documentation, I should be able to define the Config/reticulate field in the DESCRIPTION file and it will handle setting up the python environment; however, it doesn't seem that's happening.
Reticulate dependency vignette
This is somewhat of a migration project, so currently we're writing R objects to file, then using system(command) to run the python code before reading the results back into R. There are reasons it's done this way, although the plan is to leverage more of the reticulate tools to reduce read/write, I just don't have the time to make those changes right now.
I've replaced system with py_run_file() without success and py_module_available('pandas') returns false after loading the package. If I initialize reticulate in the directory, this resolves the problem, but that doesn't solve my package distribution problem (internal package).
I guess the question is:
How do I ensure reticulate::configure_environment() actually runs when distributing a package?
Below is a rough reproduceable example, making the changes to a RStudio new package template should successfully not work.
/R/hello.R:
RunDemoPy <- function(){
pyScript = system.file("python/pyDemo.py", package = "pyTestpkg")
reticulate::py_run_file(pyScript)
}
DESCRIPTION FIELD:
Config/reticulate:
list(
packages = list(
list(package = "pandas")
)
)
Imports:
reticulate
R/zzz.R
.onLoad <- function(libname, pkgname){
packageStartupMessage("On Load - config environment")
reticulate::configure_environment(pkgname)
}
.onUnload <- function(libname){
}
.onAttach <- function(libname, pkgname){
}
inst/python/pyDemo.py:
import os
import pandas as pd
pdf = pd.DataFrame()
pdf.to_csv("demo.csv")
print("Dataframe done")

Related

How to import sf to package to run a function that depends on lwgeom?

I'm building a package that imports {sf}, and more specifically I use st_length() in one of my functions.
I initially added only {sf} to my package "Imports", but when I checked it I got a few {lwgeom} related errors:
Running examples in 'gtfstools-Ex.R' failed
The error most likely occurred in:
> base::assign(".ptime", proc.time(), pos = "CheckExEnv")
> ### Name: get_trip_speed
> ### Title: Get trip speed
> ### Aliases: get_trip_speed
>
> ### ** Examples
>
> data_path <- system.file("extdata/spo_gtfs.zip", package = "gtfstools")
>
> gtfs <- read_gtfs(data_path)
>
> trip_speed <- get_trip_speed(gtfs)
Error in sf::st_length(trips_geometries) :
package lwgeom required, please install it first
This error happens when the examples are running, but some similar errors happen with the tests.
Then I added {lwgeom} to Imports. The check runs fine, but in the end I get a note: NOTE: Namespaces in Imports field not imported from: 'lwgeom'
What's the best practice when dealing with cases like this? Should I just keep track of this note and send it as a comment to CRAN during the package submission process?
You can consider adding the {lwgeom} package in Suggests field of your package DESCRIPTION file. It should do the trick.
The Suggests != Depends article by Dirk Eddelbuettel refers to a relevant bit of Writing R Extensions (WRE) that might be useful to this case.
Section 1.1.3.1 (suggested packages) reads (as of 2021-03-12):
Note that someone wanting to run the examples/tests/vignettes may not have a suggested package available (and it may not even be possible to install it for that platform). The recommendation used to be to make their use conditional via if(require("pkgname")): this is OK if that conditioning is done in examples/tests/vignettes, although using if(requireNamespace("pkgname")) is preferred, if possible.
However, using require for conditioning in package code is not good practice as it alters the search path for the rest of the session and relies on functions in that package not being masked by other require or library calls. It is better practice to use code like
if (requireNamespace("rgl", quietly = TRUE)) {
rgl::plot3d(...)
} else {
## do something else not involving rgl.
}
So while just adding {lwgeom} to Suggests works, we may stumble upon the issue where someone that runs a "lean installation" (i.e. without suggested packages) of my package won't be able to use the functions that rely on {lwgeom}.
More importantly, if an author of a package that I am importing decides to run a reverse dependency check on my package while not installing suggested packages, the check would fail because I'd have a few examples, tests and vignettes bits failing due to not having {lwgeom} available.
Thus, in addition to listing it in Suggests, I added some checks on examples and vignettes like suggested by WRE:
*examples/vignette context*
# the examples below require the 'lwgeom' package to be installed
if (requireNamespace("lwgeom", quietly = TRUE)) {
... do something ...
}
In the functions that require {lwgeom} I added:
if (!requireNamespace("lwgeom", quietly = TRUE))
stop(
"The 'lwgeom' package is required to run this function. ",
"Please install it first."
)
And added this bit to the tests of such functions (using {testthat}):
if (!requireNamespace("lwgeom", quietly = TRUE)) {
expect_error(
set_trip_speed(gtfs, "CPTM L07-0", 50),
regexp = paste0(
"The \\'lwgeom\\' package is required to run this function\\. ",
"Please install it first\\."
)
)
skip("'lwgeom' package required to run set_trip_speed() tests.")
}

Extended R package can't correctly communicate with its 'parent' R package

I am trying to build a package that extends another package. However at its most basic level I am doing something wrong. I build a simple example that presents the same issue:
I have two packages, packageA and packageB. packageA has a single R file in the R folder that reads:
local.env.A <- new.env()
setVal <- function()
{
local.env.A$test <- 1
}
getVal <- function()
{
if(!exists("test", envir = local.env.A)) stop("test does not exist")
return(local.env.A$test)
}
For packageB I have the following single R file in the R folder:
# refers to package A
setVal()
getValinA <- function()
{
return(getVal())
}
I want both packageA and packageB to be available for end users, therefore I set packageB to depend on packageA (in the description file). When packageB is loaded, e.g. by means of library(packageB) I expect it to run setVal() and thus set the test value. However, if I next try to get the value that was set by means of getValinA(), it throws me the stop:
> library(packageB)
Loading required package: PackageA
> getValinA()
Error in getVal() : test does not exist
I am pretty sure it is related to environments, but I am not sure how. Please help!
With thanks to #Roland. The answer was very simple. I was under the impression (assumptions assumptions assumptions!) that when you perform library(packageB) it would load all the actions within it, in my case perform the setVal() function. This is however not the case. If you wish this function to be performed you need to place this within the function .onLoad:
.onLoad <- function(libname, pkgname)
{
setVal()
}
By convention you place this .onload function in an R file called zzz.R. Reason being that if you do not specifically collate your R scripts it will load alphabetically, and it makes sense to perform your actions when at least all the functions in your package are loaded.

RUnit: could not find function "checkEquals"

I am creating an R package with the standard directory hierarchy. Inside the R directory, I create a test subdirectory.
In the R directory, I create a uTest.R file containing:
uTest <- function() {
test.suite <- defineTestSuite('test',
dirs = file.path('R/test'))
test.result <- runTestSuite(test.suite)
printTextProtocol(test.result)
}
In the R/test directory, I create a runit.test.R file containing:
test.validDim <- function() {
testFile <- "test/mat.csv"
generateDummyData(testFile,
10,
10)
checkEquals(validDim(testFile), TRUE)
}
I build and install my package using R CMD INSTALL --no-multiarch --with-keep.source RMixtComp in Rstudio. When I try to launch the function uTest(), I get this error message:
1 Test Suite :
test - 1 test function, 1 error, 0 failures
ERROR in test.validDim: Error in func() : could not find function "checkEquals"
However, if I call library(RUnit) prior to calling uTest(), everything works fine. In the import field of the DESCRIPTION file, I added RUnit, and in the NAMESPACE file I added import(RUnit).
How can I call uTest() directly after loading my package, without manually loading RUnit ?
You should not add RUnit to the Depends (or Imports) field in the DESCRIPTION file (despite the comment to the contrary). Doing so implies that the RUnit package is necessary in order to use your package, which is likely not the case. In other words, putting RUnit in Depends or Imports implies RUnit needs to be installed (Imports) and on the users' search path (Depends) in order for them to use your package.
You should add RUnit to the Suggests field in the DESCRIPTION file, then modify your uTest function as below:
uTest <- function() {
stopifnot(requireNamespace("RUnit"))
test.suite <- RUnit::defineTestSuite('test', dirs = file.path('R/test'))
test.result <- RUnit::runTestSuite(test.suite)
RUnit::printTextProtocol(test.result)
}
Doing this allows you to use RUnit for your tests, but does not require users to have RUnit installed (and possibly on their search path) in order to use your package. Obviously, they'll need RUnit if they wish to run your tests.

R function call without loading package

I want to use functions from the Bioconductor packages hypergraph and hyperdraw without loading the packages. When running an example from the hyperdraw vignette
dh1 <- hypergraph::DirectedHyperedge("A", "B", "R1")
dh2 <- hypergraph::DirectedHyperedge(c("A", "B"), c("C", "D"), "R2")
hg <- hypergraph::Hypergraph(LETTERS[1:5], list(dh1, dh2))
hgbph <- hyperdraw::graphBPH(hg)
I get the error:
Error in hyperdraw::graphBPH(hg) : could not find function "hyperedges"
If I try to load hyperedges:
hyperedges <- hyperdraw:::hyperedges
I get the error
Error in get(name, envir = asNamespace(pkg), inherits = FALSE) :
object 'hyperedges' not found
When I load both packages using library or require, I get no error (in running the above code without hypergraph:: and hyperdraw::).
The reason why I do not want to load the packages is because I am building a package which uses hyperdraw and hypergraph in only one function and I'd rather put these packages into Suggests than into Depends in my DESCRPTION file.
Does anyone have an idea how to solve this?
hyperdraw has this in it's DESCRIPTION file
Depends: R (>= 2.9.0), methods, grid, graph, hypergraph, Rgraphviz
and it's relying on finding hypergraph::hyperedges on the search() path. Personally, I think hyperdraw should include a line
importFrom(hypergraph, hyperedges)
in it's NAMESPACE file. Currently, the best thing to do is to add Depends: hyperdraw to your DESCRIPTION file, and to importFrom(hyperdraw, <whatever functions you need>). I have contacted the maintainer of hyperdraw to ask them to update the NAMESPACE as above; you could then merely Imports: hyperdraw. I think you're just making work for yourself and frustrating your users by trying to use Suggests or other approaches to subvert the need for formal dependencies.

rJava classpath in an R package... works on some systems... not others

I have built a package for R that wraps R around some Java classes. On my development laptop (Ubuntu) this package loads properly and works great. On two other machines (one Ubuntu, one Debian) I have tried to use this package and the classpath is not being set by the .jpackage() call.
All three machines are running R 2.12.1 and rJava .8-8 which I believe to be the most recent.
The entire package is up at Google Code but here's the contents of the zzz.R file which works to set the class path on one machine but not others:
##' #import rJava
.onLoad <- function(lib, pkg) {
pathToSdk <- paste(system.file(package = "GSRadR") , "/gsrad_sample/lib/", sep="")
jarPaths <- c(paste(pathToSdk, "clima_core-1.0.0.jar", sep=""),
paste(pathToSdk, "clima_GSRAD-1.0.0.jar", sep=""),
paste(pathToSdk, "colt-1.0.jar", sep=""),
paste(pathToSdk, "commons-lang-2.0.jar", sep=""),
paste(pathToSdk, "junit-3.8.1.jar", sep=""),
paste(pathToSdk, "log4j-1.2.8.jar", sep=""),
paste(pathToSdk, "xqore.jar", sep="")
)
.jpackage(pkg, morePaths=jarPaths)
attach( javaImport( c("java.lang", "java.io")))
packageStartupMessage( paste( "GSRadR loaded. The classpath is: ", paste(.jclassPath(), collapse=" " ) ) )
}
On my laptop this returns the following:
> require(GSRadR)
Loading required package: GSRadR
Loading required package: rJava
GSRadR loaded. The classpath is: /home/jal/R/library/rJava/java /home/jal/R/library/GSRadR/gsrad_sample/lib/clima_core-1.0.0.jar /home/jal/R/library/GSRadR/gsrad_sample/lib/clima_GSRAD-1.0.0.jar /home/jal/R/library/GSRadR/gsrad_sample/lib/colt-1.0.jar /home/jal/R/library/GSRadR/gsrad_sample/lib/commons-lang-2.0.jar /home/jal/R/library/GSRadR/gsrad_sample/lib/junit-3.8.1.jar /home/jal/R/library/GSRadR/gsrad_sample/lib/log4j-1.2.8.jar /home/jal/R/library/GSRadR/gsrad_sample/lib/xqore.jar
But on my other machines it returns only:
> require(GSRadR)
Loading required package: GSRadR
Loading required package: rJava
GSRadR loaded. The classpath is: /usr/lib/R/site-library/rJava/java
Any tips on what might cause the .jpackage() call to work differently on different machines? I've built packages using rJava before and used the same template for the .onLoad() function with no problems.
Edit
So on one of the machines where this was not working, I tried to simply add a path to the class path the "non package" way. And that failed:
> .jaddClassPath("/home/jal/R/x86_64-pc-linux-gnu-library/2.12/GSRadR/gsrad_sample/lib/clima_core-1.0.0.jar")
> .jclassPath()
[1] "/usr/lib/R/site-library/rJava/java"
Um... so I can't add anything to the class path. But why?
Edit II
When I was loading my custom library onto one of the machines that was not working, I was using a temporary library location, like so:
install.packages("/tmp/GSRadR_0.01.tar.gz", lib=/my/path)
then loading the library like this:
require(GARadR, lib=/my/path)
I discovered, through trial and error, that if I remove the lib= bit it would work properly. So why would loading an R package that uses rJava into a custom library location keep the .jaddClassPath() function from working?
I may be able to work around this, but I'd love to know what's causing this odd (at least to me) behavior.
I suspect that the directory or file in the first edit doesn't exist: /home/jal/R/x86_64-pc-linux-gnu-library/2.12/GSRadR/gsrad_sample/lib/clima_core-1.0.0.jar. (Also, are you sure that you want to add that particular file, or the directory?)
Try file.info("/home/jal/R/x86_64-pc-linux-gnu-library/2.12/GSRadR/gsrad_sample/lib/clima_core-1.0.0.jar").
In my case, I tried .jaddClassPath("/willy/wonka") and it didn't work. But when I tried .jaddClassPath("/home/voldemort"), it worked. (Let Java be your horcrux.)

Resources