Getting call hierarchy in gperftools for an R package - r

I can use gperftools to produce a call graph, as in for instance this question.
Now I would like to get a call graph for bind_rows() in the dplyr R package in order to track down this bug.
I compiled both R and dplyr using CPP/CXXFLAGS=-g -fvar-tracking-assignments and LDFLAGS=-lprofiler -lunwind.
When I run the following:
CPUPROFILE="samples.log" R --vanilla <<< "library(dplyr)
ll = lapply(1:1e5, function(x) as.list(setNames(runif(5), letters[1:5])))
print(system.time(bind_rows(ll)))"
pprof --gif /usr/lib/R/bin/exec/R samples.log > out.gif
All I get is:
How can I get the call hierarchy so I know which call in dplyr's bind rows file is the bottleneck?
edit: It seems that the --focus option is what I need here. But how to connect this to RecursiveRelease?
pprof --focus=rbind__impl --gif /usr/lib/R/bin/exec/R samples.log > out.gif
edit: After recompiling Rcpp with -g and linking with -lprofiler, I could get the following: flame.svg, where 8% gets a good stack trace but most of it still doesn't. Could this be because some library is loaded without -lprofiler support?

Related

How can I switch virtualenv from reticulate in R?

In R package reticulate there is a function use_virtualenv but it does not look like I can call it twice with different virtualenvs, second call is always ignored.
Is there a way to deactivate first virtualenv so I can call use_virtualenv("venv2") with the expected behavior?
#initialize
require(reticulate)
virtualenv_create("venv1")
virtualenv_create("venv2")
#call first virtualenv
use_virtualenv("venv1")
py_config() #show venv1 specs
#call second vrtualenv
use_virtualenv("venv2")
py_config() # still show venv1 specs, I want venv2 here
I think unloadNamespace("reticulate") could work but in my case first call is made by another package...
In short: By restarting the R session! You can't switch virtualenv in reticulate once chosen!
I tried it (but chose "venv2" first).
> use_python("venv1", T)
Error in use_python("venv1", T) :
Specified version of python 'venv1' does not exist.
> use_python("~/.virtualenvs/venv1", T)
ERROR: The requested version of Python ('~/.virtualenvs/venv1') cannot
be used, as another version of Python
('/home/josephus/.virtualenvs/venv2/bin/python') has already been
initialized. Please restart the R session if you need to attach
reticulate to a different version of Python.
Error in use_python("~/.virtualenvs/venv1", T) :
failed to initialize requested version of Python
So reticulate messages, that one has to start a new session to choose new virtual environment.
This must apply to use_virtualenv(<xxx>, T) too, though it is not as verbose as use_python(<xxx>, T).

running all examples in r package

I am developing a package in Rstudio. Many of my examples need updating so I am going through each one. The only way to check the examples is by running devtools::check() but of course this runs all the checks and it takes a while.
Is there a way of just running the examples so I don't have to wait?
Try the following code to run all examples
devtools::run_examples()
You can also do this without devtools, admittedly it's a bit more circuitous.
package = "rgl"
# this gives a key-value mapping of the various `\alias{}`es
# in each Rd file to that file's canonical name
aliases <- readRDS(system.file("help", "aliases.rds", package=package))
# or sapply(unique(aliases), example, package=package, character.only=TRUE),
# but I think the for loop is superior in this case.
for (topic in unique(aliases)) example(topic, package=package, character.only = TRUE)

R: Patching a package function and reloading base libraries

Occasionally one wants to patch a function in a package, without recompiling the whole package.
For example, in Emacs ESS, the function install.packages() might get stuck if tcltk is not loaded. One might want to patch install.packages() in order to require tcltk before installation and unload it after the package setup.
A temp() patched version of install.packages() might be:
## Get original args without ending NULL
temp=rev(rev(deparse(args(install.packages)))[-1])
temp=paste(paste(temp, collapse="\n"),
## Add code to load tcltk
"{",
" wasloaded= 'package:tcltk' %in% search()",
" require(tcltk)",
## Add orginal body without braces
paste(rev(rev(deparse(body(install.packages))[-1])[-1]), collapse="\n"),
## Unload tcltk if it was not loaded before by user
" if(!wasloaded) detach('package:tcltk', unload=TRUE)",
"}\n",
sep="\n")
## Eval patched function
temp=eval(parse(text=temp))
# temp
Now we want to replace the original install.packages() and perhaps insert the code in Rprofile.
To this end it is worth nothing that:
getAnywhere("install.packages")
# A single object matching 'install.packages' was found
# It was found in the following places
# package:utils
# namespace:utils
# with value
#
# ... install.packages() source follows (quite lengthy)
That is, the function is stored inside the package/namespace of utils. This environment is sealed and therefore install.packages() should be unlocked before being replaced:
## Override original function
unlockBinding("install.packages", as.environment("package:utils"))
assign("install.packages", temp, envir=as.environment("package:utils"))
unlockBinding("install.packages", asNamespace("utils"))
assign("install.packages", temp, envir=asNamespace("utils"))
rm(temp)
Using getAnywhere() again, we get:
getAnywhere("install.packages")
# A single object matching 'install.packages' was found
# It was found in the following places
# package:utils
# namespace:utils
# with value
#
# ... the *new* install.packages() source follows
It seems that the patched function is placed in the right place.
Unfortunately, running it gives:
Error in install.packages(xxxxx) :
could not find function "getDependencies"
getDependencies() is a function inside the same utils package, but not exported; therefore it is not accessible outside its namespace.
Despite the output of getAnywhere("install.packages"), the patched install.packages() is still misplaced.
The problem is that we need to reload the utils library to obtain the desired effect, which also requires unloading other libraries importing it.
detach("package:stats", unload=TRUE)
detach("package:graphics", unload=TRUE)
detach("package:grDevices", unload=TRUE)
detach("package:utils", unload=TRUE)
library(utils)
install.packages() works now.
Of course, we need to reload the other libraries too. Given the dependencies, using
library(stats)
should reload everything. But there is a problem when reloading the graphics library, at least on Windows:
library(graphics)
# Error in FUN(X[[i]], ...) :
# no such symbol C_contour in package path/to/library/graphics/libs/x64/graphics.dll
Which is the correct way of (re)loading the graphics library?
Patching functions in packages is a low-level operation that should be avoided, because it may break internal assumptions of the execution environment and lead to unpredictable behavior/crashes. If there is a problem with tck/ESS (I didn't try to repeat that) perhaps it should be fixed or there may be a workaround. Particularly changing locked bindings is something to avoid.
If you really wanted to run some code at the start/end of say install.packages, you can use trace. It will do some of the low-level operations mentioned in the question, but the good part is you don't have to worry about fixing this whenever some new internals of R change.
trace(install.packages,
tracer=quote(cat("Starting install.packages\n")),
exit=quote(cat("Ending install packages.\n"))
)
Replace tracer and exit accordingly - maybe exit is not needed anyway, maybe you don't need to unload the package. Still, trace is a very useful tool for debugging.
I am not sure if that will solve your problem - if it would work with ESS - but in general you can also wrap install.packages in a function you define say in your workspace:
install.packages <- function(...) {
cat("Entry.\n")
on.exit(cat("Exit.\n"))
utils::install.packages(...)
}
This is the cleanest option indeed.

parallel::clusterExport - non exported library/package function

similar to this question
But how do I clusterExport a package's non-exported function to a cluster? For some reason this passed my tests before submitting to CRAN but isn't working in production. Obviously, I want to fix and resubmit to CRAN.
library(imputeMulti)
library(parallel)
imputeMulti:::count_compare # function to be exported
nnodes <- 2L
cl <- parallel::makeCluster(nnodes)
parallel::clusterExport(cl, varlist= c("count_compare")) # fails -- but initially passed tests
parallel::clusterExport(cl, varlist= c("count_compare"), envir= as.environment("package:imputeMulti")) # also fails
I'm using cluster export to avoid the CRAN/R CMD check note "use of ::: in package". Obviously, I could export count_compare, but that's not a desirable choice.
Any help appreciated!
adding tests information:
devtools::test("imputeMulti", "count_levels")
Loading imputeMulti
Testing imputeMulti
int- count_levels works: ...............................
DONE ===========================================================================================================================================
You can use an equivalent call to clusterCall to do this.
parallel::clusterCall(cl, assign, "count_compare", count_compare, envir = .GlobalEnv)
See the definition of clusterExport to verify this is doing the same thing.
Based on my tests and work, I do not see away to use parallel::clusterExport on a non-exported library function.
The following works, but results in 1 R CMD check note:
R CMD check results
0 errors | 0 warnings | 1 note
checking dependencies in R code ... NOTE
There are ::: calls to the package's namespace in its code. A package
almost never needs to use ::: for its own objects:
'count_compare'
count_compare <- imputeMulti:::count_compare
parallel::clusterExport(cl, varlist= c("count_compare"), envir= 1)
Perhaps one of the developers of library(testthat) can provide a solution / update regarding the problem with the testthat framework not catching this. Based off of Hadley's R-Journal article (pg7 using journal numbering), I'm guessing this has to do with how environments are used for testing. But, that's just a guess.
Note: Hadley has confirmed this is the reason for the testthat miss via email correspondence.

Problems building an R package (NAs introduced by coercion)

I have written the Boggler package which includes a Play.Boggle() function that calls, on line 87, a progress bar script using shell:
shell(cmd = sprintf('Rscript.exe R/progress_bar.R "%i"', time.limit + 1), wait=FALSE)
Everything works fine when sourcing the files individually and then calling the main Play.Boggle() function, but when I try to check/build the package (under Win7-64 using RStudio), I get a failure message -- here's what the 00install.out reports:
** preparing package for lazy loading
Warning in eval(expr, envir, enclos) : NAs introduced by coercion
Error in time.limit:0 : NA/NaN argument
To make sure the argument "%i" (time.limit + 1) was correctly passed to the progress_bar.R, I added a cat(time.limit) to the script (commenting the rest out to make sure the package would build without any errors) and directed its output to a log file like this:
'Rscript.exe R/progress_bar.R "%i" > out.log'
Conclusion: the time limit is indeed passed along as expected. So I can't figure out why I get this "NA/NaN argument" error message. It must have something to do with lazy loading, concept that I haven't fully got my head around yet.
So my question is: what can I do to successfully check/build this package with full functionality (including progress_bar.R)?
Note: On github, the progress_bar.R script is there but all its content is commented out so that the package can successfully be installed. The shell(...) function call is still active, doing nothing but executing an empty script.
So the problem arises when trying to build or check, in which case all R scripts are executed, as pointed out by Roland. A simple workaround allows the package to check/build without any problems. The fix is just to add to the progress_bar.R the following lines after it tries to recuperate the commandargs (lines 10-11):
if(time.limit %in% c(NA, NaN))
time.limit <- 10 # or any minimal number
There's surely other ways to go about this. But this being a game programmed for fun, I'll happily go with that patch. Hopefully this can be of help to someone down the road and I won't have wasted 50 precious rep points in vain for that bounty! :D

Resources