Output to pdf not working with ReferenceClasses methods in R? - r

Output to pdf not working with ReferenceClasses methods in R?
This is an example taken from the ReferenceClasses R doc, with some minor
modification:
mEdit = setRefClass("mEdit", fields = list(data="matrix", edits="list"))
mEdit$methods(
edit = function(i, j, value) {
backup = list(i, j, data[i, j])
data[i, j] <<- value
edits <<- c(edits, list(backup))
invisible(value)
}
)
mEdit$methods(
undo = function() {
prev = edits
if(length(prev)) {
prev = prev[[length(prev)]]
}
else {
stop("No more edits to undo!")
}
edit(prev[[1]], prev[[2]], prev[[3]])
length(edits) <<- length(edits) - 2
invisible(prev)
}
)
mEdit$methods(
show = function() {
message("ClassName: ", classLabel(class(.self)))
message("Data:")
methods::show(data)
message("Undo list length: ", length(edits))
}
)
mEdit$methods(
.DollarNames.mEdit = function(x, pattern) {
grep(pattern, getRefClass(class(x))$methods(), value=TRUE)
}
)
x = matrix(1:24, 3, 8)
xx = mEdit(data=x)
xx$edit(2,2,0)
xx$show()
xx$edit(3, 5, 1)
xx$show()
xx$undo()
xx$show()
mv = setRefClass(
"matrixViewer",
fields=c("viewerDevice", "viewerFile"),
contains="mEdit"
)
mv$methods(
.DollarNames.mEdit = function(x, pattern) {
grep(pattern, getRefClass(class(x))$methods(), value=TRUE)
}
)
mv$methods(
view = function() {
## dd = dev.cur();
## dev.set(viewerDevice)
## devAskNewPage(FALSE)
image(
data,
main=paste("After", length(edits), "edits")
)
## dev.set(dd)
}
)
mv$methods(
edit = function(i,j, value) {
callSuper(i,j, value)
view()
}
)
mv$methods(
initialize = function(file="./mv.pdf", ...) {
viewerFile <<- file
## pdf(viewerFile)
## viewerDevice <<- dev.cur()
## dev.set(dev.prev())
callSuper(...)
}
)
mv$methods(
finalize = function() {
dev.off(viewerDevice)
}
)
x = matrix(rnorm(64, 0, 34), 8, 8)
xx = mv(file="/tmp/x.pdf", data=x)
xx$edit(2,2,0)
xx$edit(3, 5, 1)
xx$edit(4, 4, 2.3)
xx$undo()
xx$view()
Note that I have commented out those lines concerning switch
of output devices, so it uses the default device all through,
otherwise when the view method
is called, the plot is not written to the pdf file at all.
Any idea why this is happening?

Call rm on xx and then call garbage collection. finalize will then be called which will invoke dev.off and the pdf will be written. This assumes everything is uncommented.
rm(xx)
gc()
Also your .DollarNames should be
.DollarNames.mEdit = function(x, pattern) {
grep(pattern, getRefClass(class(x))$methods(), value=TRUE)
}
.DollarNames.matrixViewer = function(x, pattern) {
grep(pattern, getRefClass(class(x))$methods(), value=TRUE)
}
and are not methods of the Reference class. They are external functions seperate to the Reference classes.
So the main takeaway here is that finalize is not called until the object is garbage collected.

Related

Inside R6 class definition: 'object not found' (or: how to define 'local' objects in R6 classes)

I want to define an R6 class that sets up, updates and closes a progress bar. For these 3 tasks, I have 3 functions. The first, setup_progressbar(), calls R's txtProgressbar() which returns an object (say, pb) which needs to be passed on to the second and third functions, update_progressbar() and close_progressbar(). But the object pb is not found by the latter two functions.
library(R6)
myprogressbar <- R6Class("my_progress_bar",
public = list(
n = numeric(1),
initialize = function(n) {
stopifnot(n >= 1)
self$n <- n
},
setup_progressbar = function() {
pb <- txtProgressBar(max = self$n)
},
update_progressbar = function(i) {
setTxtProgressBar(pb, i)
},
close_progressbar = function () {
close(pb)
cat("\n")
}
))
mypb <- myprogressbar$new(10)
mypb$setup_progressbar()
mypb$update_progressbar(3) # Error in setTxtProgressBar(pb, i) : object 'pb' not found
I tried to add pb to self in the hope it would be found, but then I obtain "cannot add bindings to a locked environment".
Note: In my actual (non-minimal) example, the i is found/provided/visible, so that's not an additional problem (most likely this is just a problem in the above minimal working example once fixed beyond the 'pb' not found error).
The following works:
library(R6)
myprogressbar <- R6Class("my_progress_bar",
public = list(
n = numeric(1),
pb = NULL, # provide as argument
initialize = function(n, pb = NULL) { # provide with default so that $new() doesn't require 'pb'
stopifnot(n >= 1)
self$n <- n
},
setup_progressbar = function() {
self$pb <- txtProgressBar(max = self$n)
},
update_progressbar = function(i) {
setTxtProgressBar(self$pb, i)
},
close_progressbar = function () {
close(self$pb)
cat("\n")
}
))
mypb <- myprogressbar$new(10)
mypb$setup_progressbar()
mypb$update_progressbar(3)

How to best combine unique and match in R?

I found myself often writing code such as
#' #param x input vector
#' #param ... passed to [slow_fun()]
fast_fun <- function(x, ...) {
u <- unique(x)
i <- match(x, u)
v <- slow_fun(u, ...)
v[i]
}
To accelerate a slow vectorized "pure" function where each input entry could theoretically be computed individually and where input is expected to contain many duplicates.
Now I wonder whether this is the best way to achieve such a speedup or is there some function (preferrably in base R or the tidyverse) which does something like unique and match at the same time?
Benchmarks so far
Thanks for the provided answers. I've written a small benchmark suite to compare the approaches:
method <- list(
brute = slow_fun,
unique_match = function(x, ...) {
u <- unique(x)
i <- match(x, u)
v <- slow_fun(u, ...)
v[i]
},
unique_factor = function(x, ...) {
if (is.character(x)) {
x <- factor(x)
i <- as.integer(x)
u <- levels(x)
} else {
u <- unique(x)
i <- as.integer(factor(x, levels = u))
}
v <- slow_fun(u, ...)
v[i]
},
unique_match_df = function(x, ...) {
u <- unique(x)
i <- if (is.numeric(x)) {
match(data.frame(t(round(x, 10))), data.frame(t(round(u, 10))))
} else {
match(data.frame(t(x)), data.frame(t(u)))
}
v <- slow_fun(u, ...)
v[i]
},
rcpp_uniquify = function(x, ...) {
iu <- uniquify(x)
v <- slow_fun(iu[["u"]], ...)
v[iu[["i"]]]
}
)
exprs <- lapply(method, function(fun) substitute(fun(x), list(fun = fun)))
settings$bench <- lapply(seq_len(nrow(settings)), function(i) {
cat("\rBenchmark ", i, " / ", nrow(settings), sep = "")
x <- switch(
settings$type[i],
integer = sample.int(
n = settings$n_distinct[i],
size = settings$n_total[i],
replace = TRUE
),
double = sample(
x = runif(n = settings$n_distinct[i]),
size = settings$n_total[i],
replace = TRUE
),
character = sample(
x = stringi::stri_rand_strings(
n = settings$n_distinct[i],
length = 20L
),
size = settings$n_total[i],
replace = TRUE
)
)
microbenchmark::microbenchmark(
list = exprs
)
})
library(tidyverse)
settings %>%
mutate(
bench = map(bench, summary)
) %>%
unnest(bench) %>%
group_by(n_distinct, n_total, type) %>%
mutate(score = median / min(median)) %>%
group_by(expr) %>%
summarise(mean_score = mean(score)) %>%
arrange(mean_score)
Currently, the rcpp-based approach is best in all tested settings on my machine but barely manages to exceed the unique-then-match method.
I suspect a greater advantage in performance the longer x becomes, because unique-then-match needs two passes over the data while uniquify() only needs one pass.
|expr | mean_score|
|:---------------|----------:|
|rcpp_uniquify | 1.018550|
|unique_match | 1.027154|
|unique_factor | 5.024102|
|unique_match_df | 36.613970|
|brute | 45.106015|
Maybe you can try factor + as.integer like below
as.integer(factor(x))
I found a cool, and fast, answer recently,
match(data.frame(t(x)), data.frame(t(y)))
As always, beware when working with floats. I recommend something like
match(data.frame(t(round(x,10))), data.frame(t(round(y))))
in such cases.
I've finally managed to beat unique() and match() using Rcpp to hand-code the algorithm in C++ using a std::unordered_map as core bookkeeping data structure.
Here is the source code, which can be used in R by writing it into a file and running Rcpp::sourceCpp on it.
#include <Rcpp.h>
using namespace Rcpp;
template <int T>
List uniquify_impl(Vector<T> x) {
IntegerVector idxes(x.length());
typedef typename Rcpp::traits::storage_type<T>::type storage_t;
std::unordered_map<storage_t, int> unique_map;
int n_unique = 0;
// 1. Pass through x once
for (int i = 0; i < x.length(); i++) {
storage_t curr = x[i];
int idx = unique_map[curr];
if (idx == 0) {
unique_map[curr] = ++n_unique;
idx = n_unique;
}
idxes[i] = idx;
}
// 2. Sort unique_map by its key
Vector<T> uniques(unique_map.size());
for (auto &pair : unique_map) {
uniques[pair.second - 1] = pair.first;
}
return List::create(
_["u"] = uniques,
_["i"] = idxes
);
}
// [[Rcpp::export]]
List uniquify(RObject x) {
switch (TYPEOF(x)) {
case INTSXP: {
return uniquify_impl(as<IntegerVector>(x));
}
case REALSXP: {
return uniquify_impl(as<NumericVector>(x));
}
case STRSXP: {
return uniquify_impl(as<CharacterVector>(x));
}
default: {
warning(
"Invalid SEXPTYPE %d (%s).\n",
TYPEOF(x), type2name(x)
);
return R_NilValue;
}
}
}

R: Values saved into a list within %dopar% / foreach are not available downstream in global environment

I am trying to run the following code in parallel using dopar / foreach , but I can't figure out how to actually save the values into the list and have them appear in the global environment further down in the script.
I have that first line of code to initialize the seurat.object list. I am importing the list into the foreach. and assigning a new value to each of the list's elements in there too, using <<-, which should mean it will be saved into the global environment. Why is the updated seurat.objects list not preserved outside of the foreach?
1a. Scale only (without nUMI regression):
1b. Scale with nUMI regression and store in a new object:
seurat.objects <- list(scaled=NULL, scaled.regressed=NULL)
registerDoFuture()
cl <- makeCluster(2, outfile="")
plan(cluster, workers = cl)
result <- foreach(object=names(seurat.objects),
.export = ls(.GlobalEnv)) %dopar% {
selectObject(object)
if( ! file.exists(object.path)) {
if(object == "scaled") {
assign('seurat.objects[["scaled"]]', ScaleData(seurat.object,
do.scale = T, do.center = T, display.progress = F))
}
if(object == "scaled.regressed") {
assign('seurat.objects[["scaled.regressed"]]',
ScaleData(seurat.object,
vars.to.regress = "nUMI",
do.scale = T, do.center = T, display.progress = F))
}
saveRDS(seurat.objects[[object]], file=object.path)
} else { # Found scaled .Rds
x <- readRDS(object.path)
seurat.objects[[object]] <<- x
rm(x)
}
}
stopCluster(cl)
The selectObject function is defined before the above code, as follows:
selectObject <- function(object) {
if(object == "scaled") {
scaling <<- "_scaleOnly"
pca.result <<- "pca.scaled"
object.path <<- path.scaled.object
pca.result.path <<- paste0(clustering.path, "2_pca/pcaObject_",
age, scaling, ".Rds")
}
if(object == "scaled.regressed") {
scaling <<- "_scale_nUMIregress"
pca.result <<- "pca.scaled.regressed"
object.path <<- path.scaled.regressed.object
pca.result.path <<- paste0(clustering.path, "2_pca/pcaObject_",
age, scaling, ".Rds")
}
}
When I try to inspect the contents of seurat.objects, the list in which the data should have been stored, I get:
> seurat.objects
$scaled
NULL
$scaled.regressed
NULL

R lazy evaluation paradox (R bug?)

I have multiple functions handing around arguments that may be missing.
e.g. i have
mainfunction <- function(somearg) {
mytest(somearg)
fun <- function() { subfunction(somearg) }
fun()
}
with the interesting aspect that the only interaction of mytest(somearg) with the arg is that it tests if the argument isn’t missing:
mytest = function(somearg) {
print(missing(somearg))
}
subfunction then again tests if it’s missing and treats it accordingly:
subfunction = function(somearg) {
if (missing(somearg))
somearg = NULL
else
somearg = matrix(somearg, cols = 2)
# somearg is used here…
}
the kicker is that, with somearg missing, this doesn’t work: matrix(somearg, cols = 2) throws
argument "somearg" is missing, with no default
during debugging, i found the following:
at the start of mainfunction, missing(somearg) returns TRUE
in mytest, missing(somearg) returns TRUE
insubfunction, missing(somearg) returns FALSE (!!!!)
therefore the matrix branch is hit, but in reality, somearg is missing, so it fails…
wat.
the #BenBolker way:
mainfunction <- function(somearg = NULL) {
mytest(somearg)
fun <- function() { subfunction(somearg) }
fun()
}
mytest = function(somearg) {
print(is.null(somearg))
}
subfunction = function(somearg) {
if (is.null(somearg))
somearg = 1:10
else
somearg = matrix(somearg, ncol = 2)
somearg
}
Another way, using explicit missing argument
mainfunction <- function(somearg) {
is_missing <- missing(somearg)
mytest(is_missing)
fun <- function() { subfunction(somearg, is_missing) }
fun()
}
mytest = function(x) { print(x) }
subfunction = function(somearg, is_arg_missing) {
if (is_arg_missing)
somearg = 1:10
else
somearg = matrix(somearg, ncol = 2)
somearg
}
A third way, using plain missing arg passing:
mainfunction <- function(somearg) {
is_missing <- missing(somearg)
mytest(somearg)
fun <- function() {
if (is_missing) subfunction() else
subfunction(somearg)
}
fun()
}
mytest = function(somearg) {
print(missing(somearg))
}
subfunction = function(somearg) {
if (missing(somearg))
somearg = 1:10
else
somearg = matrix(somearg, ncol = 2)
somearg
}

Is there a destructor in R reference class?

Just as a test:
myclass = setRefClass("myclass",
fields = list(
x = "numeric",
y = "numeric"
))
myclass$methods(
dfunc = function(i) {
message("In dfunc, I save x and y...")
obj = .self
base::save(obj, file="/tmp/obj.rda")
}
)
myclass$methods(
print = function() {
if (.self$x > 10) {
stop("x is too large!")
}
message(paste("x: ", .self$x))
message(paste("y: ", .self$y))
}
)
myclass$methods(
initialize = function(x=NULL, y=NULL, obj=NULL) {
if(is.null(obj)) {
.self$x = x
.self$y = y
}
else {
.self$x = obj$x
.self$y = obj$y
}
}
)
myclass$methods(
finalize = function() {
message("I am finalizing this thing...")
}
)
Then try to create and remove an object:
u = myclass(15, 6)
u$print()
rm(u)
The finalize function is not called at all...
When you call rm you just remove the object reference from the enviroment, but you don't destroy the element.
That is the work of the garbage collector that is designed to automatically destroy objects when they have nomore reference (like in this case). Anyway, the garbage collector is triggered by some special events (e.g. too much memory used etc.), so it is not automatically invoked when you call rm (it will be probably called later later).
Anyway, you can force the garbage collector, even if this is usually discouraged, by calling gc().
u = myclass(15, 6)
rm(u)
gc()
# > I am finalizing this thing...
As you can see by running the above code, your finalize method is indeed called after gc()

Resources