Change default argument(s) of S3 Methods in R - r

Is it possible to change default argument(s) of S3 Methods in R?
It's easy enough to change arguments using formals ...
# return default arguments of table
> args(table)
function (..., exclude = if (useNA == "no") c(NA, NaN), useNA = c("no",
"ifany", "always"), dnn = list.names(...), deparse.level = 1)
# Update an argument
> formals(table)$useNA <- "always"
# Check change
> args(table)
function (..., exclude = if (useNA == "no") c(NA, NaN), useNA = "always",
dnn = list.names(...), deparse.level = 1)
But not S3 methods ...
# View default argument of S3 method
> formals(utils:::str.default)$list.len
[1] 99
# Attempt to change
> formals(utils:::str.default)$list.len <- 99
Error in formals(utils:::str.default)$list.len <- 99 :
object 'utils' not found

At #nicola's generous prompting here is an answer-version of the comments:
You can edit S3 methods and other non-exported functions using assignInNamespace(). This lets you replace a function in a given namespace with a new user-defined function (fixInNamespace() will open the target function in an editor to let you make a change).
# Take a look at what we are going to change
formals(utils:::str.default)$list.len
#> [1] 99
# extract the whole function from utils namespace
f_to_edit <- utils:::str.default
# make the necessary alterations
formals(f_to_edit)$list.len<-900
# Now we substitute our new improved version of str.default inside
# the utils namespace
assignInNamespace("str.default", f_to_edit, ns = "utils")
# and check the result
formals(utils:::str.default)$list.len
#> [1] 900
If you restart your R session you'll recover the defaults (or you can put them back manually in the current session).

Related

R: Collect All Function Definitions from a Library

I am working with R. I found this previous post on stackoverflow which shows how to get a "list" of all functions that belong to a given library:
How to find all functions in an R package?
For example:
#load desired library
library(ParBayesianOptimization)
#find out all functions from this library
getNamespaceExports("ParBayesianOptimization")
[1] "addIterations" "getLocalOptimums" "bayesOpt" "getBestPars" "changeSaveFile" "updateGP"
The above code tells me the name of all functions that are used in the "ParBayesianOptimization" library. From here, I could manually inspect each one of these functions - for example:
# manually inspect any one of these functions
getAnywhere(bayesOpt)
A single object matching ‘bayesOpt’ was found
It was found in the following places
package:ParBayesianOptimization
namespace:ParBayesianOptimization
with value
#function stats here
function (FUN, bounds, saveFile = NULL, initGrid, initPoints = 4,
iters.n = 3, iters.k = 1, otherHalting = list(timeLimit = Inf,
minUtility = 0), acq = "ucb", kappa = 2.576, eps = 0,
parallel = FALSE, gsPoints = pmax(100, length(bounds)^3),
convThresh = 1e+08, acqThresh = 1, errorHandling = "stop",
plotProgress = FALSE, verbose = 1, ...)
{
startT <- Sys.time()
optObj <- list()
etc etc etc ...
saveFile = saveFile, verbose = verbose, ...)
return(optObj)
}
#function ends here
<bytecode: 0x000001cbb4145db0>
<environment: namespace:ParBayesianOptimization>
Goal : Is it possible to take each one of these functions and create a notepad file with their full definitions?
Something that would look like this:
My attempt:
I thought I could first make an "object" in R that contained all the functions found in this library:
library(plyr)
a = getNamespaceExports("ParBayesianOptimization")
my_list = do.call("rbind.fill", lapply(a, as.data.frame))
X[[i]]
1 addIterations
2 getLocalOptimums
3 bayesOpt
4 getBestPars
5 changeSaveFile
6 updateGP
Then, I could manually create an "assignment arrow":
header_text <- rep("<-")
Then, "paste" this to each function name:
combined_list <- as.character(paste(my_list, header_text, sep = ""))
But this is not looking correct:
combined_list
[1] "c(\"addIterations\", \"getLocalOptimums\", \"bayesOpt\", \"getBestPars\", \"changeSaveFile\", \"updateGP\")<- "
The goal is to automate the process of manually copying/pasting :
function_1 = getAnywhere("first function ParBayesianOptimization library")
function_2 = getAnywhere("second function ParBayesianOptimization library")
etc
final_list = c(function_1, function_2 ...)
And removing the generic description from each function:
A single object matching ‘bayesOpt’ was found
It was found in the following places
package:ParBayesianOptimization
namespace:ParBayesianOptimization
with value
In the end, if I were to "call" the final_list object, all the functions from this library should get recreated and reassigned.
Can someone please show me how to do this?
Thanks
You can use the dump function for this
pkg <- "ParBayesianOptimization"
dump(getNamespaceExports(pkg), file="funs.R", envir = asNamespace(pkg))
This code will help you write the function definitions of all the functions in a library to a text file.
fn_list <- getNamespaceExports("ParBayesianOptimization")
for(i in seq_along(fn_list)) {
header <- paste('\n\n####Function', i, '\n\n\n')
cat(paste0(header, paste0(getAnywhere(fn_list[i]), collapse = '\n'), '\n\n'),
file = 'function.txt', append = TRUE)
}

How to reuse sparklyr context with mclapply?

I have a R code that does some distributed data preprocessing in sparklyr, and then collects the data to R local dataframe to finally save the result in the CSV. Everything works as expected and now I plan to re-use the spark context across multiple input files processing.
My code looks similar to this reproducible example:
library(dplyr)
library(sparklyr)
sc <- spark_connect(master = "local")
# Generate random input
matrix(rbinom(1000, 1, .5), ncol=1) %>% write.csv('/tmp/input/df0.csv')
matrix(rbinom(1000, 1, .5), ncol=1) %>% write.csv('/tmp/input/df1.csv')
# Multi-job input
input = list(
list(name="df0", path="/tmp/input/df0.csv"),
list(name="df1", path="/tmp/input/df1.csv")
)
global_parallelism = 2
results_dir = "/tmp/results2"
# Function executed on each file
f <- function (job) {
spark_df <- spark_read_csv(sc, "df_tbl", job$path)
local_df <- spark_df %>%
group_by(V1) %>%
summarise(n=n()) %>%
sdf_collect
output_path <- paste(results_dir, "/", job$name, ".csv", sep="")
local_df %>% write.csv(output_path)
return (output_path)
}
If I execute the function of a job inputs in sequential way with lapply everything works as expected:
> lapply(input, f)
[[1]]
[1] "/tmp/results2/df0.csv"
[[2]]
[1] "/tmp/results2/df1.csv"
However, if I plan to run it in parallel to maximize usage of spark context (if df0 spark processing is done and the local R is working on it, df1 can be already processed by spark):
> library(parallel)
> library(MASS)
> mclapply(input, f, mc.cores = global_parallelism)
*** caught segfault ***
address 0x560b2c134003, cause 'memory not mapped'
[[1]]
[1] "Error in as.vector(x, \"list\") : \n cannot coerce type 'environment' to vector of type 'list'\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in as.vector(x, "list"): cannot coerce type 'environment' to vector of type 'list'>
[[2]]
NULL
Warning messages:
1: In mclapply(input, f, mc.cores = global_parallelism) :
scheduled core 2 did not deliver a result, all values of the job will be affected
2: In mclapply(input, f, mc.cores = global_parallelism) :
scheduled core 1 encountered error in user code, all values of the job will be affected
When I'm doing similar with Python and ThreadPoolExcutor, the spark context is shared across threads, same for Scala and Java.
Is this possible to reuse sparklyr context in parallel execution in R?
Yeah, unfortunately, the sc object, which is of class spark_connection, cannot be exported to another R process (even if forked processing is used). If you use the future.apply package, part of the future ecosystem, you can see this if you use:
library(future.apply)
plan(multicore)
## Look for non-exportable objects and given an error if found
options(future.globals.onReference = "error")
y <- future_lapply(input, f)
That will throw:
Error: Detected a non-exportable reference (‘externalptr’) in one of the
globals (‘sc’ of class ‘spark_connection’) used in the future expression

Meaning of this warning; "Warning message: In get(object, envir = currentEnv, inherits = TRUE) : restarting interrupted promise evaluation"

I have written a function in R which extracts data from a database and builds a new table.
My new table is labeled with the date of the extract (build_date_0).
When I'm debugging my function I get the following warning when I look at my date string:
Browse[2]> build_date_0
[1] "2019-05-01"
Warning message:
In get(object, envir = currentEnv, inherits = TRUE) :
restarting interrupted promise evaluation
Questions:
What does this warning mean / what is happening (step-by-step/basics)?
Should I care?
In general how can I find out more about this error?
This is my code:
build_account_db = function(conn = connection_object
,various_inputs = 24){
browser()
# create connection objects
tabs_1 = dplyr::tbl(conn,in_schema("DB_1","VIEW_W") # some table
# create date string
build_date_0 = lubridate::today() %>% as.character()
build_date = str_replace_all(build_date_0,"-+","_")
db_name_1 = paste0('DATABASE.tab_1_',build_date)
db_name_2 = paste0('DATABASE.tab_2_',build_date)
# build query
query_text_1 = tabs_1 %>% select(COL_1) # some query
query_text_1 = tabs_1 %>% select(COL_2)
# build new tables
create_db = DBI::dbSendQuery(conn_t,paste('CREATE TABLE',db_name_1,'AS (',query_text_1,') WITH DATA PRIMARY INDEX (ID_1)'))
create_db2 = DBI::dbSendQuery(conn_t,paste('CREATE TABLE',db_name_2,'AS (',query_text_2,') WITH DATA PRIMARY INDEX (ID_1)'))
}
When I check a variable, I may or may not get this warning (it varies, even if I restart R, and run my code again with a cleared environment)
Browse[2]> build_date
[1] "2019-02-28 11:00:00 AEDT"
Warning message:
In get(object, envir = currentEnv, inherits = TRUE) :
restarting interrupted promise evaluation
What I've tried: I read this question, but it's more about suppressing the error. Also, google.
I found this link on promises and evaluation in R helpful for a related problem: https://mailund.dk/posts/promises-and-lazy-evaluation/. I wonder if after build_date_0 = lubridate::today() %>% as.character() if you add a call to just build_date_0 if that would solve the promise? Good luck!

‘l’ is no list of MALDIquant::MassPeaks objects

I'm following this tutorial about MALDIquant package but i'm getting an error, before executing this line
peaks <- binPeaks(peaks, tolerance=0.002)
The Error is :
Error: binPeaks(peaks, tolerance = 0.002) : ‘l’ is no list of MALDIquant::MassPeaks objects!
When i do class(peaks) :
> class(peaks)
[1] "MassPeaks"
attr(,"package")
[1] "MALDIquant"
binPeaks just works on list of MassPeaks objects. Your class output shows that you have just a single MassPeaks object.
## load package
library("MALDIquant")
## create two MassPeaks objects
p <- list(createMassPeaks(mass=seq(100, 500, 100), intensity=1:5),
createMassPeaks(mass=c(seq(100.2, 300.2, 100), 395), intensity=1:4))
binnedPeaks <- binPeaks(p, tolerance=0.002)
## but not:
binPeaks(p[[1]])
# Error: binPeaks(peaks, tolerance = 0.002) : ‘l’ is no list of MALDIquant::MassPeaks objects!
Please look at ?binPeaks for details or write me an email (I am the maintainer, you will find my mail address at CRAN.

Importing package namespace into default namespace [duplicate]

This question already has an answer here:
How can I read the source code for an R function?
(1 answer)
Closed 9 years ago.
Frequently when I am working with R and I want to find out what the function does, I type in the name of the function and scroll through the code. However, sometimes when I type in the name of the function I get a response that does not tell me anything.
> library(limma)
> plotMDS #can't get to the code
function (x, ...)
UseMethod("plotMDS")
<environment: namespace:limma>
> limma:::plotMDS
function (x, ...)
UseMethod("plotMDS")
<environment: namespace:limma>
> heatmap #im expecting something more like this
function (x, Rowv = NULL, Colv = if (symm) "Rowv" else NULL,
distfun = dist, hclustfun = hclust, reorderfun = function(d,
w) reorder(d, w), add.expr, symm = FALSE, revC = identical(Colv,
"Rowv"), scale = c("row", "column", "none"), na.rm = TRUE,
margins = c(5, 5), ColSideColors, RowSideColors, cexRow = 0.2 +
1/log10(nr), cexCol = 0.2 + 1/log10(nc), labRow = NULL,
labCol = NULL, main = NULL, xlab = NULL, ylab = NULL, keep.dendro = FALSE,
verbose = getOption("verbose"), ...)
{
scale <- if (symm && missing(scale))
"none"
else match.arg(scale)
/* ... many lines removed ... */
}
invisible(list(rowInd = rowInd, colInd = colInd, Rowv = if (keep.dendro &&
doRdend) ddr, Colv = if (keep.dendro && doCdend) ddc))
}
<bytecode: 0x16199b8>
<environment: namespace:stats>
Thus, I was wondering if there is a way to import a package's namespace into the default namespace so I can look at code in functions (and debug things easier). I've been reading up on namespace but most of the time it is written for developers so it is talking about how to export namespaces for packages.
plotMDS is the generic function. What you access via plotMDS and limma:::plotMDS is exactly the same thing, the latter just less-efficiently. What you want to get at are the methods for this generic function.
To see the list of method for plotMDS try
methods(plotMDS)
That will return a vector of function names. I can't install limma so here is what we see for the base plot generic [in my current session]:
> methods(plot)
[1] plot.acf* plot.correspondence* plot.data.frame*
[4] plot.decomposed.ts* plot.default plot.dendrogram*
[7] plot.density plot.ecdf plot.factor*
[10] plot.formula* plot.function plot.hclust*
[13] plot.histogram* plot.HoltWinters* plot.isoreg*
[16] plot.lda* plot.lm plot.mca*
[19] plot.medpolish* plot.mlm plot.ppr*
[22] plot.prcomp* plot.princomp* plot.profile*
[25] plot.profile.nls* plot.ridgelm* plot.spec
[28] plot.stepfun plot.stl* plot.table*
[31] plot.ts plot.tskernel* plot.TukeyHSD
Non-visible functions are asterisked
To access the code of non-starred functions we just enter the full function name, e.g.
> plot.density
function (x, main = NULL, xlab = NULL, ylab = "Density", type = "l",
zero.line = TRUE, ...)
{
....
To see the code for starred functions/methods you need the pkg:::function structure, e.g. for the plot.data.frame method
> plot.data.frame
Error: object 'plot.data.frame' not found
> graphics:::plot.data.frame
function (x, ...)
{
....
If you don't know which namespace a method belongs to, then use getAnywhere, e.g.
> getAnywhere(plot.data.frame)
A single object matching ‘plot.data.frame’ was found
It was found in the following places
registered S3 method for plot from namespace graphics
namespace:graphics
with value
function (x, ...)
{
....
The printed results indicate the relevant namespace (in this case graphics) plus return the value of the function, or the code.
This is a really crude alternative, but it does what was requested:
Firstly, copy the contents of the namespace to a list in the global environment:
L <- as.list(asNamespace("yourpackage"))
Now you can either navigate L or copy all its contents to equally named objects in the global environment with this:
invisible(lapply(names(L), function(x) eval(parse(text=paste0(x,"<-L[['",x,"']]")), globalenv())))
warning: this will overwrite whatever object you have defined with the same name! So use with care.

Resources