R: Collect All Function Definitions from a Library - r

I am working with R. I found this previous post on stackoverflow which shows how to get a "list" of all functions that belong to a given library:
How to find all functions in an R package?
For example:
#load desired library
library(ParBayesianOptimization)
#find out all functions from this library
getNamespaceExports("ParBayesianOptimization")
[1] "addIterations" "getLocalOptimums" "bayesOpt" "getBestPars" "changeSaveFile" "updateGP"
The above code tells me the name of all functions that are used in the "ParBayesianOptimization" library. From here, I could manually inspect each one of these functions - for example:
# manually inspect any one of these functions
getAnywhere(bayesOpt)
A single object matching ‘bayesOpt’ was found
It was found in the following places
package:ParBayesianOptimization
namespace:ParBayesianOptimization
with value
#function stats here
function (FUN, bounds, saveFile = NULL, initGrid, initPoints = 4,
iters.n = 3, iters.k = 1, otherHalting = list(timeLimit = Inf,
minUtility = 0), acq = "ucb", kappa = 2.576, eps = 0,
parallel = FALSE, gsPoints = pmax(100, length(bounds)^3),
convThresh = 1e+08, acqThresh = 1, errorHandling = "stop",
plotProgress = FALSE, verbose = 1, ...)
{
startT <- Sys.time()
optObj <- list()
etc etc etc ...
saveFile = saveFile, verbose = verbose, ...)
return(optObj)
}
#function ends here
<bytecode: 0x000001cbb4145db0>
<environment: namespace:ParBayesianOptimization>
Goal : Is it possible to take each one of these functions and create a notepad file with their full definitions?
Something that would look like this:
My attempt:
I thought I could first make an "object" in R that contained all the functions found in this library:
library(plyr)
a = getNamespaceExports("ParBayesianOptimization")
my_list = do.call("rbind.fill", lapply(a, as.data.frame))
X[[i]]
1 addIterations
2 getLocalOptimums
3 bayesOpt
4 getBestPars
5 changeSaveFile
6 updateGP
Then, I could manually create an "assignment arrow":
header_text <- rep("<-")
Then, "paste" this to each function name:
combined_list <- as.character(paste(my_list, header_text, sep = ""))
But this is not looking correct:
combined_list
[1] "c(\"addIterations\", \"getLocalOptimums\", \"bayesOpt\", \"getBestPars\", \"changeSaveFile\", \"updateGP\")<- "
The goal is to automate the process of manually copying/pasting :
function_1 = getAnywhere("first function ParBayesianOptimization library")
function_2 = getAnywhere("second function ParBayesianOptimization library")
etc
final_list = c(function_1, function_2 ...)
And removing the generic description from each function:
A single object matching ‘bayesOpt’ was found
It was found in the following places
package:ParBayesianOptimization
namespace:ParBayesianOptimization
with value
In the end, if I were to "call" the final_list object, all the functions from this library should get recreated and reassigned.
Can someone please show me how to do this?
Thanks

You can use the dump function for this
pkg <- "ParBayesianOptimization"
dump(getNamespaceExports(pkg), file="funs.R", envir = asNamespace(pkg))

This code will help you write the function definitions of all the functions in a library to a text file.
fn_list <- getNamespaceExports("ParBayesianOptimization")
for(i in seq_along(fn_list)) {
header <- paste('\n\n####Function', i, '\n\n\n')
cat(paste0(header, paste0(getAnywhere(fn_list[i]), collapse = '\n'), '\n\n'),
file = 'function.txt', append = TRUE)
}

Related

R - Parallel Processing and ldply error

I am trying to use the below code to make API calls in a parallel process to speed up the API calls. (I know this isn't the best way to speed up API calls but it works)
It only fails when I try to use parallel, otherwise it works. In the ldply function I am getting the below error:
Error in do.ply(i) :
task 1 failed - "object of type 'closure' is not subsettable"
In addition:
Warning messages:
1: : ... may be used in an incorrect context: ‘.fun(piece, ...)’
2: : ... may be used in an incorrect context: ‘.fun(piece, ...)’
any help would be appreciated!
One <- 26
cl<-makeCluster(4)
registerDoSNOW(cl)
func.time <- Sys.time()
## API CALL ONE FOR "kline"
url <- "https://api.binance.com"
path <- paste("/api/v1/klines?symbol=",pairs[1],"&interval=1m&limit=1", sep = "")
raw.results <- GET(url = url, path = path)
text_content <- content(raw.results, as = "text", encoding = "UTF-8")
kline <- data.frame(text_content %>% fromJSON())
kline$symbol <- pairs[1]
## API FUNCTION TO BE APPLIED FOR REST
loopfunction <- function(i){
url <- "https://api.binance.com"
path <- paste("/api/v1/klines?symbol=",pairs[i],"&interval=1m&limit=1", sep = "")
raw.results <- GET(url = url, path = path)
text_content <- content(raw.results, as = "text", encoding = "UTF-8")
kline_temp <- data.frame(text_content %>% fromJSON())
kline_temp$symbol <- pairs[i]
kline <- rbind(kline,kline_temp)
return(kline)
}
## DPLY PARALLEL FUNCTION
kline2 <- data.frame(ldply(2:(One - 1), .fun = loopfunction, .parallel = T, .paropts = c("httr", "jsonlite", "dplyr"))) ##"ONE" is a list varriable created earlier
stopCluster(cl)
func.end.time <- Sys.time()
func.tot.time <- func.end.time - func.time
Your question isn't fully reproducible, so the following is an educated guess.
Your loopfunction() references an object called pairs. It seems from your script that a variable called pairs is defined somewhere in your local environment. However, when loopfunction() is passed to ldply(), it no longer has access to that variable (ordinarily, it would, but parallelization requires fresh R environments to be created). Having failed to find an object called pairs in the environment, R continues searching, and finds a match in stats::pairs(). This is a plotting function, not a subsettable object like a vector or data frame. Hence the error message, "object of type 'closure' is not subsettable".
I'm not especially familiar with how ldply implements parallel processing, but you could probably modify your function definition like this:
loopfunction <- function(i, pairs) {
...[body of function]...
}
And pass pairs as an extra parameter in your ldply call:
kline2 <- data.frame(ldply(2:(One - 1), .fun = loopfunction, pairs = pairs, .parallel = T, .paropts = list(.packages = c("httr", "jsonlite", "dplyr"))))

R CMD check not recognizing S3 generic methods

I'm writing a package and if I run the code in my Rstudio it runs but when I give it to R CMD check to run, it doesn't recognize the S3 methods.
I have a generic method:
count_kmers <- function(obj, klen = 6, parallel = TRUE,
nproc = ifelse(parallel, comm.size(), 1),
distributed = FALSE) {
UseMethod("count_kmers", obj)
}
And then the substitute methods:
count_kmers.character <- function(obj, klen = 6, parallel = TRUE,
nproc = ifelse(parallel, comm.size(), 1),
distributed = FALSE) {...}
count_kmers.AAStringSet <- function(obj, klen = 6, parallel = TRUE,
nproc = ifelse(parallel, comm.size(), 1),
distributed = FALSE){...}
Now an example that I run in my documentation is:
seqs <- AAStringSet(c("seq1"="MLVVD",
"seq2"="PVVRA",
"seq3"="LVVR"))
## Count the kmers and generate a dataframe of the frequencies
freqs <- count_kmers(seqs, klen = 3, parallel = FALSE)
head(freqs)
If I run the code as a normal code it works but if I check it using R CMD check then it will complain:
Error in UseMethod("count_kmers", obj) :
no applicable method for 'count_kmers' applied to an object of class "c('AAStringSet', 'XStringSet', 'XRawList', 'XVectorList', 'List', 'Vector', 'list_OR_List', 'Annotated')"
The AAStringSet is an object from the Biostrings package. But that doesn't matter, even if I pass a character string to count_kmers, I'm receiving the same error but saying:
no applicable method for 'count_kmers' applied to an object of class "character".

R - XGBoost: Error building DMatrix

I am having trouble using the XGBoost in R.
I am reading a CSV file with my data:
get_data = function()
{
#Loading Data
path = "dados_eye.csv"
data = read.csv(path)
#Dividing into two groups
train_porcentage = 0.05
train_lines = nrow(data)*train_porcentage
train = data[1:train_lines,]
test = data[train_lines:nrow(data),]
rownames(train) = c(1:nrow(train))
rownames(test) = c(1:nrow(test))
return (list("test" = test, "train" = train))
}
This function is Called my the main.R
lista_dados = get_data()
#machine = train_svm(lista_dados$train)
#machine = train_rf(lista_dados$train)
machine = train_xgt(lista_dados$train)
The problem is here in the train_xgt
train_xgt = function(train_data)
{
data_train = data.frame(train_data[,1:14])
label_train = data.frame(factor(train_data[,15]))
print(is.data.frame(data_train))
print(is.data.frame(label_train))
dtrain = xgb.DMatrix(data_train, label=label_train)
machine = xgboost(dtrain, num_class = 4 ,max.depth = 2,
eta = 1, nround = 2,nthread = 2,
objective = "binary:logistic")
return (machine)
}
This is the Error:
becchi#ubuntu:~/Documents/EEG_DATA/Dados_Eye$ Rscript main.R
[1] TRUE
[1] TRUE
Error in xgb.DMatrix(data_train, label = label_train) :
xgb.DMatrix: does not support to construct from list Calls: train_xgt
-> xgb.DMatrix Execution halted becchi#ubuntu:~/Documents/EEG_DATA/Dados_Eye$
As you can see, they are both DataFrames.
I dont know what I am doing wrong, please help!
Just convert data frame to matrix first using as.matrix() and then pass to xgb.Dmatrix().
Check if all columns have numeric data in them- I think this could be because you have some column that has data stored as factors/ characters which it won't be able to convert to a matrix. if you have factor variables, you can use one-hot encoding to convert them into dummy variables.
Try:
dtrain = xgb.DMatrix(as.matrix(sapply(data_train, as.numeric)), label=label_train)
instead of just:
dtrain = xgb.DMatrix(data_train, label=label_train)

DEoptim error: objective function result has different length than parameter matrix due to foreachArgs specification

I have a very odd DEoptim error that I have "fixed", but do not understand.
I do not have issues when I use DEoptim's parallel capabilities with the parallel package (i.e., pType=1). However when I use foreach instead (which I must use on the grid computing setup that is available to me), I have issues. Below is an MRE of a much simplified version of the issue that I had. pType=1 works, pType=2 when foreachArgs is specified returns an error:
objective function result has different length than parameter matrix
When I do not specify foreachArgs the issue goes away. Does anyone have thoughts about the root cause of this issue?
library(zoo)
library(parallel)
library(doParallel)
library(DEoptim)
myfunc1 <- function(params){
s <- myfunc2(params,ncal,n_left_cens,astats, X_ret, disc_length, X_acq, POP_0, POP_ann_growth)
loss_func(s)
}
myfunc2 = function(params,ncal,n_left_cens,astats, X_ret, disc_length, X_acq, POP_0, POP_ann_growth){
sum(params) + ncal + n_left_cens + astats + X_ret + disc_length + X_acq + POP_0 + POP_ann_growth
}
loss_func = function(s){
s
}
# General setup
ncal = 1
n_left_cens = 1
astats= 1
disc_length = 1
POP_0 = 1
POP_ann_growth = 1
X_acq = 1
X_ret = 1
params = c(1,1)
W = 1
paral = TRUE
itermax=100
ncores = detectCores()
cltype <- ifelse(.Platform$OS.type != "windows", "FORK", "PSOCK")
trace=TRUE
# bounds for search for DEoptim
lower = rep(-1,length(params))
upper = lower*-1
# parallel: works
pType = 1
parVar = c("myfunc1","myfunc2","loss_func","W","ncal","n_left_cens","astats","X_ret","disc_length",
"X_acq","POP_0","POP_ann_growth")
foreachArguments <- list("myfunc1","myfunc2","loss_func","ncal","n_left_cens","astats","X_ret","disc_length",
"X_acq","POP_0","POP_ann_growth")
clusters <- makeCluster(ncores, type = cltype)
registerDoParallel(clusters)
clusterExport(cl=clusters, varlist=foreachArguments, envir=environment())
results <- DEoptim(fn=myfunc1,lower=lower,upper=upper,
DEoptim.control(itermax=itermax,trace=trace,parallelType=pType,
parVar=parVar))
showConnections(all = TRUE)
closeAllConnections()
# foreach with foreachArgs specified: doesn't work
pType = 2
clusters <- makeCluster(ncores, type = cltype)
registerDoParallel(clusters)
clusterExport(cl=clusters, varlist=foreachArguments, envir=environment())
results <- DEoptim(fn=myfunc1,lower=lower,upper=upper,
DEoptim.control(itermax=itermax,trace=trace,parallelType=pType,
foreachArgs=foreachArguments))
showConnections(all = TRUE)
closeAllConnections()
# foreach with foreachArgs unspecified: works
pType = 2
foreachArguments <- list("myfunc1","myfunc2","loss_func","ncal","n_left_cens","astats","X_ret","disc_length",
"X_acq","POP_0","POP_ann_growth")
clusters <- makeCluster(ncores, type = cltype)
registerDoParallel(clusters)
clusterExport(cl=clusters, varlist=foreachArguments, envir=environment())
results <- DEoptim(fn=myfunc1,lower=lower,upper=upper,
DEoptim.control(itermax=itermax,trace=trace,parallelType=pType))
showConnections(all = TRUE)
closeAllConnections()
From ?DEoptim.control:
foreachArgs: A list of named arguments for the ‘foreach’ function from
the package ‘foreach’. The arguments ‘i’, ‘.combine’ and ‘.export’ are
not possible to set here; they are set internally.
Which you seem to be conflating with the behavior of parVar:
parVar: Used if ‘parallelType=1’; a list of variable names (as strings)
that need to exist in the environment for use by the objective function
or are used as arguments by the objective function.
You need to specify the arguments passed to foreach as name = value pairs. For example:
foreachArguments <- list(.export = c("myfunc1", "myfunc2", "loss_func", "ncal",
"n_left_cens", "astats", "X_ret", "disc_length", "X_acq","POP_0","POP_ann_growth")
I'm not sure what's causing that specific error, but the fix is "don't do that." ;)
Here's an example of how you'd actually use the foreachArgs argument. Note that I'm setting the .verbose argument to make foreach print diagnostics:
library(doParallel)
library(DEoptim)
clusters <- makeCluster(detectCores())
registerDoParallel(clusters)
obj_func <- function(params) { sum(params) }
results <- DEoptim(fn=obj_func, lower=c(-1, -1), upper=c(1, 1),
DEoptim.control(parallelType=2, foreachArgs=list(.verbose=TRUE)))
stopCluster(clusters)

Unused arguments in R error

I am new to R , I am trying to run example which is given in "rebmix-help pdf". It use galaxy dataset and here is the code
library(rebmix)
devAskNewPage(ask = TRUE)
data("galaxy")
write.table(galaxy, file = "galaxy.txt", sep = "\t",eol = "\n", row.names = FALSE, col.names = FALSE)
REBMIX <- array(list(NULL), c(3, 3, 3))
Table <- NULL
Preprocessing <- c("histogram", "Parzen window", "k-nearest neighbour")
InformationCriterion <- c("AIC", "BIC", "CLC")
pdf <- c("normal", "lognormal", "Weibull")
K <- list(7:20, 7:20, 2:10)
for (i in 1:3) {
for (j in 1:3) {
for (k in 1:3) {
REBMIX[[i, j, k]] <- REBMIX(Dataset = "galaxy.txt",
Preprocessing = Preprocessing[k], D = 0.0025,
cmax = 12, InformationCriterion = InformationCriterion[j],
pdf = pdf[i], K = K[[k]])
if (is.null(Table))
Table <- REBMIX[[i, j, k]]$summary
else Table <- merge(Table, REBMIX[[i, j,k]]$summary, all = TRUE, sort = FALSE)
}
}
}
It is giving me error ERROR:
unused argument (InformationCriterion = InformationCriterion[j])
Plz help
I'm running R 3.0.2 (Windows) and the library rebmix defines a function REBMIX where InformationCriterion is not listed as a named argument, but Criterion.
Brief invoke REBMIX as :
REBMIX[[i, j, k]] <- REBMIX(Dataset = "galaxy.txt",
Preprocessing = Preprocessing[k], D = 0.0025,
cmax = 12, Criterion = InformationCriterion[j],
pdf = pdf[i], K = K[[k]])
It looks as though there have been substantial changes to the rebmix package since the example mentioned in the OP was created. Among the most noticable changes is the use of S4 classes.
There's also an updated demo in the rebmix package using the galaxy data (see demo("rebmix.galaxy"))
To get the above example to produce results (Note: I am not familiar with this package or the rebmix algorithm!!!):
Change the argument to Criterion as mentioned by #Giupo
Use the S4 slot access operator # instead of $
Don't name the results object REDMIX because that's already the function name
library(rebmix)
data("galaxy")
## Don't re-name the REBMIX object!
myREBMIX <- array(list(NULL), c(3, 3, 3))
Table <- NULL
Preprocessing <- c("histogram", "Parzen window", "k-nearest neighbour")
InformationCriterion <- c("AIC", "BIC", "CLC")
pdf <- c("normal", "lognormal", "Weibull")
K <- list(7:20, 7:20, 2:10)
for (i in 1:3) {
for (j in 1:3) {
for (k in 1:3) {
myREBMIX[[i, j, k]] <- REBMIX(Dataset = list(galaxy),
Preprocessing = Preprocessing[k], D = 0.0025,
cmax = 12, Criterion = InformationCriterion[j],
pdf = pdf[i], K = K[[k]])
if (is.null(Table)) {
Table <- myREBMIX[[i, j, k]]#summary
} else {
Table <- merge(Table, myREBMIX[[i, j,k]]#summary, all = TRUE, sort = FALSE)
}
}
}
}
I guess this is late. But I encountered a similar problem just a few minutes ago. And I realized the real scenario that you may face when you got this kind of error msg... It's just the version conflict.
You may use a different version of the R package from the tutorial, thus the argument names could be different between what you are running and what the real code use.
So please check the version first before you try to manually edit the file. Also, it happens that your old version package is still in the path and it overrides the new one. This was exactly what I had... since I manually installed the old and new version separately...

Resources