I'm writing a package and if I run the code in my Rstudio it runs but when I give it to R CMD check to run, it doesn't recognize the S3 methods.
I have a generic method:
count_kmers <- function(obj, klen = 6, parallel = TRUE,
nproc = ifelse(parallel, comm.size(), 1),
distributed = FALSE) {
UseMethod("count_kmers", obj)
}
And then the substitute methods:
count_kmers.character <- function(obj, klen = 6, parallel = TRUE,
nproc = ifelse(parallel, comm.size(), 1),
distributed = FALSE) {...}
count_kmers.AAStringSet <- function(obj, klen = 6, parallel = TRUE,
nproc = ifelse(parallel, comm.size(), 1),
distributed = FALSE){...}
Now an example that I run in my documentation is:
seqs <- AAStringSet(c("seq1"="MLVVD",
"seq2"="PVVRA",
"seq3"="LVVR"))
## Count the kmers and generate a dataframe of the frequencies
freqs <- count_kmers(seqs, klen = 3, parallel = FALSE)
head(freqs)
If I run the code as a normal code it works but if I check it using R CMD check then it will complain:
Error in UseMethod("count_kmers", obj) :
no applicable method for 'count_kmers' applied to an object of class "c('AAStringSet', 'XStringSet', 'XRawList', 'XVectorList', 'List', 'Vector', 'list_OR_List', 'Annotated')"
The AAStringSet is an object from the Biostrings package. But that doesn't matter, even if I pass a character string to count_kmers, I'm receiving the same error but saying:
no applicable method for 'count_kmers' applied to an object of class "character".
Related
I am working with R. I found this previous post on stackoverflow which shows how to get a "list" of all functions that belong to a given library:
How to find all functions in an R package?
For example:
#load desired library
library(ParBayesianOptimization)
#find out all functions from this library
getNamespaceExports("ParBayesianOptimization")
[1] "addIterations" "getLocalOptimums" "bayesOpt" "getBestPars" "changeSaveFile" "updateGP"
The above code tells me the name of all functions that are used in the "ParBayesianOptimization" library. From here, I could manually inspect each one of these functions - for example:
# manually inspect any one of these functions
getAnywhere(bayesOpt)
A single object matching ‘bayesOpt’ was found
It was found in the following places
package:ParBayesianOptimization
namespace:ParBayesianOptimization
with value
#function stats here
function (FUN, bounds, saveFile = NULL, initGrid, initPoints = 4,
iters.n = 3, iters.k = 1, otherHalting = list(timeLimit = Inf,
minUtility = 0), acq = "ucb", kappa = 2.576, eps = 0,
parallel = FALSE, gsPoints = pmax(100, length(bounds)^3),
convThresh = 1e+08, acqThresh = 1, errorHandling = "stop",
plotProgress = FALSE, verbose = 1, ...)
{
startT <- Sys.time()
optObj <- list()
etc etc etc ...
saveFile = saveFile, verbose = verbose, ...)
return(optObj)
}
#function ends here
<bytecode: 0x000001cbb4145db0>
<environment: namespace:ParBayesianOptimization>
Goal : Is it possible to take each one of these functions and create a notepad file with their full definitions?
Something that would look like this:
My attempt:
I thought I could first make an "object" in R that contained all the functions found in this library:
library(plyr)
a = getNamespaceExports("ParBayesianOptimization")
my_list = do.call("rbind.fill", lapply(a, as.data.frame))
X[[i]]
1 addIterations
2 getLocalOptimums
3 bayesOpt
4 getBestPars
5 changeSaveFile
6 updateGP
Then, I could manually create an "assignment arrow":
header_text <- rep("<-")
Then, "paste" this to each function name:
combined_list <- as.character(paste(my_list, header_text, sep = ""))
But this is not looking correct:
combined_list
[1] "c(\"addIterations\", \"getLocalOptimums\", \"bayesOpt\", \"getBestPars\", \"changeSaveFile\", \"updateGP\")<- "
The goal is to automate the process of manually copying/pasting :
function_1 = getAnywhere("first function ParBayesianOptimization library")
function_2 = getAnywhere("second function ParBayesianOptimization library")
etc
final_list = c(function_1, function_2 ...)
And removing the generic description from each function:
A single object matching ‘bayesOpt’ was found
It was found in the following places
package:ParBayesianOptimization
namespace:ParBayesianOptimization
with value
In the end, if I were to "call" the final_list object, all the functions from this library should get recreated and reassigned.
Can someone please show me how to do this?
Thanks
You can use the dump function for this
pkg <- "ParBayesianOptimization"
dump(getNamespaceExports(pkg), file="funs.R", envir = asNamespace(pkg))
This code will help you write the function definitions of all the functions in a library to a text file.
fn_list <- getNamespaceExports("ParBayesianOptimization")
for(i in seq_along(fn_list)) {
header <- paste('\n\n####Function', i, '\n\n\n')
cat(paste0(header, paste0(getAnywhere(fn_list[i]), collapse = '\n'), '\n\n'),
file = 'function.txt', append = TRUE)
}
I am currently trying to run a parallelized RQA with the following code.
library(snow)
library(doSNOW)
library(crqa)
my_wincrqa = function(x, y){
wincrqa(x, y, windowstep = 1000, windowsize = 2000,
radius = .2, delay = 4, embed = 2, rescale = 0, normalize = 0,
mindiagline = 2, minvertline = 2, tw = 0, whiteline = F,
side = "both", method = "crqa", metric = "euclidean", datatype = "continuous")
}
cl<-makeCluster(11,type="SOCK")
start_time <- Sys.time()
WCRQA_list = clusterMap(cl, my_wincrqa, HR_list, RR_list)
end_time <- Sys.time()
end_time - start_time
Unfortunately, I get this: "
Error in checkForRemoteErrors(val) : 2 nodes produced errors; first
error: could not find function "wincrqa"
I know there is probably sum error in setting up the parallel processing, but I am not able to resolve it. I also tried a similar thing using the parallel() package.
I am happy for any help!
Best,
Johnson
The issue is that you’ve loaded and attached the ‘crqa’ package in your main execution environment, but the cluster nodes are running code in separate, isolated R sessions — they don’t see the same loaded packages or global variables!
The easiest solution is to replace use of wincrqa with a fully qualified name, i.e. to use crqa::wincrqa inside your function.
Alternatively, it is possible to attach the ‘crqa’ package on all cluster nodes prior to executing the function:
clusterEvalQ(cl, library(crqa))
WCRQA_list = clusterMap(cl, my_wincrqa, HR_list, RR_list)
I am trying to use the below code to make API calls in a parallel process to speed up the API calls. (I know this isn't the best way to speed up API calls but it works)
It only fails when I try to use parallel, otherwise it works. In the ldply function I am getting the below error:
Error in do.ply(i) :
task 1 failed - "object of type 'closure' is not subsettable"
In addition:
Warning messages:
1: : ... may be used in an incorrect context: ‘.fun(piece, ...)’
2: : ... may be used in an incorrect context: ‘.fun(piece, ...)’
any help would be appreciated!
One <- 26
cl<-makeCluster(4)
registerDoSNOW(cl)
func.time <- Sys.time()
## API CALL ONE FOR "kline"
url <- "https://api.binance.com"
path <- paste("/api/v1/klines?symbol=",pairs[1],"&interval=1m&limit=1", sep = "")
raw.results <- GET(url = url, path = path)
text_content <- content(raw.results, as = "text", encoding = "UTF-8")
kline <- data.frame(text_content %>% fromJSON())
kline$symbol <- pairs[1]
## API FUNCTION TO BE APPLIED FOR REST
loopfunction <- function(i){
url <- "https://api.binance.com"
path <- paste("/api/v1/klines?symbol=",pairs[i],"&interval=1m&limit=1", sep = "")
raw.results <- GET(url = url, path = path)
text_content <- content(raw.results, as = "text", encoding = "UTF-8")
kline_temp <- data.frame(text_content %>% fromJSON())
kline_temp$symbol <- pairs[i]
kline <- rbind(kline,kline_temp)
return(kline)
}
## DPLY PARALLEL FUNCTION
kline2 <- data.frame(ldply(2:(One - 1), .fun = loopfunction, .parallel = T, .paropts = c("httr", "jsonlite", "dplyr"))) ##"ONE" is a list varriable created earlier
stopCluster(cl)
func.end.time <- Sys.time()
func.tot.time <- func.end.time - func.time
Your question isn't fully reproducible, so the following is an educated guess.
Your loopfunction() references an object called pairs. It seems from your script that a variable called pairs is defined somewhere in your local environment. However, when loopfunction() is passed to ldply(), it no longer has access to that variable (ordinarily, it would, but parallelization requires fresh R environments to be created). Having failed to find an object called pairs in the environment, R continues searching, and finds a match in stats::pairs(). This is a plotting function, not a subsettable object like a vector or data frame. Hence the error message, "object of type 'closure' is not subsettable".
I'm not especially familiar with how ldply implements parallel processing, but you could probably modify your function definition like this:
loopfunction <- function(i, pairs) {
...[body of function]...
}
And pass pairs as an extra parameter in your ldply call:
kline2 <- data.frame(ldply(2:(One - 1), .fun = loopfunction, pairs = pairs, .parallel = T, .paropts = list(.packages = c("httr", "jsonlite", "dplyr"))))
EDIT: New version of rslurm makes the solution very easy. See my answer below.
Apologies for the somewhat longer than desired MWE, and a title that I realize after submitting the question may be needlessly complicated. I believe the real issue is getting the environment of a RefClass object into rslurm::slurm_apply.
MWE
Here I define a toy reference class called BankAccount. It has two fields and two methods.
The fields are transactions, a list of all transactions associated with the account and suspicion_threshold the value above which the bank will investigate the transaction.
The two methods are is_suspicious which compares the transactions with the suspicion_threshold on the local machine and is_suspicious_slurm, which uses rslurm::slurm_apply to spread many calls to is_suspicious over a cluster of computers managed by SLURM. You can imagine if there were many transactions or if the is_suspicious function were more complex, this might be necessary.
So, here's the setup
BankAccount <- setRefClass(
Class = 'BankAccount',
fields = list(
transactions = 'numeric',
suspicion_threshold = 'numeric'
)
)
BankAccount$methods(
is_suspicious = function(start_idx = 1, stop_idx = length(transactions)) {
return(start_idx + which(transactions[start_idx:stop_idx] > suspicion_threshold) - 1)
}
)
BankAccount$methods(
is_suspicious_slurm = function(num_nodes) {
usingMethods(is_suspicious)
t <- length(transactions)
t_per_n <- floor(t/num_nodes)
starts <- seq(from = 1, length.out = num_nodes, by = t_per_n)
stops <- seq(from = t_per_n, length.out = num_nodes, by = t_per_n)
stops[num_nodes] <- t
sjob <- rslurm::slurm_apply(f = is_suspicious,
params = data.frame(start_idx = starts,
stop_idx = stops),
nodes = num_nodes,
add_objects = .self)
results_list <- rslurm::get_slurm_out(slr_job = sjob,
outtype = "raw",
wait = TRUE)
return(unlist(results_list))
}
)
Now, on my local machine I can run:
library(RCexampleforSE)
set.seed(27599)
b <- BankAccount$new()
b$transactions <- rnorm(n = 500)
b$suspicion_threshold <- 2
b$is_suspicious()
b$is_suspicious_slurm(num_nodes = 3)
and it works as expected:
62 103 155 171 182 188 297 398 493 499
If I run:
b$is_suspicious_slurm(num_nodes = 3)
I get an error, since my personal computer is not connected to a SLURM cluster.
sh: squeue: command not found
Cannot submit; no SLURM workload manager on path
Submission scripts output in directory _rslurm_13ba46e3c70b0
Error in rslurm::get_slurm_out(slr_job = sjob, outtype = "raw", wait = TRUE):
slr_job has not been submitted
If I logon to my university cluster, which uses SLURM, and run the same script, the setup and local methods work just as they did on my personal computer. When I run:
b$is_suspicious_slurm(num_nodes = 3)
it sends jobs to the cluster, as hoped for:
Submitted batch job 6363868
But these jobs error immediately with the following error message in slurm_0.out, slurm_1.out, and slurm_2.out:
Error in attr(, "mayCall") : argument 1 is empty
Execution halted
Thoughts and Attempts
I figure the job probably needs, but doesn't have available, the BankAccount object. So I tried passing it in as add_objects parameter to rslurm::slurm_apply:
sjob <- rslurm::slurm_apply(f = is_suspicious,
params = data.frame(start_idx = starts,
stop_idx = stops),
nodes = num_nodes,
add_objects = .self)
I also tried it in quotes and inside eval(), neither of which worked.
How can I make the object accessible to the worker jobs created with rslurm::slurm_apply?
Version 0.4.0 of rslurm completely solved this problem.
Define is_suspicious_slurm() as:
BankAccount$methods(
is_suspicious_slurm = function(num_nodes) {
usingMethods(is_suspicious)
t <- length(transactions)
t_per_n <- floor(t/num_nodes)
starts <- seq(from = 1, length.out = num_nodes, by = t_per_n)
stops <- seq(from = t_per_n, length.out = num_nodes, by = t_per_n)
stops[num_nodes] <- t
sjob <- rslurm::slurm_apply(f = is_suspicious,
params = data.frame(start_idx = starts,
stop_idx = stops),
nodes = num_nodes)
results_list <- rslurm::get_slurm_out(slr_job = sjob,
outtype = "raw",
wait = TRUE)
return(unlist(results_list))
}
)
The only change is that in the call to rslurm::slurm_apply, the add_objects parameter is not specified. It does not need to be specified because as #Ian pointed out:
"...you don't need to pass self at all when slurm_apply sends the serialized function, which appears to include both ".self" and "transactions" in the enclosing environment."
EDIT: OP's answer is all you need to know.
The add_objects parameter is used for passing a character vector, not the objects themselves. All the objects are then saved in one RData file, assuming they can be found by name. In theory, you should be able to use add_objects = c('.self') within your method definition.
The key here is, "assuming they can be found". I will edit this post once a pending update to the rslurm package (which should make that finding more successful) is released.
Be very careful passing objects to cluster nodes: they do not come back. Not only will any side effects be lost, there's no inter-node communication implemented by rslurm.
Also be careful with which :) Your is_suspicious method will be wrong for arguments that don't start at 1. Try this version:
BankAccount$methods(
is_suspicious = function(i = 1:length(transactions)) {
idx <- which(transactions[i] > suspicion_threshold)
i[idx]
}
)
I am new to R , I am trying to run example which is given in "rebmix-help pdf". It use galaxy dataset and here is the code
library(rebmix)
devAskNewPage(ask = TRUE)
data("galaxy")
write.table(galaxy, file = "galaxy.txt", sep = "\t",eol = "\n", row.names = FALSE, col.names = FALSE)
REBMIX <- array(list(NULL), c(3, 3, 3))
Table <- NULL
Preprocessing <- c("histogram", "Parzen window", "k-nearest neighbour")
InformationCriterion <- c("AIC", "BIC", "CLC")
pdf <- c("normal", "lognormal", "Weibull")
K <- list(7:20, 7:20, 2:10)
for (i in 1:3) {
for (j in 1:3) {
for (k in 1:3) {
REBMIX[[i, j, k]] <- REBMIX(Dataset = "galaxy.txt",
Preprocessing = Preprocessing[k], D = 0.0025,
cmax = 12, InformationCriterion = InformationCriterion[j],
pdf = pdf[i], K = K[[k]])
if (is.null(Table))
Table <- REBMIX[[i, j, k]]$summary
else Table <- merge(Table, REBMIX[[i, j,k]]$summary, all = TRUE, sort = FALSE)
}
}
}
It is giving me error ERROR:
unused argument (InformationCriterion = InformationCriterion[j])
Plz help
I'm running R 3.0.2 (Windows) and the library rebmix defines a function REBMIX where InformationCriterion is not listed as a named argument, but Criterion.
Brief invoke REBMIX as :
REBMIX[[i, j, k]] <- REBMIX(Dataset = "galaxy.txt",
Preprocessing = Preprocessing[k], D = 0.0025,
cmax = 12, Criterion = InformationCriterion[j],
pdf = pdf[i], K = K[[k]])
It looks as though there have been substantial changes to the rebmix package since the example mentioned in the OP was created. Among the most noticable changes is the use of S4 classes.
There's also an updated demo in the rebmix package using the galaxy data (see demo("rebmix.galaxy"))
To get the above example to produce results (Note: I am not familiar with this package or the rebmix algorithm!!!):
Change the argument to Criterion as mentioned by #Giupo
Use the S4 slot access operator # instead of $
Don't name the results object REDMIX because that's already the function name
library(rebmix)
data("galaxy")
## Don't re-name the REBMIX object!
myREBMIX <- array(list(NULL), c(3, 3, 3))
Table <- NULL
Preprocessing <- c("histogram", "Parzen window", "k-nearest neighbour")
InformationCriterion <- c("AIC", "BIC", "CLC")
pdf <- c("normal", "lognormal", "Weibull")
K <- list(7:20, 7:20, 2:10)
for (i in 1:3) {
for (j in 1:3) {
for (k in 1:3) {
myREBMIX[[i, j, k]] <- REBMIX(Dataset = list(galaxy),
Preprocessing = Preprocessing[k], D = 0.0025,
cmax = 12, Criterion = InformationCriterion[j],
pdf = pdf[i], K = K[[k]])
if (is.null(Table)) {
Table <- myREBMIX[[i, j, k]]#summary
} else {
Table <- merge(Table, myREBMIX[[i, j,k]]#summary, all = TRUE, sort = FALSE)
}
}
}
}
I guess this is late. But I encountered a similar problem just a few minutes ago. And I realized the real scenario that you may face when you got this kind of error msg... It's just the version conflict.
You may use a different version of the R package from the tutorial, thus the argument names could be different between what you are running and what the real code use.
So please check the version first before you try to manually edit the file. Also, it happens that your old version package is still in the path and it overrides the new one. This was exactly what I had... since I manually installed the old and new version separately...