How to capture warnings with the console output? - r

I am trying to capture complete console log of my R script. I want a chronological order of everything, warnings printed as they occur. I tried this:
options(warn = 1)
tmpSinkfileName <- tempfile()
sink(tmpSinkfileName, split = TRUE)
cat("Doing something\n")
warning("Hi here")
cat("Doing something else\n")
warning("Hi there")
sink()
console.out <- readChar(tmpSinkfileName, file.info(tmpSinkfileName)$size)
unlink(tmpSinkfileName)
cat(console.out)
# Doing something
# Doing something else
warnings()
# NULL
but unfortunatelly the warnings are missing in console.out. How can I do this? According to the docs, options(warn = 1) should print the warnings as they occur. Unfortunatelly, they were not captured by sink().

Almost got it, but it's pretty complicated and it's pretty annoying that unlike the standard output, the message output cannot be split, i.e. redirected to the file and kept in the output at the same time (UNIX tee behaviour)!
options(warn = 1)
tmpSinkfileName <- tempfile()
tmpFD <- file(tmpSinkfileName, open = "wt")
sink(tmpFD, split = TRUE)
sink(tmpFD, type = "message")
cat("Doing something\n")
warning("Hi here")
cat("Doing something else\n")
warning("Hi there")
sink(type = "message")
sink()
console.out <- readChar(tmpSinkfileName, file.info(tmpSinkfileName)$size)
unlink(tmpSinkfileName)
cat(console.out)
If I try
sink(tmpFD, type = "message", split = TRUE)
it says
Error in sink(tmpFD, type = "message", split = TRUE) : cannot split
the message connection
which is pretty annoying!

I wrote the following function to capture ouput and messages:
create_log <- function(logfile_name, path) {
if (file.exists(paste0(path, logfile_name))) {
file.remove(paste0(path, logfile_name))
}
fid <- file(paste0(path, logfile_name), open = "wt")
sink(fid, type = "message", split = F)
sink(fid, append = T, type = "output", split = T)
warning("Use closeAllConnections() in the end of the script")
}

Related

Error when trying to download species models in R with ebirdst::ebirdst_download()

I've requested an access key and set it. Sys.getenv("EBIRDST_KEY") returns the correct key. ebirdst_download(species = "Sharp-tailed Grouse") returns an error
Error in ebirdst_download(species = "Sharp-tailed Grouse") : Cannot
access Status and Trends data URL. Ensure that you have a working
internet connection and a valid API key for the Status and Trends
data.
My internet is working. Looking at the source code, I believe the error is generated because the function reads a url, essentially read_json(stringr::str_glue("{api_url}list-obj/{species}?key={key}")) which for some reason is null.
https://rdrr.io/github/CornellLabofOrnithology/ebirdst/src/R/ebirdst-loading.R?fbclid=IwAR1JYbCoD_VGwtZ0e1tz7yEPIR1buwN3GUyraZqokeS8rFTox4g3ceWRnns
But I can get it to work if I use the following lines of the source code. I'm not sure why the ebirdst function is failing.
species<-"Sharp-tailed Grouse"
path = rappdirs::user_data_dir("ebirdst")
species <- get_species(species)
which_run <- which(ebirdst::ebirdst_runs$species_code == species)
run <- ebirdst::ebirdst_runs$run_name[which_run]
key<-Sys.getenv("EBIRDST_KEY")
api_url <- "https://st-download.ebird.org/v1/"
list_obj_url <- stringr::str_glue("{api_url}list-obj/{species}?key={key}")
files <-jsonlite::read_json(list_obj_url, simplifyVector = TRUE)
files <- data.frame(file = files)
files <- files[!stringr::str_detect(files$file, "\\.db$"), , drop = FALSE]
files$src_path <- stringr::str_glue("{api_url}fetch?objKey={files$file}",
"&key={key}")
files$dest_path <- file.path(path, files$file)
files$exists <- file.exists(files$dest_path)
dirs <- unique(dirname(files$dest_path))
for (d in dirs) {
dir.create(d, showWarnings = FALSE, recursive = TRUE)
}
old_timeout <- getOption("timeout")
options(timeout = max(3000, old_timeout))
for (i in seq_len(nrow(files))) {
dl_response <- utils::download.file(files$src_path[i],
files$dest_path[i],
mode = "wb")
if (dl_response != 0) {
stop("Error downloading file: ", files$file[i])
}
}

How to read in parallel multiple chunks from the same connection in R?

I have a .bz2 file and I want to read it and do some processing. The file cannot be loaded in memory. I want to do some computations on the chunks I read and I they can be performed independently of one another and therefore I thought I would try to do it in parallel.
I tried the following:
library(parallel)
no_cores <- detectCores() - 1
cl <- makeCluster(no_cores)
con = file("myfile.bz2", "r")
parLapply(cl, con,
function(x)
print(head(read.csv(x, nrows = 100, stringsAsFactors = F, header = F, colClasses = "character", fill = F), 16)))
## Doesn't work Error in checkForRemoteErrors(val) :
one node produced an error: 'file' must be a character string or connection
parLapply(cl, list(con, con, con),
function(x)
print(head(read.csv(x, nrows = 100, stringsAsFactors = F, header = F, colClasses = "character", fill = F), 16)))
## Doesn't work Error in checkForRemoteErrors(val) :
3 nodes produced errors; first error: invalid connection
Can this somehow work?
Also any other recommendation as to how to go about it would be really helpful, as I am new to the world of parallel processing.
You cannot and must not use connections from one R process in another R process - connections are unique to the R session where they are created.
Internally, they are just integer indices and there is very little in R that protects you from mistakenly trying to use them in other R processes. If your want to know for details, see https://github.com/HenrikBengtsson/Wishlist-for-R/issues/81.
FWIW, if you use the future framework for parallelization and set R option future.globals.onReference to "error", then it protects you against this mistake (https://cran.r-project.org/web/packages/future/vignettes/future-4-non-exportable-objects.html). For example,
library(future.apply)
options(future.globals.onReference = "error")
library(parallel)
no_cores <- detectCores() - 1
cl <- makeCluster(no_cores)
cat("Hello world\n", file = bzfile("myfile.bz2", open="wb"))
con <- file("myfile.bz2", "r")
y <- future_lapply(list(con, con, con), FUN = function(x) {
data <- read.csv(x, nrows = 100, stringsAsFactors = FALSE, header = FALSE, colClasses = "character", fill = FALSE)
print(head(data), 16)
})
Error: Detected a non-exportable reference ('externalptr') in one of the globals (<unknown>) used in the future expression

File paths with drake on a shared drive

I am encountering some odd drake behaviour which I just can't figure out. I am trying to add a .rmd to my drake plan. I am working on a remote machine AND on a network drive on that machine. If I try to add an .rmd file to my plan like this:
> library(drake)
> library(rmarkdown)
>
> list.files()
[1] "drake_testing.Rproj" "foo.png" "report.Rmd"
>
> plan <- drake_plan(
+ png("foo.png"),
+ plot(iris$Sepal.Length ~ iris$Sepal.Width),
+ dev.off(),
+ report = render(
+ input = knitr_in("report.Rmd"),
+ output_file = "report.html",
+ quiet = TRUE
+ )
+
+ )
>
> plan
# A tibble: 4 x 2
target command
<chr> <expr>
1 drake_target_1 png("foo.png")
2 drake_target_2 plot(iris$Sepal.Length ~ iris$Sepal.Width)
3 drake_target_3 dev.off()
4 report render(input = knitr_in("report.Rmd"), output_file = "report.html", quiet = TRUE)
>
> ## Turn your plan into a set of instructions
> config <- drake_config(plan)
Error: The specified file is not readable: report.Rmd
>
> traceback()
13: stop(txt, obj, call. = FALSE)
12: .errorhandler("The specified file is not readable: ", object,
mode = errormode)
11: digest::digest(object = file, algo = config$hash_algorithm, file = TRUE,
serialize = FALSE)
10: rehash_file(file, config)
9: rehash_storage(target = target, file = file, config = config)
8: FUN(X[[i]], ...)
7: lapply(X = X, FUN = FUN, ...)
6: weak_mclapply(X = keys, FUN = FUN, mc.cores = jobs, ...)
5: lightly_parallelize_atomic(X = X, FUN = FUN, jobs = jobs, ...)
4: lightly_parallelize(X = knitr_files, FUN = storage_hash, jobs = config$jobs,
config = config)
3: cdl_get_knitr_hash(config)
2: create_drake_layout(plan = plan, envir = envir, verbose = verbose,
jobs = jobs_preprocess, console_log_file = console_log_file,
trigger = trigger, cache = cache)
1: drake_config(plan)
I have tried the following permutations to make this work:
Move the .rmd to the local drive and call it with the full path to there
Add in file.path inside and outside of knitr_in to complete a full path.
Try using file_in for each of the scenarios above.
I have also tried debugging but I get a little lost when drake turns the file name into a hash then turns it back into the basename of the file (i.e. report.Rmd). The error ultimately happens when digest::digest is called.
Does anyone have experience attempting to figure out something like this?
I think the answer depends on whether you get the same error when you call digest("report.Rmd", file = TRUE) on its own outside drake_config(plan). If it errors (which I am betting it does) there may be something strange about your file system that clashes with R. If that is the case, then there is unfortunately nothing drake can do.
I also suggest some changes to your plan:
plan <- drake_plan(
plot_step = {
png(file_out("foo.png")),
plot(iris$Sepal.Length ~ iris$Sepal.Width),
dev.off()
},
report = render(
input = knitr_in("report.Rmd"),
output_file = "report.html",
quiet = TRUE
)
)
Or better yet, compartmentalize your work in reusable functions:
plot_foo = function(filename) {
png(filename),
plot(iris$Sepal.Length ~ iris$Sepal.Width),
dev.off()
}
plan <- drake_plan(
foo = plot_foo(file_out("foo.png")),
report = render(
input = knitr_in("report.Rmd"),
output_file = "report.html",
quiet = TRUE
)
)
A target is a skippable workflow step with a meaningful return value and/or output file(s). png() and dev.off() are part of the plotting step, and file_out() tells drake to watch foo.png for changes. Also, it is good practice to name your targets. Usually, the return values of targets are meaningful, just like variables in R.

R - `try` in conjunction with capturing ALL console output?

Here's a piece of code I'm working with:
install.package('BiocManager');BiocManager::install('UniProt.ws')
requireNamespace('UniProt.ws')
uniprot_object <- UniProt.ws::UniProt.ws(
UniProt.ws::availableUniprotSpecies(
pattern = '^Homo sapiens$')$`taxon ID`)
query_results <- try(
UniProt.ws::select(
x = uniprot_object,
keys = 'BAA08084.1',
keytype = 'EMBL/GENBANK/DDBJ',
columns = c('ENSEMBL','UNIPROTKB')))
This particular key/keytype combination is non-productive and produces the following output:
Getting mapping data for BAA08084.1 ... and ACC
error while trying to retrieve data in chunk 1:
no lines available in input
continuing to try
Error in `colnames<-`(`*tmp*`, value = `*vtmp*`) :
attempt to set 'colnames' on an object with less than two dimensions
Of the two [eE]rrors reported only the second is a 'proper' R error object and given the use of try accordingly captured in the variable query_result.
I am, however, desperate to capture the other error bit (no lines available in input) to inform downstream programmatic processes.
After playing with a plethora of capture.output, sink, purrr::quietly, etc. options found by startpaging (googling), I continue to fail capturing that bit. How can I do that?
As #Csd suggested, you could use tryCatch. The message that you are after is printed by the message() function in R, not stop(), so try() will ignore it. To capture output from message(), use code like this:
query_results <- tryCatch(
UniProt.ws::select(
x = uniprot_object,
keys = 'BAA08084.1',
keytype = 'EMBL/GENBANK/DDBJ',
columns = c('ENSEMBL','UNIPROTKB')),
message = function(e) conditionMessage(e))
This will abort evaluation when it gets any message, and return the message in query_results. If you are doing more than debugging, you probably want the message saved, but evaluation to continue. In that case, use withCallingHandlers instead. For example,
saveMessages <- c()
query_results <- withCallingHandlers(
UniProt.ws::select(
x = uniprot_object,
keys = 'BAA08084.1',
keytype = 'EMBL/GENBANK/DDBJ',
columns = c('ENSEMBL','UNIPROTKB')),
message = function(e)
saveMessages <<- c(saveMessages, conditionMessage(e)))
When I run this version, query_results is unchanged (because the later error aborted execution), but the messages are saved:
saveMessages
[1] "Getting mapping data for BAA08084.1 ... and ACC\n"
[2] "error while trying to retrieve data in chunk 1:\n no lines available in input\ncontinuing to try\n"
Based on #user2554330 s most excellent answer, I constructed an ugly thing that does exactly what I want:
try to execute the statement
don't fail fatally
leave no ugly messages
allow me access to errors and messages
So here it is in all it's despicable glory:
saveMessages <- c()
query_results <- suppressMessages(
withCallingHandlers(
try(
UniProt.ws::select(
x = uniprot_object,
keys = 'BAA08084.1',
keytype = 'EMBL/GENBANK/DDBJ',
columns = c('ENSEMBL','UNIPROTKB')),
silent = TRUE),
message = function(e)
saveMessages <<- c(saveMessages, conditionMessage(e))))

starting a function in asynch mode in R, as a separate process

I am looking for the ability to start R processes Asynchronously from within R.
Something like the below function
startFunctionInAsynchMode<-function(workingDir,filesToSource, functionName, ...){
#workingdir - the dir that should be set as wd
#filesToSource - vector of fileNames to be sourced
#functionName - the actual function to be run asynchrously
#... - other parameters to be passed to the function
#Return Value - should be the System Process Id Started
}
Would anyone have quick ideas? I checked packages like parallel etc. but doesn't seem to fit.
Thanks in advance
Here is an implementation using R CMD. Basic version tested. And with some open items.
startFunctionInAsynchMode<-function(workingDir,filesToSource, functionName, ...){
wd<-getwd()
setwd(workingDir)
fs<-makeFiles()
scriptFile<-fs$ScriptFile
cat(file=scriptFile,paste0("source(\"",filesToSource,"\")", collapse = "\n"))
cat(file=scriptFile,"\n",append = T)
functionCall<-getFunctionCall(functionName,as.list(match.call()), startIndex=5)
cat(file=scriptFile,functionCall,append = T)
commandsToRun <- paste0("(R CMD BATCH ", scriptFile, " ",fs$LogFile , " --slave ) &")
print(commandsToRun)
system(commandsToRun)
Sys.sleep(5)
pids<-getPids(scriptFile, "--restore")
cat(file=fs$KillScript,paste0("kill -9 ",pids$PID[1]))
setwd(wd)
return(as.character(pids$PID[1]))
}
makeFiles<-function(){
res<-list()
dir.create("./temp/tempRgen", recursive=T,showWarnings = F)
tf<-tempfile("rGen-","./temp/tempRgen", fileext = "")
res$ScriptFile<-paste0(tf,".R")
res$LogFile<-paste0(tf,".log")
res$KillScript<-paste0(tf,"-kill.sh")
file.create(res$KillScript,showWarnings = F)
file.create(res$ScriptFile,showWarnings = F)
res
}
#Open Items to be handled
#1. Named Arguments
#2. Non String Arguments
getFunctionCall<-function(functionName,argList,startIndex){
res<-paste0(functionName,"(")
if(!is.null(argList)){
if(length(argList)>=startIndex){
first=T
for(i in startIndex:length(argList)){
if(first){
first=F
} else {
res<-paste0(res,",")
}
res<-paste0(res,"\"",argList[[i]],"\"")
}
}
}
res<-paste0(res,")")
}
getPids <- function(grepFor, refineWith){
numCols <- length(unlist(str_split(system("ps aux", intern=T)[1], "\\s+")))
psOutput <- system(paste0("ps auxww | grep ", grepFor), intern=T)
psOutput <- psOutput[str_detect(psOutput, refineWith)]
pidDf <- ldply(psOutput, parseEachPsLine)
# Remove the process that actually grep-ed for my search string
pidDf <- pidDf[!str_detect(pidDf$COMMAND, "grep"),]
return(pidDf)
}
parseEachPsLine <- function(line){
tabular <- read.table(textConnection(line), header=F, sep=" ")
tabular <- tabular[!is.na(tabular)]
psTitles <- c("USER", "PID", "CPU", "MEM", "VSZ", "RSS", "TTY", "STAT", "START", "TIME", "COMMAND")
psColNames <- setNames(seq(1, length(psTitles)), psTitles)
COMMAND <- paste0(tabular[(psColNames["COMMAND"]):length(tabular)], collapse=" ")
return(data.frame("PID"=tabular[psColNames["PID"]], "STARTED"=tabular[psColNames["START"]], "COMMAND"=COMMAND, "STATUS"=tabular[psColNames["STAT"]]))
}

Resources