How to get the queue number from CONDOR into your R job - r

I think I have a simple problem because I was looking up and down the internet and couldn't find someone else asking this question:
My university has a Condor set-up. I want to run several repetitions of the same code (e.g. 100 times). My R code has a routine to store the results in a file, i.e.:
write.csv(res, file=paste(paste(paste(format(Sys.time(), '%y%m%d'),'res', queue, sep="_"), sep='/'),'.csv',sep='',collapse=''))
res are my results (a data.frame), I indicate that this file contains the results with 'res' and finally I want to add the queue number of this calculation (otherwise files would be replaced, wouldn't they?). It should look like: 140109_res_1.csv, 140109_res_2.csv, ...
My submit file to condor looks like this:
universe = vanilla
executable = /usr/bin/R
arguments = --vanilla
log = testR.log
error = testR.err
input = run_condor.r
output = testR$(Process).txt
requirements = (opsys == "LINUX") && (arch == "X86_64") && (HAS_R_2_13 =?= True)
request_memory = 1000
should_transfer_files = YES
transfer_executable = FALSE
when_to_transfer_output = ON_EXIT
queue 3
I wonder how do I get the 'queue' number into my R code? I tried a simple example with
print(queue)
print(Queue)
But there is no object found called queue or Queue. Any suggestions?
Best wishes,
Marco

Okay, I solved the problem. This is how it goes:
I had to change my submit file. I changed the slot arguments to:
arguments = --vanilla --args $(Process)
Now the process number is forwarded to the R code. There you retrieve it with the following line. The value will be stored as a character. Therefore, you should convert it to a numeric value (also check whether a number like 10 is passed on as '1' and '0' in which case you should also collapse the values).
run <- commandArgs(TRUE)
Here is an example of the code I let run.
> run <- commandArgs(TRUE)
> run
[1] "0"
> class(run)
[1] "character"
> try(as.numeric(run))
[1] 0
> try(run <- as.numeric(paste(run, collapse='')) )
> try(print(run))
[1] 0
> try(write(run, paste(run,'csv', sep='.')))
You can also find information how to pass on variables/arguments to your code here: http://research.cs.wisc.edu/htcondor/manual/v7.6/condor_submit.html
I hope this helps anyone.
Cheers and thanks for all other commenters!
Marco

Related

R loop completes only 3 iterations out of 2504

I've written a function to download multiple files from NOAA's database. Firstly, I've got sites which is a list of site ID's that I want to download off the website. It looks like this:
> head(sites)
[[1]]
[1] "9212"
[[2]]
[1] "10158"
[[3]]
[1] "11098"
> length(sites)
[1] 2504
My function is shown below.
tested<-lapply(seq_along(sites), function(x) {
no<-sites[[x]]
data=GET(paste0('https://www.ncdc.noaa.gov/paleo-search/data/search.json?xmlId=', no))
v<-content(data)
check=GET(v$statusUrl)
j<-content(check)
URL<-j$archive
download.file(URL, destfile=paste0('./tree_ring/', no, '.zip'))
})
The weird issue is that it works for the first three sites (downloads properly), but then it stops after the three sites and throws the following error:
Error in charToRaw(URL) : argument must be a character vector of length 1
I've tried manually downloading the 4th and 5th site (using the same code as above, but not within function) and it works fine. What could be going on here?
EDIT 1: Showing more site ID's as requested
> dput(sites[1:6])
list("9212", "10158", "11098", "15757", "15777", "15781")
I converted your code to a for loop so I could see the most recent values of all your variables when things fail.
The fails aren't consistently on the 4th site. Running your code a few times, sometimes it fails on 2, or 3, or 4. When it fails, if I look at j, I see this:
$message
[1] "finalizing archive"
$status
[1] "working"
$message
[1] "finalizing archive"
$status
[1] "working"
If I re-run check=GET(v$statusUrl); j<-content(check) a few seconds later, then I see
$archive
[1] "https://www.ncdc.noaa.gov/web-content/paleo/bundle/1986420067_2020-04-23.zip"
$status
[1] "complete"
So, I think it takes the server a little bit of time to prepare the file for download, and sometimes R asks for the file before it's ready, which causes an error. A simple fix might look like this:
check_status <- function(v) {
check <- GET(v$statusUrl)
content(check)
}
for(x in seq_along(sites)) {
no<-sites[[x]]
data=GET(paste0('https://www.ncdc.noaa.gov/paleo-search/data/search.json?xmlId=', no))
v<-content(data)
try_counter <- 0
j <- check_status(v)
while(j$status != "complete" & try_counter < 100) {
Sys.sleep(0.1)
j <- check_status(v)
}
URL<-j$archive
download.file(URL, destfile=paste0(no, '.zip'))
}
If the status isn't ready, this version will wait 0.1 seconds before checking again, up to 10 seconds.

R - Trycatch is saving warning instead of returning function output

I am trying to download records from twitter using rtweet. One issue with this is the twitter server needs to wait 15minutes every 18000 records. So, after record number 18000, I receive a data frame with all the records and a nice warning telling me to wait for a bit. search_tweets has an function argument to download more than 18000 records called retryonratelimit. However, this isnt working so I am exploring other options.
I have produced a function, incorporating tryCatch to address this. However, when the warning at 18000 records pops up, tryCatch is saving the warning rather than the data frame which should be spit out before the warning. Something it would not do if 17999 records were downloaded
library(rtweet)
library(RDCOMClient)
library(profvis)
TwitScrape = function(SearchTerm){
ReturnDF = tryCatch({
TempList=NULL
Temp = search_tweets(SearchTerm,n=18000)
TempList = list(as.data.frame(Temp), SearchTerm)
return(TempList)
},
warning = function(TempList){
Comb=NULL
MAXID = min(TempList[[1]]$status_id)
message("Delay for 15 minutes to accommodate server download limits")
pause(901)
TempWarn = search_tweets(TempList[[2]],n=18000, max_id=MAXID)
TempWarn = as.data.frame(TempWarn)
Comb = rbind(TempList[[1]], TempWarn)
CombList = list(Comb, TempList[[2]])
return(CombList)
}
)
}
Searches = c("#MUFC","#LFC", "#MCFC")
TestExpandList=NULL
TestExpand=NULL
TestExpand2=NULL
for (i in seq_along(Searches)){
TestExpandList = TwitScrape(SearchTerm = Searches[i])
TestExpand = TestExpandList[[1]]
TestExpand$Cat = Searches[i]
TestExpand$DownloadDate = Sys.Date()
TestExpand2 = rbind(TestExpand2, TestExpand)
}
I hope this makes sense. If I can offer any more information please let me know. In summary, why is tryCatch saving my warning rather than the data frame I want?
I am not 100% sure what you would like to achieve, but it seems you are using tryCatch with a wrong understanding.
The argument in the warning-handler warning = function(TempList) is the warning itself, i.e. you have named it TempList, but that doesn't mean it will become your TempList variable, it will still just pass the warning into the handler.
Your function TwitScrape is returning ReturnDF by convention, as you are not properly returning anything, I guess that is still what you want and ok.
I would try to re-structure your solution without tryCatch
Thanks for your comments. RolandASc, you were right. I went back to the drawing board. See the working TwitScrape function below:
TwitScrape = function(SearchTerm){
DF=NULL
DF = search_tweets(SearchTerm,n=18001)
Warn = warnings()
if (names(Warn[1]) == "Rate limit exceeded - 88"){
message("paused")
pause(910)
DF2 = search_tweets(SearchTerm,n=18000, max_id = min(DF$status_id))
DF3 = rbind(DF, DF2)
return(DF3)
}
else {
return(DF)
}}

Prompt 'Yes' every time to getFilings

I am going to download the 2005 10-Ks for several corporations in R using the EDGAR package. I have a mini loop to test which is working:
for (CIK in c(789019, 777676, 849399)){
getFilings(2005,CIK,'10-K')
}
However each time this runs I get a yes/no prompt and I have to type 'yes':
Total number of filings to be downloaded=1. Do you want to download (yes/no)? yes
Total number of filings to be downloaded=1. Do you want to download (yes/no)? yes
Total number of filings to be downloaded=1. Do you want to download (yes/no)? yes
How can I prompt R to answer 'yes' for each run? Thank you
Please remember to include a minimal reproducible example in your question, including library(...) and all other necessary commands:
library(edgar)
report <- getMasterIndex(2005)
We can bypass the prompt by doing some code surgery. Here, we retrieve the code for getFilings, and replace the line that asks for the prompt with just a message. We then write the new function (my_getFilings) to a temporary file, and source that file:
x <- capture.output(dput(edgar::getFilings))
x <- gsub("choice <- .*", "cat(paste(msg3, '\n')); choice <- 'yes'", x)
x <- gsub("^function", "my_getFilings <- function", x)
writeLines(x, con = tmp <- tempfile())
source(tmp)
Everything downloads fine:
for (CIK in c(789019, 777676, 849399)){
my_getFilings(2005, CIK, '10-K')
}
list.files(file.path(getwd(), "Edgar filings"))
# [1] "777676_10-K_2005" "789019_10-K_2005" "849399_10-K_2005"

Rscript: How to inject options for an R script [duplicate]

I've got a R script for which I'd like to be able to supply several command-line parameters (rather than hardcode parameter values in the code itself). The script runs on Windows.
I can't find info on how to read parameters supplied on the command-line into my R script. I'd be surprised if it can't be done, so maybe I'm just not using the best keywords in my Google search...
Any pointers or recommendations?
Dirk's answer here is everything you need. Here's a minimal reproducible example.
I made two files: exmpl.bat and exmpl.R.
exmpl.bat:
set R_Script="C:\Program Files\R-3.0.2\bin\RScript.exe"
%R_Script% exmpl.R 2010-01-28 example 100 > exmpl.batch 2>&1
Alternatively, using Rterm.exe:
set R_TERM="C:\Program Files\R-3.0.2\bin\i386\Rterm.exe"
%R_TERM% --no-restore --no-save --args 2010-01-28 example 100 < exmpl.R > exmpl.batch 2>&1
exmpl.R:
options(echo=TRUE) # if you want see commands in output file
args <- commandArgs(trailingOnly = TRUE)
print(args)
# trailingOnly=TRUE means that only your arguments are returned, check:
# print(commandArgs(trailingOnly=FALSE))
start_date <- as.Date(args[1])
name <- args[2]
n <- as.integer(args[3])
rm(args)
# Some computations:
x <- rnorm(n)
png(paste(name,".png",sep=""))
plot(start_date+(1L:n), x)
dev.off()
summary(x)
Save both files in the same directory and start exmpl.bat. In the result you'll get:
example.png with some plot
exmpl.batch with all that was done
You could also add an environment variable %R_Script%:
"C:\Program Files\R-3.0.2\bin\RScript.exe"
and use it in your batch scripts as %R_Script% <filename.r> <arguments>
Differences between RScript and Rterm:
Rscript has simpler syntax
Rscript automatically chooses architecture on x64 (see R Installation and Administration, 2.6 Sub-architectures for details)
Rscript needs options(echo=TRUE) in the .R file if you want to write the commands to the output file
A few points:
Command-line parameters are
accessible via commandArgs(), so
see help(commandArgs) for an
overview.
You can use Rscript.exe on all platforms, including Windows. It will support commandArgs(). littler could be ported to Windows but lives right now only on OS X and Linux.
There are two add-on packages on CRAN -- getopt and optparse -- which were both written for command-line parsing.
Edit in Nov 2015: New alternatives have appeared and I wholeheartedly recommend docopt.
Add this to the top of your script:
args<-commandArgs(TRUE)
Then you can refer to the arguments passed as args[1], args[2] etc.
Then run
Rscript myscript.R arg1 arg2 arg3
If your args are strings with spaces in them, enclose within double quotes.
Try library(getopt) ... if you want things to be nicer. For example:
spec <- matrix(c(
'in' , 'i', 1, "character", "file from fastq-stats -x (required)",
'gc' , 'g', 1, "character", "input gc content file (optional)",
'out' , 'o', 1, "character", "output filename (optional)",
'help' , 'h', 0, "logical", "this help"
),ncol=5,byrow=T)
opt = getopt(spec);
if (!is.null(opt$help) || is.null(opt$in)) {
cat(paste(getopt(spec, usage=T),"\n"));
q();
}
Since optparse has been mentioned a couple of times in the answers, and it provides a comprehensive kit for command line processing, here's a short simplified example of how you can use it, assuming the input file exists:
script.R:
library(optparse)
option_list <- list(
make_option(c("-n", "--count_lines"), action="store_true", default=FALSE,
help="Count the line numbers [default]"),
make_option(c("-f", "--factor"), type="integer", default=3,
help="Multiply output by this number [default %default]")
)
parser <- OptionParser(usage="%prog [options] file", option_list=option_list)
args <- parse_args(parser, positional_arguments = 1)
opt <- args$options
file <- args$args
if(opt$count_lines) {
print(paste(length(readLines(file)) * opt$factor))
}
Given an arbitrary file blah.txt with 23 lines.
On the command line:
Rscript script.R -h outputs
Usage: script.R [options] file
Options:
-n, --count_lines
Count the line numbers [default]
-f FACTOR, --factor=FACTOR
Multiply output by this number [default 3]
-h, --help
Show this help message and exit
Rscript script.R -n blah.txt outputs [1] "69"
Rscript script.R -n -f 5 blah.txt outputs [1] "115"
you need littler (pronounced 'little r')
Dirk will be by in about 15 minutes to elaborate ;)
In bash, you can construct a command line like the following:
$ z=10
$ echo $z
10
$ Rscript -e "args<-commandArgs(TRUE);x=args[1]:args[2];x;mean(x);sd(x)" 1 $z
[1] 1 2 3 4 5 6 7 8 9 10
[1] 5.5
[1] 3.027650
$
You can see that the variable $z is substituted by bash shell with "10" and this value is picked up by commandArgs and fed into args[2], and the range command x=1:10 executed by R successfully, etc etc.
FYI: there is a function args(), which retrieves the arguments of R functions, not to be confused with a vector of arguments named args
If you need to specify options with flags, (like -h, --help, --number=42, etc) you can use the R package optparse (inspired from Python):
http://cran.r-project.org/web/packages/optparse/vignettes/optparse.pdf.
At least this how I understand your question, because I found this post when looking for an equivalent of the bash getopt, or perl Getopt, or python argparse and optparse.
I just put together a nice data structure and chain of processing to generate this switching behaviour, no libraries needed. I'm sure it will have been implemented numerous times over, and came across this thread looking for examples - thought I'd chip in.
I didn't even particularly need flags (the only flag here is a debug mode, creating a variable which I check for as a condition of starting a downstream function if (!exists(debug.mode)) {...} else {print(variables)}). The flag checking lapply statements below produce the same as:
if ("--debug" %in% args) debug.mode <- T
if ("-h" %in% args || "--help" %in% args)
where args is the variable read in from command line arguments (a character vector, equivalent to c('--debug','--help') when you supply these on for instance)
It's reusable for any other flag and you avoid all the repetition, and no libraries so no dependencies:
args <- commandArgs(TRUE)
flag.details <- list(
"debug" = list(
def = "Print variables rather than executing function XYZ...",
flag = "--debug",
output = "debug.mode <- T"),
"help" = list(
def = "Display flag definitions",
flag = c("-h","--help"),
output = "cat(help.prompt)") )
flag.conditions <- lapply(flag.details, function(x) {
paste0(paste0('"',x$flag,'"'), sep = " %in% args", collapse = " || ")
})
flag.truth.table <- unlist(lapply(flag.conditions, function(x) {
if (eval(parse(text = x))) {
return(T)
} else return(F)
}))
help.prompts <- lapply(names(flag.truth.table), function(x){
# joins 2-space-separatated flags with a tab-space to the flag description
paste0(c(paste0(flag.details[x][[1]][['flag']], collapse=" "),
flag.details[x][[1]][['def']]), collapse="\t")
} )
help.prompt <- paste(c(unlist(help.prompts),''),collapse="\n\n")
# The following lines handle the flags, running the corresponding 'output' entry in flag.details for any supplied
flag.output <- unlist(lapply(names(flag.truth.table), function(x){
if (flag.truth.table[x]) return(flag.details[x][[1]][['output']])
}))
eval(parse(text = flag.output))
Note that in flag.details here the commands are stored as strings, then evaluated with eval(parse(text = '...')). Optparse is obviously desirable for any serious script, but minimal-functionality code is good too sometimes.
Sample output:
$ Rscript check_mail.Rscript --help
--debug Print variables rather than executing function XYZ...
-h --help Display flag definitions

Get function's title from documentation

I would like to get the title of a base function (e.g.: rnorm) in one of my scripts. That is included in the documentation, but I have no idea how to "grab" it.
I mean the line given in the RD files as \title{} or the top line in documentation.
Is there any simple way to do this without calling Rd_db function from tools and parse all RD files -- as having a very big overhead for this simple stuff? Other thing: I tried with parse_Rd too, but:
I do not know which Rd file holds my function,
I have no Rd files on my system (just rdb, rdx and rds).
So a function to parse the (offline) documentation would be the best :)
POC demo:
> get.title("rnorm")
[1] "The Normal Distribution"
If you look at the code for help, you see that the function index.search seems to be what is pulling in the location of the help files, and that the default for the associated find.packages() function is NULL. Turns out tha tthere is neither a help fo that function nor is exposed, so I tested the usual suspects for which package it was in (base, tools, utils), and ended up with "utils:
utils:::index.search("+", find.package())
#[1] "/Library/Frameworks/R.framework/Resources/library/base/help/Arithmetic"
So:
ghelp <- utils:::index.search("+", find.package())
gsub("^.+/", "", ghelp)
#[1] "Arithmetic"
ghelp <- utils:::index.search("rnorm", find.package())
gsub("^.+/", "", ghelp)
#[1] "Normal"
What you are asking for is \title{Title}, but here I have shown you how to find the specific Rd file to parse and is sounds as though you already know how to do that.
EDIT: #Hadley has provided a method for getting all of the help text, once you know the package name, so applying that to the index.search() value above:
target <- gsub("^.+/library/(.+)/help.+$", "\\1", utils:::index.search("rnorm",
find.package()))
doc.txt <- pkg_topic(target, "rnorm") # assuming both of Hadley's functions are here
print(doc.txt[[1]][[1]][1])
#[1] "The Normal Distribution"
It's not completely obvious what you want, but the code below will get the Rd data structure corresponding to the the topic you're interested in - you can then manipulate that to extract whatever you want.
There may be simpler ways, but unfortunately very little of the needed coded is exported and documented. I really wish there was a base help package.
pkg_topic <- function(package, topic, file = NULL) {
# Find "file" name given topic name/alias
if (is.null(file)) {
topics <- pkg_topics_index(package)
topic_page <- subset(topics, alias == topic, select = file)$file
if(length(topic_page) < 1)
topic_page <- subset(topics, file == topic, select = file)$file
stopifnot(length(topic_page) >= 1)
file <- topic_page[1]
}
rdb_path <- file.path(system.file("help", package = package), package)
tools:::fetchRdDB(rdb_path, file)
}
pkg_topics_index <- function(package) {
help_path <- system.file("help", package = package)
file_path <- file.path(help_path, "AnIndex")
if (length(readLines(file_path, n = 1)) < 1) {
return(NULL)
}
topics <- read.table(file_path, sep = "\t",
stringsAsFactors = FALSE, comment.char = "", quote = "", header = FALSE)
names(topics) <- c("alias", "file")
topics[complete.cases(topics), ]
}

Resources