A better way to extract functions from an R script? - r

Say I have a file "myfuncs.R" with a few functions in it:
A <- function(x) x
B <- function(y) y
C <- function(z) z
I want to place all the functions contained within "myfuncs.R" into their own files, named appropriately. I have a simple Bash-shell script to extract functions and place them in separate files:
split -p "function\(" myfuncs.R tmpfunc
grep "function(" tmpfunc* | awk '{
# strip first-instances of function assignment
sub("<-", " ")
sub("=", " ")
sub(":", " ") # and colon introduced by grep
mv=$1
mvto=sprintf("func_%s.R",$2)
print "mv", mv, mvto
}' | sh
leaving me with:
func_A.R
func_B.R
func_C.R
But, this script has obvious limitations. For example, it will misbehave when function 'A' has a nested function:
A <- function(x){
Aa <- function(x){x}
return(Aa)
}
and outright fails if the whole function is on a single line.
Does anyone know of a more robust, and less error-prone method to do this?

Source your functions and then type package.skeleton()
Separate files will be made for each function.

Related

Why is parse(text=) throwing an error when there is a comma in the text?

Inside of a Shiny app I am combining levels to make a ID that will be used for ggplot faceting.
Here is an example of what works when doing it by hand:
paste0(data[[1]],', ',data[[2]])
However, the user can select whatever and how ever many identifying columns they want, so I am automating it like this:
First I build combo_ID_build from whatever columns they have selected in UI1 (which is numeric with column numbers for whatever the user selected).
for (i in UI1){
combo_ID_build<-paste0(combo_ID_build,"data[[",i,"]]")
if(i != tail(UI1,1)){
combo_ID_build<-paste0(combo_ID_build,",', ',")
}
}
Then I use eval(parse())
ID <- paste0(eval(parse(text = combo_ID_build)))
So for this example, the user selects columns 1 and 2, so combo_ID_build would be "data[[1]],', ',data[[2]]" where str(combo_ID_build) is chr.
However, when parsed this throws an error, "Error in parse(text = combo_ID_build) : :1:10: unexpected ','
1: data[[1]],"
I tried escaping the comma with \ and \\ but it only throws an error about the backslashes being unrecognized escape strings.
What am I doing wrong here?
You eval(parse()) tries to parse and evaluate an expression. You run
eval(parse(text = "data[[1]],', ',data[[2]]"))
That's equivalent to running the command
data[[1]], ', ', data[[2]]
which indeed is a syntax error. It doesn't matter that you wrapped it in paste0, when you have foo(bar()) foo() will run on the output of bar(). If bar() can't run on its own, there will be an error. You have paste0(eval(parse()), paste0() will run on the output of eval(). So what you eval() needs to run on its own before paste0() is called.
eval(parse()) should usually be avoided. In this case, I think you're looking for do.call, which lets you call a function on a set of arguments in a list. Let's also use paste with sep = ", " to simplify things:
# no loop or eval/parse needed
combo_ID_build = do.call(what = \(x) paste(x, sep = ", "), args = data[UI1])
In this particular case, we can simplify even more by using the utility function interaction() which is made for this use case:
combo_ID_build = interaction(data[UI1], sep = ", ")
I'll leave you with this:
fortunes::fortune(106)
# If the answer is parse() you should usually rethink the question.
# -- Thomas Lumley
# R-help (February 2005)

R: Parse a line like a shell command line

I have a key=value file like follows:
a=foo
b="foo bar"
c="foo \"bar\""
d="foo \'bar\'"
Values appear to be quoted/escaped when there are special characters with a logic similar to the shell.
To parse the file into a named vector, the straight way is:
txt <- readLines("dat")
spl <- strsplit(txt, "=")
vec <- setNames(sapply(spl, `[`, 2),
sapply(spl, `[`, 1))
which gives:
a b c d
"foo" "\"foo bar\"" "\"foo \\\"bar\\\"\"" "\"foo \\'bar\\'\""
Now, just like for the shell quotes in mkdir "new folder" are not literal, the same is above.
Therefore, I reparse vec with:
prs <- function(el) if(substr(el, 1, 1) == "\"") eval(parse(text= el)) else el
sapply(vec, prs)
That is:
a b c d
"foo" "foo bar" "foo \"bar\"" "foo 'bar'"
I wonder if, rather than my naive prs function, there is an established one, perhaps from a library, to parse cli-like lines. What I have found so far, assumes R scripts, where commandArgs() already makes the tokenising task.

How to add variables to several functions in R and run them in the command line

I have a script that is composed of several functions. A summarised example of my script looks like that
>Test.R
massive.process_1 <- function() {
seed(123)
x <- do_something()
save(x, '/home/Result1.RData')
}
massive.process_2 <- function() {
seed(4)
x <- do_something()
save(x, '/home/Result2.RData')
}
massive.process_1()
massive.process_2()
I have to execute this script but instead of 2 _massive.processs_I need to run 100 of them but changing the seed value and the name of the data saved in each step. I can do it manually, adding 100 massive.process functions but I would like to know if is there any way to put it on a script to avoid typing 100 functions?
Many thanks
My bash file to run it is the following:
#!/bin/bash
echo Started analysis at: `date`
rfile="Test.R"
Rscript $rfile
echo Finished analysis at: `date`
Adding to Dennis's answer...
to change the filename you can use "paste".
massive.process <- function(i) {
seed(i)
x <- do_something()
outname = paste("/home/Result", i, ".RData", sep="")
save(x, outname)
x
}
for (i in 1:100){
massive.process(i);
}
or
X = lapply(1:100, massive.process)
If you use the list approach, to access the ith x, just use X[i]
another way to write the lapply loop is with an anonymous function. This might make more clear what's going on.
X = lapply(1:100, function(i){
massive.process(i)
})
The previous notation is the same, just more compact.
Why not adding the seed as parameter to the functions?
massive.process <- function(seedValue) {...}
And it would probably a good idea to implement the loop in R instead of using a shell script.

Passing multiple arguments via command line in R

I am trying to pass multiple file path arguments via command line to an Rscript which can then be processed using an arguments parser. Ultimately I would want something like this
Rscript test.R --inputfiles fileA.txt fileB.txt fileC.txt --printvar yes --size 10 --anotheroption helloworld -- etc...
passed through the command line and have the result as an array in R when parsed
args$inputfiles = "fileA.txt", "fileB.txt", "fileC.txt"
I have tried several parsers including optparse and getopt but neither of them seem to support this functionality. I know argparse does but it is currently not available for R version 2.15.2
Any ideas?
Thanks
Although it wasn't released on CRAN when this question was asked a beta version of the argparse module is up there now which can do this. It is basically a wrapper around the popular python module of the same name so you need to have a recent version of python installed to use it. See install notes for more info. The basic example included sums an arbitrarily long list of numbers which should not be hard to modify so you can grab an arbitrarily long list of input files.
> install.packages("argparse")
> library("argparse")
> example("ArgumentParser")
The way you describe command line options is different from the way that most people would expect them to be used. Normally, a command line option would take a single parameter, and parameters without a preceding option are passed as arguments. If an argument would take multiple items (like a list of files), I would suggest parsing the string using strsplit().
Here's an example using optparse:
library (optparse)
option_list <- list ( make_option (c("-f","--filelist"),default="blah.txt",
help="comma separated list of files (default %default)")
)
parser <-OptionParser(option_list=option_list)
arguments <- parse_args (parser, positional_arguments=TRUE)
opt <- arguments$options
args <- arguments$args
myfilelist <- strsplit(opt$filelist, ",")
print (myfilelist)
print (args)
Here are several example runs:
$ Rscript blah.r -h
Usage: blah.r [options]
Options:
-f FILELIST, --filelist=FILELIST
comma separated list of files (default blah.txt)
-h, --help
Show this help message and exit
$ Rscript blah.r -f hello.txt
[[1]]
[1] "hello.txt"
character(0)
$ Rscript blah.r -f hello.txt world.txt
[[1]]
[1] "hello.txt"
[1] "world.txt"
$ Rscript blah.r -f hello.txt,world.txt another_argument and_another
[[1]]
[1] "hello.txt" "world.txt"
[1] "another_argument" "and_another"
$ Rscript blah.r an_argument -f hello.txt,world.txt,blah another_argument and_another
[[1]]
[1] "hello.txt" "world.txt" "blah"
[1] "an_argument" "another_argument" "and_another"
Note that for the strsplit, you can use a regular expression to determine the delimiter. I would suggest something like the following, which would let you use commas or colons to separate your list:
myfilelist <- strsplit (opt$filelist,"[,:]")
In the front of your script test.R, you put this :
args <- commandArgs(trailingOnly = TRUE)
hh <- paste(unlist(args),collapse=' ')
listoptions <- unlist(strsplit(hh,'--'))[-1]
options.args <- sapply(listoptions,function(x){
unlist(strsplit(x, ' '))[-1]
})
options.names <- sapply(listoptions,function(x){
option <- unlist(strsplit(x, ' '))[1]
})
names(options.args) <- unlist(options.names)
print(options.args)
to get :
$inputfiles
[1] "fileA.txt" "fileB.txt" "fileC.txt"
$printvar
[1] "yes"
$size
[1] "10"
$anotheroption
[1] "helloworld"
After searching around, and avoiding to write a new package from the bottom up, I figured the best way to input multiple arguments using the package optparse is to separate input files by a character which is most likely illegal to be included in a file name (for example, a colon)
Rscript test.R --inputfiles fileA.txt:fileB.txt:fileC.txt etc...
File names can also have spaces in them as long as the spaces are escaped (optparse will take care of this)
Rscript test.R --inputfiles file\ A.txt:file\ B.txt:fileC.txt etc...
Ultimatley, it would be nice to have a package (possibly a modified version of optparse) that would support multiple arguments like mentioned in the question and below
Rscript test.R --inputfiles fileA.txt fileB.txt fileC.txt
One would think such trivial features would be implemented into a widely used package such as optparse
Cheers
#agstudy's solution does not work properly if input arguments are lists of the same length. By default, sapply will collapse inputs of the same length into a matrix rather than a list. The fix is simple enough, just explicitly set simplify to false in the sapply parsing the arguments.
args <- commandArgs(trailingOnly = TRUE)
hh <- paste(unlist(args),collapse=' ')
listoptions <- unlist(strsplit(hh,'--'))[-1]
options.args <- sapply(listoptions,function(x){
unlist(strsplit(x, ' '))[-1]
}, simplify=FALSE)
options.names <- sapply(listoptions,function(x){
option <- unlist(strsplit(x, ' '))[1]
})
names(options.args) <- unlist(options.names)
print(options.args)
I had this same issue, and the workaround that I developed is to adjust the input command line arguments before they are fed to the optparse parser, by concatenating whitespace-delimited input file names together using an alternative delimiter such as a "pipe" character, which is unlikely to be used as part of a file name.
The adjustment is then reversed at the end again, by removing the delimiter using str_split().
Here is some example code:
#!/usr/bin/env Rscript
library(optparse)
library(stringr)
# ---- Part 1: Helper Functions ----
# Function to collapse multiple input arguments into a single string
# delimited by the "pipe" character
insert_delimiter <- function(rawarg) {
# Identify index locations of arguments with "-" as the very first
# character. These are presumed to be flags. Prepend with a "dummy"
# index of 0, which we'll use in the index step calculation below.
flagloc <- c(0, which(str_detect(rawarg, '^-')))
# Additionally, append a second dummy index at the end of the real ones.
n <- length(flagloc)
flagloc[n+1] <- length(rawarg) + 1
concatarg <- c()
# Counter over the output command line arguments, with multiple input
# command line arguments concatenated together into a single string as
# necessary
ii <- 1
# Counter over the flag index locations
for(ij in seq(1,length(flagloc)-1)) {
# Calculate the index step size between consecutive pairs of flags
step <- flagloc[ij+1]-flagloc[ij]
# Case 1: empty flag with no arguments
if (step == 1) {
# Ignore dummy index at beginning
if (ij != 1) {
concatarg[ii] <- rawarg[flagloc[ij]]
ii <- ii + 1
}
}
# Case 2: standard flag with one argument
else if (step == 2) {
concatarg[ii] <- rawarg[flagloc[ij]]
concatarg[ii+1] <- rawarg[flagloc[ij]+1]
ii <- ii + 2
}
# Case 3: flag with multiple whitespace delimited arguments (not
# currently handled correctly by optparse)
else if (step > 2) {
concatarg[ii] <- rawarg[flagloc[ij]]
# Concatenate multiple arguments using the "pipe" character as a delimiter
concatarg[ii+1] <- paste0(rawarg[(flagloc[ij]+1):(flagloc[ij+1]-1)],
collapse='|')
ii <- ii + 2
}
}
return(concatarg)
}
# Function to remove "pipe" character and re-expand parsed options into an
# output list again
remove_delimiter <- function(rawopt) {
outopt <- list()
for(nm in names(rawopt)) {
if (typeof(rawopt[[nm]]) == "character") {
outopt[[nm]] <- unlist(str_split(rawopt[[nm]], '\\|'))
} else {
outopt[[nm]] <- rawopt[[nm]]
}
}
return(outopt)
}
# ---- Part 2: Example Usage ----
# Prepare list of allowed options for parser, in standard fashion
option_list <- list(
make_option(c('-i', '--inputfiles'), type='character', dest='fnames',
help='Space separated list of file names', metavar='INPUTFILES'),
make_option(c('-p', '--printvar'), type='character', dest='pvar',
help='Valid options are "yes" or "no"',
metavar='PRINTVAR'),
make_option(c('-s', '--size'), type='integer', dest='sz',
help='Integer size value',
metavar='SIZE')
)
# This is the customary pattern that optparse would use to parse command line
# arguments, however it chokes when there are multiple whitespace-delimited
# options included after the "-i" or "--inputfiles" flag.
#opt <- parse_args(OptionParser(option_list=option_list),
# args=commandArgs(trailingOnly = TRUE))
# This works correctly
opt <- remove_delimiter(parse_args(OptionParser(option_list=option_list),
args=insert_delimiter(commandArgs(trailingOnly = TRUE))))
print(opt)
Assuming the above file were named fix_optparse.R, here is the output result:
> chmod +x fix_optparse.R
> ./fix_optparse.R --help
Usage: ./fix_optparse.R [options]
Options:
-i INPUTFILES, --inputfiles=INPUTFILES
Space separated list of file names
-p PRINTVAR, --printvar=PRINTVAR
Valid options are "yes" or "no"
-s SIZE, --size=SIZE
Integer size value
-h, --help
Show this help message and exit
> ./fix_optparse.R --inputfiles fileA.txt fileB.txt fileC.txt --printvar yes --size 10
$fnames
[1] "fileA.txt" "fileB.txt" "fileC.txt"
$pvar
[1] "yes"
$sz
[1] 10
$help
[1] FALSE
>
A minor limitation with this approach is that if any of the other arguments have the potential to accept a "pipe" character as a valid input, then those arguments will not be treated correctly. However I think you could probably develop a slightly more sophisticated version of this solution to handle that case correctly as well. This simple version works most of the time, and illustrates the general idea.

loop loading pairs of files

I am writing a loop that takes two files per run e.g.a0.txt and b0.txt. I am running this over 100 files that run from a0.txt and b0.txt to a999.txt and b999.txt. The pattern function i use works perfect if i do the run for files a0 and b0 to a9 and b9 with only file pairs 0-9 in the directory. but when i put more files in the directory and do the run from '0:10, the loop fails and confuses vectors in files. I think this is becuase of thepattern` i use i.e.
list.files(pattern=paste('.', x, '\\.txt', sep=''))
This only looks for files that have '.',x,//txt.
So if '.'=a and x=1 it finds file a1. But i think it gets confused between a0 and a10 when I do the run over more files. But i cannot seem to find the appropriate loop that will serach for files that also look for files up to a999 and b999, as well.
Can anyone help with a better way to do this? code below.
dostuff <- function(x)
{
files <- list.files(pattern=paste('.', x, '\\.txt', sep=''))
a <- read.table(files[1],header=FALSE) #file a0.txt
G <- a$V1-a$V2
b <- read.table(files[2],header=FALSE) #file b0.txt
as.factor(b$V2)
q <- tapply(b$V3,b$V2,Fun=length)
H <- b$V1-b$V2
model <- lm(G~H)
return(model$coefficients[2],q)
}
results <- sapply(0:10,dostuff)
Error in tapply(b$V3, b$V2, FUN = length) : arguments must have same length
How about getting the files directly, without searching. i.e.
dostuff <- function(x)
{
a.filename <- paste('a', x, '.txt', sep='') # a<x>.txt
b.filename <- paste('b', x, '.txt', sep='') # b<x>.txt
a <- read.table(a.filename, header=FALSE)
# [...]
b <- read.table(b.filename, header=FALSE)
# [...]
}
But the error message says the problem is caused by the call to tapply rather than anything about incorrect file names, and I have literally no idea how that could happen, since I thought a data frame (which read.table creates) always has the same number of rows for each column. Did you copy-paste that error message out of R? (I have a feeling there might be a typo, and so it was, for example, q <- tapply(a$V3,b$V2,Fun=length). But I could easily be wrong)
Also, as.factor(b$V2) doesn't modify b$V2, it just returns a factor representing b$V2: after you call as.factor b$V2 is still a vector. You need to assign it to something, e.g.:
V2.factor <- as.factor(b$V2)
If the beginning of the two files is always the same (a,b in your example); you could use this information in the pattern:
x <- 1
list.files(pattern=paste('[a,b]', x, '\\.txt', sep=''))
# [1] "a1.txt" "b1.txt"
x <- 11
list.files(pattern=paste('[a,b]', x, '\\.txt', sep=''))
# [1] "a11.txt" "b11.txt"
Edit: and you should include the ^ as well, as Wojciech proposed. ^ matches the beginning of a line or in your case the beginning of the filename.

Resources