Overwrite method to extend it, using the original implementation - r

Whether in a package or occasionally in base R, I sometimes want to add a little flavor to an existing function. Most of the times, this is a minor change of what should happen at the start or at the end of the function (silly example: I'd like the cat function to include a newline at the end by default).
Now I know I can simply overwrite an existing method by assigning my new implementation to its name, BUT: how, then, can I still use the old one? In the case of cat, I would have to do something like:
cat<-function(... , file = "", sep = " ", fill = FALSE, labels = NULL,
append = FALSE)
{
cat(..., "\n" , file = file, sep = sep, fill = fill, labels = labels,
append = append)
}
This means using the 'old' cat in the implementation of the new one. Now if I understand anything about how calling and late binding in R work, this will simply fail (infinite recursion).
So is there a way of achieving this (in the spirit of object-oriented overrides of functions), without resorting to
giving my new function another name (I want it to 'just work')
saving the old function under some other name (Then, when I create
this function in another R session, I might forget the extra step)
using all the source of the original function (As #Andrie said: it is important to have the most elegant solution possible)
Is there a paradigm for this? Or how could I go about this in the safest way possible? Or am I just wishing for too much?
Edit Given #Andrie's answer: this can be done quite simply. However, Andrie's trick will not work if I want to alter the behaviour of some function in a package that is called by another function in the package.
As an example: I have made numerous additions to the plotting functions of the glmnet package. But if you look at plot.cv.glmnet and the likes, you will see that they forward the call to another function within that package, so I'd really need to inject my new version into the package (which, by the way, can be done with reassignInPackage). But then of course the namespace prefixing will fail because I've just replaced the namespaced version. This example is not as contrived as it might seem: I've been there quite a few times. On the other hand, maybe somebody will/can argue that I should drop my requirements in that case? Which would be the best way to go then?

If I understand you correctly, I think it's a simple matter of referring to namespace::function, i.e. in this case use base::cat inside your function.
For example:
cat <- function(... , file = "", sep = " ", fill = FALSE, labels = NULL,
append = FALSE)
{
base::cat(..., "\n" , file = file, sep = sep, fill = fill, labels = labels,
append = append)
}
> cat("Hello", "world!")
Hello world!
> cat("Text on line 2")
Text on line 2

Related

How to define that some arguments in a data frame might not be used?

I have this data frame:
df <- data.frame(ref_ws = ref_ws,
turb_ws = turb_ws,
ref_wd = ref_wd,
fcf = turb_ws/ref_ws,
ref_fi = ref_fi,
shear = shear,
turbulence_intensity = turbulence_intensity,
inflow = inflow,
veer = veer)
that is part of a function where I define optional arguments (shear, turbulence_intensity, inflow and veer )
trial_plots <- function(ref_ws,turb_ws,ref_wd,shear,turbulence_intensity,inflow,veer)
the variables ref_ws,turb_ws,ref_wd are mandatory but the others are optional.
The optional ones will generate an individual plot for each one in case that we define the argument in the function.
For example, if shear is not used, I want to continue and see if it can generate the next plot regarding the turbulence_intensity and so on.
At the moment this is is the error:
Error in data.frame(ref_ws = ref_ws, turb_ws = turb_ws,ref_wd = ref_wd, :
argument "veer" is missing, with no default
How can I define these arguments to be optional?
Hadley recommends to use NULL value as default argument and use is.null test in the function body:
Sometimes you want to add a non-trivial default value, which might take several lines of code to compute. Instead of inserting that code in the function definition, you could use missing() to conditionally compute it if needed. However, this makes it hard to know which arguments are required and which are optional without carefully reading the documentation. Instead, I usually set the default value to NULL and use is.null() to check if the argument was supplied.
From Advanced R book
I think it's a useful advice and personally use it a lot.

Using strings from loops as parts of function commands and variable names in R

How does one use the string coming from a loop
- to generate new variables
- as a part of functions commands
- as functions' arguments
- as a part of if statements
in R?
Specifically, as an example (the code obviously doesn't work, but I'd like to have something not less intelligible than what is bellow),
list_dist <- c("unif","norm")
for (dist in list_dist){
paste("rv",dist,sep="") = paste("r",dist,sep="")(100,0,1)
paste("meanrv",dist,sep="") = mean(paste("rv",dist,sep=""))
if (round(paste("meanrv",dist,sep=""),3) != 0){
print("Not small enough")
}
}
Note: This is an example and I do need to use kind of loops to avoid writing huge scripts.
I managed to use strings as in the example above but only with eval/parse/text/paste and combining the whole statement (i.e. the whole "line") inside paste, instead of pasting only in the varname part or the function part, which makes code ugly and illegible and coding inefficient.
Other available replies to similar questions which I've seen are not specific as in how to deal with this sort of usage of strings from loops.
I'm sure there must be a more efficient and flexible way to deal with this, as there is in some other languages.
Thanks in advance!
Resist the temptation of creating variable names programmatically. Instead, structure your data properly into lists:
list_dist = list(unif = runif, norm = rnorm)
distributions = lapply(list_dist, function (f) f(100, 0, 1))
means = unlist(lapply(distributions, mean))
# … etc.
As you can see, this also gets rid of the loop, by using list functions instead.
Your last step can also be vectorised:
if (any(round(means, 3) != 0))
warning('not small enough')
try this:
list_dist <- list(unif = runif,norm = rnorm)
for (i in 1:length(list_dist)){
assign(paste("rv",names(list_dist)[i],sep=""), list_dist[[i]](100,0,1))
assign(paste("meanrv",names(list_dist)[i],sep=""),mean(get(paste("rv",names(list_dist)[i],sep=""))))
if (round(get(paste("meanrv",names(list_dist)[i],sep="")),3) != 0){
print("Not small enough")
}
}

Set R to include missing data – How can is set the `useNA="ifany"` option for `table()` as default?

In many cases being aware of missing data is crucial and ignoring them can seriously impair your analysis.
Therefore I'd like to set the useNA = "ifany" as default for table(). Ideally similar to options(stringsAsFactors = FALSE)
I found an ugly hack below, but it must go better and without defining a function.
https://stat.ethz.ch/pipermail/r-help/2010-January/223871.html
tableNA<-function(x) {
varname<-deparse(substitute(x))
assign(varname,x)
tabNA<-table(get(varname),useNA="always")
names(attr(tabNA,"dimnames"))<-varname
return(tabNA)
}
Well you do need to define a function1 but you can reuse the existing name (and make the definition much leaner):
table = function (..., useNA = 'ifany') base::table(..., useNA = useNA)
This will make the new functionality available under the old name – but only in your code, so it’s “safe” (i.e. it doesn’t change packages’ use of table).
We use ... to allow arbitrary arguments to be passed, and we give useNA the desired default value of 'ifany'. Inside the function, we just call the “real” table function. But in order to avoid calling ourselves, we specify the namespace in which it’s found: base. And we just pass all the arguments untouched.
1 Just look at the source code of table – it doesn’t query any option in setting the argument, so there can be no way of determining the setting of that argument via an option.

How to avoid prepending .self when using eval in a reference class in R?

I need to use eval to call a reference class method. Below is a toy example:
MyClass <- setRefClass("MyClass",
fields = c("my_field"),
methods = list(
initialize = function(){
my_field <<- 3
},
hello = function(){
"hello"
},
run = function(user_defined_text){
eval(parse(text = user_defined_text))
}
)
)
p <- MyClass$new()
p$run("hello()") # Error: could not find function "hello" - doesn't work
p$run(".self$hello()") # "hello" - it works
p$run("hello()") # "hello" - now it works?!
p <- MyClass$new()
p$run("my_field") # 3 - no need to add .self
I guess I could do eval(parse(text = paste0(".self$", user_defined_text))), but I don't really understand:
why is .self needed to eval methods, but not fields?
why is .self no longer needed after it has been used once?
'Why' questions are always challenging to answer; usually the answer is 'because'. On ?setRefClass we eventually have
Only methods actually used will be included in the environment
corresponding to an individual object. To declare that a method
requires a particular other method, the first method should
include a call to '$usingMethods()' with the name of the other
method as an argument. Declaring the methods this way is essential
if the other method is used indirectly (e.g., via 'sapply()' or
'do.call()'). If it is called directly, code analysis will find
it. Declaring the method is harmless in any case, however, and may
aid readability of the source code.
I'm not sure this is entirely helpful in your case, where the user is apparently able to specify any method. Offering a little unasked editorial comment, I'm not sure 'why' you'd want to write a method that would parse input text to methods; I've never used that paradigm myself.

Creating Automatin in R

I have created a script that analyzes a set of raw data and converts it into many different formats based on different parameters and functions. I have 152 more raw data sheets to go, but all I will have to do is use my script on each one. However, there will be times that I might decide I need to change a variable or parameter and I would like to come up with a parameter list at the top of my spreadsheet that would affect the rest of the functions in my soon to be very large script.
Global variables aren't the answer to this problem, this is best illustrated through this example:
exceedes <- function (L=NULL, R=NULL)
{
if (is.null(L) | is.null(R))
{
print ("mycols: invalid L,R.")
return (NULL)
}
options (na.rm = TRUE)
test <-(mean(L, na.rm=TRUE)-R*sd(L,na.rm=TRUE))
test1 <- ifelse(is.na(L), NA, ifelse(L > test, 1, 0))
return (test1)
}
L=ROCC[,2]
R=.08
ROCC$newcolumn <- exceedes(L,R)
names(ROCC)[names(ROCC)=="newcolumn"]="Exceedes1"
L=ROCC[,2]
R=.16
ROCC$newcolumn <- exceedes(L,R)
names(ROCC)[names(ROCC)=="newcolumn"]="Exceedes2"
L=ROCC[,2]
R=.24
ROCC$newcolumn <- exceedes(L,R)
names(ROCC)[names(ROCC)=="newcolumn"]="Exceedes3"
So in the above example, I would like to have a way at the top of my script to change the range of R and have it affect the rest of the script because this function will be repeated 152 times. The only way I can think of doing it is to copy and paste the function over and over with a different variable each time, and set it globally. But I have to imagine there is a simpler way, my function possibly needs to be rearranged perhaps?
File names and output names. I am not sure whether this is possible but say for example that all my input.csv's come in a format where one dataset will be titled 123 another will be 124, another 125 etc. and then have R know to take the very next dataset, and then output that dataset to a specific folder on my computer without me having to actually type in read.csv(file="123.csv"), and then write.csv(example, file="123.csv) and so on?
General formatting of automation script
Before I dive into my automation, my procedure was going to be to copy and past the script 152 times over and then change the filename and output name for each one. This sounds ridiculous, but with my lack of programming skills I am not sure a better way to change it. Any ideas?
Thanks for all the help in advance.
You can rerun the function with different parameters by constructing a vector of paremters (say R)
R <- c(seq(0.1, 1, by = 0.01))
and then run your exceedes function length(R) times using sapply.
exceedes <- function(R, L) {} #notice the argument order
sapply(X = R, FUN = exceedes, L = ROCC[, 2])
You can pass other arguments to your function (e.g. file.name) and use it to create whatever file name you need.

Resources