Is it possible to wrap an R function to amend its functionality?
Here's a toy example to explain what I mean. Consider this function sum2:
sum2 <- function (x) if (length(x) == 1) { cat(x); sum(x) } else sum(x)
It does what sum does, with a tiny modification. Suppose I'd like to redefine sum itself to do what sum2 does here. How can I do this in a general way, without knowing anything about the internals of the function I'm wrapping?
I would like to do this to temporarily "fix" a package function without having to modify and -reinstall the package. I would like to check for its inputs and return a special value in case the input satisfies some condition.
(For those who are deeply familiar with Mathematica, I'm looking for something similar to the Gayley-Villegas trick.)
You need to be careful with this. All packages now have Namespaces and will call the other functions within the same namespace. Your approach will probably work when you call functions from main command prompt. But functions in the package will call the original function, not your modification.
Look at the help for assignInNamespace and related functions for ways to make the changes within the Namespace. The trace function is another way to modify a function in place, adding some additional code to the existing function.
Something along these lines has worked:
sum2 <- sum
sum <- function (x) if (length(x) == 1) { cat(x); sum2(x) } else sum2(x)
What I did not realize is that I could just store the original definition of sum in sum2 so I can call it from the redefined sum.
As Matthew notes this won't override sum when it is called as base::sum.
Related
This has probably been answered already and in that case, I am sorry to repeat the question, but unfortunately, I couldn't find an answer to my problem. I am currently trying to work on the readability of my code and trying to use functions more frequently, yet I am not that familiar with it.
I have a data.frame and some columns contain NA's that I want to interpolate with, in this case, a simple kalman filter.
require(imputeTS)
#some test data
col <- c("Temp","Prec")
df_a <- data.frame(c(10,13,NA,14,17),
c(20,NA,30,NA,NA))
names(df_a) <- col
#this is my function I'd like to use
gapfilling <- function(df,col){
print(sum(is.na(df[,col])))
df[,col] <- na_kalman(df[,col])
}
#this is my for-loop to loop through the columns
for (i in col) {
gapfilling(df_a, i)
}
I have two problems:
My for loop works, yet it doesn't overwrite the data.frame. Why?
How can I achieve this without a for-loop? As far as I am aware you should avoid for-loops if possible and I am sure it's possible in my case, I just don't know how.
How can I achieve this without a for-loop? As far as I am aware you should avoid for-loops if possible and I am sure it's possible in my case, I just don't know how.
You most definitely do not have to avoid for loops. What you should avoid is using a loop to perform actions that could be vectorized. Loops are in general just fine, however they are (much) slower compared to compiled languages such as c++, but are equivalent to loops in languages such as python.
My for loop works, yet it doesn't overwrite the data.frame. Why?
This is a problem with overwriting values within a function, or what is referred to as scope. Basically any assignment is restricted to its current environment (or scope). Take the example below:
f <- function(x){
a <- x
cat("a is equal to ", a, "\n")
return(3)
}
x <- 4
f(x)
a is equal to 4
[1] 3
print(a)
Error in print(a) : object 'a' not found
As you can see, "a" definitely exists, but it stops existing after the function call has been fulfilled. It is restricted to the environment (or scope) of the function. Here the scope is basically the time at which the function is run.
To alleviate this, you have to overwrite the value in the global environment
for (i in col) {
df_a[, i] <- gapfilling(df_a, i)
}
Now for readability (not speed) one could change this to a lapply
df_a[, col] <- lapply(df_a[, col], na_kalman)
I set a heavy point on it not being faster than using a loop. lapply iterates over each column, as you would in a loop. Speed could be obtained if say na_kalman was programmed to take multiple columns, and possibly save time using optimized c or c++ code.
Can anyone help me understand what a wrapper function in r is? I would really appreciate if you could explain it with the help of examples on building one's own wrapper function and when to use one.
Thanks in advance.
Say I want to use mean() but I want to set some default arguments and my usecase doesn't allow me to add additional arguments when I'm actually calling mean().
I could create a wrapper function:
mean_noNA <- function(x) {
return(mean(x, na.rm = T))
}
mean_noNA is a wrapper for mean() where we have set na.rm to TRUE.
Now we could use mean_noNA(x) the same as mean(x, na.rm = T).
Wrapper functions occur in any programming language, and they just mean that you are "wrapping" one function inside another function that alters how it works in some useful way. When we refer to a "wrapper" function we mean a function that the main purpose of the function is to call some internal function; there may be some alteration or additional computation in the wrapper, but this is sufficiently minor that the original function constitutes the bulk of the computation.
As an example, consider the following wrapper function for the log function in R. One of the drawbacks of the original function is that it does not work properly for negative numeric inputs (it gives NaN with a warning message). We can remedy this by creating a "wrapper" function that turns it into the complex logarithm:
Log <- function(x, base = exp(1)) {
LOG <- base::log(as.complex(x), base = base)
if (all(Im(LOG) == 0)) { LOG <- Re(LOG) }
LOG }
The function Log is a "wrapper" for log that adjusts it so that it will now accept numeric or complex inputs, including negative numeric inputs. In the event that it receives a non-negative numeric or a complex input it gives the same output the original log function. However, if it is given a negative numeric input it gives the complex output that should be returned by the complex logarithm.
Currently I'm working on a R project which includes following code.
vec <- 1:25
fib <- function(x)
{ if (x==0) return (0)
if (x==1) return (1)
if (x==2) return(2)
return(fib(x-1)+fib(x-2))
}
lapply(vec,fib)
I just want to know that, how does R compute the fibonacci function in a code like this? More simply, when it comes to number 25 in vector "vec" does R compute the whole function, or can R compute the fib(25) using the values of fib(24) and fib(23) since they have been computed already?
It will compute all the recursive values one by one by default, but you can use an external package like memoise to cache previous values, or do it yourself. Have a look at the following blog which shows this using a Fibonacci function as well.
I have this function
ANN<-function (x,y){
DV<-rep(c(0:1),5)
X1<-c(1:10)
X2<-c(2:11)
ANN<-neuralnet(x~y,hidden=10,algorithm='rprop+')
return(ANN)
}
I need the function run like
formula=X1+X2
ANN(DV,formula)
and get result of the function. So the problem is to say the function USE the object which was created during the run of function. I need to run trough lapply more combinations of x,y, so I need it this way. Any advices how to achieve it? Thanks
I've edited my answer, this still works for me. Does it work for you? Can you be specific about what sort of errors you are getting?
New response:
ANN<-function (y){
X1<-c(1:10)
DV<-rep(c(0:1),5)
X2<-c(2:11)
dat <- data.frame(X1,X2)
ANN<-neuralnet(DV ~y,hidden=10,algorithm='rprop+',data=dat)
return(ANN)
}
formula<-X1+X2
ANN(formula)
If you want so specify the two parts of the formula separately, you should still pass them as formulas.
library(neuralnet)
ANN<-function (x,y){
DV<-rep(c(0:1),5)
X1<-c(1:10)
X2<-c(2:11)
formula<-update(x,y)
ANN<-neuralnet(formula,data=data.frame(DV,X1,X2),
hidden=10,algorithm='rprop+')
return(ANN)
}
ANN(DV~., ~X1+X2)
And assuming you're using neuralnet() from the neuralnet library, it seems the data= is required so you'll need to pass in a data.frame with those columns.
Formulas as special because they are not evaluated unless explicitly requested to do so. This is different than just using a symbol, where as soon as you use it is evaluated to something in the proper frame. This means there's a big difference between DV (a "name") and DV~. (a formula). The latter is safer for passing around to functions and evaluating in a different context. Things get much trickier with symbols/names.
I was wondering if the apply family could be used in R with a regressive input.
Say I have:
apply(MyMatrix,1,MyFunc,MyMatrix)
I know that apply is essentially a loop, so in the above example could it run one iteration of MyFunc over the first line of MyMatrix modifying MyMatrix globally and then select the modified MyMatrix for the next iteration ? I realize that normal loops could be used here but I just wanted to know if there is a way to do it like this.
Thanks
I don't believe so. Even modifying MyMatrix globally won't change the MyMatrix passed to your function. R functions don't operate that way. Your object is actually copied when it's passed into a function and a new instance of it exists then. It's not done by reference.
Unfortunately, the *apply family of functions are able to work in this manner. (This has been a frustration to me at times as well, but I've come to appreciate and work with it.)
There are two impediments to this:
The *apply family of functions deal with the value of MyMatrix when you make the call, iterate over the rows (in this example), and then join the results (based on the dimensions of each output). It is not re-evaluated each time.
Even if it did re-evaluate it, MyFunc is only given one row (in this example) at a time, not the whole matrix. (Your second reference to MyMatrix appears to be working around this.)
To do what I think you're saying, then your MyFunc function needs to accept as arguments the entire matrix and the row on which you are operating, and return just the row in question, ala:
MyFunc <- function(rownum, mtx) {
# ...
mtx[rownum,]
}
Using that premise, you could do:
for (rr in seq.int(nrow(MyMatrix))) {
MyMatrix[rr,] <- MyFunc(rr, MyMatrix)
}
or, if you must stay with the *apply family:
MyMatrix.new <- sapply(seq.int(nrow(MyMatrix)), MyFunc, MyMatrix)
You might want the transpose (t()) of the return from sapply() here.
If MyFunc returns the whole matrix instead of just one row, this can be done though a little differently.
I know of no way to directly do what you suggest.