Overriding a method of data.table - failing to perfectly forward arguments - r

I'm able to successfully modify the behaviour of [.data.frame, but fail to do so for [.data.table.
For data.frame:
# Exact same signature as "[.data.frame" :
"[.my.data.frame" <- function (x, i, j,
drop = if (missing(i)) TRUE
else length(cols) == 1) {
if(!missing(j) && j==8 ) {
cat("Oy vey\n")
}
NextMethod()
}
df <- data.frame(a=1,b=2)
class(df) <- c("my.data.frame", class(df))
# Works as expected:
df[1,2] # 2
df[1,8] # Oy Vey NULL
df[1,] # 1 2
However, for (the considerably more complicated) data.table:
# Exact same signature as "[.data.table" :
"[.my.data.table" <- function (x, i, j, by, keyby, with = TRUE, nomatch = getOption("datatable.nomatch"),
mult = "all", roll = FALSE,
rollends = if (roll == "nearest") c(TRUE, TRUE)
else if (roll >= 0) c(FALSE, TRUE) else c(TRUE, FALSE),
which = FALSE, .SDcols, verbose = getOption("datatable.verbose"),
allow.cartesian = getOption("datatable.allow.cartesian"),
drop = NULL, on = NULL) {
if(!missing(j) && j==8 ) {
cat("Oy vey\n")
}
NextMethod()
}
dt <- data.table(a=1,b=2)
class(dt) <- c("my.data.table", class(dt))
dt[1,2] # ERROR: i is not found in calling scope and it is not a column of type logical. When the first argument inside DT[...] is a single symbol, data.table looks for it in calling scope.
I know better than to pass arguments to NextMethod. It looks like I must call [.data.table explicitly, capture and pass the arguments as unevaluated promises - but all my attempts with quote, substitute or match.call have so far failed. Any insight would be appreciated.

I've found a partial solution, posting here in hope someone might improve on it.
"[.my.data.table" <- function (x, ...) {
# Modifications and tests galore - which can be tricky with this signature
class(x) <- class(x)[-1]
ret <- x[...]
class(x) <- c("my.data.table", class(x))
ret
}
I still consider this partial, because actually doing something in the function probably involves at least something like arglist <- list(...), and this fails when [ is called like this -
dt[1,]
Other directions are still very welcome.

Related

Pass optional arguments to function, three dots

I'm confused how ... works.
tt = function(...) {
return(x)
}
Why doesn't tt(x = 2) return 2?
Instead it fails with the error:
Error in tt(x = 2) : object 'x' not found
Even though I'm passing x as argument ?
Because everything you pass in the ... stays in the .... Variables you pass that aren't explicitly captured by a parameter are not expanded into the local environment. The ... should be used for values your current function doesn't need to interact with at all, but some later function does need to use do they can be easily passed along inside the .... It's meant for a scenario like
ss <- function(x) {
x
}
tt <- function(...) {
return(ss(...))
}
tt(x=2)
If your function needs the variable x to be defined, it should be a parameter
tt <- function(x, ...) {
return(x)
}
If you really want to expand the dots into the current environment (and I strongly suggest that you do not), you can do something like
tt <- function(...) {
list2env(list(...), environment())
return(x)
}
if you define three dots as an argument for your function and want it to work, you need to tell your function where the dots actually go. in your example you are neither defining x as an argument, neither ... feature elsewhere in the body of your function. an example that actually works is:
tt <- function(x, ...){
mean(x, ...)
}
x <- c(1, 2, 3, NA)
tt(x)
#[1] NA
tt(x, na.rm = TRUE)
#[1] 2
here ... is referring to any other arguments that the function mean might take. additionally you have a regular argument x. in the first example tt(x) just returns mean(x), whilst in the second example tt(x, na.rm = TRUE), passes the second argument na.rm = TRUE to mean so tt returns mean(x, na.rm = TRUE).
Another way that the programmers of R use a lot is list(...) as in
tt <- function(...) {
args <- list(...) # As in this
if("x" %in% names(args))
return(args$x)
else
return("Something else.")
}
tt(x = 2)
#[1] 2
tt(y = 1, 2)
#[1] "Something else."
I believe that this is one of their favorite, if not the favorite, way of handling the dots arguments.

Can I override `$` or `[[` to throw an error instead of NULL when asking for a missing list element?

My hunch is this is an abuse of the R language and there's a good reason this doesn't happen. But I find this to be a perpetual source of insidious errors in code that I'm trying to debug:
MWE
list.1 <- list(a=1,b=2,c=list(d=3))
list.2 <- list(b=4,c=list(d=6,e=7))
input.values <- list(list.1,list.2)
do.something.to.a.list <- function(a.list) {
a.list$b <- a.list$c$d + a.list$a
a.list
}
experiment.results <- lapply(input.values,do.something.to.a.list)
use.results.in.some.other.mission.critical.way <- function(result) {
result <- result^2
patient.would.survive.operation <- mean(c(-5,result)) >= -5
if(patient.would.survive.operation) {
print("Congrats, the patient would survive! Good job developing a safe procedure.")
} else {
print("Sorry, the patient won't make it.")
}
}
lapply(experiment.results, function(x)
use.results.in.some.other.mission.critical.way(x$b))
YES I am aware this is a stupid example and that I could just add a check for the existence of the element before trying to access it. But I'm not asking to know what I could do, if I had perfect memory and awareness at all times, to work slowly around the fact that this feature is inconvenient and causes me lots of headache. I'm trying to avoid the headache altogether, perhaps at the cost of code speed.
So: what I want to know is...
(a) Is it possible to do this. My initial attempt failed, and I got stuck trying to read the C internals for "$" to understand how to handle the arguments correctly
(b) If so, is there a good reason not to (or to) do this.
Basically, my idea is that instead of writing every single function that depends on non-null return from list access to check really carefully, I can write just one function to check carefully and trust that the rest of the functions won't get called with unmet preconditions b/c the failed list access will fail-fast.
You can override almost anything in R (except certain special values - NULL, NA, NA_integer_ NA_real_ NA_complex_, NA_character_, NaN, Inf, TRUE, FALSE as far as I'm aware).
For your specific case, you could do this:
`$` <- function(x, i) {
if (is.list(x)) {
i_ <- deparse(substitute(i))
x_ <- deparse(substitute(x))
if (i_ %in% names(x)) {
eval(substitute(base::`$`(x, i)), envir = parent.frame())
} else {
stop(sprintf("\"%s\" not found in `%s`", i_, x_))
}
} else {
eval(substitute(base::`$`(x, i)), envir = parent.frame())
}
}
`[[` <- function(x, i) {
if (is.list(x) && is.character(i)) {
x_ <- deparse(substitute(x))
if (i %in% names(x)) {
base::`[[`(x, i)
} else {
stop(sprintf("\"%s\" not found in `%s`", i, x_))
}
} else {
base::`[[`(x, i)
}
}
Example:
x <- list(a = 1, b = 2)
x$a
#[1] 1
x$c
#Error in x$c : "c" not found in `x`
col1 <- "b"
col2 <- "d"
x[[col1]]
#[1] 2
x[[col2]]
#Error in x[[col2]] : "d" not found in `x`
It will slow your code down quite a bit:
microbenchmark::microbenchmark(x$a, base::`$`(x, a), times = 1e4)
#Unit: microseconds
# expr min lq mean median uq max neval
# x$a 77.152 81.398 90.25542 82.814 85.2915 7161.956 10000
# base::`$`(x, a) 9.910 11.326 12.89522 12.033 12.3880 4042.646 10000
I've limited this to lists (which will include data.frames) and have implemented selection with [[ by numeric and character vectors, but this may not fully represent the ways in which $ and [[ can be used.
Note for [[ you could use #rawr's simpler code:
`[[` <- function(x, i) if (is.null(res <- base::`[[`(x, i))) simpleError('NULL') else res
but this will throw an error for a member of a list which is NULL rather than just not defined. e.g.
x <- list(a = NULL, b = 2)
x[["a"]]
This may of course be what is desired.

How to get Vectorize return the results invisibly?

I have a drawing function f that should not return any output.
f <- function(a=0) invisible(NULL)
f(10)
After vectorizing f, it does return NULL.
f_vec <- Vectorize(f)
f_vec(10)
[[1]]
NULL
How can I prevent this, i.e. make the output invisible here as well.
I could of course use a wrapper to suppress it.
f_wrapper <- function(a=0) {
dummy <- f_vec(a)
}
f_wrapper(10)
Is there a way to avoid the wrapper and get what I want straight away?
Yeah there is. This new version of Vectorize will do it:
Vectorize_2 <- function (FUN, vectorize.args = arg.names, SIMPLIFY = TRUE, USE.NAMES = TRUE) {
arg.names <- as.list(formals(FUN))
arg.names[["..."]] <- NULL
arg.names <- names(arg.names)
vectorize.args <- as.character(vectorize.args)
if (!length(vectorize.args))
return(FUN)
if (!all(vectorize.args %in% arg.names))
stop("must specify names of formal arguments for 'vectorize'")
FUNV <- function() {
args <- lapply(as.list(match.call())[-1L], eval, parent.frame())
names <- if (is.null(names(args)))
character(length(args))
else names(args)
dovec <- names %in% vectorize.args
invisible(do.call("mapply", c(FUN = FUN, args[dovec], MoreArgs = list(args[!dovec]),
SIMPLIFY = SIMPLIFY, USE.NAMES = USE.NAMES)))
}
formals(FUNV) <- formals(FUN)
FUNV
}
But, how did I know to do this? Did I spend 20 minutes writing a brand new version of Vectorize? NOPE! I just ran dput(Vectorize) to see the R code behind Vectorize and added the invisible where necessary! You can do this with all R functions. You don't even need the dput! Just run Vectorize!

Conditionally remove data frames from environment

How can I drop data frames with less than 3 variables? I tried this:
`1001.AFG.1.A`<-data.frame(x = 1, y = 1:10)
apply(ls(), function(x) {if (dim(x)[2]<3) rm(x)})
The error message is:
Error in match.fun(FUN) : argument "FUN" is missing, with no default
1) The first line produces a named logical vector, to.rm with a component for each object which is TRUE if that object should be removed and FALSE otherwise. Thus names(to.rm)[to.rm] are the objects to be removed so feed that into rm. By splitting it into two steps, this lets one review to.rm before actually performing the rm.
to.rm <- unlist(eapply(.GlobalEnv, function(x) is.data.frame(x) && ncol(x) < 3))
rm(list = names(to.rm)[to.rm], envir = .GlobalEnv)
If this is entered directly into the global environment (i.e. not placed in a fucntion) then envir = .GlobalEnv in the last line is the default and can be omitted.
2) Another way is to iterate through the object names of env as shown. We have provided a verbose argument to show what it is doing and a dryrun argument to show what it would remove without actually removing anything.
rm2 <- function(env = .GlobalEnv, verbose = FALSE, dryrun = FALSE, all.names = FALSE) {
for(nm in ls(env, all.names = all.names)) {
obj <- get(nm, env)
if (is.data.frame(obj) && ncol(obj) < 3) {
if (verbose || dryrun) cat("removing", nm, "\n")
if (!dryrun) rm(list = nm, envir = env)
}
}
}
rm2(dryrun = TRUE)
rm2(verbose = TRUE)
Update Added envir argument to rm in (1). It was already in (2).
Update 2 Minor imrovements to (2).
You may want to try :
sapply(ls(), function(x) {
if (is.data.frame(get(x)) && dim(get(x))[2]<3) rm(list=x,envir=.GlobalEnv)
})
I you want to suppress the printings, you can do :
invisible(sapply(ls(), function(x) {
if (is.data.frame(get(x)) && dim(get(x))[2]<3) rm(list=x,envir=.GlobalEnv)
}))

Passing arguments to iterated function through apply

I have a function like this dummy-one:
FUN <- function(x, parameter){
if (parameter == 1){
z <- DO SOMETHING WITH "x"}
if (parameter ==2){
z <- DO OTHER STUFF WITH "x"}
return(z)
}
Now, I would like to use the function on a dataset using apply.
The problem is, that apply(data,1,FUN(parameter=1))
wont work, as FUN doesn't know what "x" is.
Is there a way to tell apply to call FUN with "x" as the current row/col?
`
You want apply(data,1,FUN,parameter=1). Note the ... in the function definition:
> args(apply)
function (X, MARGIN, FUN, ...)
NULL
and the corresponding entry in the documentation:
...: optional arguments to ‘FUN’.
You can make an anonymous function within the call to apply so that FUN will know what "x" is:
apply(data, 1, function(x) FUN(x, parameter = 1))
See ?apply for examples at the bottom that use this method.
Here's a practical example of passing arguments using the ... object and *apply. It's slick, and this seemed like an easy example to explain the use. An important point to remember is when you define an argument as ... all calls to that function must have named arguments. (so R understands what you're trying to put where). For example, I could have called times <- fperform(longfunction, 10, noise = 5000) but leaving off noise = would have given me an error because it's being passed through ... My personal style is to name all of the arguments if a ... is used just to be safe.
You can see that the argument noise is being defined in the call to fperform(FUN = longfunction, ntimes = 10, noise = 5000) but isn't being used for another 2 levels with the call to diff <- rbind(c(x, runtime(FUN, ...))) and ultimately fun <- FUN(...)
# Made this to take up time
longfunction <- function(noise = 2500, ...) {
lapply(seq(noise), function(x) {
z <- noise * runif(x)
})
}
# Takes a function and clocks the runtime
runtime <- function(FUN, display = TRUE, ...) {
before <- Sys.time()
fun <- FUN(...)
after <- Sys.time()
if (isTRUE(display)) {
print(after-before)
}
else {
after-before
}
}
# Vectorizes runtime() to allow for multiple tests
fperform <- function(FUN, ntimes = 10, ...) {
out <- sapply(seq(ntimes), function(x) {
diff <- rbind(c(x, runtime(FUN, ...)))
})
}
times <- fperform(FUN = longfunction, ntimes = 10, noise = 5000)
avgtime <- mean(times[2,])
print(paste("Average Time difference of ", avgtime, " secs", sep=""))

Resources