Does anyone know about a package or function that lets me do different sanity checks about data classes or check matching lengths of the variables?
Any suggestions are welcomed beyond the basic:
f1 <- function(data, x, y) {
if (!is.data.frame(data)) stop("data must be data.frame!")
if (!is.vector(x)) stop("x must be a vector!")
...code...
}
I am looking for something along (any other suggestions welcomed)
f2 <- function(data, x, y) {
check(
data = data.frame,
err1 = "data must be data.frame",
x = vector,
err2 = "x must be vector",
...
)
... code ...
}
Related
In R (3.4.3),
I'm trying to make my code more succinct before I move it to a package. I'm struggling to find a simple way to check actual arguments passed to multiple functions within an environment, especially when some of those arguments have default values. My goal is not paste redundant code in each of the functions, maintaining readability, and not running into scoping issues. Here is an example (imagine these functions being within the same environment/script):
Current Code
foo_add <- function(x, y = x, divisible = TRUE) {
if(missing(x) || !is.numeric(x)) stop("define x as a number")
if(!is.numeric(y)) y <- x
if(!is.logical(divisible)) stop("define divisible as TRUE or FALSE")
...}
foo_subtract <- function(x, y = x, divisible = TRUE) {
if(missing(x) || !is.numeric(x)) stop("define x as a number")
if(!is.numeric(y)) y <- x
if(!is.logical(divisible)) stop("define divisible as TRUE or FALSE")
...}
foo_divide <- function(x, y = x, divisible = TRUE) {
if(missing(x) || !is.numeric(x)) stop("define x as a number")
if(!is.numeric(y)) y <- x
if(!is.logical(divisible)) stop("define divisible as TRUE or FALSE")
...}
You can see that I'm repeating code just to verify the useR defined the arguments correctly. Below I've added a list return just to check if things worked. My rendition of a cleaner version:
Desired Code
foo_add <- function(x, y = x, divisible = TRUE) {
do.call("check_args", mget(names(formals())))
list(x,y,divisible)
}
foo_subtract <- function(x, y = x, divisible = TRUE) {
do.call("check_args", mget(names(formals())))
list(x,y,divisible)
}
foo_divide <- function(x, y = x, divisible = TRUE) {
do.call("check_args", mget(names(formals())))
list(x,y,divisible)
}
check_args <- function(x,y,divisible) {
if(missing(x) || !is.numeric(x)) stop("define x as a number")
if(!is.numeric(y)) y <- x
if(!is.logical(divisible)) stop("define divisible as TRUE or FALSE")
}
Basically what I want to do is always check if actual arguments are specified correctly while controlling for default arguments and missing arguments. From my basic knowledge I kind of want to make a constructor-esk function for actual arguments. I also want to have the ability of not only error-checking but modifying actual argument values and returning them to the parent function for use (e.g., like the condition if(!is.numeric(y)) y <- x).
The Desired Code seems to work, I just want to make sure, given how new I am to R programming, that I'm accounting for things. Specifically if I modify objects within the check_args() then pass them to a parent function using do.call. Is there a cleaner way to go about doing this?
Updated Solution I've created a, somewhat extensible, solution see here
I'm trying to read a function call as a string and evaluate this function within another function. I'm using eval(parse(text = )) to evaluate the string. The function I'm calling in the string doesn't seem to have access to the environment in which it is nested. In the code below, my "isgreater" function finds the object y, defined in the global environment, but can't find the object x, defined within the function. Does anybody know why, and how to get around this? I have already tried adding the argument envir = .GlobalEnv to both of my evals, to no avail.
str <- "isgreater(y)"
isgreater <- function(y) {
return(eval(y > x))
}
y <- 4
test <- function() {
x <- 3
return(eval(parse(text = str)))
}
test()
Error:
Error in eval(y > x) : object 'x' not found
Thanks to #MrFlick and #r2evans for their useful and thought-provoking comments. As far as a solution, I've found that this code works. x must be passed into the function and cannot be a default value. In the code below, my function generates a list of results with the x variable being changed within the function. If anyone knows why this is, I would love to know.
str <- "isgreater(y, x)"
isgreater <- function(y, x) {
return(eval(y > x))
}
y <- 50
test <- function() {
list <- list()
for(i in 1:100) {
x <- i
bool <- eval(parse(text = str))
list <- append(list, bool)
}
return(list)
}
test()
After considering the points made by #r2evans, I have elected to change my approach to the problem so that I do not arrive at this string-parsing step. Thanks a lot, everyone.
I offer the following code, not as a solution, but rather as an insight into how R "works". The code does things that are quite dangerous and should only be examined for its demonstration of how to assert a value for x. Unfortunately, that assertion does destroy the x-value of 3 inside the isgreater-function:
str <- "isgreater(y)"
isgreater <- function(y) {
return(eval( y > x ))
}
y <- 4
test <- function() {
environment(isgreater)$x <- 5
return(eval(parse(text = str) ))
}
test()
#[1] FALSE
The environment<- function is used in the R6 programming paradigm. Take a look at ?R6 if you are interested in working with a more object-oriented set of structures and syntax. (I will note that when I first ran your code, there was an object named x in my workspace and some of my efforts were able to succeed to the extent of not throwing an error, but they were finding that length-10000 vector and filling up my console with logical results until I escaped the console. Yet another argument for passing both x and y to isgreater.)
I have a subclass of data.frame that needs an extra argument when subsetting. NextMethod() passes extra arguments along, which generates an error because the next method recognizes neither the argument itself, nor the 'dots' arguments.
Example:
class(Theoph) <- c('special','data.frame')
`[.special` <- function(x, i, j, drop, k, ...){
y <- NextMethod()
attr(y, 'k') <- k
y
}
Theoph[1:5,k='head']
Result:
Error in `[.data.frame`(Theoph, 1:5, k = "head") :
unused argument (k = k)
Can I make 'k' invisible downstream? I've tried removing it, defining as NULL, passing only arguments of interest, writing a wrapper. The subset operator [ is a particularly difficult generic because of some non-default argument matching rules.
Since in this case you know what the next method is, why not just call it?
class(Theoph) <- c('special','data.frame')
`[.special` <- function(x, i, j, drop = TRUE, k, ...) {
y <- `[.data.frame`(x, i, j, drop = drop)
attr(y, 'k') <- k
y
}
Theoph[1:5, k = 'head']
However, I would be cautious about this sort of approach since [ is a rather special function, and I don't think it actually includes ... in its argument list. (It looks like it does in the docs, but I think this is a simplification and it's not using the standard ... object)
I have a function like this dummy-one:
FUN <- function(x, parameter){
if (parameter == 1){
z <- DO SOMETHING WITH "x"}
if (parameter ==2){
z <- DO OTHER STUFF WITH "x"}
return(z)
}
Now, I would like to use the function on a dataset using apply.
The problem is, that apply(data,1,FUN(parameter=1))
wont work, as FUN doesn't know what "x" is.
Is there a way to tell apply to call FUN with "x" as the current row/col?
`
You want apply(data,1,FUN,parameter=1). Note the ... in the function definition:
> args(apply)
function (X, MARGIN, FUN, ...)
NULL
and the corresponding entry in the documentation:
...: optional arguments to ‘FUN’.
You can make an anonymous function within the call to apply so that FUN will know what "x" is:
apply(data, 1, function(x) FUN(x, parameter = 1))
See ?apply for examples at the bottom that use this method.
Here's a practical example of passing arguments using the ... object and *apply. It's slick, and this seemed like an easy example to explain the use. An important point to remember is when you define an argument as ... all calls to that function must have named arguments. (so R understands what you're trying to put where). For example, I could have called times <- fperform(longfunction, 10, noise = 5000) but leaving off noise = would have given me an error because it's being passed through ... My personal style is to name all of the arguments if a ... is used just to be safe.
You can see that the argument noise is being defined in the call to fperform(FUN = longfunction, ntimes = 10, noise = 5000) but isn't being used for another 2 levels with the call to diff <- rbind(c(x, runtime(FUN, ...))) and ultimately fun <- FUN(...)
# Made this to take up time
longfunction <- function(noise = 2500, ...) {
lapply(seq(noise), function(x) {
z <- noise * runif(x)
})
}
# Takes a function and clocks the runtime
runtime <- function(FUN, display = TRUE, ...) {
before <- Sys.time()
fun <- FUN(...)
after <- Sys.time()
if (isTRUE(display)) {
print(after-before)
}
else {
after-before
}
}
# Vectorizes runtime() to allow for multiple tests
fperform <- function(FUN, ntimes = 10, ...) {
out <- sapply(seq(ntimes), function(x) {
diff <- rbind(c(x, runtime(FUN, ...)))
})
}
times <- fperform(FUN = longfunction, ntimes = 10, noise = 5000)
avgtime <- mean(times[2,])
print(paste("Average Time difference of ", avgtime, " secs", sep=""))
How can I write this all in one line?
mydata is a "zoo" series, limit is a numeric vector of the same size
tmp <- ave(coredata(mydata), as.Date(index(mydata)),
FUN = function(x) cummax(x)-x)
tmp <- (tmp < limit)
final <- ave(tmp, as.Date(index(mydata)),
FUN = function(x) cumprod(x))
I've tried to use two vectors as argument to ave(...) but it seems to accept just one even if I join them into a matrix.
This is just an example, but any other function could be use.
Here I need to compare the value of cummax(mydata)-mydata with a numeric vector and
once it surpasses it I'll keep zeros till the end of the day. The cummax is calculated from the beginning of each day.
If limit were a single number instead of a vector (with different possible numbers) I could write it:
ave(coredata(mydata), as.Date(index(mydata)),
FUN = function(x) cumprod((cummax(x) - x) < limit))
But I can't introduce there a vector longer than x (it should have the same length than each day) and I don't know how to introduce it as another argument in ave().
Seems like this routine imposes intraday stoploss based on maxdrawdown. So I assume you want to be able to pass in variable limit as a second argument to your aggregation function which only currently only takes 1 function due to the way ave works.
If putting all this in one line is not an absolute must, I can share a function I've written that generalizes aggregation via "cut variables". Here's the code:
mtapplylist2 <- function(t, IDX, DEF, MoreArgs=NULL, ...)
{
if(mode(DEF) != "list")
{
cat("Definition must be list type\n");
return(NULL);
}
a <- c();
colnames <- names(DEF);
for ( i in 1:length(DEF) )
{
def <- DEF[[i]];
func <- def[1];
if(mode(func) == "character") { func <- get(func); }
cols <- def[-1];
# build the argument to be called
arglist <- list();
arglist[[1]] <- func;
for( j in 1:length(cols) )
{
col <- cols[j];
grp <- split(t[,col], IDX);
arglist[[1+j]] <- grp;
}
arglist[["MoreArgs"]] <- MoreArgs;
v <- do.call("mapply", arglist);
# print(class(v)); print(v);
if(class(v) == "matrix")
{
a <- cbind(a, as.vector(v));
} else {
a <- cbind(a, v);
}
}
colnames(a) <- colnames;
return(a);
}
And you can use it like this:
# assuming you have the data in the data.frame
df <- data.frame(date=rep(1:10,10), ret=rnorm(100), limit=rep(c(0.25,0.50),50))
dfunc <- function(x, ...) { return(cummax(x)-x ) }
pfunc <- function(x,y, ...) { return((cummax(x)-x) < y) }
# assumes you have the function declared in the same namespace
def <- list(
"drawdown" = c("dfunc", "ret"),
"hasdrawdown" = c("pfunc", "ret", "limit")
);
# from R console
> def <- list("drawdown" = c("dfunc", "ret"),"happened" = c("pfunc","ret","limit"))
> dim( mtapplylist2(df, df$date, def) )
[1] 100 2
Notice that the "def" variable is a list containing the following items:
computed column name
vector arg function name as a string
name of the variable in the input data.frame that are inputs into the function
If you look at the guts of "mtapplylist2" function, the key components would be "split" and "mapply". These functions are sufficiently fast (I think split is implemented in C).
This works with functions requiring multiple arguments, and also for functions returning vector of the same size or aggregated value.
Try it out and let me know if this solves your problem.