Check the number of arguments before a certain argument in R - r

Suppose we want our function to be able to deal with two scenarios:
somefun = function(x, y, method, ...) {
res = dowith(x, y)
res
}
somefun = function(z, method, ...) {
x = z$v1
y = z$v2
res = dowith(x, y)
res
}
How can we make somefun aware of the difference between these two situations?

If you can guarantee that x is never going to be a list, you could just use is.list(x) to determine which version of the function is being called. Otherwise, you can use missing:
somefun<-function(x,y,method,...){
if(missing(y)){
cat("Using List version\n")
y<-x$y
x<-x$x
}
else{
cat("Using normal version\n")
}
c(x,y)
}
> somefun(list(x=1,y=2),method="a method")
Using List version
[1] 1 2
> somefun(1,2,method="a method")
Using normal version
[1] 1 2
>
However, be aware that if you do this, and you want to use the list version of the function, then method and everything after it have to be passed in by name, otherwise R is going to bind them to y:
> somefun(list(x=1,y=2),"a method")
Using normal version
$x
[1] 1
$y
[1] 2
[[3]]
[1] "a method"
> somefun(list(x=1,y=2),method="a method",5)
Using normal version
$x
[1] 1
$y
[1] 2
[[3]]
[1] 5
> somefun(list(x=1,y=2),method="a method",q=5)
Using List version
[1] 1 2

I don't know of an automatic way to do this, but when dealing with these types of situations, it is sometimes helpful to use switch. Here's a basic example:
somefun <- function(x, y = NULL, type = c("DF", "vecs"), method = NULL, ...) {
switch(type,
DF = sum(x[["v1"]], x[["v2"]]),
vecs = sum(x, y),
stop("'type' must be either 'DF' or 'vecs'"))
}
somefun(x = 10, y = 3, type="vecs")
# [1] 13
somefun(x = data.frame(v1 = 2, v2 = 4), type="DF")
# [1] 6
somefun(x = data.frame(v1 = 2, v2 = 4), type = "meh")
# Error in somefun(x = data.frame(v1 = 2, v2 = 4), type = "meh") :
# 'type' must be either 'DF' or 'vecs'
In the above, we're expecting that the user must enter a type argument where the acceptable values are "DF" or "vecs", and where a different set of operations has been defined for each option.
Of course, I would also script out a set of different scenarios and use some condition checking at the start of the function to make sure things will be working as expected. For instance, if you expect that most of the times, people will be inputting a data.frame, you could do something like if (is.null(y) & is.null(type)) temp <- "DF" (or insert a try type statement in there). At the end of the day, it also comes down to whether you can predict a sensible set of default values.
If your functions are complicated, you might want to separate out the steps that go into the switches into separate functions as this would probably lead to more readable (and more easily reusable) code.

Related

How to custom print/show variables (with custom class) in my R package

Many R functions return objects that are printed to the console in a special manner. For instance, t_results = t.test(c(1,2,3), c(1,2,4)) will assign a list to the t_results variable, but when I enter this variable in the console, or call it as print(t_results) or show(t_results), it prints some plain text information (such as Welch Two Sample t-test... etc.) instead of returning the actual list. (This is a base R function, but I've seen this implemented in many custom user R packages just as well.)
My question is: how do I do this for objects created in my own custom R package? I've read several related questions and answers (e.g., this, this, and this), which do give a general idea (using setMethod for my custom classes), but none of them makes it clear to me what exactly I need to do to make it work properly in a custom R package. I also cannot find any official documentation or tutorial on the matter.
To give an example of what I want to do, here is a very simple function from my hypothetical R package, which simply return a small data.frame (with an arbitrary class name I add to it, here 'my_df_class'):
my_main_function = function() {
my_df = data.frame(a = c('x1', 'y2', 'z2'),
b = c('x2', 'y2', 'z2'))
class(my_df) = c(class(my_df), 'my_df_class')
return(my_df)
}
I would like to have this printed/shown e.g. like this:
my_print_function = function(df) {
cat('My results:', df$a[2], df$a[3])
}
# see my_print_function(my_main_function())
What exactly has to be done to make this work for my R package (i.e., that when someone installs my R package, assigns the my_main_function() results to a variable, and prints/shows that variable, it would be done via my_print_function())?
Here is a small explanation. Adding to the amazing answer posted by #nya:
First, you are dealing with S3 classes. With these classes, we can have one method manipulating the objects differently depending on the class the object belongs to.
Below is a simple class and how it operates:
Class contains numbers,
The class values to be printed like 1k, 2k, 100k, 1M,
The values can be manipulated numerically.
-- Lets call the class my_numbers
Now we will define the class constructor:
my_numbers = function(x) structure(x, class = c('my_numbers', 'numeric'))
Note that we added the class 'numeric'. ie the class my_numbers INHERITS from numeric class
We can create an object of the said class as follows:
b <- my_numbers(c(100, 2000, 23455, 24567654, 2345323))
b
[1] 100 2000 23455 24567654 2345323
attr(,"class")
[1] "my_numbers" "numeric"
Nothing special has happened. Only an attribute of class has been added to the vector. You can easily remove/strip off the attribute by calling c(b)
c(b)
[1] 100 2000 23455 24567654 2345323
vector b is just a normal vector of numbers.
Note that the class attribute could have been added by any of the following (any many more ways):
class(b) <- c('my_numbers', 'numeric')
attr(b, 'class') <- c('my_numbers', 'numeric')
attributes(b) <- list(class = c('my_numbers', 'numeric'))
Where is the magic?
I will write a simple function with recursion. Don't worry about the function implementation. We will just use it as an example.
my_numbers_print = function(x, ..., digs=2, d = 1, L = c('', 'K', 'M', 'B', 'T')){
ifelse(abs(x) >= 1000, Recall(x/1000, d = d + 1),
sprintf(paste0('%.',digs,'f%s'), x, L[d]))
}
my_numbers_print(b)
[1] "100.00" "2.00K" "23.45K" "24.57M" "2.35M"
There is no magic still. Thats the normal function called on b.
Instead of calling the function my_numbers_print we could write another function with the name print.my_numbers ie method.class_name (Note I added the parameter quote = FALSE
print.my_numbers = function(x, ..., quote = FALSE){
print(my_numbers_print(x), quote = quote)
}
b
[1] 100.00 2.00K 23.45K 24.57M 2.35M
Now b has been printed nicely. We can still do math on b
b^2
[1] 10.00K 4.00M 550.14M 603.57T 5.50T
Can we add b to a dataframe?
data.frame(b)
b
1 100
2 2000
3 23455
4 24567654
5 2345323
b reverts back to numeric instead of maintaining its class. That is because we need to change another function. ie the formats function.
Ideally, the correct way to do this is to create a format function and then the print function. (Becoming too long)
Summary : Everything Put Together
# Create a my_numbers class definition function
my_numbers = function(x) structure(x, class = c('my_numbers', 'numeric'))
# format the numbers
format.my_numbers = function(x,...,digs =1, d = 1, L = c('', 'K', 'M', 'B', 'T')){
ifelse(abs(x) >= 1000, Recall(x/1000, d = d + 1),
sprintf(paste0('%.',digs,'f%s'), x, L[d]))
}
#printing the numbers
print.my_numbers = function(x, ...) print(format(x), quote = FALSE)
# ensure class is maintained after extraction to allow for sort/order etc
'[.my_numbers' = function(x, ..., drop = FALSE) my_numbers(NextMethod('['))
b <- my_numbers(c(2000, 100, 20, 23455, 24567654, 2345323))
data.frame(x = sort(-b) / 2)
x
1 -12.3M
2 -1.2M
3 -11.7K
4 -1.0K
5 -50.0
6 -10.0
The easiest way to use a specific function for a class is to set it as an S3 generic.
print.my_df_class = function(df) {
cat('My results:', df$a[2], df$a[3])
}
Note that because you retain the data.frame class on line class(my_df) = c(class(my_df), 'my_df_class'), the print() will show the printing of the data.frame.
print(my_main_function())
# a b
# 1 x1 x2
# 2 y2 y2
# 3 z2 z2
You can either use print.my_df_class(), or modify the my_main_function() class assignment.
my_main_function = function() {
my_df = data.frame(a = c('x1', 'y2', 'z2'),
b = c('x2', 'y2', 'z2'))
class(my_df) = 'my_df_class'
return(my_df)
}
Then you can use print without the class specification at the end to get a class-specific response.
print(my_main_function())
# My results: y2 z2

Disable partial name idenfication of function arguments

I am trying to make a function in R that outputs a data frame in a standard way, but that also allows the user to have the personalized columns that he deams necessary (the goal is to make a data format for paleomagnetic data, for which there are common informations that everybody use, and some more unusual that the user might like to keep in the format).
However, I realized that if the user wants the header of his data to be a prefix of one of the defined arguments of the data formating function (e.g. via the 'sheep' argument, that is a prefix of the 'sheepc' argument, see example below), the function interprets it as the defined argument (through partial name identification, see http://adv-r.had.co.nz/Functions.html#lexical-scoping for more details).
Is there a way to prevent this, or to at least give a warning to the user saying that he cannot use this name ?
PS I realize this question is similar to Disabling partial variable names in subsetting data frames, but I would like to avoid toying with the options of the future users of my function.
fun <- function(sheeta = 1, sheetb = 2, sheepc = 3, ...)
{
# I use the sheeta, sheetb and sheepc arguments for computations
# (more complex than shown below, but here thet are just there to give an example)
a <- sum(sheeta, sheetb)
df1 <- data.frame(standard = rep(a, sheepc))
df2 <- as.data.frame(list(...))
if(nrow(df1) == nrow(df2)){
res <- cbind(df1, df2)
return(res)
} else {
stop("Extra elements should be of length ", sheep)
}
}
fun(ball = rep(1,3))
#> standard ball
#> 1 3 1
#> 2 3 1
#> 3 3 1
fun(sheep = rep(1,3))
#> Error in rep(a, sheepc): argument 'times' incorrect
fun(sheet = rep(1,3))
#> Error in fun(sheet = rep(1, 3)) :
#> argument 1 matches multiple formal arguments
From the language definition:
If the formal arguments contain ‘...’ then partial matching is only
applied to arguments that precede it.
fun <- function(..., sheeta = 1, sheetb = 2, sheepc = 3)
{<your function body>}
fun(sheep = rep(1,3))
# standard sheep
#1 3 1
#2 3 1
#3 3 1
Of course, your function should have assertion checks for the non-... parameters (see help("stopifnot")). You could also consider adding a . or _ to their tags to make name collisions less likely.
Edit:
"would it be possible to achieve the same effect without having the ... at the beginning ?"
Yes, here is a quick example with one parameter:
fun <- function(sheepc = 3, ...)
{
stopifnot("partial matching detected" = identical(sys.call(), match.call()))
list(...)
}
fun(sheep = rep(1,3))
# Error in fun(sheep = rep(1, 3)) : partial matching detected
fun(ball = rep(1,3))
#$ball
#[1] 1 1 1

declare a variable without registering it to the environment

In some R script, I use some dummy variable in a for loop.
The variable has no purpose itself, so I don't need it recorded at all.
For instance :
database = read.csv("data/somefile.csv")
for (i in 1:ncol(database)) {
name <- names(database)[i]
if (name %in% some_vector) {
label(database[, .i]) <- some_function(databas$somecolumn)
}
}
In R Studio, the "Global Environement" tab keeps track of variables i and name (and give it the last value it had), although they have no usefulness at all.
Is there any elegant way to declare my value so it is not tracked in the global environment ?
Use local for all your workspace hygiene needs.
foo <- local({
x <- 0
for(i in 1:nrow(mtcars))
x <- x + mtcars$mpg[i]
x
})
foo now contains the result of the calculation, and the temporary variables i and x are discarded.
To hide objects from RStudio's object explorer, you can prefix with . like
.x = 2
Downsides. This still creates .x and keeps it in memory, where it might take up space or accidentally be used again after you've forgotten about it. It also hides from the standard "clear workspace" command rm(list = ls()). See ?ls for a way of handling this.
Aside. Generally, I would not create any variables like this, instead wrapping any operation involving temporary objects in a function as #Aurèle suggested and not leaning too heavily on what RStudio's object browser shows me.
The only case so far where I've used dot-prefixed objects is for interactive use in a function, like:
f = function(x, y, debug.obj = FALSE){
dx = dim(x)
dy = dim(y)
if (!(length(dx) == 2 && length(dy) == 2 && dx[2] == dy[1])){
if (debug.obj){
.debug.f <<- list(dx = dx, dy = dy)
stop("Dims don't match. See .debug.f")
}
stop("Dims don't match.")
}
x %*% y
}
# example usage
f(matrix(1,1,1), matrix(2,2,2), debug.obj = TRUE)
# Error in f(matrix(1, 1, 1), matrix(2, 2, 2), debug.obj = TRUE) :
# Dims don't match. See .debug.f
.debug.f
# $dx
# [1] 1 1
#
# $dy
# [1] 2 2
Even this might be a bad idea, though.

How to explicitly call the default value of a function argument in R?

How can I tell R to use the default value of a function argument without i) omitting the argument in the function call and ii) without knowing what the default value is?
I know I can use the default value of mean in rnorm():
rnorm(n = 100) # by omitting the argument
# or
rnorm(n = 100, mean = 0) # by including it in the call with the default value
But assume I don't know the default value but want to include it explicitly in the function call. How can I achieve that?
You can access the argument list and default values via:
> formals(rnorm)
$n
$mean
[1] 0
$sd
[1] 1
formals("rnorm") also works. Some simple examples:
> rnorm(10,mean = formals(rnorm)$mean)
[1] -0.5376897 0.4372421 0.3449424 -0.9569394 -1.1459726 -0.6109554 0.1907090 0.2991381 -0.2713715
[10] -1.4462570
> rnorm(10,mean = formals(rnorm)$mean + 3)
[1] 2.701544 2.863189 1.709289 2.987687 2.848045 5.136735 2.559616 3.827967 3.079658 5.016970
Obviously, you could store the result of formals(rnorm) ahead of time as well.
As #joran has already pointed out, formals() exposes the default values. However, as I understand the question, what you're really after is the construction of the call expression. To that end, it is useful to combine formals() with as.call() to produce the call itself. The following function does just that, by producing a function that produces "argument-completed calls," for a given function name f:
drop_missing <- function(sig) {
sig[!sapply(sig, identical, quote(expr =))]
}
complete_call <- function(f) {
nm <- as.name(f)
sig <- formals(args(f))
make_call <- function() {
args <- match.call()[-1]
sig[names(args)] <- args
as.call(c(nm, drop_missing(sig)))
}
formals(make_call) <- sig
make_call
}
Example usage:
complete_call("log")(1)
#> log(x = 1, base = exp(1))
complete_call("rnorm")(10)
#> rnorm(n = 10, mean = 0, sd = 1)
complete_call("rnorm")()
#> rnorm(mean = 0, sd = 1)
Remarks:
1) The output is a language object. To execute the call, you need to evaluate it, e.g.,
eval(complete_call("rnorm")(10))
#> [1] -0.89428324 -1.78405483 -1.83972728 ... (output truncated)
2) If you want complete_call() to accept a function, rather than the name of a function, you could write nm <- as.name(deparse(substitute(f))) in place of the given assignment. However, that would not work in a nested call, where you would get as.name("f") for nm, because of R's rules fo lexical scoping.
3) Without the call to args() in the assignment of sig, complete_call() would only work for closures, since primitive and builtin functions don't have formals.

print list names while iterating in lapply not working in 3.2

I'm trying o output the list names every time I run the function thru lapply. I posted this question earlier that I posted earlier, and the answer provided by #Ananda Mahto worked fine until I upgraded my R to version 3.2.0. It is no longer working and I get the following error message: Error in eval.parent(quote(names(X)))[substitute(x)[[3]]] : invalid subscript type 'symbol'
x <- ts(rnorm(40,5), start = c(1961, 1), frequency = 12)
y <- ts(rnorm(50,20), start = c(1971, 1), frequency = 12)
z <- ts(rnorm(50,39), start = c(1981, 1), frequency = 12)
a <- ts(rnorm(50,59), start = c(1991, 1), frequency = 12)
dat.list <- list(x=x,y=y,z=z,a=a)
abc <- function(x) {
r <- mean(x)
print(eval.parent(quote(names(X)))[substitute(x)[[3]]])
return(r)
}
forl <- lapply(dat.list, abc)
I'm not sure what the issues is, but I checked all the syntax in the new version of R nothing has changed. Any help is greatly appreciated. I'm open to any new ideas as well.
If you restructure it a little, you can get the same effect with a simpler appearance:
set.seed(42)
## using your dat.list construction code from above
abc <- function(x) { r <- mean(x); return(r); }
forl <- mapply(function(n, x) {
message(n)
abc(x)
}, names(dat.list), dat.list, SIMPLIFY=FALSE)
## x
## y
## z
## a
forl
## $x
## [1] 4.960464
## $y
## [1] 20.1141
## $z
## [1] 38.87175
## $a
## [1] 58.89825
I'm presuming you wanted the output in a list vice a vector, ergo SIMPLIFY=FALSE to mimic lapply. Otherwise, it's a simpler vector.
It's not as generic in that you have to explicitly pass the names as well as the dat, though you can create a sandwich function at the cost of having to reimplement lapply functionality.
A personal coding preference I'm using here is the use of message over print. The rationale is that the user has the option to use suppressMessages (outside the mapply, assuming it could/would be buried in a function), whereas suppressing the output of a print call is a bit more work.

Resources