I am trying to make a function in R that outputs a data frame in a standard way, but that also allows the user to have the personalized columns that he deams necessary (the goal is to make a data format for paleomagnetic data, for which there are common informations that everybody use, and some more unusual that the user might like to keep in the format).
However, I realized that if the user wants the header of his data to be a prefix of one of the defined arguments of the data formating function (e.g. via the 'sheep' argument, that is a prefix of the 'sheepc' argument, see example below), the function interprets it as the defined argument (through partial name identification, see http://adv-r.had.co.nz/Functions.html#lexical-scoping for more details).
Is there a way to prevent this, or to at least give a warning to the user saying that he cannot use this name ?
PS I realize this question is similar to Disabling partial variable names in subsetting data frames, but I would like to avoid toying with the options of the future users of my function.
fun <- function(sheeta = 1, sheetb = 2, sheepc = 3, ...)
{
# I use the sheeta, sheetb and sheepc arguments for computations
# (more complex than shown below, but here thet are just there to give an example)
a <- sum(sheeta, sheetb)
df1 <- data.frame(standard = rep(a, sheepc))
df2 <- as.data.frame(list(...))
if(nrow(df1) == nrow(df2)){
res <- cbind(df1, df2)
return(res)
} else {
stop("Extra elements should be of length ", sheep)
}
}
fun(ball = rep(1,3))
#> standard ball
#> 1 3 1
#> 2 3 1
#> 3 3 1
fun(sheep = rep(1,3))
#> Error in rep(a, sheepc): argument 'times' incorrect
fun(sheet = rep(1,3))
#> Error in fun(sheet = rep(1, 3)) :
#> argument 1 matches multiple formal arguments
From the language definition:
If the formal arguments contain ‘...’ then partial matching is only
applied to arguments that precede it.
fun <- function(..., sheeta = 1, sheetb = 2, sheepc = 3)
{<your function body>}
fun(sheep = rep(1,3))
# standard sheep
#1 3 1
#2 3 1
#3 3 1
Of course, your function should have assertion checks for the non-... parameters (see help("stopifnot")). You could also consider adding a . or _ to their tags to make name collisions less likely.
Edit:
"would it be possible to achieve the same effect without having the ... at the beginning ?"
Yes, here is a quick example with one parameter:
fun <- function(sheepc = 3, ...)
{
stopifnot("partial matching detected" = identical(sys.call(), match.call()))
list(...)
}
fun(sheep = rep(1,3))
# Error in fun(sheep = rep(1, 3)) : partial matching detected
fun(ball = rep(1,3))
#$ball
#[1] 1 1 1
Related
Background
Packages can include a lot of functions. Some of them require informative error messages, and perhaps some comments in the function to explain what/why is happening. An example, f1 in a hypothetical f1.R file. All documentation and comments (both why the error and why the condition) in one place.
f1 <- function(x){
if(!is.character(x)) stop("Only characters suported")
# user input ...
# .... NaN problem in g()
# ....
# ratio of magnitude negative integer i base ^ i is positive
if(x < .Machine$longdouble.min.exp / .Machine$longdouble.min.exp) stop("oof, an error")
log(x)
}
f1(-1)
# >Error in f1(-1) : oof, an error
I create a separate conds.R, specifying a function (and w warning, s suggestion) etc, for example.
e <- function(x){
switch(
as.character(x),
"1" = "Only character supported",
# user input ...
# .... NaN problem in g()
# ....
"2" = "oof, and error") |>
stop()
}
Then in, say, f.R script I can define f2 as
f2 <- function(x){
if(!is.character(x)) e(1)
# ratio of magnitude negative integer i base ^ i is positive
if(x < .Machine$longdouble.min.exp / .Machine$longdouble.min.exp) e(2)
log(x)
}
f2(-1)
#> Error in e(2) : oof, and error
Which does throw the error, and on top of it a nice traceback & rerun with debug option in the console. Further, as package maintainer I would prefer this as it avoids considering writing terse if statements + 1-line error message or aligning comments in a tryCatch statement.
Question
Is there a reason (not opinion on syntax) to avoid writing a conds.R in a package?
There is no reason to avoid writing conds.R. This is very common and good practice in package development, especially as many of the checks you want to do will be applicable across many functions (like asserting the input is character, as you've done above. Here's a nice example from dplyr.
library(dplyr)
df <- data.frame(x = 1:3, x = c("a", "b", "c"), y = 4:6)
names(df) <- c("x", "x", "y")
df
#> x x y
#> 1 1 a 4
#> 2 2 b 5
#> 3 3 c 6
df2 <- data.frame(x = 2:4, z = 7:9)
full_join(df, df2, by = "x")
#> Error: Input columns in `x` must be unique.
#> x Problem with `x`.
nest_join(df, df2, by = "x")
#> Error: Input columns in `x` must be unique.
#> x Problem with `x`.
traceback()
#> 7: stop(fallback)
#> 6: signal_abort(cnd)
#> 5: abort(c(glue("Input columns in `{input}` must be unique."), x = glue("Problem with {err_vars(vars[dup])}.")))
#> 4: check_duplicate_vars(x_names, "x")
#> 3: join_cols(tbl_vars(x), tbl_vars(y), by = by, suffix = c("", ""), keep = keep)
#> 2: nest_join.data.frame(df, df2, by = "x")
#> 1: nest_join(df, df2, by = "x")
Here, both functions rely code written in join-cols.R. Both call join_cols() which in turn calls check_duplicate_vars(), which I've copied the source code from:
check_duplicate_vars <- function(vars, input, error_call = caller_env()) {
dup <- duplicated(vars)
if (any(dup)) {
bullets <- c(
glue("Input columns in `{input}` must be unique."),
x = glue("Problem with {err_vars(vars[dup])}.")
)
abort(bullets, call = error_call)
}
}
Although different in syntax from what you wrote, it's designed to provide the same behaviour, and shows it is possible to include in a package and no reason (from my understanding) not to do this. However, I would add a few syntax points based on your code above:
I would bundle the check (if() statement) inside the package with the error raising to reduce repeating yourself in other areas you use the function.
It's often nicer to include the name of the variable or argument passed in so the error message is explicit, such as in the dplyr example above. This makes the error more clear to the user what is causing the problem, in this case, that the x column is not unique in df.
The traceback showing #> Error in e(2) : oof, and error in your example is more obscure to the user, especially as e() is likely not exported in the NAMESPACE and they would need to parse the source code to understand where the error is generated. If you use stop(..., .call = FALSE) or passing the calling environment through the nested functions, like in join-cols.R, then you can avoid not helpful information in the traceback(). This is for instance suggested in Hadley's Advanced R:
By default, the error message includes the call, but this is typically not useful (and recapitulates information that you can easily get from traceback()), so I think it’s good practice to use call. = FALSE
I have been playing around with R6 ab bit and tried to implement a replacement function (similar in spirit to base::`diag<-`()). I wasn't hugely surprised to learn that the following does not work
library(R6)
r6_class <- R6Class("r6_class",
public = list(
initialize = function(x) private$data <- x,
elem = function(i) private$data[i],
`elem<-` = function(i, val) private$data[i] <- val
),
private = list(
data = NULL
)
)
test <- r6_class$new(1:5)
test$elem(2)
#> [1] 2
test$elem(2) <- 3
#> Error in test$elem(2) <- 3 :
#> target of assignment expands to non-language object
What does this correspond to in prefix notation? All of the following work as expected, so I guess it's none of these
test$`elem<-`(2, 3)
`$`(test, "elem<-")(2, 3)
I'm less interested in possible workarounds, but more in understanding why the above is invalid.
You are allowed to have nested complex assignments, e.g.
names(x)[3] <- "c"
but
test$elem(2) <- 3
is not of that form. It would be legal syntax as
elem(test,2) <- 3
which would expand to
*tmp* <- test
test <- `elem<-`(*tmp*, 2, 3)
but in the original form it would have to expand to
*tmp* <- 2
2 <- `test$elem<-`(*tmp*, 3)
(I've used test$elem<- in backticks to suggest it's the assignment version of the function returned by test$elem. That's not really right, there is no such thing.) The main problem is that the object being modified is 2, so you get the error message you saw: you're not allowed to modify 2.
If you want to do this in R6, I think you could do it something like this. Define a global function
`elem<-` <- function(x, arg, value) x$`elem<-`(arg, value)
and change the definition of your class elem<- method to
`elem<-` = function(i, val) { private$data[i] <- val; self }
Not all that convenient to need two definitions for every assignment method, but it appears to work.
Am working through the section on vectors in "The Book on R", which has given the following examples:
length(x=c(3,2,8,1))
# [1] 4
length(x=5:13)
# [1] 9
foo <- 4
bar <- c(3,8.3,rep(x=32,times=foo),seq(from=-2,to=1,length.out=foo+1))
length(x=bar)
# [1] 11
But if the input length(c(3,2,8,1)) is going to give you the output 4 anyway, why would you add in x=? What is the purpose of x=? At first I thought it had to do with variables but R did not reflect that x was holding the vector (3,2,8,1) after I typed length(x=c(3,2,8,1)).
And why does length(y=c(5:13)) does not work but gives an error:
Error in length(y = 5:13) : supplied argument name 'y' does not match 'x'
R has named arguments for functions. Check this section of R's doc for some information on the subject.
So x is just the name that was given to the first argument of function length, it has nothing to do with any variable in your environment that may be named x.
Overall, it's a pretty handy feature:
it allows you to pass arguments in any order (if you use the arg = ... syntax)
the function's writer can give hints to users about what type of arguments are expected
combined with auto-completion, it helps to remember a function's syntax and usage
and it is optional, since you can also pass arguments without naming them:
'
matrix(data = 1:12, ncol = 3) # is equivalent to:
matrix(1:12,,3)
You can also use it to write some really confusing stuff (of course, not recommended), such as:
x <- 1:3
length(x = x) # 3
length(x = (x <- 1:4)) # 4 ...
x # 1 2 3 4
By "replacement functions" I mean those mentioned in this thread What are Replacement Functions in R?, ones that look like 'length<-'(x, value). When I was working with such functions I encountered something weird. It seems that a replacement function only works when variables are named according to a certain rule.
Here is my code:
a <- c(1,2,3)
I will try to change the first element of a, using one of the 3 replacement functions below.
'first0<-' <- function(x, value){
x[1] <- value
x
}
first0(a) <- 5
a
# returns [1] 5 2 3.
The first one works pretty well... but then when I change the name of arguments in the definition,
'first1<-' <- function(somex, somevalue){
somex[1] <- somevalue
somex
}
first1(a) <- 9
# Error in `first1<-`(`*tmp*`, value = 9) : unused argument (value = 9)
a
# returns [1] 5 2 3
It fails to work, though the following code is OK:
a <- 'first1<-'(a, 9)
a
# returns [1] 9 2 3
Some other names work well, too, if they are similar to x and value, it seems:
'first2<-' <- function(x11, value11){
x11[1] <- value11
x11
}
first2(a) <- 33
a
# returns [1] 33 2 3
This doesn't make sense to me. Do the names of variables actually matter or did I make some mistakes?
There are two things going on here. First, the only real rule of replacement functions is that the new value will be passed as a parameter named value and it will be the last parameter. That's why when you specify the signature function(somex, somevalue), you get the error unused argument (value = 9) and the assignment doesn't work.
Secondly, things work with the signature function(x11, value11) thanks to partial matching of parameter names in R. Consider this example
f<-function(a, value1234=5) {
print(value1234)
}
f(value=5)
# [1] 5
Note that 5 is returned. This behavior is defined under argument matching in the language definition.
Another way to see what's going on is to print the call signature of what's actually being called.
'first0<-' <- function(x, value){
print(sys.call())
x[1] <- value
x
}
a <- c(1,2,3)
first0(a) <- 5
# `first0<-`(`*tmp*`, value = 5)
So the first parameter is actually passed as an unnamed positional parameter, and the new value is passed as the named parameter value=. This is the only parameter name that matters.
Suppose we want our function to be able to deal with two scenarios:
somefun = function(x, y, method, ...) {
res = dowith(x, y)
res
}
somefun = function(z, method, ...) {
x = z$v1
y = z$v2
res = dowith(x, y)
res
}
How can we make somefun aware of the difference between these two situations?
If you can guarantee that x is never going to be a list, you could just use is.list(x) to determine which version of the function is being called. Otherwise, you can use missing:
somefun<-function(x,y,method,...){
if(missing(y)){
cat("Using List version\n")
y<-x$y
x<-x$x
}
else{
cat("Using normal version\n")
}
c(x,y)
}
> somefun(list(x=1,y=2),method="a method")
Using List version
[1] 1 2
> somefun(1,2,method="a method")
Using normal version
[1] 1 2
>
However, be aware that if you do this, and you want to use the list version of the function, then method and everything after it have to be passed in by name, otherwise R is going to bind them to y:
> somefun(list(x=1,y=2),"a method")
Using normal version
$x
[1] 1
$y
[1] 2
[[3]]
[1] "a method"
> somefun(list(x=1,y=2),method="a method",5)
Using normal version
$x
[1] 1
$y
[1] 2
[[3]]
[1] 5
> somefun(list(x=1,y=2),method="a method",q=5)
Using List version
[1] 1 2
I don't know of an automatic way to do this, but when dealing with these types of situations, it is sometimes helpful to use switch. Here's a basic example:
somefun <- function(x, y = NULL, type = c("DF", "vecs"), method = NULL, ...) {
switch(type,
DF = sum(x[["v1"]], x[["v2"]]),
vecs = sum(x, y),
stop("'type' must be either 'DF' or 'vecs'"))
}
somefun(x = 10, y = 3, type="vecs")
# [1] 13
somefun(x = data.frame(v1 = 2, v2 = 4), type="DF")
# [1] 6
somefun(x = data.frame(v1 = 2, v2 = 4), type = "meh")
# Error in somefun(x = data.frame(v1 = 2, v2 = 4), type = "meh") :
# 'type' must be either 'DF' or 'vecs'
In the above, we're expecting that the user must enter a type argument where the acceptable values are "DF" or "vecs", and where a different set of operations has been defined for each option.
Of course, I would also script out a set of different scenarios and use some condition checking at the start of the function to make sure things will be working as expected. For instance, if you expect that most of the times, people will be inputting a data.frame, you could do something like if (is.null(y) & is.null(type)) temp <- "DF" (or insert a try type statement in there). At the end of the day, it also comes down to whether you can predict a sensible set of default values.
If your functions are complicated, you might want to separate out the steps that go into the switches into separate functions as this would probably lead to more readable (and more easily reusable) code.