Background
Packages can include a lot of functions. Some of them require informative error messages, and perhaps some comments in the function to explain what/why is happening. An example, f1 in a hypothetical f1.R file. All documentation and comments (both why the error and why the condition) in one place.
f1 <- function(x){
if(!is.character(x)) stop("Only characters suported")
# user input ...
# .... NaN problem in g()
# ....
# ratio of magnitude negative integer i base ^ i is positive
if(x < .Machine$longdouble.min.exp / .Machine$longdouble.min.exp) stop("oof, an error")
log(x)
}
f1(-1)
# >Error in f1(-1) : oof, an error
I create a separate conds.R, specifying a function (and w warning, s suggestion) etc, for example.
e <- function(x){
switch(
as.character(x),
"1" = "Only character supported",
# user input ...
# .... NaN problem in g()
# ....
"2" = "oof, and error") |>
stop()
}
Then in, say, f.R script I can define f2 as
f2 <- function(x){
if(!is.character(x)) e(1)
# ratio of magnitude negative integer i base ^ i is positive
if(x < .Machine$longdouble.min.exp / .Machine$longdouble.min.exp) e(2)
log(x)
}
f2(-1)
#> Error in e(2) : oof, and error
Which does throw the error, and on top of it a nice traceback & rerun with debug option in the console. Further, as package maintainer I would prefer this as it avoids considering writing terse if statements + 1-line error message or aligning comments in a tryCatch statement.
Question
Is there a reason (not opinion on syntax) to avoid writing a conds.R in a package?
There is no reason to avoid writing conds.R. This is very common and good practice in package development, especially as many of the checks you want to do will be applicable across many functions (like asserting the input is character, as you've done above. Here's a nice example from dplyr.
library(dplyr)
df <- data.frame(x = 1:3, x = c("a", "b", "c"), y = 4:6)
names(df) <- c("x", "x", "y")
df
#> x x y
#> 1 1 a 4
#> 2 2 b 5
#> 3 3 c 6
df2 <- data.frame(x = 2:4, z = 7:9)
full_join(df, df2, by = "x")
#> Error: Input columns in `x` must be unique.
#> x Problem with `x`.
nest_join(df, df2, by = "x")
#> Error: Input columns in `x` must be unique.
#> x Problem with `x`.
traceback()
#> 7: stop(fallback)
#> 6: signal_abort(cnd)
#> 5: abort(c(glue("Input columns in `{input}` must be unique."), x = glue("Problem with {err_vars(vars[dup])}.")))
#> 4: check_duplicate_vars(x_names, "x")
#> 3: join_cols(tbl_vars(x), tbl_vars(y), by = by, suffix = c("", ""), keep = keep)
#> 2: nest_join.data.frame(df, df2, by = "x")
#> 1: nest_join(df, df2, by = "x")
Here, both functions rely code written in join-cols.R. Both call join_cols() which in turn calls check_duplicate_vars(), which I've copied the source code from:
check_duplicate_vars <- function(vars, input, error_call = caller_env()) {
dup <- duplicated(vars)
if (any(dup)) {
bullets <- c(
glue("Input columns in `{input}` must be unique."),
x = glue("Problem with {err_vars(vars[dup])}.")
)
abort(bullets, call = error_call)
}
}
Although different in syntax from what you wrote, it's designed to provide the same behaviour, and shows it is possible to include in a package and no reason (from my understanding) not to do this. However, I would add a few syntax points based on your code above:
I would bundle the check (if() statement) inside the package with the error raising to reduce repeating yourself in other areas you use the function.
It's often nicer to include the name of the variable or argument passed in so the error message is explicit, such as in the dplyr example above. This makes the error more clear to the user what is causing the problem, in this case, that the x column is not unique in df.
The traceback showing #> Error in e(2) : oof, and error in your example is more obscure to the user, especially as e() is likely not exported in the NAMESPACE and they would need to parse the source code to understand where the error is generated. If you use stop(..., .call = FALSE) or passing the calling environment through the nested functions, like in join-cols.R, then you can avoid not helpful information in the traceback(). This is for instance suggested in Hadley's Advanced R:
By default, the error message includes the call, but this is typically not useful (and recapitulates information that you can easily get from traceback()), so I think it’s good practice to use call. = FALSE
Related
I am trying to understand why my code produces a different result when run with reprex::reprex() than directly from the script and how to consistently produce the output of the reprex() call. The issue emerges within the filter() call.
Example 1 shows my function filters the data.frame rows based on a column's matches with another vector when I select, copy, and then run it with reprex::reprex() in RStudio.
Example 2 (screenshot from the console output) shows that running the exact same code directly in the script throws a 'match' requires vector arguments error.
Example 3 shows with a slight modification of the function that !!sym() appears to be creating some sort of time series object. Omitting sym() and replace == with %in% has the same consequence.
UPDATE:
The issue did not replicate on others' machines nor my own. I swapped out of an RStudio project to a single .R file and it still persisted. However, when I Cntrl+Shift+F10 to detach libraries, data, etc. the discrepancy vanished. This suggested that I was deal with some sort of namespace issue. Upon returning to the RStudio Project, the issue returned. However, calling dplyr::filter() within the function resolved the issue - reinforcing it being a namespace issue.
While the accepted answer provides some solutions and correctly identifies the issue, the outstanding question (for another post) is why the namespace precedence was not applied in this case when I loaded the package immediately beforehand.
Example 1: !!sym() produces a vector for %in% as expected when code is run with reprex::reprex()
# Packages
library(dplyr)
library(rlang)
# Example data
mydat <- data.frame(type = c("a","b","c","a","c"))
myvec <- c("a","c")
# Example function
foo <- function(df, type_var = "type", vec){
df %>%
filter(!!sym(type_var) %in% vec)
}
# Call function
foo(df = mydat, type_var = "type", vec = myvec)
#> type
#> 1 a
#> 2 c
#> 3 a
#> 4 c
Example 2: Console output shows type error when run from within an R script
Example 3: slightly modified function shows that !!sym() is creating a time series object?!
# Example function
foo <- function(df, type_var = "type", vec){
df %>%
filter(!!sym(type_var) == "a")
}
# Apply function
foo(df = mydat, type_var = "type", vec = myvec)
#>Time Series:
#>Start = 1
#>End = 5
#>Frequency = 1
#> [,1]
#> [1,] 0
#> [2,] 0
#> [3,] 0
#> [4,] 0
#> [5,] 0
It's related to which version of filter is being used and whether it's imported from stats or dplyr. I suspect you have an ~/.Rprofile somewhere that's loading some library functions which are being loaded sometimes and not others.
Changing example 3 to
foo <- function(df, type_var = "type", vec){
df %>%
dplyr::filter(!!sym(type_var) == "a")
}
# Apply function
foo(df = mydat, type_var = "type", vec = myvec)
yields:
type
1 a
2 a
Similarly changing example 1 to:
library(dplyr)
library(rlang)
# Example data
mydat <- data.frame(type = c("a","b","c","a","c"))
myvec <- c("a","c")
# Example function
foo <- function(df, type_var = "type", vec){
df %>%
dplyr::filter(!!sym(type_var) %in% vec)
}
# Call function
foo(df = mydat, type_var = "type", vec = myvec)
gives:
type
1 a
2 c
3 a
4 c
Beware of namespace collisions when running R in console/Rscript etc, it can be hard to track down bugs. filter and lag are the chief culprits (source I almost had to retract a journal paper because lag was imported from the wrong namespace on an Rscript and failed in a weird and silent way).
I am trying to make a function in R that outputs a data frame in a standard way, but that also allows the user to have the personalized columns that he deams necessary (the goal is to make a data format for paleomagnetic data, for which there are common informations that everybody use, and some more unusual that the user might like to keep in the format).
However, I realized that if the user wants the header of his data to be a prefix of one of the defined arguments of the data formating function (e.g. via the 'sheep' argument, that is a prefix of the 'sheepc' argument, see example below), the function interprets it as the defined argument (through partial name identification, see http://adv-r.had.co.nz/Functions.html#lexical-scoping for more details).
Is there a way to prevent this, or to at least give a warning to the user saying that he cannot use this name ?
PS I realize this question is similar to Disabling partial variable names in subsetting data frames, but I would like to avoid toying with the options of the future users of my function.
fun <- function(sheeta = 1, sheetb = 2, sheepc = 3, ...)
{
# I use the sheeta, sheetb and sheepc arguments for computations
# (more complex than shown below, but here thet are just there to give an example)
a <- sum(sheeta, sheetb)
df1 <- data.frame(standard = rep(a, sheepc))
df2 <- as.data.frame(list(...))
if(nrow(df1) == nrow(df2)){
res <- cbind(df1, df2)
return(res)
} else {
stop("Extra elements should be of length ", sheep)
}
}
fun(ball = rep(1,3))
#> standard ball
#> 1 3 1
#> 2 3 1
#> 3 3 1
fun(sheep = rep(1,3))
#> Error in rep(a, sheepc): argument 'times' incorrect
fun(sheet = rep(1,3))
#> Error in fun(sheet = rep(1, 3)) :
#> argument 1 matches multiple formal arguments
From the language definition:
If the formal arguments contain ‘...’ then partial matching is only
applied to arguments that precede it.
fun <- function(..., sheeta = 1, sheetb = 2, sheepc = 3)
{<your function body>}
fun(sheep = rep(1,3))
# standard sheep
#1 3 1
#2 3 1
#3 3 1
Of course, your function should have assertion checks for the non-... parameters (see help("stopifnot")). You could also consider adding a . or _ to their tags to make name collisions less likely.
Edit:
"would it be possible to achieve the same effect without having the ... at the beginning ?"
Yes, here is a quick example with one parameter:
fun <- function(sheepc = 3, ...)
{
stopifnot("partial matching detected" = identical(sys.call(), match.call()))
list(...)
}
fun(sheep = rep(1,3))
# Error in fun(sheep = rep(1, 3)) : partial matching detected
fun(ball = rep(1,3))
#$ball
#[1] 1 1 1
I am trying to use recode and mutate_all to recode columns. However, for some reason, I am getting an error. I do believe this post is similar to how to recode (and reverse code) variables in columns with dplyr but the answer in that post has used lapply function.
Here's what I tried after reading dplyr package's help pdf.
by_species<-matrix(c(1,2,3,4),2,2)
tbl_species<-as_data_frame(by_species)
tbl_species %>% mutate_all(funs(. * 0.4))
# A tibble: 2 x 2
V1 V2
<dbl> <dbl>
1 0.4 1.2
2 0.8 1.6
So, this works well.
However, this doesn't work:
grades<-matrix(c("A","A-","B","C","D","B-","C","C","F"),3,3)
tbl_grades <- as_data_frame(grades)
tbl_grades %>% mutate_all(funs(dplyr::recode(.,A = '4.0')))
I get this error:
Error in vapply(dots[missing_names], function(x) make_name(x$expr), character(1)) :
values must be length 1,
but FUN(X[[1]]) result is length 3
Can someone please explain what's the problem and why above code isn't working?
I'd appreciate any help.
Thanks
#Mir has done a good job describing the problem. Here's one possible workaround. Since the problem is in generating the name, you can supply your own name
tbl_grades %>% mutate_all(funs(recode=recode(.,A = '4.0')))
Now this does add columns rather than replace them. Here's a function that will "forget" that you supplied those names
dropnames<-function(x) {if(is(x,"lazy_dots")) {attr(x,"has_names")<-FALSE}; x}
tbl_grades %>% mutate_all(dropnames(funs(recode=dplyr::recode(.,A = '4.0'))))
This should behave like the original. Although really
tbl_grades %>% mutate_all(dropnames(funs(recode(.,A = '4.0'))))
because dplyr often has special c++ versions of some functions that it can use if it recognized the functions (like lag for example) but this will not happen if you also specify the namespace (if you use dplyr::lag).
If we call it without the dplyr:: then it works fine.
funs(recode(., A = '4.0'))
<fun_calls>
$ recode: recode(., A = "4.0")
tbl_grades %>% mutate_all(funs(recode(. ,A = '4.0')))
# A tibble: 3 x 3
V1 V2 V3
<chr> <chr> <chr>
1 4.0 C C
2 A- D C
3 B B- F
The issue lies in the funs call. If we extract that part out the same error appears.
funs(dplyr::recode(., A = '4.0'))
Error in vapply(dots[missing_names], function(x) make_name(x$expr), character(1)) :
values must be length 1, but FUN(X[[1]]) result is length 3
The issue boils down to the fact that :: is a function itself. (see ?`::`). To visualize this a little better, we look at both the infix and prefix ways of writing the function.
`::`(dplyr, recode)
function (.x, ..., .default = NULL, .missing = NULL)
{
UseMethod("recode")
}
<environment: namespace:dplyr>
dplyr::recode
function (.x, ..., .default = NULL, .missing = NULL)
{
UseMethod("recode")
}
<environment: namespace:dplyr>
funs attempts to extract the function names of its arguments by grabbing the first element of the call object and calling as.character on it. The first element of the call object is the calling function and subsequent elements are the argument values. For example:
as.call(quote(recall(., A = '4.0')))
recall(., A = "4.0")
as.call(quote(recall(., A = '4.0')))[[1]]
recall
as.call(quote(recall(., A = '4.0')))[[2]]
.
as.call(quote(recall(., A = '4.0')))[[3]]
"4.0"
as.call(quote(recall(., A = '4.0')))[[4]]
Error in as.call(quote(recall(., A = "4.0")))[[4]] :
subscript out of bounds
This runs into issues when dplyr::recode is used because this creates a nested call object. When we grab the first element, we get not just a name of a function, but an entire function call.
as.call(quote(dplyr::recall(., A = '4.0')))
dplyr::recall(., A = "4.0")
as.call(quote(dplyr::recall(., A = '4.0')))[[1]]
dplyr::recall
as.call(quote(dplyr::recall(., A = '4.0')))[[1]][[1]]
`::`
as.call(quote(dplyr::recall(., A = '4.0')))[[1]][[2]]
dplyr
as.call(quote(dplyr::recall(., A = '4.0')))[[1]][[3]]
recall
In contrast to when recode is called without dplyr::.
as.call(quote(recall(., A = '4.0')))[[1]][[1]]
Error in as.call(quote(recall(., A = "4.0")))[[1]][[1]] :
object of type 'symbol' is not subsettable
Because the first element when dplyr:: is included is a whole function call, as.character results in a vector that has both the name of a function and its arguments.
as.call(quote(dplyr::recall(., A = '4.0')))[[1]] %>% as.character()
[1] "::" "dplyr" "recall"
Funs reasonably expects the name of the function to have only one element, not three, and thus errors out.
Am working through the section on vectors in "The Book on R", which has given the following examples:
length(x=c(3,2,8,1))
# [1] 4
length(x=5:13)
# [1] 9
foo <- 4
bar <- c(3,8.3,rep(x=32,times=foo),seq(from=-2,to=1,length.out=foo+1))
length(x=bar)
# [1] 11
But if the input length(c(3,2,8,1)) is going to give you the output 4 anyway, why would you add in x=? What is the purpose of x=? At first I thought it had to do with variables but R did not reflect that x was holding the vector (3,2,8,1) after I typed length(x=c(3,2,8,1)).
And why does length(y=c(5:13)) does not work but gives an error:
Error in length(y = 5:13) : supplied argument name 'y' does not match 'x'
R has named arguments for functions. Check this section of R's doc for some information on the subject.
So x is just the name that was given to the first argument of function length, it has nothing to do with any variable in your environment that may be named x.
Overall, it's a pretty handy feature:
it allows you to pass arguments in any order (if you use the arg = ... syntax)
the function's writer can give hints to users about what type of arguments are expected
combined with auto-completion, it helps to remember a function's syntax and usage
and it is optional, since you can also pass arguments without naming them:
'
matrix(data = 1:12, ncol = 3) # is equivalent to:
matrix(1:12,,3)
You can also use it to write some really confusing stuff (of course, not recommended), such as:
x <- 1:3
length(x = x) # 3
length(x = (x <- 1:4)) # 4 ...
x # 1 2 3 4
I searched for a reference to learn about replacement functions in R, but I haven't found any yet. I'm trying to understand the concept of the replacement functions in R. I have the code below but I don't understand it:
"cutoff<-" <- function(x, value){
x[x > value] <- Inf
x
}
and then we call cutoff with:
cutoff(x) <- 65
Could anyone explain what a replacement function is in R?
When you call
cutoff(x) <- 65
you are in effect calling
x <- "cutoff<-"(x = x, value = 65)
The name of the function has to be quoted as it is a syntactically valid but non-standard name and the parser would interpret <- as the operator not as part of the function name if it weren't quoted.
"cutoff<-"() is just like any other function (albeit with a weird name); it makes a change to its input argument on the basis of value (in this case it is setting any value in x greater than 65 to Inf (infinite)).
The magic is really being done when you call the function like this
cutoff(x) <- 65
because R is parsing that and pulling out the various bits to make the real call shown above.
More generically we have
FUN(obj) <- value
R finds function "FUN<-"() and sets up the call by passing obj and value into "FUN<-"() and arranges for the result of "FUN<-"() to be assigned back to obj, hence it calls:
obj <- "FUN<-"(obj, value)
A useful reference for this information is the R Language Definition Section 3.4.4: Subset assignment ; the discussion is a bit oblique, but seems to be the most official reference there is (replacement functions are mentioned in passing in the R FAQ (differences between R and S-PLUS), and in the R language reference (various technical issues), but I haven't found any further discussion in official documentation).
Gavin provides an excellent discussion of the interpretation of the replacement function. I wanted to provide a reference since you also asked for that: R Language Definition Section 3.4.4: Subset assignment.
As a complement to the accepted answer I would like to note that replacement functions can be defined also for non standard functions, namely operators (see ?Syntax) and control flow constructs. (see ?Control).
Note also that it is perfectly acceptable to design a generic and associated methods for replacement functions.
operators
When defining a new class it is common to define S3 methods for $<-, [[<- and [<-, some examples are data.table:::`$<-.data.table`, data.table:::`[<-.data.table`, or tibble:::`$.tbl_df`.
However for any other operator we can write a replacement function, some examples :
`!<-` <- function(x, value) !value
x <- NULL # x needs to exist before replacement functions are used!
!x <- TRUE
x
#> [1] FALSE
`==<-` <- function(e1, e2, value) replace(e1, e1 == e2, value)
x <- 1:3
x == 2 <- 200
x
#> [1] 1 200 3
`(<-` <- function(x, value) sapply(x, value, USE.NAMES = FALSE)
x <- c("foo", "bar")
(x) <- toupper
x
#> [1] "FOO" "BAR"
`%chrtr%<-` <- function(e1, e2, value) {
chartr(e2, value, e1)
}
x <- "woot"
x %chrtr% "o" <- "a"
x
#> [1] "waat"
we can even define <-<-, but the parser will prevent its usage if we call x <- y <- z, so we need to use the left to right assignment symbol
`<-<-` <- function(e1, e2, value){
paste(e2, e1, value)
}
x <- "b"
"a" -> x <- "c"
x
#> [1] "a b c"
Fun fact, <<- can have a double role
x <- 1:3
x < 2 <- NA # this fails but `<<-` was called!
#> Error in x < 2 <- NA: incorrect number of arguments to "<<-"
# ok let's define it then!
`<<-` <- function(x, y, value){
if (missing(value)) {
eval.parent(substitute(.Primitive("<<-")(x, y)))
} else {
replace(x, x < y, value)
}
}
x < 2 <- NA
x
#> [1] NA 2 3
x <<- "still works"
x
#> [1] "still works"
control flow constructs
These are in practice seldom encountered (in fact I'm responsible for the only practical use I know, in defining for<- for my package pbfor), but R is flexible enough, or crazy enough, to allow us to define them. However to actually use them, due to the way control flow constructs are parsed, we need to use the left to right assignment ->.
`repeat<-` <- function(x, value) replicate(value, x)
x <- "foo"
3 -> repeat x
x
#> [1] "foo" "foo" "foo"
function<-
function<- can be defined in principle but to the extent of my knowledge we can't do anything with it.
`function<-` <- function(x,value){NULL}
3 -> function(arg) {}
#> Error in function(arg) {: target of assignment expands to non-language object
Remember, in R everything operation is a function call (therefore also the assignment operations) and everything that exists is an object.
Replacement functions act as if they modify their arguments in place such as in
colnames(d) <- c("Input", "Output")
They have the identifier <- at the end of their name and return a modified copy of the argument object (non-primitive replacement functions) or the same object (primitive replacement functions)
At the R prompt, the following will not work:
> `second` <- function(x, value) {
+ x[2] <- value
+ x
+ }
> x <- 1:10
> x
[1] 1 2 3 4 5 6 7 8 9 10
> second(x) <- 9
Error in second(x) <- 9: couldn't find function "second<-"
As you can see, R is searching the environment not for second but for second<-.
So lets do the same thing but using such a function identifier instead:
> `second<-` <- function(x, value) {
+ x[2] <- value
+ x
+ }
Now, the assignment at the second position of the vector works:
> second(x) <- 9
> x
[1] 1 9 3 4 5 6 7 8 9 10
I also wrote a simple script to list all replacement functions in R base package, find it here.