"Argument x" in length function - r

Am working through the section on vectors in "The Book on R", which has given the following examples:
length(x=c(3,2,8,1))
# [1] 4
length(x=5:13)
# [1] 9
foo <- 4
bar <- c(3,8.3,rep(x=32,times=foo),seq(from=-2,to=1,length.out=foo+1))
length(x=bar)
# [1] 11
But if the input length(c(3,2,8,1)) is going to give you the output 4 anyway, why would you add in x=? What is the purpose of x=? At first I thought it had to do with variables but R did not reflect that x was holding the vector (3,2,8,1) after I typed length(x=c(3,2,8,1)).
And why does length(y=c(5:13)) does not work but gives an error:
Error in length(y = 5:13) : supplied argument name 'y' does not match 'x'

R has named arguments for functions. Check this section of R's doc for some information on the subject.
So x is just the name that was given to the first argument of function length, it has nothing to do with any variable in your environment that may be named x.
Overall, it's a pretty handy feature:
it allows you to pass arguments in any order (if you use the arg = ... syntax)
the function's writer can give hints to users about what type of arguments are expected
combined with auto-completion, it helps to remember a function's syntax and usage
and it is optional, since you can also pass arguments without naming them:
'
matrix(data = 1:12, ncol = 3) # is equivalent to:
matrix(1:12,,3)
You can also use it to write some really confusing stuff (of course, not recommended), such as:
x <- 1:3
length(x = x) # 3
length(x = (x <- 1:4)) # 4 ...
x # 1 2 3 4

Related

Disable partial name idenfication of function arguments

I am trying to make a function in R that outputs a data frame in a standard way, but that also allows the user to have the personalized columns that he deams necessary (the goal is to make a data format for paleomagnetic data, for which there are common informations that everybody use, and some more unusual that the user might like to keep in the format).
However, I realized that if the user wants the header of his data to be a prefix of one of the defined arguments of the data formating function (e.g. via the 'sheep' argument, that is a prefix of the 'sheepc' argument, see example below), the function interprets it as the defined argument (through partial name identification, see http://adv-r.had.co.nz/Functions.html#lexical-scoping for more details).
Is there a way to prevent this, or to at least give a warning to the user saying that he cannot use this name ?
PS I realize this question is similar to Disabling partial variable names in subsetting data frames, but I would like to avoid toying with the options of the future users of my function.
fun <- function(sheeta = 1, sheetb = 2, sheepc = 3, ...)
{
# I use the sheeta, sheetb and sheepc arguments for computations
# (more complex than shown below, but here thet are just there to give an example)
a <- sum(sheeta, sheetb)
df1 <- data.frame(standard = rep(a, sheepc))
df2 <- as.data.frame(list(...))
if(nrow(df1) == nrow(df2)){
res <- cbind(df1, df2)
return(res)
} else {
stop("Extra elements should be of length ", sheep)
}
}
fun(ball = rep(1,3))
#> standard ball
#> 1 3 1
#> 2 3 1
#> 3 3 1
fun(sheep = rep(1,3))
#> Error in rep(a, sheepc): argument 'times' incorrect
fun(sheet = rep(1,3))
#> Error in fun(sheet = rep(1, 3)) :
#> argument 1 matches multiple formal arguments
From the language definition:
If the formal arguments contain ‘...’ then partial matching is only
applied to arguments that precede it.
fun <- function(..., sheeta = 1, sheetb = 2, sheepc = 3)
{<your function body>}
fun(sheep = rep(1,3))
# standard sheep
#1 3 1
#2 3 1
#3 3 1
Of course, your function should have assertion checks for the non-... parameters (see help("stopifnot")). You could also consider adding a . or _ to their tags to make name collisions less likely.
Edit:
"would it be possible to achieve the same effect without having the ... at the beginning ?"
Yes, here is a quick example with one parameter:
fun <- function(sheepc = 3, ...)
{
stopifnot("partial matching detected" = identical(sys.call(), match.call()))
list(...)
}
fun(sheep = rep(1,3))
# Error in fun(sheep = rep(1, 3)) : partial matching detected
fun(ball = rep(1,3))
#$ball
#[1] 1 1 1

Assign value to indices of nested lists stored as strings in R

I have a dataframe of nested list indices, which have been stored as strings. To give a simplified example:
df1 <- data.frame(x = c("lst$x$y$a", "lst$x$y$b"), stringsAsFactors = F)
These are then coordinates for the following list:
lst <- list(x=list(y=list(a="foo",b="bar",c="")))
I'd like replace values or assign new values to these elements using the indices in df1.
One attempt was
do.call(`<-`, list(eval(parse(text = df1[1,1])), "somethingelse"))
but this doesn't seem tow work. Instead it assigns "something" to foo.
I'm not too happy with using eval(parse(text=)) (maintaining code will become a nightmare), but recognise I may have little choice.
Any tips welcome.
Let's consider 3 situations:
Case 1
do.call(`<-`, list("lst$x$y$a", "somethingelse"))
This will create a new variable named lst$x$y$a in your workspace, so the following two commands will call different objects. (The former is the object you store in lst, and the latter is the new variable. You need to call it with backticks because its name will confuse R.)
> lst$x$y$a # [1] "foo"
> `lst$x$y$a` # [1] "somethingelse"
Case 2
do.call(`<-`, list(parse(text = "lst$x$y$a"), "somethingelse"))
You mostly get what you expect with this one but an error still occurs:
invalid (do_set) left-hand side to assignment
Let's check:
> parse(text = "lst$x$y$a") # expression(lst$x$y$a)
It belongs to the class expression, and the operator <- seems not to accept this class to the left-hand side.
Case 3
This one will achieve what you want:
do.call(`<-`, list(parse(text = "lst$x$y$a")[[1]], "somethingelse"))
If put [[1]] behind an expression object, a call object will be extracted and take effect in the operator <-.
> lst
# $x
# $x$y
# $x$y$a
# [1] "somethingelse"
#
# $x$y$b
# [1] "bar"
#
# $x$y$c
# [1] ""

Sequential evaluation of named arguments in R

I am trying to understand how to succinctly implement something like the argument capture/parsing/evaluation mechanism that enables the following behavior with dplyr::tibble() (FKA dplyr::data_frame()):
# `b` finds `a` in previous arg
dplyr::tibble(a=1:5, b=a+1)
## a b
## 1 2
## 2 3
## ...
# `b` can't find `a` bc it doesn't exist yet
dplyr::tibble(b=a+1, a=1:5)
## Error in eval_tidy(xs[[i]], unique_output) : object 'a' not found
With base:: classes like data.frame and list, this isn't possible (maybe bc arguments aren't interpreted sequentially(?) and/or maybe bc they get evaluated in the parent environment(?)):
data.frame(a=1:5, b=a+1)
## Error in data.frame(a = 1:5, b = a + 1) : object 'a' not found
list(a=1:5, b=a+1)
## Error: object 'a' not found
So my question is: what might be a good strategy in base R to write a function list2() that is just like base::list() except that it allows tibble() behavior like list2(a=1:5, b=a+1)??
I'm aware that this is part of what "tidyeval" does, but I am interested in isolating the exact mechanism that makes this trick possible. And I'm aware that one could just say list(a <- 1:5, b <- a+1), but I am looking for a solution that does not use global assignment.
What I've been thinking so far: One inelegant and unsafe way to achieve the desired behavior would be the following -- first parse the arguments into strings, then create an environment, add each element to that environment, put them into a list, and return (suggestions for better ways to parse ... into a named list appreciated!):
list2 <- function(...){
# (gross bc we are converting code to strings and then back again)
argstring <- as.character(match.call(expand.dots=FALSE))[2]
argstring <- gsub("^pairlist\\((.+)\\)$", "\\1", argstring)
# (terrible bc commas aren't allowed except to separate args!!!)
argstrings <- strsplit(argstring, split=", ?")[[1]]
env <- new.env()
# (icky bc all args must have names)
for (arg in argstrings){
eval(parse(text=arg), envir=env)
}
vars <- ls(env)
out <- list()
for (var in vars){
out <- c(out, list(eval(parse(text=var), envir=env)))
}
return(setNames(out, vars))
}
This allows us to derive the basic behavior, but it doesn't generalize well at all (see comments in list2() definition):
list2(a=1:5, b=a+1)
## $a
## [1] 1 2 3 4 5
##
## $b
## [1] 2 3 4 5 6
We could introduce hacks to fix little things like producing names when they aren't supplied, e.g. like this:
# (still gross but at least we don't have to supply names for everything)
list3 <- function(...){
argstring <- as.character(match.call(expand.dots=FALSE))[2]
argstring <- gsub("^pairlist\\((.+)\\)$", "\\1", argstring)
argstrings <- strsplit(argstring, split=", ?")[[1]]
env <- new.env()
# if a name isn't supplied, create one of the form `v1`, `v2`, ...
ctr <- 0
for (arg in argstrings){
ctr <- ctr+1
if (grepl("^[a-zA-Z_] ?= ?", arg))
eval(parse(text=arg), envir=env)
else
eval(parse(text=paste0("v", ctr, "=", arg)), envir=env)
}
vars <- ls(env)
out <- list()
for (var in vars){
out <- c(out, list(eval(parse(text=var), envir=env)))
}
return(setNames(out, vars))
}
Then instead of this:
# evaluates `a+b-2`, but doesn't include in `env`
list2(a=1:5, b=a+1, a+b-2)
## $a
## [1] 1 2 3 4 5
##
## $b
## [1] 2 3 4 5 6
We get this:
list3(a=1:5, b=a+1, a+b-2)
## $a
## [1] 1 2 3 4 5
##
## $b
## [1] 2 3 4 5 6
##
## $v3
## [1] 1 3 5 7 9
But it feels like there will still be problematic edge cases even if we fix the issue with commas, with names, etc.
Anyone have any ideas/suggestions/insights/solutions/etc.??
Many thanks!
The reason data.frame(a=1:5, b=a+1) doesn't work is a scoping issue, not an evaluation order issue.
Arguments to a function are normally evaluated in the calling frame. When you say a+1, you are referring to the variable a in the frame that made the call to data.frame, not the column that you are about to create.
dplyr::data_frame does very non-standard evaluation, so it can mix up frames as you saw. It appears to look first in the frame corresponding to the object that is under construction, and second in the usual place.
One way to use the dplyr semantics with a base function is to do both,
e.g.
do.call(data.frame, as.list(dplyr::data_frame(a = 1:5, b = a+1)))
but this is kind of useless: you can convert a tibble to a dataframe directly, and this can't be used with other base functions, since it forces all arguments to the same length.
To write your list2 function, I'd recommend looking at the source of dplyr::data_frame, and do everything it does except the final conversion to a tibble. It's source is deceptively short:
function (...)
{
xs <- quos(..., .named = TRUE)
as_tibble(lst_quos(xs, expand = TRUE))
}
This is deceptive, because lst_quos is a private function in the tibble package, so you'll need your own copy of that, plus any private functions it calls, etc. Unless of course you don't mind using private functions, then here's your list2:
list2 <- function(...) {
xs <- rlang::quos(..., .named = TRUE)
tibble:::lst_quos(xs, expand = TRUE)
}
This will work until the tibble maintainer chooses to change lst_quos, which he's free to do without warning (since it's private). It wouldn't be acceptable code in a CRAN package because of this fragility.

Concerning R, when defining a Replacement Function, do the arguments have to be named as/like "x" and "value"?

By "replacement functions" I mean those mentioned in this thread What are Replacement Functions in R?, ones that look like 'length<-'(x, value). When I was working with such functions I encountered something weird. It seems that a replacement function only works when variables are named according to a certain rule.
Here is my code:
a <- c(1,2,3)
I will try to change the first element of a, using one of the 3 replacement functions below.
'first0<-' <- function(x, value){
x[1] <- value
x
}
first0(a) <- 5
a
# returns [1] 5 2 3.
The first one works pretty well... but then when I change the name of arguments in the definition,
'first1<-' <- function(somex, somevalue){
somex[1] <- somevalue
somex
}
first1(a) <- 9
# Error in `first1<-`(`*tmp*`, value = 9) : unused argument (value = 9)
a
# returns [1] 5 2 3
It fails to work, though the following code is OK:
a <- 'first1<-'(a, 9)
a
# returns [1] 9 2 3
Some other names work well, too, if they are similar to x and value, it seems:
'first2<-' <- function(x11, value11){
x11[1] <- value11
x11
}
first2(a) <- 33
a
# returns [1] 33 2 3
This doesn't make sense to me. Do the names of variables actually matter or did I make some mistakes?
There are two things going on here. First, the only real rule of replacement functions is that the new value will be passed as a parameter named value and it will be the last parameter. That's why when you specify the signature function(somex, somevalue), you get the error unused argument (value = 9) and the assignment doesn't work.
Secondly, things work with the signature function(x11, value11) thanks to partial matching of parameter names in R. Consider this example
f<-function(a, value1234=5) {
print(value1234)
}
f(value=5)
# [1] 5
Note that 5 is returned. This behavior is defined under argument matching in the language definition.
Another way to see what's going on is to print the call signature of what's actually being called.
'first0<-' <- function(x, value){
print(sys.call())
x[1] <- value
x
}
a <- c(1,2,3)
first0(a) <- 5
# `first0<-`(`*tmp*`, value = 5)
So the first parameter is actually passed as an unnamed positional parameter, and the new value is passed as the named parameter value=. This is the only parameter name that matters.

Retrieving arguments of a function call with default values in R

Is there a way to retrieve function arguments from an evaluated formula that are not specified in the function call?
For example, consider the call seq(1, 10). If I wanted to get the first argument, I could use quote() and simply use quote(seq(1,10))[[1]]. However, this only works if the argument is defined at the function call (instead of having a default value) and I need to know its exact position.
In this example, is there some way to get the by argument from seq(1, 10) without a lengthy list of if statements to see if it is defined?
The first thing to note is that all of the named arguments you're after (from, to, by, etc.) belong to seq.default(), the method that is dispatched by your call to seq(), and not to seq() itself. (seq() itself only has one formal, ...).
From there you can use these two building blocks
## (1) Retrieves pairlist of all formals
formals(seq.default)
# [long pairlist object omitted to save space]
## (2) Matches supplied arguments to formals
match.call(definition = seq.default, call = quote(seq.default(1,10)))
# seq.default(from = 1, to = 10)
to do something like this:
modifyList(formals(seq.default),
as.list(match.call(seq.default, quote(seq.default(1,10))))[-1])
# $from
# [1] 1
#
# $to
# [1] 10
#
# $by
# ((to - from)/(length.out - 1))
#
# $length.out
# NULL
#
# $along.with
# NULL
#
# $...

Resources