In R: Why is there no complete list of every argument a function can use? - r

Im using R for about 3 years and one of the main advantages (in my opinion) is the wide range of questions and assistance one can find on stackoverflow and similar websites.
One thing that is missing and kind of annoys me is an entire list of every single argument a function can use (plus possible values of those arguments).
For example: In R documentation all "main" arguments are listed and in many cases the documentation says "... further arguments passed to or from other methods". How can I know which arguments are meant by "..."?
When searching on stackoverflow for a way to get my desired result of an analysis I sometimes stumble about these additional arguments which can be very helpful in many cases. It still takes much time to find these arguments hidden in other users answers. Sometimes I used a workaround which would have been unnecessary if I had known some additional function arguments.
Is anyone else experiencing the same thing?
(It's difficult to mention examples but I remember having that trouble when using the leaflet functions for the first time.)
Tim

The most direct answer is that we often don't know what arguments one might want to pass to .... In fact, that is the point of ... arguments, is to not require us to know what arguments may be passed to it.
Consider, for example, the print generic in base R. It is defined as
print(x, ...)
So what are the arguments that can be passed to ...?
print.factor defines
print(x, quote = FALSE, max.levels = NULL,
width = getOption("width"), ...)
print.table defines
print(x, digits = getOption("digits"), quote = FALSE,
na.print = "", zero.print = "0", justify = "none", ...)
Notice that the print methods for factor and table objects don't share the same arguments. In fact, every print method may be defined with a different set of arguments. R then uses the class of the object to determine which set of arguments to apply to print.
When a developer creates a new print method, CRAN requires that all new methods contain at least the same arguments as the generic. So every print method has arguments x and ....
How do I know what arguments may be acceptable to ...?
First, read and follow the documentation. In glm, you find that the ... argument accepts arguments to "form the default control argument." This references the control argument, which then references the glm.control function. Opening ?glm.control shows the arguments epsilon, maxit and trace.
Another example, in ggplot2's geom_line, the documentation states that ... arguments are passed to the layer function. Use ?layer to see what arguments are available.
If the documentation simply specifies "to other methods," then you are probably looking at a method that is dispatched with different behaviors for different types of objects.

Related

To find valid argument for a function in R's help document (meaning of ...)

This question may seem basic but this has bothered me quite a while. The help document for many functions has ... as one of its argument, but somehow I can never get my head around this ... thing.
For example, suppose I have created a model say model_xgboost and want to make a prediction based on a dataset say data_tbl using the predict() function, and I want to know the syntax. So I look at its help document which says:
?predict
**Usage**
predict (object, ...)
**Arguments**
object a model object for which prediction is desired.
... additional arguments affecting the predictions produced.
To me the syntax and its examples didn't really enlighten me as I still have no idea what the valid syntax/arguments are for the function. In an online course it uses something like below, which works:
data_tbl %>%
predict(model_xgboost, new_data = .)
However, looking across the help doc I cannot find the new_data argument. Instead it mentioned newdata argument in its Details section, which actually didn't work if I displace the new_data = . with newdata = .:
Error in `check_pred_type_dots()`:
! Did you mean to use `new_data` instead of `newdata`?
My questions are:
How do I know exactly what argument(s) / syntax can be used for a function like this?
Why new_data but not newdata in this example?
I might be missing something here, but is there any reference/resource about how to use/interpret a help document, in plain English? (a lot of document, including R help file seem just give a brief sentence like "additional arguments affecting the predictions produced" etc)
#CarlWitthoft's answer is good, I want to add a little bit of nuance about this particular function. The reason the help page for ?predict is so vague is an unfortunate consequence of the fact that predict() is a generic method in R: that is, it's a function that can be applied to a variety of different object types, using slightly different (but appropriate) methods in each case. As such, the ?predict help page only lists object (which is required as the first argument in all methods) and ..., because different predict methods could take very different arguments/options.
If you call methods("predict") in a clean R session (before loading any additional packages) you'll see a list of 16 methods that base R knows about. After loading library("tidymodels"), the list expands to 69 methods. I don't know what class your object is (class("model_xgboost")), but assuming that it's of class model_fit, we look at ?predict.model_fit to see
predict(object, new_data, type = NULL, opts = list(), ...)
This tells us that we need to call the new data new_data (and, reading a bit farther down, that it needs to be "A rectangular data object, such as a data frame")
The help page for predict says
Most prediction methods which are similar to those for linear
models have an argument ‘newdata’ specifying the first place to
look for explanatory variables to be used for prediction
(emphasis added). I don't know why the parsnip authors (the predict.model_fit method comes from the parsnip package) decided to use new_data rather than newdata, presumably in line with the tidyverse style guide, which says
Use underscores (_) (so called snake case) to separate words within a name.
In my opinion this might have been a mistake, but you can see that the parsnip/tidymodels authors have realized that people are likely to make this mistake and added an informative warning, as shown in your example and noted e.g. here
Among other things, the existence of ... in a function definition means you can enter any arguments (values, functions, etc) you want to. There are some cases where the main function does not even use the ... but passes them to functions called inside the main function. Simple example:
foo <- function(x,...){
y <- x^2
plot(x,y,...)
}
I know of functions which accept a function as an input argument, at which point the items to include via ... are specific to the selected input function name.

In help files, is there any universal meaning or logic behind the "..." arguments?

I notice in different packages it sometimes refers to a variable, a column to sort by, an object, etc. Like in dplyr it usually refers to a variable, but in ggplot, it's not even used.
Is there any logic behind this? Is there any universality? Or can it just be anything.
It can accept an arbitrary number of formal parameters. In base R these parameters are often arguments to other functions. The Arguments description in the function's help page should indicate which subsequent functions will be getting these arguments. It can be the only argument, the first, the last, or it can be intercalated in the parameter list. The items in the list(...) generally should be named. You can access the items in the "dots" list in a couple of different ways: ...() and list(...) are two that I've seen. Generally, it is the R-cognoscenti that will be designing functions with these forms. When it is intercalated (or the first), the named parameters that follow it in the formals must be named when the function is called and cannot be assigned positionally. If you type ?'...' you will be shown the Reserved page, which in turn has a link to dotsMethods {methods}.
I found it difficult to search for [r] with "..." using a SO search panel but searching on "[r] dotsMethods" brought up 10 hits.

Function argument matching: by name vs by position

What is the difference between this lines of code?
mean(some_argument)
mean(x = some_argument)
The output is the same, but has the explicit mention of x any advantages?
People typically don't add argument names for commonly used arguments, such as the x in mean, but almost always refer to the na.rm arguments when removing missing values.
While neglecting the argument name makes for compact code, here are four (related) reasons for including the names of arguments rather than relying on their position.
Re-order arguments as needed. When you refer to the arguments by name, you can arbitrarily re-order the arguments and still produce the desired result. Sometimes it is useful to re-order your arguments. For example, when running a loop over one of the arguments, you might prefer to put the looped argument in the front of the function.
It is typically safer / more future-proof. As an example, if some user-written function or package re-orders the arguments in an update, and you relied on the positions of the arguments, this would break your code. In the best scenario, you would get an error. In the worst scenario the function would run, but would an incorrect result. Including the argument names greatly reduces the chances of running into either case.
For greater code clarity. If an argument is rarely used or you want to be explicit for future readers of your code (including you 2 months from now), adding the names can make for easier reading.
Ability to skip arguments. If you want to only change the third argument, then referring to it by name is probably preferable.
See also the R Language Definition: 4.3.2 Argument matching

How to check all the arguments of a function in R

Usually in html help of a package in R argument list, ends with
........ other arguments passed.
but how can we print all the arguments of a function in R.
If I understand your question correctly, then I tend to say: Most of the time, it is not possible to list all the possible arguments that may be passed under the ... 'section' of the function.
Please look at the very simple ?plot function. There, only two arguments, x and y are given on the help page. However, under ... there is a range of additional possible arguments. Most of them are (as the help page says) graphical parameters. In the ?lm case, I understand that the arguments in ... are passed to lm.fit and lm.wfit. Check those help pages to see which parameters these functions take.
I guess the main problem is that you have to check ALL functions, that arguments under ... may be passed to, to know ALL the possible arguments that may be passed under .... Since the number of functions may be very large, also the number of arguments working in ... may be very large. So we don't want to have them on the "top-level" help page.
I hope that made any sense...

Confused about R terminology: Attributes, parameters, and arguments

Once and for all I want to get the R terminology right. However, none of the books I was reading was of big help, and it seems to me the authors choose the names sometimes arbitrarily. So, my question is when exactly are the names "attribute", "parameter", and "argument" used?
From what I read and understood so far, a parameter is what a function can take as input. For example if I have a function that calculates the sum of two values, sum(value1, value2), 'value1' and 'value2' are the function's parameters.
If we are calling a function, we call the values passed to the function arguments. For the sum-function example, "23" and "48" would be the function arguments for:
sum(23,48).
So basically we call it parameter when we define a function, and we call it argument when we call the function (so the arguments are passed to the function's parameters)
But what about "attributes"? From what I understand, attributes are the equivalent of parameters in methods (and methods are functions of a class object)?
For example, if I would have something like:
heatmap(myData, Colv=NA, Rowv=NA)
... , would 'myData' be an argument or attribute? And what about Colv=NA and Rowv=NA? Isn't heatmap() a function and thus everything in the parentheses should be called arguments?
Suppose we have:
f <- function(x) x + 1
comment(f) <- "my function"
f(3)
Arguments We distinguish between formal arguments and actual arguments. In the above x is the formal argument to f. The names of the formal arguments of f are given by:
> names(formals(f))
[1] "x"
The actual arguments to a function vary from one call to another and in the above example there is a single actual argument 3.
The function args can be used to display the entire function signature of a function including the formal arguments and the default arguments and if you are debugging a function you can enter match.call() to list the function signature with the actual arguments substituted.
Attributes The attributes of an R object are given by attributes(f) like this:
> attributes(f)
$srcref
function(x) x + 1
$comment
[1] "my function"
There is one exception and that is that an object's class is also regarded as an attribute but is not given by the above but rather is given by class:
> class(f)
[1] "function"
Parameters Sometimes function arguments are referred to as parameters or sometimes one refers to those arguments which are fixed as parameters but this tends to be related more to mathematics and statistics than R.
In statistical models the model is typically a function of the data and the model parameters often via the likelihood. For example, here:
> lm(demand ~ Time, BOD)
Call:
lm(formula = demand ~ Time, data = BOD)
Coefficients:
(Intercept) Time
8.521 1.721
the linear regression coefficients of Intercept and Time (viz. 8.521 and 1.721) are often referred to as model parameters.
As Dwin has already pointed out the various values influencing graphics in R are also termed parameters and can be displayed via:
> par()
and the corresponding concepts in other R graphics systems are often also referred to as parameters.
I suppose colloquial use of the term "attribute" might refer to several features of data objects, but there is a very specific meaning in R. An attribute is a value returned by either the functions: attributes or attr. These are critical to the language in that classes and names are stored as attributes. There two other assignment functions: attributes<- and attr<- that allow additional attributes to be assigned in support of class specific objectives.
?attributes
?attr
There is a par function which sets graphical "parameters" that control the base graphics behavior. So that would be an R-specific use of parameter than might be slightly different than use of "argument" which is generally applied to the formal arguments to functions.
?par
The is a function args which applied to a function name or an anonymous function will return its arguments (as a "closure" which gets printed on the console just as a user would type during a function definition) along with their default values. The function formals will return the same "argument" information in the form of a list.
?args
?formals
I realize I am implicitly arguing with Matthew whose R skills are excellent. Contrary to him, I think that attributes and arguments have more specific meanings in the context of R and that careful authors will make an effort to keep their meanings separate. I would not have a problem understanding someone who uses parameter as a synonym for argument if the context were clearly a discussion of applying a function, since that is the typical parlance in mathematics. I would agree with the conclusion of your last sentence. Those are 'arguments' and most emphatically not attributes. The attributes of an object returned by heatmap are:
> attributes(hv) #from first example in ?heatmap
#$names
# [1] "rowInd" "colInd" "Rowv" "Colv"
But only some of the arguments became attributes and then only after being assigned to the returned value during the function execution.
I am not sure how analogous R is to Python, but I think most of the terms should be consistent across different languages. From what I read and learned in the last couple of days, a parameter is basically what a function takes as its input when you define it:
my_function <- function (param1, param2){
...
}
and it is called argument if you are invoking a function with certain input values (that are passed to the function as parameters):
my_function(arg1, arg2)
Functions that are part of a class are called method. And an attribute can be either a value or method associated with a class object (or so-called instance)
So the question whether we call something argument or attribute depends on what we are calling: a function or a method. But I would say now argument is an appropriate term if we call the heatmap function, for example:
heatmap(my_data)
Attribute : Object's properties, e.g. Person has String fName, lName;
Parameter: appears in function/method definition e.g. public void setName(fName, lName)
Argument: value passed for a method/function's parameter when invoking/calling the method/function e.g. myPerson.setName("Michael", "Jackson")

Resources