Function argument matching: by name vs by position

Function argument matching: by name vs by position - r

What is the difference between this lines of code?
mean(some_argument)
mean(x = some_argument)
The output is the same, but has the explicit mention of x any advantages?

People typically don't add argument names for commonly used arguments, such as the x in mean, but almost always refer to the na.rm arguments when removing missing values.
While neglecting the argument name makes for compact code, here are four (related) reasons for including the names of arguments rather than relying on their position.
Re-order arguments as needed. When you refer to the arguments by name, you can arbitrarily re-order the arguments and still produce the desired result. Sometimes it is useful to re-order your arguments. For example, when running a loop over one of the arguments, you might prefer to put the looped argument in the front of the function.
It is typically safer / more future-proof. As an example, if some user-written function or package re-orders the arguments in an update, and you relied on the positions of the arguments, this would break your code. In the best scenario, you would get an error. In the worst scenario the function would run, but would an incorrect result. Including the argument names greatly reduces the chances of running into either case.
For greater code clarity. If an argument is rarely used or you want to be explicit for future readers of your code (including you 2 months from now), adding the names can make for easier reading.
Ability to skip arguments. If you want to only change the third argument, then referring to it by name is probably preferable.
See also the R Language Definition: 4.3.2 Argument matching

Related

Why doesn't R throw an error when I use only the initial part of my column name in a data frame?

I have a data frame containing various columns along with sender_bank_flag. I ran the below two queries on my data frame.
sum(s_50k_sample$sender_bank_flag, na.rm=TRUE)
sum(s_50k_sample$sender_bank, na.rm=TRUE)
I got the same output from both the queries even though there is no such column as sender_bank in my data frame. I expected to get an error for the second code. Didn't know R has such a functionality! Does anyone know what exactly is this functionality & how can it be better utilized?

Probably worthwhile to augment all comments into an answer.
Both my comment and BenBolker's point to doc page ?Extract:
Under Recursive (list-like) objects:
Both "[[" and "$" select a single element of the list. The main difference is that "$" does not allow computed indices, whereas "[[" does. x$name is equivalent to x[["name", exact = FALSE]]. Also, the partial matching behavior of "[[" can be controlled using the exact argument.
Under Character indices:
Character indices can in some circumstances be partially matched (see ?pmatch) to the names or dimnames of the object being subsetted (but never for subassignment). Unlike S (Becker et al p. 358), R never uses partial matching when extracting by "[", and partial matching is not by default used by "[[" (see argument exact).
Thus the default behaviour is to use partial matching only when extracting from recursive objects (except environments) by "$". Even in that case, warnings can be switched on by options(warnPartialMatchDollar = TRUE).
Note, the manual has rich information, and make sure you fully digest them. I formatted the content, adding Stack Overflow threads behind where relevant.
Links provided by phiver's comment are worth reading in a long term.

In help files, is there any universal meaning or logic behind the "..." arguments?

I notice in different packages it sometimes refers to a variable, a column to sort by, an object, etc. Like in dplyr it usually refers to a variable, but in ggplot, it's not even used.
Is there any logic behind this? Is there any universality? Or can it just be anything.

It can accept an arbitrary number of formal parameters. In base R these parameters are often arguments to other functions. The Arguments description in the function's help page should indicate which subsequent functions will be getting these arguments. It can be the only argument, the first, the last, or it can be intercalated in the parameter list. The items in the list(...) generally should be named. You can access the items in the "dots" list in a couple of different ways: ...() and list(...) are two that I've seen. Generally, it is the R-cognoscenti that will be designing functions with these forms. When it is intercalated (or the first), the named parameters that follow it in the formals must be named when the function is called and cannot be assigned positionally. If you type ?'...' you will be shown the Reserved page, which in turn has a link to dotsMethods {methods}.
I found it difficult to search for [r] with "..." using a SO search panel but searching on "[r] dotsMethods" brought up 10 hits.

In R: Why is there no complete list of every argument a function can use?

Im using R for about 3 years and one of the main advantages (in my opinion) is the wide range of questions and assistance one can find on stackoverflow and similar websites.
One thing that is missing and kind of annoys me is an entire list of every single argument a function can use (plus possible values of those arguments).
For example: In R documentation all "main" arguments are listed and in many cases the documentation says "... further arguments passed to or from other methods". How can I know which arguments are meant by "..."?
When searching on stackoverflow for a way to get my desired result of an analysis I sometimes stumble about these additional arguments which can be very helpful in many cases. It still takes much time to find these arguments hidden in other users answers. Sometimes I used a workaround which would have been unnecessary if I had known some additional function arguments.
Is anyone else experiencing the same thing?
(It's difficult to mention examples but I remember having that trouble when using the leaflet functions for the first time.)
Tim

The most direct answer is that we often don't know what arguments one might want to pass to .... In fact, that is the point of ... arguments, is to not require us to know what arguments may be passed to it.
Consider, for example, the print generic in base R. It is defined as
print(x, ...)
So what are the arguments that can be passed to ...?
print.factor defines
print(x, quote = FALSE, max.levels = NULL,
width = getOption("width"), ...)
print.table defines
print(x, digits = getOption("digits"), quote = FALSE,
na.print = "", zero.print = "0", justify = "none", ...)
Notice that the print methods for factor and table objects don't share the same arguments. In fact, every print method may be defined with a different set of arguments. R then uses the class of the object to determine which set of arguments to apply to print.
When a developer creates a new print method, CRAN requires that all new methods contain at least the same arguments as the generic. So every print method has arguments x and ....
How do I know what arguments may be acceptable to ...?
First, read and follow the documentation. In glm, you find that the ... argument accepts arguments to "form the default control argument." This references the control argument, which then references the glm.control function. Opening ?glm.control shows the arguments epsilon, maxit and trace.
Another example, in ggplot2's geom_line, the documentation states that ... arguments are passed to the layer function. Use ?layer to see what arguments are available.
If the documentation simply specifies "to other methods," then you are probably looking at a method that is dispatched with different behaviors for different types of objects.

Why do the R functions mean() and sum() behave differently with vectors vs. raw strings?

I was wondering if there was an underlying programming logic as to why some basic R functions behave differently towards raw data input into them vs. vectors.
For example, if I do this
mean(1,2,3)
I don't get the correct answer, and don't get an error
But if I do this
sum(1,2,3)
I do get the right answer, even though I'd assume proper syntax would be sum(c(1,2,3))
And if I do this
sd(1,2,3)
I get an error Error in sd(1, 2, 3) : unused argument (3)
I'm interested into what, if any, the underlying programming logic of these different behaviors are. (I'm sure if I rooted around in the source code I could figure out exactly why they behave differently, but I want to know if there is a reason why the code might have been written that way).
Practically, I'm teaching a basic R class and want to explain to my students why things work that way; they get a bit tired of me saying "That's just how R works, live with it; and always put things in vectors to make life easy."
EDITS: I have bolded some sections to add emphasis. My question is largely about software design, not how these particular function happen to operate or how to determine their exact operation. That is, not "what arguments do these functions accept" but "why do simple mathematical functions in R appear (to a biologist) to have been designed differently".

the second argument taken by mean is trim, which is not a listed argument for sum. the first argument for sum is \dots, so, I believe, the function will try to compute the sum of all values entered as unnamed arguments.
mean and sum are generic functions, so they get deployed differently depending on an object's class.

What is the impact of not calling the arguments while calling a function [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
My question is about the difference between two ways to pass the the arguments of a function
for instance
function1(obj1, obj2, obj3, obj4, obj5)
or
function1(arg1=obj1, arg2=obj2, arg3=obj3, arg4=obj4, arg5=obj5)
Is there a rule/convention/document for that?
I can see at least 2 situations where the first way is not great
If we want to add a new argument, we are forced to add it at the end of the list, which may not great for the common sense (as I like to group arguments that goes together)
The arguments with default have to be put at the end of the list, otherwise you have to input it even if you use the default value.
Any ideas on that?

For me the issue is simple: reproducible results require reproducible and explicit function calls.
In my case, I use named arguments having learned that another person may insert a new parameter in their function if they so choose, which caused my code to break.
I also tend to store parameters in a list and use these when calling a function, e.g. someCrazyFunction(stuff = stuff, eps = Par$eps, tol = Par$tol, verbose = Par$verbose, strict = Par$strict, debug = Par$debug)
If I don't do this, I am not doing my part to ensure reproducible results. It is only a few keystrokes and I don't have to worry if the author of the function or package moves arguments around, inserts new arguments, deletes some arguments (which I'll notice because R will tell me that some object is not needed), or otherwise makes seemingly harmless changes. If they make such a change, then how can someone else who looks at my code be sure of how to reproduce the same call as it was at the time I made it?
Lesson: Debugging is far more painful than the few keystrokes needed to ensure reproducibility.
(Minor update) This question & the selected answer from elsewhere on SO exemplifies a particular aspect of this implicit contract between the package creator and the person with a dependency on a package. If I develop a dependency on a given function and the author simply shuffles the arguments, then my code should work perfectly regardless. They made no explicit contract to not move things around, and I can assume no implicit contract that it will behave that way. I only assume that they will not change the definitions of arguments.

From a function implementer's point of view, you must always add new parameters to the end and name them so they don't have a prefix in common with existing arguments.
This is because people are free to use positional matching and partial names. A fact of R life...

Function arguments in R can be matched via position or by name and you as the person punching things into the keyboard are afforded some flexibility in how you decide to use or abuse that. One of the immediate benefits to using the named arguments is that you can change the order of the arguments within the function as you see fit. i.e.
function1(arg1=obj1, arg2=obj2, arg3=obj3, arg4=obj4, arg5=obj5)
and
function1(arg5=obj5, arg4=obj4, arg3=obj3, arg2=obj2, arg1=obj1)
will evaluate in the same fashion, while
function1(obj1, aobj2, obj3, obj4, obj5)
and
function1(obj5, aobj4, obj3, obj2, obj1)
will not. Function arguments can also be partially matched and are matched using the following criteria:
exact match for named argument
partial match for named argument
positional match
This can obviously lead to some unintended consequences if you aren't careful with the partial matching. I believe that if an argument is matched by name, it is "removed" from the positional search, though I can't find a specific reference for that at the moment. As a note of common use, I tend to see people use the positional matching for the first argument in a function, and then specify others that may be optional afterwords by name. Again, this is mostly personal convention and habit as far as I'm concerned.

In functional programming like R, there can be thirty or something parameters for a function. In that case, argument names are handy with default parameter values.
Other than that, especially for short list of parameters, argument names do not make good sense.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex