Suppressing parentheses in a function that uses the pipe operator - r

Can somebody explain why I have to suppress the parentheses for the function is.factor in the command shown below? Student-data was read from a .csv file. I can see the structure of Student-data and I want to select only the factor variables. The command works fine but I cannot see why I cannot write the parentheses. I saw an example in the forum. Sorry if the question is silly or has been asked before. I could not find any similar question.
studentData%>%select_if(is.factor)

It's not the pipe, %>%, that requires you to "drop the brackets, it's select_if. From the documentation:
.predicate "A predicate function to be applied to the columns or a logical vector. The variables for which .predicate is or returns TRUE are selected. This argument is passed to rlang::as_function() and thus supports quosure-style lambda functions and strings representing function names."
You're not evaluating the function here. You're passing an R object. (Functions are objects, just as data.frames or scalars are). The evaluation happens later, in the guts of select_if. Including the brackets would tell R to evaluate the function at the time the select_if call was executed. That's not correct. It needs to be evaluated later.

Related

Is attributes() a function in R?

Help files call attributes() a function. Its syntax looks like a function call. Even class(attributes) calls it a function.
But I see I can assign something to attributes(myobject), which seems unusual. For example, I cannot assign anything to log(myobject).
So what is the proper name for "functions" like attributes()? Are there any other examples of it? How do you tell them apart from regular functions? (Other than trying supposedfunction(x)<-0, that is.)
Finally, I guess attributes() implementation overrides the assignment operator, in order to become a destination for assignments. Am I right? Is there any usable guide on how to do it?
Very good observation Indeed. It's an example of replacement function, if you see closely and type apropos('attributes') in your R console, It will return
"attributes"
"attributes<-"
along with other outputs.
So, basically the place where you are able to assign on the left sign of assignment operator, you are not calling attributes, you are actually calling attributes<- , There are many functions in R like that for example: names(), colnames(), length() etc. In your example log doesn't have any replacement counterpart hence it doesn't work the way you anticipated.
Definiton(from advanced R book link given below):
Replacement functions act like they modify their arguments in place,
and have the special name xxx<-. They typically have two arguments (x
and value), although they can have more, and they must return the
modified object
If you want to see the list of these functions you can do :
apropos('<-$') and you can check out similar functions, which has similar kind of properties.
You can read about it here and here
I am hopeful that this solves your problem.

How to use apply() with my function

bmi<-function(x,y){
(x)/((y/100)^2)
}
bmi(70,177) it can work
but with apply() it does't work
apply(Student,1:2,bmi(Student$weight,Student$height))
Error in match.fun(FUN) :
'bmi(Student$weight, Student$height)' is not a function, character or symbol
It's a bit unclear what the goal is. If it's just to get an answer, then the comments do answer it. If on the other hand, the goal is to understand what you are doing wrong, then read on. I'd say the first error going from left to right is passing the whole dataframe. I would have only passed the 'height' and 'weight' columns.
The next error, again going from left to right, is the use of 1:2 as the second argument to apply. You obviously want to do this "by rows" which mean you should use only 1, i.e. the first dimension of the dataframe.
And the third error is using a function call rather than the function name. Functions with arguments in parentheses don't work when an R function (meaning apply in this case) is expecting a function name or an anonymous function as illustrated in comments.
Fourth error is not assigning the value to a column in your dataframe. So this probably would have succeeded in making the desired extra column via the apply method. But, as noted in comments this is not the most efficient method.:
Student$bmi_val <- apply(Student[ ,c("weight", "height")], bmi)
# didn't want my column name to be the same as the function name
The apply function was actually designed to work with matrices and arrays, so for many purposes it is ill-suited when used with dataframes. In this case where all the arguments to the bmi function are numeric and you can control the order of argument in the first argument to match the x and y positions, it's arguably an acceptable strategy, but not most R-ish method. When working with dates or factor variables, you should definitely avoid apply.

Function argument matching: by name vs by position

What is the difference between this lines of code?
mean(some_argument)
mean(x = some_argument)
The output is the same, but has the explicit mention of x any advantages?
People typically don't add argument names for commonly used arguments, such as the x in mean, but almost always refer to the na.rm arguments when removing missing values.
While neglecting the argument name makes for compact code, here are four (related) reasons for including the names of arguments rather than relying on their position.
Re-order arguments as needed. When you refer to the arguments by name, you can arbitrarily re-order the arguments and still produce the desired result. Sometimes it is useful to re-order your arguments. For example, when running a loop over one of the arguments, you might prefer to put the looped argument in the front of the function.
It is typically safer / more future-proof. As an example, if some user-written function or package re-orders the arguments in an update, and you relied on the positions of the arguments, this would break your code. In the best scenario, you would get an error. In the worst scenario the function would run, but would an incorrect result. Including the argument names greatly reduces the chances of running into either case.
For greater code clarity. If an argument is rarely used or you want to be explicit for future readers of your code (including you 2 months from now), adding the names can make for easier reading.
Ability to skip arguments. If you want to only change the third argument, then referring to it by name is probably preferable.
See also the R Language Definition: 4.3.2 Argument matching

Why is (...) useful in R? [duplicate]

This question already has answers here:
How to use R's ellipsis feature when writing your own function?
(5 answers)
Closed 8 years ago.
I'm trying to understand ... and/or (...) in R. I understand that somehow this is used for entering unknown or multiple parameters to a function, but when is this necessary and/or useful? I've searched rdocumentation for it, but found nothing. In the R Language Definition it is defined but in very abstract terms.
Hence I ask: why is ... useful? Is it just not sloppy coding? Wouldn't it be better to pass arguments explicitly?
It's called varargs (short for variadic arguments).
when is this necessary?
It's not strictly necessary, and it's not inherently sloppy. The "non-sloppy" alternative to varargs in any language is to pass an array or list as a single variable. So varargs is just syntactic sugar on top of a collection of things.
[when is it] useful?
Any time you want to save a few keystrokes and implicitly construct a list when calling a function.
Wouldn't it be better to pass arguments explicitly?
Depends. What are your criteria?
The ellipsis gives you the possibility to define functions with an unknown number of arguments/parameters.
It necessary for functions like c or list where the number of arguments given by the user is unknown.
If you type c or list in the R console you'll see that both of these functions use ... as arguments.
In these two cases it would be hard to pass arguments explicitly since the user can pass as many arguments as needed.
You can look at this post for more examples

lapply-ing with the "$" function

I was going through some examples in hadley's guide to functionals, and came across an unexpected problem.
Suppose I have a list of model objects,
x=1:3;y=3:1; bah <- list(lm(x~y),lm(y~x))
and want to extract something from each (as suggested in hadley's question about a list called "trials"). I was expecting one of these to work:
lapply(bah,`$`,i='call') # or...
lapply(bah,`$`,call)
However, these return nulls. It seems like I'm not misusing the $ function, as these things work:
`$`(bah[[1]],i='call')
`$`(bah[[1]],call)
Anyway, I'm just doing this as an exercise and am curious where my mistake is. I know I could use an anonymous function, but think there must be a way to use syntax similar to my initial non-solution. I've looked through the places $ is mentioned in ?Extract, but didn't see any obvious explanation.
I just realized that this works:
lapply(bah,`[[`,i='call')
and this
lapply(bah,function(x)`$`(x,call))
Maybe this just comes down to some lapply voodoo that demands anonymous functions where none should be needed? I feel like I've heard that somewhere on SO before.
This is documented in ?lapply, in the "Note" section (emphasis mine):
For historical reasons, the calls created by lapply are unevaluated,
and code has been written (e.g. bquote) that relies on this. This
means that the recorded call is always of the form FUN(X[[0L]],
...), with 0L replaced by the current integer index. This is not
normally a problem, but it can be if FUN uses sys.call or
match.call or if it is a primitive function that makes use of the
call. This means that it is often safer to call primitive functions
with a wrapper, so that e.g. lapply(ll, function(x) is.numeric(x))
is required in R 2.7.1 to ensure that method dispatch for is.numeric
occurs correctly.

Resources