Use data.frame in custom function? - r

Often functions that work with data.frames have the ability to let the user provide a dataset, so that the user can use its columns in a straight forward way. E.g.:
lm(mpg~cyl+gear,data=mtcars)
Instead of using mtcars$cyl in the formula, we can simply use cyl. How can I implement such behavior in custom built functions?

There are several different techniques for this, described in Standard nonstandard valuation rules.

Related

Is is possible to use function overloading in R?

Is it possible to use function overloading in R? I've seen answers to this question from 2013, but the language and its possibilities evolved a lot over time.
In my specific case, I'd like the following:
formFeedback <- function(success, message)
formFeedback <- function(feedback_object)
Is this at all possible? If not, what's the best practice? Should I use two different named functions, should I make all parameters optional, or should I force the users to always pass an object and make the other one impossible?
No, there's no function overloading in R, though there are a few object systems that accomplish similar things. The simplest one is S3: it dispatches based on the class attribute of the first argument (though there are a few cases where it looks at other arguments).
So if your feedback_object has a class attribute, you could make your first form the default and your second one the method for that class, but you'd have to rename the first argument to match in all methods. The functions would need to have different names, formFeedback.default and formFeedback.yourclass.
There's also a more complicated system called S4 that can dispatch on the types of all arguments.
And you can craft your own dispatch if you like (and this is the basis for several other object systems in packages), using whatever method you like to decide which function to call.

Adding a Base function vs using a unique function name in Julia

I am porting a library to Julia and am curious what would be considered best practice for adding Base functions from a module.
This library contains functions like values(t::MyType) and keys(t::MyType) that take unique struct types but do not really do the same thing or return the same types as the Base functions
What would be the best practice in this case?
Just add Base.values(t::MyType) and Base.keys(t::MyType) functions so they can be used without prefixes.
Change the function names to my_type_keys(t::MyType) and my_type_values(t::MyType)
Use their original names and require them to be prefixed as MyModule.values(t) and MyModule.keys(t)
If you extend Base functions you should aim for them to do conceptually the same thing. Also, you should only extend Base functions to dispatch on types you define in your own package. The rule is that you can define methods for outside (e.g. Base or some other package's) functions for your own types. Or define you own functions for outside types. But defining methods for outside functions on outside types is "type piracy".
Given that you'd be defining them on your own types, it's not a problem that the return values differ, as long as the functions are conceptualy the same.
With regards to option 2 or 3 you can do both. Option 3 requires that you don't import the Base functions explicitly (in that case you'd be extending them by defining new methods rather than defining new functions with the same name), that you don't export them, and most likely that you'll not be using the Base functions inside your module (keys is really widely used for example - you can use it but would have to preface it with Base. inside the module).
Option 2 is always safe, especially if you can find a better name than my_keys. BUT I think it is preferable to use the same name IF the function is doing conceptually the same thing, as it simplifies the user experience a lot. You can assume most users will know the Base functions and try them out if that's intuitive.

What is select_at() for?

I understand how to use dplyr::select_if() and dplyr::mutate_at(). But I don't understand what dplyr::select_at() provides that a basic select() doesn't provide.
As far as I understand, the verb_at() functions allow you to utilize the select helper functions (like matches() and starts_with()). But select() already uses the select helpers--so why would you use select_at() instead of just select()?
The primary benefit of select_at() (as opposed to the vanilla select()) is that it provides an .funs= parameter so that you can use a function, eg. toupper() to rename files as you select them.
This makes a ton of sense for something like rename_at(). Providing similar functionality with select_at() makes sense from a tidyverse-style "everything works the same" perspective.

Fuzzy systems with same output from different rules

I'm new in fuzzy logic modeling. I'm using the package "sets" in R. Starting form a database of crisps variables of 8 input varialbes and 1 output variable, I performed the fuzzyfication and I assigned a membership function to each variables (inputs and output).
I'm now stucking with the definition of the fuzzy rules.
I would like to ask if we have the same consequent from different rules, how are these rules processed?
I read that it's possible for this problem to assign a weight to each rule? Is it the correct way to proceed?
There's some one how has already experiences with this issue?
Thanks a lot in advance.
Michele
it is possible to assign rules with two different antecedents to the same output. I have previous Matlab experience, not R, but it is the same principle everywhere and the system is evaluated like a charm. However, be careful if you want to use neurofuzzy network (ANFIS), because it doesn't allow the particular characteristic.

Convention for combining GET parameters with AND?

I'm designing an API and I want to allow my users to combine a GET parameter with AND operators. What's the best way to do this?
Specifically I have a group_by parameter that gets passed to a Mongo backend. I want to allow users to group by multiple variables.
I can think of two ways:
?group_by=alpha&group_by=beta
or:
?group_by=alpha,beta
Is either one to be preferred? I've consulted a few API design references but no-one seems to have a view on this.
There is no strict preference. The advantage to the first approach is that many frameworks will turn group_by into an array or similar structure for you, whereas in the second approach you need to parse out the values yourself. The second approach is also less verbose, which may be relevant if your query string is particularly large.
You may also want to test with the first approach that the query strings always come into your framework in the order the client sent them. Some frameworks have a bug where that doesn't happen.

Resources