Writing function methods for passing GroupedDataFrame in Julia - julia

I have written a function like the following one:
gini(v::Array{<:Real,1}) = (2 * sum([x*i for (i,x) in enumerate(sort(v))]) / sum(sort(v)) - (length(v)+1))/(length(v))
This function works well when passing a Vector or a DataFrame. For example:
gini(collect(1:1:10))
# 0.3
or
using DataFrames # DataFrames v1.3.2
df = DataFrame(v = collect(1:1:10),
group = repeat([1, 2], 5))
combine(df, :v => gini)
#1×1 DataFrame
# Row │ v_gini
# │ Float64
#─────┼─────────
# 1 │ 0.3
However, unlike other functions that take vectors as an argument (e.g. Statistics.mean), it throws a MethodError when passing a GroupedDataFrame.
combine(groupby(df, :group), :v => gini)
# nested task error: MethodError: no method matching #gini(::SubArray{Int64, 1, Vector{Int64}, Tuple{SubArray{Int64, 1, #Vector{Int64}, Tuple{UnitRange{Int64}}, true}}, false})
# Closest candidates are:
# gini(::Vector{<:Real})
How can I write functions like the one above that work when passing a GroupedDataFrame?

You need to change method signature to:
gini(v::AbstractVector{<:Real})
The point is that combine passes a view of a vector (which does not have Vector type but SubArray). Therefore you need to allow any vectors by your function not just Vector.

Related

Disable partial name idenfication of function arguments

I am trying to make a function in R that outputs a data frame in a standard way, but that also allows the user to have the personalized columns that he deams necessary (the goal is to make a data format for paleomagnetic data, for which there are common informations that everybody use, and some more unusual that the user might like to keep in the format).
However, I realized that if the user wants the header of his data to be a prefix of one of the defined arguments of the data formating function (e.g. via the 'sheep' argument, that is a prefix of the 'sheepc' argument, see example below), the function interprets it as the defined argument (through partial name identification, see http://adv-r.had.co.nz/Functions.html#lexical-scoping for more details).
Is there a way to prevent this, or to at least give a warning to the user saying that he cannot use this name ?
PS I realize this question is similar to Disabling partial variable names in subsetting data frames, but I would like to avoid toying with the options of the future users of my function.
fun <- function(sheeta = 1, sheetb = 2, sheepc = 3, ...)
{
# I use the sheeta, sheetb and sheepc arguments for computations
# (more complex than shown below, but here thet are just there to give an example)
a <- sum(sheeta, sheetb)
df1 <- data.frame(standard = rep(a, sheepc))
df2 <- as.data.frame(list(...))
if(nrow(df1) == nrow(df2)){
res <- cbind(df1, df2)
return(res)
} else {
stop("Extra elements should be of length ", sheep)
}
}
fun(ball = rep(1,3))
#> standard ball
#> 1 3 1
#> 2 3 1
#> 3 3 1
fun(sheep = rep(1,3))
#> Error in rep(a, sheepc): argument 'times' incorrect
fun(sheet = rep(1,3))
#> Error in fun(sheet = rep(1, 3)) :
#> argument 1 matches multiple formal arguments
From the language definition:
If the formal arguments contain ‘...’ then partial matching is only
applied to arguments that precede it.
fun <- function(..., sheeta = 1, sheetb = 2, sheepc = 3)
{<your function body>}
fun(sheep = rep(1,3))
# standard sheep
#1 3 1
#2 3 1
#3 3 1
Of course, your function should have assertion checks for the non-... parameters (see help("stopifnot")). You could also consider adding a . or _ to their tags to make name collisions less likely.
Edit:
"would it be possible to achieve the same effect without having the ... at the beginning ?"
Yes, here is a quick example with one parameter:
fun <- function(sheepc = 3, ...)
{
stopifnot("partial matching detected" = identical(sys.call(), match.call()))
list(...)
}
fun(sheep = rep(1,3))
# Error in fun(sheep = rep(1, 3)) : partial matching detected
fun(ball = rep(1,3))
#$ball
#[1] 1 1 1

"Argument x" in length function

Am working through the section on vectors in "The Book on R", which has given the following examples:
length(x=c(3,2,8,1))
# [1] 4
length(x=5:13)
# [1] 9
foo <- 4
bar <- c(3,8.3,rep(x=32,times=foo),seq(from=-2,to=1,length.out=foo+1))
length(x=bar)
# [1] 11
But if the input length(c(3,2,8,1)) is going to give you the output 4 anyway, why would you add in x=? What is the purpose of x=? At first I thought it had to do with variables but R did not reflect that x was holding the vector (3,2,8,1) after I typed length(x=c(3,2,8,1)).
And why does length(y=c(5:13)) does not work but gives an error:
Error in length(y = 5:13) : supplied argument name 'y' does not match 'x'
R has named arguments for functions. Check this section of R's doc for some information on the subject.
So x is just the name that was given to the first argument of function length, it has nothing to do with any variable in your environment that may be named x.
Overall, it's a pretty handy feature:
it allows you to pass arguments in any order (if you use the arg = ... syntax)
the function's writer can give hints to users about what type of arguments are expected
combined with auto-completion, it helps to remember a function's syntax and usage
and it is optional, since you can also pass arguments without naming them:
'
matrix(data = 1:12, ncol = 3) # is equivalent to:
matrix(1:12,,3)
You can also use it to write some really confusing stuff (of course, not recommended), such as:
x <- 1:3
length(x = x) # 3
length(x = (x <- 1:4)) # 4 ...
x # 1 2 3 4

Concerning R, when defining a Replacement Function, do the arguments have to be named as/like "x" and "value"?

By "replacement functions" I mean those mentioned in this thread What are Replacement Functions in R?, ones that look like 'length<-'(x, value). When I was working with such functions I encountered something weird. It seems that a replacement function only works when variables are named according to a certain rule.
Here is my code:
a <- c(1,2,3)
I will try to change the first element of a, using one of the 3 replacement functions below.
'first0<-' <- function(x, value){
x[1] <- value
x
}
first0(a) <- 5
a
# returns [1] 5 2 3.
The first one works pretty well... but then when I change the name of arguments in the definition,
'first1<-' <- function(somex, somevalue){
somex[1] <- somevalue
somex
}
first1(a) <- 9
# Error in `first1<-`(`*tmp*`, value = 9) : unused argument (value = 9)
a
# returns [1] 5 2 3
It fails to work, though the following code is OK:
a <- 'first1<-'(a, 9)
a
# returns [1] 9 2 3
Some other names work well, too, if they are similar to x and value, it seems:
'first2<-' <- function(x11, value11){
x11[1] <- value11
x11
}
first2(a) <- 33
a
# returns [1] 33 2 3
This doesn't make sense to me. Do the names of variables actually matter or did I make some mistakes?
There are two things going on here. First, the only real rule of replacement functions is that the new value will be passed as a parameter named value and it will be the last parameter. That's why when you specify the signature function(somex, somevalue), you get the error unused argument (value = 9) and the assignment doesn't work.
Secondly, things work with the signature function(x11, value11) thanks to partial matching of parameter names in R. Consider this example
f<-function(a, value1234=5) {
print(value1234)
}
f(value=5)
# [1] 5
Note that 5 is returned. This behavior is defined under argument matching in the language definition.
Another way to see what's going on is to print the call signature of what's actually being called.
'first0<-' <- function(x, value){
print(sys.call())
x[1] <- value
x
}
a <- c(1,2,3)
first0(a) <- 5
# `first0<-`(`*tmp*`, value = 5)
So the first parameter is actually passed as an unnamed positional parameter, and the new value is passed as the named parameter value=. This is the only parameter name that matters.

Retrieving arguments of a function call with default values in R

Is there a way to retrieve function arguments from an evaluated formula that are not specified in the function call?
For example, consider the call seq(1, 10). If I wanted to get the first argument, I could use quote() and simply use quote(seq(1,10))[[1]]. However, this only works if the argument is defined at the function call (instead of having a default value) and I need to know its exact position.
In this example, is there some way to get the by argument from seq(1, 10) without a lengthy list of if statements to see if it is defined?
The first thing to note is that all of the named arguments you're after (from, to, by, etc.) belong to seq.default(), the method that is dispatched by your call to seq(), and not to seq() itself. (seq() itself only has one formal, ...).
From there you can use these two building blocks
## (1) Retrieves pairlist of all formals
formals(seq.default)
# [long pairlist object omitted to save space]
## (2) Matches supplied arguments to formals
match.call(definition = seq.default, call = quote(seq.default(1,10)))
# seq.default(from = 1, to = 10)
to do something like this:
modifyList(formals(seq.default),
as.list(match.call(seq.default, quote(seq.default(1,10))))[-1])
# $from
# [1] 1
#
# $to
# [1] 10
#
# $by
# ((to - from)/(length.out - 1))
#
# $length.out
# NULL
#
# $along.with
# NULL
#
# $...

Function to apply arbitrary functions in R

How would one implement in R the function apply.func(func, arg.list), which takes an arbitrary function func and a suitable list arg.list as arguments, and returns the result of calling func with the arguments contained in arg.list. E.g.
apply.func(foo, list(x="A", y=1, z=TRUE))
is equivalent to
foo(x="A", y=1, z=TRUE)
Thanks!
P.S. FWIW, the Python equivalent of apply.func would be something like
def apply_func(func, arg_list):
return func(*arg_list)
or
def apply_func(func, kwarg_dict):
return func(**kwarg_dict)
or some variant thereof.
I think do.call is what you're looking for. You can read about it via ?do.call.
The classic example of how folks use do.call is to rbind data frames or matrices together:
d1 <- data.frame(x = 1:5,y = letters[1:5])
d2 <- data.frame(x = 6:10,y = letters[6:10])
do.call(rbind,list(d1,d2))
Here's another fairly trivial example using sum:
do.call(sum,list(1:5,runif(10)))
R allows functions to be passed as arguments to functions. This means you can define apply.func as follows (where f is a function and ... indicates all other parameters:
apply.func <- function(f, ...)f(...)
You can then use apply.func to and specify any function where the parameters makes sense:
apply.func(paste, 1, 2, 3)
[1] "1 2 3"
apply.func(sum, 1, 2, 3)
[1] 6
However, note that the following may not produce the results you expected, since mean takes a vector as an argument:
apply.func(mean, 1, 2, 3)
[1] 1
Note that there is also a base R function called do.call which effectively does the same thing.

Resources