custom ordering in Julia SortedSet - julia

In the Julia documentation for SortedSet, there is a reference to "ordering objects", which can be used in the constructor. I'm working on a project where I need to implement a custom sort on a set of structs. I'd like to use a functor for this, since there is additional state I need for my comparisons.
Here is a somewhat simplified version of the problem I want to solve. I have two structs, Point and Edge:
struct Point{T<:Real}
x::T
y::T
end
struct Edge{T<:Real}
first::Point{T}
second::Point{T}
end
I have a Point called 'vantage', and I want to order Edges by their distance from 'vantage'. Conceptually:
function edge_ordering(vantage::Point, e1::Edge, e2::Edge)
d1 = distance(vantage, e1)
d2 = distance(vantage, e2)
return d1 < d2
end
Are "ordering objects" functors (or functor-ish)? Is there some other conventional way of doing this sort of ordering in Julia?

An Ordering object can contain fields, you can store your state there. This is an example of a Remainder Ordering which sort integers by it's remainder:
using DataStructures
struct RemainderOrdering <: Base.Order.Ordering
r::Int
end
import Base.Order.lt
lt(o::RemainderOrdering, a, b) = isless(a % o.r, b % o.r)
SortedSet(RemainderOrdering(3), [1,2,3]) # 3, 1, 2
I'm not sure how it is related to functors, so I may misunderstand your question. This is an alternative implementation that defines an Ordering functor. I made explanations in comments.
using DataStructures
import Base: isless, map
struct Foo # this is your structure
x::Int
end
struct PrimaryOrdered{T, F} # this is the functor, F is the additional state.
x::T
end
map(f::Base.Callable, x::T) where {T <: PrimaryOrdered} = T(f(x.x)) # this makes it a functor?
isless(x::PrimaryOrdered{T, F}, y::PrimaryOrdered{T, F}) where {T, F} =
F(x.x) < F(y.x) # do comparison with your additional state, here I assume it is a closure
const OrderR3 = PrimaryOrdered{Foo, x -> x.x % 3} # a order that order by the remainder by 3
a = OrderR3(Foo(2))
f(x::Foo) = Foo(x.x + 1) # this is a Foo -> Foo
a = map(f, a) # you can map f on a OrderR3 object
a == OrderR3(Foo(33)) # true
a = map(OrderR3 ∘ Foo, [1, 2, 3])
s = SortedSet(a)
map(x->x.x, s) # Foo[3, 1, 2]
As always, an MWE is important for a question to be understood better. You can include a piece of code to show how you want to construct and use your SortedSet, instead of the vague "state" and "functor".

The sorting is based on the method isless for the type. So for instance if you have a type in which you want to sort on the b field. For instance you can do
struct Foo{T}
a::T
b::T
end
Base.:isless(x::T,y::T) where {T<:Foo} = isless(x.b,y.b)
s=[Foo(1,2),Foo(2,-1)]
res=SortedSet(s)
#SortedSet(Foo[Foo(2, -1), Foo(1, 2)],
#Base.Order.ForwardOrdering())
Tuples are also sorted in order, so you can also use
sort(s,by=x->(x.b,x.a)) to sort by b,thena without having to define isless for the type.

Related

How to dispatch based on the type of any of the splatted args?

Consider an existing function in Base, which takes in a variable number of arguments of some abstract type T. I have defined a subtype S<:T and would like to write a method which dispatches if any of the arguments is my subtype S.
As an example, consider function Base.cat, with T being an AbstractArray and S being some MyCustomArray <: AbstractArray.
Desired behaviour:
julia> v = [1, 2, 3];
julia> cat(v, v, v, dims=2)
3×3 Array{Int64,2}:
1 1 1
2 2 2
3 3 3
julia> w = MyCustomArray([1,2,3])
julia> cat(v, v, w, dims=2)
"do something fancy"
Attempt:
function Base.cat(w::MyCustomArray, a::AbstractArray...; dims)
pritnln("do something fancy")
end
But this only works if the first argument is MyCustomArray.
What is an elegant way of achieving this?
I would say that it is not possible to do it cleanly without type piracy (but if it is possible I would also like to learn how).
For example consider cat that you asked about. It has one very general signature in Base (actually not requiring A to be AbstractArray as you write):
julia> methods(cat)
# 1 method for generic function "cat":
[1] cat(A...; dims) in Base at abstractarray.jl:1654
You could write a specific method:
Base.cat(A::AbstractArray...; dims) = ...
and check if any of elements of A is your special array, but this would be type piracy.
Now the problem is that you cannot even write Union{S, T} as since S <: T it will be resolved as just T.
This would mean that you would have to use S explicitly in the signature, but then even:
f(::S, ::T) = ...
f(::T, ::S) = ...
is problematic and a compiler will ask you to define f(::S, ::S) as the above definitions lead to dispatch ambiguity. So, even if you wanted to limit the number of varargs to some maximum number you would have to annotate types for all divisions of A into subsets to avoid dispatch ambiguity (which is doable using macros, but grows the number of required methods exponentially).
For general usage, I concur with Bogumił, but let me make an additional comment. If you have control over how cat is called, you can at least write some kind of trait-dispatch code:
struct MyCustomArray{T, N} <: AbstractArray{T, N}
x::Array{T, N}
end
HasCustom() = Val(false)
HasCustom(::MyCustomArray, rest...) = Val(true)
HasCustom(::AbstractArray, rest...) = HasCustom(rest...)
# `IsCustom` or something would be more elegant, but `Val` is quicker for now
Base.cat(::Val{true}, args...; dims) = println("something fancy")
Base.cat(::Val{false}, args...; dims) = cat(args...; dims=dims)
And the compiler is cool enough to optimize that away:
julia> args = (v, v, w);
julia> #code_warntype cat(HasCustom(args...), args...; dims=2);
Variables
#self#::Core.Compiler.Const(cat, false)
#unused#::Core.Compiler.Const(Val{true}(), false)
args::Tuple{Array{Int64,1},Array{Int64,1},MyCustomArray{Int64,1}}
Body::Nothing
1 ─ %1 = Main.println("something fancy")::Core.Compiler.Const(nothing, false)
└── return %1
If you don't have control over calls to cat, the only resort I can think of to make the above technique work is to overdub methods containing such call, to replace matching calls by the custom implementation. In which case you don't even need to overload cat, but can directly replace it by some mycat doing your fancy stuff.

Evaluate expression with local variables

I'm writing a genetic program in order to test the fitness of randomly generated expressions. Shown here is the function to generate the expression as well a the main function. DIV and GT are defined elsewhere in the code:
function create_single_full_tree(depth, fs, ts)
"""
Creates a single AST with full depth
Inputs
depth Current depth of tree. Initially called from main() with max depth
fs Function Set - Array of allowed functions
ts Terminal Set - Array of allowed terminal values
Output
Full AST of typeof()==Expr
"""
# If we are at the bottom
if depth == 1
# End of tree, return function with two terminal nodes
return Expr(:call, fs[rand(1:length(fs))], ts[rand(1:length(ts))], ts[rand(1:length(ts))])
else
# Not end of expression, recurively go back through and create functions for each new node
return Expr(:call, fs[rand(1:length(fs))], create_single_full_tree(depth-1, fs, ts), create_single_full_tree(depth-1, fs, ts))
end
end
function main()
"""
Main function
"""
# Define functional and terminal sets
fs = [:+, :-, :DIV, :GT]
ts = [:x, :v, -1]
# Create the tree
ast = create_single_full_tree(4, fs, ts)
#println(typeof(ast))
#println(ast)
#println(dump(ast))
x = 1
v = 1
eval(ast) # Error out unless x and v are globals
end
main()
I am generating a random expression based on certain allowed functions and variables. As seen in the code, the expression can only have symbols x and v, as well as the value -1. I will need to test the expression with a variety of x and v values; here I am just using x=1 and v=1 to test the code.
The expression is being returned correctly, however, eval() can only be used with global variables, so it will error out when run unless I declare x and v to be global (ERROR: LoadError: UndefVarError: x not defined). I would like to avoid globals if possible. Is there a better way to generate and evaluate these generated expressions with locally defined variables?
Here is an example for generating an (anonymous) function. The result of eval can be called as a function and your variable can be passed as parameters:
myfun = eval(Expr(:->,:x, Expr(:block, Expr(:call,:*,3,:x) )))
myfun(14)
# returns 42
The dump function is very useful to inspect the expression that the parsers has created. For two input arguments you would use a tuple for example as args[1]:
julia> dump(parse("(x,y) -> 3x + y"))
Expr
head: Symbol ->
args: Array{Any}((2,))
1: Expr
head: Symbol tuple
args: Array{Any}((2,))
1: Symbol x
2: Symbol y
typ: Any
2: Expr
[...]
Does this help?
In the Metaprogramming part of the Julia documentation, there is a sentence under the eval() and effects section which says
Every module has its own eval() function that evaluates expressions in its global scope.
Similarly, the REPL help ?eval will give you, on Julia 0.6.2, the following help:
Evaluate an expression in the given module and return the result. Every Module (except those defined with baremodule) has its own 1-argument definition of eval, which evaluates expressions in that module.
I assume, you are working in the Main module in your example. That's why you need to have the globals defined there. For your problem, you can use macros and interpolate the values of x and y directly inside the macro.
A minimal working example would be:
macro eval_line(a, b, x)
isa(a, Real) || (warn("$a is not a real number."); return :(throw(DomainError())))
isa(b, Real) || (warn("$b is not a real number."); return :(throw(DomainError())))
return :($a * $x + $b) # interpolate the variables
end
Here, #eval_line macro does the following:
Main> #macroexpand #eval_line(5, 6, 2)
:(5 * 2 + 6)
As you can see, the values of macro's arguments are interpolated inside the macro and the expression is given to the user accordingly. When the user does not behave,
Main> #macroexpand #eval_line([1,2,3], 7, 8)
WARNING: [1, 2, 3] is not a real number.
:((Main.throw)((Main.DomainError)()))
a user-friendly warning message is provided to the user at parse-time, and a DomainError is thrown at run-time.
Of course, you can do these things within your functions, again by interpolating the variables --- you do not need to use macros. However, what you would like to achieve in the end is to combine eval with the output of a function that returns Expr. This is what the macro functionality is for. Finally, you would simply call your macros with an # sign preceding the macro name:
Main> #eval_line(5, 6, 2)
16
Main> #eval_line([1,2,3], 7, 8)
WARNING: [1, 2, 3] is not a real number.
ERROR: DomainError:
Stacktrace:
[1] eval(::Module, ::Any) at ./boot.jl:235
EDIT 1. You can take this one step further, and create functions accordingly:
macro define_lines(linedefs)
for (name, a, b) in eval(linedefs)
ex = quote
function $(Symbol(name))(x) # interpolate name
return $a * x + $b # interpolate a and b here
end
end
eval(ex) # evaluate the function definition expression in the module
end
end
Then, you can call this macro to create different line definitions in the form of functions to be called later on:
#define_lines([
("identity_line", 1, 0);
("null_line", 0, 0);
("unit_shift", 0, 1)
])
identity_line(5) # returns 5
null_line(5) # returns 0
unit_shift(5) # returns 1
EDIT 2. You can, I guess, achieve what you would like to achieve by using a macro similar to that below:
macro random_oper(depth, fs, ts)
operations = eval(fs)
oper = operations[rand(1:length(operations))]
terminals = eval(ts)
ts = terminals[rand(1:length(terminals), 2)]
ex = :($oper($ts...))
for d in 2:depth
oper = operations[rand(1:length(operations))]
t = terminals[rand(1:length(terminals))]
ex = :($oper($ex, $t))
end
return ex
end
which will give the following, for instance:
Main> #macroexpand #random_oper(1, [+, -, /], [1,2,3])
:((-)([3, 3]...))
Main> #macroexpand #random_oper(2, [+, -, /], [1,2,3])
:((+)((-)([2, 3]...), 3))
Thanks Arda for the thorough response! This helped, but part of me thinks there may be a better way to do this as it seems too roundabout. Since I am writing a genetic program, I will need to create 500 of these ASTs, all with random functions and terminals from a set of allowed functions and terminals (fs and ts in the code). I will also need to test each function with 20 different values of x and v.
In order to accomplish this with the information you have given, I have come up with the following macro:
macro create_function(defs)
for name in eval(defs)
ex = quote
function $(Symbol(name))(x,v)
fs = [:+, :-, :DIV, :GT]
ts = [x,v,-1]
return create_single_full_tree(4, fs, ts)
end
end
eval(ex)
end
end
I can then supply a list of 500 random function names in my main() function, such as ["func1, func2, func3,.....". Which I can eval with any x and v values in my main function. This has solved my issue, however, this seems to be a very roundabout way of doing this, and may make it difficult to evolve each AST with each iteration.

Convert Dict to DataFrame in Julia

Suppose I have a Dict defined as follows:
x = Dict{AbstractString,Array{Integer,1}}("A" => [1,2,3], "B" => [4,5,6])
I want to convert this to a DataFrame object (from the DataFrames module). Constructing a DataFrame has a similar syntax to constructing a dictionary. For example, the above dictionary could be manually constructed as a data frame as follows:
DataFrame(A = [1,2,3], B = [4,5,6])
I haven't found a direct way to get from a dictionary to a data frame but I figured one could exploit the syntactic similarity and write a macro to do this. The following doesn't work at all but it illustrates the approach I had in mind:
macro dict_to_df(x)
typeof(eval(x)) <: Dict || throw(ArgumentError("Expected Dict"))
return quote
DataFrame(
for k in keys(eval(x))
#eval ($k) = $(eval(x)[$k])
end
)
end
end
I also tried writing this as a function, which does work when all dictionary values have the same length:
function dict_to_df(x::Dict)
s = "DataFrame("
for k in keys(x)
v = x[k]
if typeof(v) <: AbstractString
v = string('"', v, '"')
end
s *= "$(k) = $(v),"
end
s = chop(s) * ")"
return eval(parse(s))
end
Is there a better, faster, or more idiomatic approach to this?
Another method could be
DataFrame(Any[values(x)...],Symbol[map(symbol,keys(x))...])
It was a bit tricky to get the types in order to access the right constructor. To get a list of the constructors for DataFrames I used methods(DataFrame).
The DataFrame(a=[1,2,3]) way of creating a DataFrame uses keyword arguments. To use splatting (...) for keyword arguments the keys need to be symbols. In the example x has strings, but these can be converted to symbols. In code, this is:
DataFrame(;[Symbol(k)=>v for (k,v) in x]...)
Finally, things would be cleaner if x had originally been with symbols. Then the code would go:
x = Dict{Symbol,Array{Integer,1}}(:A => [1,2,3], :B => [4,5,6])
df = DataFrame(;x...)

How to make type attribute be a function of other type attributes?

To declare a new composite type, we use the following syntax
type foo
a::Int64
b::Int64
end
and instantiate like such
x = foo(1,3)
Is there some way to have type attributes that always just a function of other attributes? For example, is there some way to do the following (which is invalid syntax)...
type foo
a::Int64
b::Int64
c = a + b
end
My current workaround is just to define a function which calculates c and returns an instance of the type, like so...
type foo
a::Int64
b::Int64
c::Int64
end
function foo_maker(a, b)
return foo(a, b, a+b)
end
Is there a more elegant solution? Possibly one that can be contained within the type definition?
EDIT - 3/7/14
With Cristóvão's suggestion in mind, I've ended up declaring constructors like the following to allow for keyword args and attributes calculated upon instantiation
# Type with optional keyword argument structure
type LargeType
# Declare all the attributes in order up top
q::Int64
w::Int64
e::Int64
r::Int64
t::Int64
y::Int64
a::Number
b::Number
c::Number
# Declare Longer constructor with stuff going on in the body
LargeType(;q=1,w=1,e=1,r=1,t=1,y=1) = begin
# Large Constructor Example
a = round(r^t - log(pi))
b = a % t
c = a*b
# Return new instance with correctly ordered arguments
return new(q,w,e,r,t,y,a,b,c)
end
end
println(LargeType(r=2,t=5))
Try this:
julia> type foo
a::Int64
b::Int64
c::Int64
foo(a::Int64, b::Int64) = new(a, b, a+b)
end
julia> foo(1,2)
foo(1,2,3)
julia> foo(4,5,6)
no method foo(Int64, Int64, Int64)
However, that won't prevent one from manually changing a, b or c and rendering c inconsistent. To prevent that, if it presents no other problems, you can make foo immutable:
julia> immutable foo
...
There isn't any way to do this currently, but there might be in the future:
https://github.com/JuliaLang/julia/issues/1974

Higher-order type constructors and functors in Ocaml

Can the following polymorphic functions
let id x = x;;
let compose f g x = f (g x);;
let rec fix f = f (fix f);; (*laziness aside*)
be written for types/type constructors or modules/functors? I tried
type 'x id = Id of 'x;;
type 'f 'g 'x compose = Compose of ('f ('g 'x));;
type 'f fix = Fix of ('f (Fix 'f));;
for types but it doesn't work.
Here's a Haskell version for types:
data Id x = Id x
data Compose f g x = Compose (f (g x))
data Fix f = Fix (f (Fix f))
-- examples:
l = Compose [Just 'a'] :: Compose [] Maybe Char
type Natural = Fix Maybe -- natural numbers are fixpoint of Maybe
n = Fix (Just (Fix (Just (Fix Nothing)))) :: Natural -- n is 2
-- up to isomorphism composition of identity and f is f:
iso :: Compose Id f x -> f x
iso (Compose (Id a)) = a
Haskell allows type variables of higher kind. ML dialects, including Caml, allow type variables of kind "*" only. Translated into plain English,
In Haskell, a type variable g can correspond to a "type constructor" like Maybe or IO or lists. So the g x in your Haskell example would be OK (jargon: "well-kinded") if for example g is Maybe and x is Integer.
In ML, a type variable 'g can correspond only to a "ground type" like int or string, never to a type constructor like option or list. It is therefore never correct to try to apply a type variable to another type.
As far as I'm aware, there's no deep reason for this limitation in ML. The most likely explanation is historical contingency. When Milner originally came up with his ideas about polymorphism, he worked with very simple type variables standing only for monotypes of kind *. Early versions of Haskell did the same, and then at some point Mark Jones discovered that inferring the kinds of type variables is actually quite easy. Haskell was quickly revised to allow type variables of higher kind, but ML has never caught up.
The people at INRIA have made a lot of other changes to ML, and I'm a bit surprised they've never made this one. When I'm programming in ML, I might enjoy having higher-kinded type variables. But they aren't there, and I don't know any way to encode the kind of examples you are talking about except by using functors.
You can do something similar in OCaml, using modules in place of types, and functors (higher-order modules) in place of higher-order types. But it looks much uglier and it doesn't have type-inference ability, so you have to manually specify a lot of stuff.
module type Type = sig
type t
end
module Char = struct
type t = char
end
module List (X:Type) = struct
type t = X.t list
end
module Maybe (X:Type) = struct
type t = X.t option
end
(* In the following, I decided to omit the redundant
single constructors "Id of ...", "Compose of ...", since
they don't help in OCaml since we can't use inference *)
module Id (X:Type) = X
module Compose
(F:functor(Z:Type)->Type)
(G:functor(Y:Type)->Type)
(X:Type) = F(G(X))
let l : Compose(List)(Maybe)(Char).t = [Some 'a']
module Example2 (F:functor(Y:Type)->Type) (X:Type) = struct
(* unlike types, "free" module variables are not allowed,
so we have to put it inside another functor in order
to scope F and X *)
let iso (a:Compose(Id)(F)(X).t) : F(X).t = a
end
Well... I'm not an expert of higher-order-types nor Haskell programming.
But this seems to be ok for F# (which is OCaml), could you work with these:
type 'x id = Id of 'x;;
type 'f fix = Fix of ('f fix -> 'f);;
type ('f,'g,'x) compose = Compose of ('f ->'g -> 'x);;
The last one I wrapped to tuple as I didn't come up with anything better...
You can do it but you need to make a bit of a trick:
newtype Fix f = In{out:: f (Fix f)}
You can define Cata afterwards:
Cata :: (Functor f) => (f a -> a) -> Fix f -> a
Cata f = f.(fmap (cata f)).out
That will define a generic catamorphism for all functors, which you can use to build your own stuff. Example:
data ListFix a b = Nil | Cons a b
data List a = Fix (ListFix a)
instance functor (ListFix a) where
fmap f Nil = Nil
fmap f (Cons a lst) = Cons a (f lst)

Resources