Equality in Ocaml hashtables - hashtable

Are there hashtables in Ocaml that use == instead of = when testing for equality of keys?
For example:
# type foo = A of int;;
# let a = A(1);;
# let b = A(1);;
# a == b;;
- : bool = false
# a = b;;
- : bool = true
# let h = Hashtbl.create 8;;
# Hashtbl.add h a 1;;
# Hashtbl.add h b 2;;
# Hashtbl.find h a;;
- : int = 2
# Hashtbl.find h b;;
- : int = 2
I'd like a hashtable that can distinguish between a and b. Is that possible?

You can use custom hashtables:
module H = Hashtbl.Make(struct
type t = foo
let equal = (==)
let hash = Hashtbl.hash
end)
And then use H instead of Hashtbl in your code.

The solution in Thomas and cago's answers is functionally correct. One issue that may trouble you if you use their solution as-is is that you will get more collisions than expected if you hash many keys that are equal for (=) and different for (==). Indeed, all keys that are equal for (=) have the same hash for Hashtbl.hash, and end up in the same bucket, where they are they recognized as different (since you asked for (==) to be used as equality function) and create different bindings. In the worst cases, the hash-table will behave with the same complexity as an association list (which, by the way, is another data structure that you could be using, and then you wouldn't have to worry about providing a hash function).
If you can accept the key of a value changing occasionally (and therefore the value being impossible to retrieve from the hash-table, since the binding is then in the wrong bucket), you can use the following low-level function as hash:
external address_of_value: 'a -> int = "address_of_value"
Implemented in C as:
#include "caml/mlvalues.h"
value address_of_value(value v)
{
return (Val_long(((unsigned long)v)/sizeof(long)));
}
You would then use:
module H = Hashtbl.Make(struct
type t = foo
let equal = (==)
let hash = address_of_value
end);;

Related

Swapping Variables by pattern matching?

Assume you have 2 Integer Variables a and b
How would you swap them only if a > b by using a match expression?
If a <= b do not swap the ints.
In an imperative language:
if (a > b){
int temp=a;
a=b;
b=temp;
}
Doing the same in ocaml seems surprisingly hard.
I tried
let swap a b =
match a,b with
| a,b when a > b -> b,a
| a,b when a <= b -> a,b
I am trying to do this because in the following function call, I want to make sure that x is the bigger of the two variables.
One easy way :
let swap a b =
if (a>b) then (b,a)
else (a,b)
But this is not equivalent to the C code, your C code is swapping the value of the variable - this is how imperative language are doing.
In Ocaml, there is no side-effect (except if you use reference to some int). This swap function will return a tuple whose members are always ordered (the first member will be always smaller than the second order).
Without state, you cannot "swap" the values of the variables since the variables are immutable. Your best bet is to use a tuple and introduce new variables in the scope. Example:
let diff a b =
let (min, max) = if a <= b then (a, b) else (b, a)
in max - min
You can of course use the same identifiers and shadow the original variables:
let diff a b =
let (a, b) = if a <= b then (a, b) else (b, a)
in b - a
It doesn't really help with readability though.
Just for reference, if you'd like to swap the values in two refs, it would look like the following:
let swap a_ref b_ref =
let a, b = !a_ref, !b_ref in
a_ref := b;
b_ref := a
;;
which has the type val swap : 'a ref -> 'a ref -> unit.

Automatic detection of domain for dependent type function in Idris

Idris language tutorial has simple and understandable example of the idea of Dependent Types:
http://docs.idris-lang.org/en/latest/tutorial/typesfuns.html#first-class-types
Here is the code:
isSingleton : Bool -> Type
isSingleton True = Int
isSingleton False = List Int
mkSingle : (x : Bool) -> isSingleton x
mkSingle True = 0
mkSingle False = []
sum : (single : Bool) -> isSingleton single -> Int
sum True x = x
sum False [] = 0
sum False (x :: xs) = x + sum False xs
I decided to spend more time on this example. What bothers me in sum function is that I need to explicitly pass single : Bool value to function. I don't want to do this and I want compiler to guess what this boolean value should be. Hence I pass only Int or List Int to sum function there should be 1-to-1 correspondence between boolean value and type of argument (if I pass some other type this just mustn't type check).
Of course, I understand, this is not possible in general case. Such compiler tricks require my function isSingleton (or any other similar function) be injective. But for this case it should be possible as it seems to me...
So I started with next implementation: I just made single argument implicit.
sum : {single : Bool} -> isSingleton single -> Int
sum {single = True} x = x
sum {single = False} [] = 0
sum {single = False} (x :: xs) = x + sum' {single = False} xs
Well, it doesn't really solve my problem because I still need to call this function in the next way:
sum {single=True} 1
But I read in tutorial about auto keyword. Though I don't quite understand what auto does (because I didn't find description of it) I decided to patch my function just a little bit more:
sum' : {auto single : Bool} -> isSingleton single -> Int
sum' {single = True} x = x
sum' {single = False} [] = 0
sum' {single = False} (x :: xs) = x + sum' {single = False} xs
And it works for lists!
*DepFun> :t sum'
sum' : {auto single : Bool} -> isSingleton single -> Int
*DepFun> sum' [1,2,3]
6 : Int
But doesn't work for single value :(
*DepFun> sum' 3
When checking an application of function Main.sum':
List Int is not a numeric type
Can someone explain is it actually possible to achieve my goal in such injective function usages currently? I watched this short video about proving something is injective:
https://www.youtube.com/watch?v=7Ml8u7DFgAk
But I don't understand how I can use such proofs in my example.
If this is not possible what is the best way to write such functions?
The auto keyword basically tells Idris, "Find me any value of this type". So you're liable to get the wrong answer unless that type only contains one value. Idris sees {auto x : Bool} and fills it in with any old Bool, namely False. It doesn't use its knowledge of later arguments to help it choose - information doesn't flow from right to left.
One fix would be to make the information flow in the other direction. Rather using a universe-style construction as you have above, write a function accepting an arbitrary type and use a predicate to refine it to the two options you want. This way Idris can look at the type of the preceding argument and pick the only value of IsListOrInt whose type matches.
data IsListOrInt a where
IsInt : IsListOrInt Int
IsList : IsListOrInt (List Int)
sum : a -> {auto isListOrInt : IsListOrInt a} -> Int
sum x {IsInt} = x
sum [] {IsList} = 0
sum (x :: xs) {IsList} = x + sum xs
Now, in this case the search space is small enough (two values - True and False) that Idris could feasibly explore every option in a brute-force fashion and pick the first one that results in a program which passes the type checker, but that algorithm doesn't scale well when the types are much bigger than two, or when trying to infer multiple values.
Compare the left-to-right nature of the information flow in the above example with the behaviour of regular non-auto braces, which instruct Idris to find the result in a bidirectional fashion using unification. As you note, this could only succeed when the type functions in question are injective. You could structure your input as a separate, indexed datatype, and allow Idris to look at the constructor to find b using unification.
data OneOrMany isOne where
One : Int -> OneOrMany True
Many : List Int -> OneOrMany False
sum : {b : Bool} -> OneOrMany b -> Int
sum (One x) = x
sum (Many []) = 0
sum (Many (x :: xs)) = x + sum (Many xs)
test = sum (One 3) + sum (Many [29, 43])
Predicting when the machine will or won't be able to guess what you mean is an important skill in dependently-typed programming; you'll find yourself getting better at it with more experience.
Of course, in this case it's all moot because lists already have one-or-many semantics. Write your function over plain old lists; then if you need to apply it to a single value you can just wrap it in a singleton list.

F# - Treating a function like a map

Long story short, I came up with this funny function set, that takes a function, f : 'k -> 'v, a chosen value, k : 'k, a chosen result, v : 'v, uses f as the basis for a new function g : 'k -> 'v that is the exact same as f, except for that it now holds that, g k = v.
Here is the (pretty simple) F# code I wrote in order to make it:
let set : ('k -> 'v) -> 'k -> 'v -> 'k -> 'v =
fun f k v x ->
if x = k then v else f x
My questions are:
Does this function pose any problems?
I could imagine repeat use of the function, like this
let kvs : (int * int) List = ... // A very long list of random int pairs.
List.fold (fun f (k,v) -> set f k v) id kvs
would start building up a long list of functions on the heap. Is this something to be concerned about?
Is there a better way to do this, while still keeping the type?
I mean, I could do stuff like construct a type for holding the original function, f, a Map, setting key-value pairs to the map, and checking the map first, the function second, when using keys to get values, but that's not what interests me here - what interest me is having a function for "modifying" a single result for a given value, for a given function.
Potential problems:
The set-modified function leaks space if you override the same value twice:
let huge_object = ...
let small_object = ...
let f0 = set f 0 huge_object
let f1 = set f0 0 small_object
Even though it can never be the output of f1, huge_object cannot be garbage-collected until f1 can: huge_object is referenced by f0, which is in turn referenced by the f1.
The set-modified function has overhead linear in the number of set operations applied to it.
I don't know if these are actual problems for your intended application.
If you wish set to have exactly the type ('k -> 'v) -> 'k -> 'v -> 'k -> 'v then I don't see a better way(*). The obvious idea would be to have a "modification table" of functions you've already modified, then let set look up a given f in this table. But function types do not admit equality checking, so you cannot compare f to the set of functions known to your modification table.
(*) Reflection not withstanding.

Hashtable of mutable variable in Ocaml

I need to use hashtable of mutable variable in Ocaml, but it doesn't work out.
let link = Hashtbl.create 3;;
let a = ref [1;2];;
let b = ref [3;4];;
Hashtbl.add link a b;;
# Hashtbl.mem link a;;
- : bool = true
# a := 5::!a;;
- : unit = ()
# Hashtbl.mem link a;;
- : bool = false
Is there any way to make it works?
You can use the functorial interface, which lets you supply your own hash and equality functions. Then you write functions that are based only on the non-mutable parts of your values. In this example, there are no non-mutable parts. So, it's not especially clear what you're expecting to find in the table. But in a more realistic example (in my experience) there are non-mutable parts that you can use.
If there aren't any non-mutable parts, you can add them specifically for use in hashing. You could add an arbitrary unique integer to each value, for example.
It's also possible to do hashing based on physical equality (==), which has a reasonable definition for references (and other mutable values). You have to be careful with it, though, as physical equality is a little tricky. For example, you can't use the physical address of a value as your hash key--an address can change at any time due to garbage collection.
Mutable variables that may happen to have the same content can still be distinguished because they are stored at different locations in memory. They can be compared with the physical equality operator (==). However, OCaml doesn't provide anything better than equality, it doesn't provide a nontrivial hash function or order on references, so the only data structure you can build to store references is an association list of some form, with $\Theta(n)$ access time for most uses.
(You can actually get at the underlying pointer if you play dirty. But the pointer can move under your feet. There is a way to make use of it nonetheless, but if you need to ask, you shouldn't use it. And you aren't desperate enough for that anyway.)
It would be easy to compare references if two distinct references had a distinct content. So make it so! Add a unique identifier to your references. Keep a global counter, increment it by 1 each time you create a reference, and store the counter value with the data. Now your references can be indexed by their counter value.
let counter = ref 0
let new_var x = incr counter; ref (!counter, x)
let var_value v = snd !v
let update_var v x = v := (fst !v, x)
let hash_var v = Hashtbl.hash (fst !v)
For better type safety and improved efficiency, define a data structure containing a counter value and an item.
let counter = ref 0
type counter = int
type 'a variable = {
key : counter;
mutable data : 'a;
}
let new_var x = incr counter; {key = !counter; data = x}
let hash_var v = Hashtbl.hash v.key
You can put the code above in a module and make the counter type abstract. Also, you can define a hash table module using the Hashtbl functorial interface. Here's another way to define variables and a hash table structure on them with a cleaner, more modular structure.
module Counter = (struct
type t = int
let counter = ref 0
let next () = incr counter; !counter
let value c = c
end : sig
type t
val next : unit -> t
val value : t -> int
end)
module Variable = struct
type 'a variable = {
mutable data : 'a;
key : Counter.t;
}
let make x = {key = Counter.next(); data = x}
let update v x = v.data <- x
let get v = v.data
let equal v1 v2 = v1 == v2
let hash v = Counter.value v.key
let compare v1 v2 = Counter.value v2.key - Counter.value v1.key
end
module Make = functor(A : sig type t end) -> struct
module M = struct
type t = A.t Variable.variable
include Variable
end
module Hashtbl = Hashtbl.Make(M)
module Set = Set.Make(M)
module Map = Map.Make(M)
end
We need the intermediate module Variable because the type variable is parametric and the standard library data structure functors (Hashtbl.Make, Set.Make, Map.Make) are only defined for type constructors with no argument. Here's an interface that exposes both the polymorphic interface (with the associated functions, but no data structures) and a functor to build any number of monomorphic instances, with an associated hash table (and set, and map) type.
module Variable : sig
type 'a variable
val make : 'a -> 'a variable
val update : 'a variable -> 'a -> unit
val get : 'a variable -> 'a
val equal : 'a -> 'a -> bool
val hash : 'a variable -> int
val compare : 'a variable -> 'b variable -> int
end
module Make : functor(A : sig type t end) -> sig
module M : sig
type t = A.t variable.variable
val make : A.t -> t
val update : t -> A.t -> unit
val get : t -> A.t
val equal : t -> t -> bool
val hash : t -> int
val compare : t -> t -> int
end
module Hashtbl : Hashtbl.S with type key = M.t
module Set : Set.S with type key = M.t
module Map : Map.S with type key = M.t
end
Note that if you expect that your program may generate more than 2^30 variables during a run, an int won't cut it. You need two int values to make a 2^60-bit value, or an Int64.t.
Note that if your program is multithreaded, you need a lock around the counter, to make the incrementation and lookup atomic.

Recursive anonymous functions in SML

Is it possible to write recursive anonymous functions in SML? I know I could just use the fun syntax, but I'm curious.
I have written, as an example of what I want:
val fact =
fn n => case n of
0 => 1
| x => x * fact (n - 1)
The anonymous function aren't really anonymous anymore when you bind it to a
variable. And since val rec is just the derived form of fun with no
difference other than appearance, you could just as well have written it using
the fun syntax. Also you can do pattern matching in fn expressions as well
as in case, as cases are derived from fn.
So in all its simpleness you could have written your function as
val rec fact = fn 0 => 1
| x => x * fact (x - 1)
but this is the exact same as the below more readable (in my oppinion)
fun fact 0 = 1
| fact x = x * fact (x - 1)
As far as I think, there is only one reason to use write your code using the
long val rec, and that is because you can easier annotate your code with
comments and forced types. For examples if you have seen Haskell code before and
like the way they type annotate their functions, you could write it something
like this
val rec fact : int -> int =
fn 0 => 1
| x => x * fact (x - 1)
As templatetypedef mentioned, it is possible to do it using a fixed-point
combinator. Such a combinator might look like
fun Y f =
let
exception BlackHole
val r = ref (fn _ => raise BlackHole)
fun a x = !r x
fun ta f = (r := f ; f)
in
ta (f a)
end
And you could then calculate fact 5 with the below code, which uses anonymous
functions to express the faculty function and then binds the result of the
computation to res.
val res =
Y (fn fact =>
fn 0 => 1
| n => n * fact (n - 1)
)
5
The fixed-point code and example computation are courtesy of Morten Brøns-Pedersen.
Updated response to George Kangas' answer:
In languages I know, a recursive function will always get bound to a
name. The convenient and conventional way is provided by keywords like
"define", or "let", or "letrec",...
Trivially true by definition. If the function (recursive or not) wasn't bound to a name it would be anonymous.
The unconventional, more anonymous looking, way is by lambda binding.
I don't see what unconventional there is about anonymous functions, they are used all the time in SML, infact in any functional language. Its even starting to show up in more and more imperative languages as well.
Jesper Reenberg's answer shows lambda binding; the "anonymous"
function gets bound to the names "f" and "fact" by lambdas (called
"fn" in SML).
The anonymous function is in fact anonymous (not "anonymous" -- no quotes), and yes of course it will get bound in the scope of what ever function it is passed onto as an argument. In any other cases the language would be totally useless. The exact same thing happens when calling map (fn x => x) [.....], in this case the anonymous identity function, is still in fact anonymous.
The "normal" definition of an anonymous function (at least according to wikipedia), saying that it must not be bound to an identifier, is a bit weak and ought to include the implicit statement "in the current environment".
This is in fact true for my example, as seen by running it in mlton with the -show-basis argument on an file containing only fun Y ... and the val res ..
val Y: (('a -> 'b) -> 'a -> 'b) -> 'a -> 'b
val res: int32
From this it is seen that none of the anonymous functions are bound in the environment.
A shorter "lambdanonymous" alternative, which requires OCaml launched
by "ocaml -rectypes":
(fun f n -> f f n)
(fun f n -> if n = 0 then 1 else n * (f f (n - 1))
7;; Which produces 7! = 5040.
It seems that you have completely misunderstood the idea of the original question:
Is it possible to write recursive anonymous functions in SML?
And the simple answer is yes. The complex answer is (among others?) an example of this done using a fix point combinator, not a "lambdanonymous" (what ever that is supposed to mean) example done in another language using features not even remotely possible in SML.
All you have to do is put rec after val, as in
val rec fact =
fn n => case n of
0 => 1
| x => x * fact (n - 1)
Wikipedia describes this near the top of the first section.
let fun fact 0 = 1
| fact x = x * fact (x - 1)
in
fact
end
This is a recursive anonymous function. The name 'fact' is only used internally.
Some languages (such as Coq) use 'fix' as the primitive for recursive functions, while some languages (such as SML) use recursive-let as the primitive. These two primitives can encode each other:
fix f => e
:= let rec f = e in f end
let rec f = e ... in ... end
:= let f = fix f => e ... in ... end
In languages I know, a recursive function will always get bound to a name. The convenient and conventional way is provided by keywords like "define", or "let", or "letrec",...
The unconventional, more anonymous looking, way is by lambda binding. Jesper Reenberg's answer shows lambda binding; the "anonymous" function gets bound to the names "f" and "fact" by lambdas (called "fn" in SML).
A shorter "lambdanonymous" alternative, which requires OCaml launched by "ocaml -rectypes":
(fun f n -> f f n)
(fun f n -> if n = 0 then 1 else n * (f f (n - 1))
7;;
Which produces 7! = 5040.

Resources