Pointer to a record in OCaml - pointers

I am implementing binary search trees in OCaml, trying to use as much imperative programming as possible.
I have the following data type:
type tKey = Key of int;;
type tBST = Null | Pos of node ref
and node = {mutable key : tKey; mutable left : tBST; mutable right : tBST};;
I am having trouble with this function:
let createNode k tree =
tree := Pos ({key = k; left = Null; right = Null});;
Error: This record expression is expected to have type node ref
The field key does not belong to type ref
A binary search tree can be either Null (means empty tree) or a Pos. A tree Pos is a pointer to a node, and a node is a structure of a key and 2 other trees (left and right).
My main goal here is to have a tree that is modified after functions are over. Passing tree by reference so when createNode is over, the tBST I passed as parameter is modified.
Question: is actually possible to do what I am trying in OCaml? if so, how could I change my function createNode and/or data type to make this happen?
Thank you very much.

It is possible, but you need to create the Pos node with a reference explicitly:
Pos (ref {key = k; (*...*)})
Whether what you are trying to do is recommended practice in a language like Ocaml is a different story, though.

The question has already been answered. I would just like to add a side note: The use of ref seems superfluous in this case.
A value of type tBST is either Null or a mutable pointer. If it is Null it will remain Null. If it is non-Null, it will remain non-Null, but the actual pointer might change. That might well be what you intended, but I have my doubts. In particular, what tBST does not do, is to emulate C-style pointers (which are either null or really point somewhere). I suspect, though, that that was your intention.
The idiomatic way to emulate C-style pointers is to just use the built-in option type, like so:
type tBST = node option
A value of type node option is either None or Some n, where n is a pointer to a value of type node. You use tBST for mutable fields (of the record node), so you would effectively have mutable C-style pointers to nodes.

Here is what you probably had in mind:
type tree = node option ref
and node = {
mutable left: tree;
mutable key: int;
mutable right: tree;
};;
let t0 : tree = ref None;;
let t1 : tree = ref (Some { left = ref None; key = 1; right = ref None; }) ;;
let create_node key tree =
tree := Some { left = ref None; key; right = ref None; }
No need to have a separate type for key but you can if you want it, and with the latest OCaml there no runtime overhead for it.

Related

Nim: How to pass an array of varying size to an argument of a foreign function calling into a .dll?

Bellow is a minimal example when wrapping the OpenAL32.dll. The foreign function alcCreateContext has the argument attrlist which takes a ptr to an array of type ALCint or nil. The issue is the array can be of different lengths depending on the amount of different flags passed in. The array should be organized as [flag, int, flag, int, ...]. How can this be accomplished in a more dynamic way allowing the inclusion of ALC_FREQUENCY for example? The array size is currently hard coded into the procedure and its nasty.
when defined(windows):
{.push cdecl, dynlib: "OpenAL32.dll", importc.}
else:
{.push importc.}
type
ALCint = cint
ALCdevice* = pointer
ALCcontext* = pointer
const
ALC_MONO_SOURCES* = 0x00001010
ALC_STEREO_SOURCES* = 0x00001011
ALC_FREQUENCY* = 0x00001007
proc alcCreateContext*(device: ALCdevice; attrlist: ptr array[0..3, ALCint]): ALCcontext
proc alcOpenDevice*(devicename: cstring): ALCdevice
const attributes = [ALC_MONO_SOURCES.ALCint, 65536.ALCint, ALC_STEREO_SOURCES.ALCint, 65536.ALCint]
discard alcOpenDevice(nil).alcCreateContext(attributes.unsafeAddr)
I experimented with openArray and other containers. Is the solution some sort of cast? This is also the workaround for getting more then 256 sounds out of OpenAL.
Answer from PMunch. Thank You.
The foreign function now wants ptr UncheckedArray[ALCint] and when passing the argument use cast[ptr UncheckedArray[ALCint]](attributes.unsafeAddr)
when defined(windows):
{.push cdecl, dynlib: "OpenAL32.dll", importc.}
else:
{.push importc.}
type
ALCint = cint
ALCdevice* = pointer
ALCcontext* = pointer
const
ALC_MONO_SOURCES* = 0x00001010
ALC_STEREO_SOURCES* = 0x00001011
ALC_FREQUENCY* = 0x00001007
proc alcCreateContext*(device: ALCdevice; attrlist: ptr UncheckedArray[ALCint]): ALCcontext
proc alcOpenDevice*(devicename: cstring): ALCdevice
const attributes = [ALC_MONO_SOURCES.ALCint, 65536.ALCint, ALC_STEREO_SOURCES.ALCint, 65536.ALCint]
discard alcOpenDevice(nil).alcCreateContext(cast[ptr UncheckedArray[ALCint]](attributes.unsafeAddr))
An array in C is simply a pointer to anywhere with one or more contiguous elements of the same type. So to pass a C array to a function you simply need to get such a pointer. Say for example you have a seq of integers then the address of the first element is a C array. Simply do mySeq[0].addr and you're good. Keep the lifecycle of the data in mind though. If Nim doesn't find any more references to the sequence then the memory will get freed. You can also manually get a pointer with create (https://nim-lang.org/docs/system.html#create%2Ctypedesc) and you can cast such pointers to ptr UncheckedArray[T] to be able to use [] on the data in Nim.

What's a good pattern to manage impossible states in Elm?

Maybe you can help. I'm an Elm beginner and I'm struggling with a rather mundane problem. I'm quite excited with Elm and I've been rather successful with smaller things, so now I tried something more complex but I just can't seem to get my head around it.
I'm trying to build something in Elm that uses a graph-like underlying data structure. I create the graph with a fluent/factory pattern like this:
sample : Result String MyThing
sample =
MyThing.empty
|> addNode 1 "bobble"
|> addNode 2 "why not"
|> addEdge 1 2 "some data here too"
When this code returns Ok MyThing, then the whole graph has been set up in a consistent manner, guaranteed, i.e. all nodes and edges have the required data and the edges for all nodes actually exist.
The actual code has more complex data associated with the nodes and edges but that doesn't matter for the question. Internally, the nodes and edges are stored in the Dict Int element.
type alias MyThing =
{ nodes : Dict Int String
, edges : Dict Int { from : Int, to : Int, label : String }
}
Now, in the users of the module, I want to access the various elements of the graph. But whenever I access one of the nodes or edges with Dict.get, I get a Maybe. That's rather inconvenient because by the virtue of my constructor code I know the indexes exist etc. I don't want to clutter upstream code with Maybe and Result when I know the indexes in an edge exist. To give an example:
getNodeTexts : Edge -> MyThing -> Maybe (String, String)
getNodeTexts edge thing =
case Dict.get edge.from thing.nodes of
Nothing ->
--Yeah, actually this can never happen...
Nothing
Just fromNode -> case Dict.get edge.to thing.nodes of
Nothing ->
--Again, this can never actually happen because the builder code prevents it.
Nothing
Just toNode ->
Just ( fromNode.label, toNode.label )
That's just a lot of boilerplate code to handle something I specifically prevented in the factory code. But what's even worse: Now the consumer needs extra boilerplate code to handle the Maybe--potentially not knowing that the Maybe will actually never be Nothing. The API is sort of lying to the consumer. Isn't that something Elm tries to avoid? Compare to the hypothetical but incorrect:
getNodeTexts : Edge -> MyThing -> (String, String)
getNodeTexts edge thing =
( Dict.get edge.from thing.nodes |> .label
, Dict.get edge.to thing.nodes |> .label
)
An alternative would be not to use Int IDs but use the actual data instead--but then updating things gets very tedious as connectors can have many edges. Managing state without the decoupling through Ints just doesn't seem like a good idea.
I feel there must be a solution to this dilemma using opaque ID types but I just don't see it. I would be very grateful for any pointers.
Note: I've also tried to use both drathier and elm-community elm-graph libraries but they don't address the specific question. They rely on Dict underneath as well, so I end up with the same Maybes.
There is no easy answer to your question. I can offer one comment and a coding suggestion.
You use the magic words "impossible state" but as OOBalance has pointed out, you can create an impossible state in your modelling. The normal meaning of "impossible state" in Elm is precisely in relation to modelling e.g. when you use two Bools to represent 3 possible states. In Elm you can use a custom type for this and not leave one combination of bools in your code.
As for your code, you can reduce its length (and perhaps complexity) with
getNodeTexts : Edge -> MyThing -> Maybe ( String, String )
getNodeTexts edge thing =
Maybe.map2 (\ n1 n2 -> ( n1.label, n2.label ))
(Dict.get edge.from thing.nodes)
(Dict.get edge.to thing.nodes)
From your description, it looks to me like those states actually aren't impossible.
Let's start with your definition of MyThing:
type alias MyThing =
{ nodes : Dict Int String
, edges : Dict Int { from : Int, to : Int, label : String }
}
This is a type alias, not a type – meaning the compiler will accept MyThing in place of {nodes : Dict Int String, edges : Dict Int {from : Int, to : Int, label : String}} and vice-versa.
So rather than construct a MyThing value safely using your factory functions, I can write:
import Dict
myThing = { nodes = Dict.empty, edges = Dict.fromList [(0, {from = 0, to = 1, label = "Edge 0"})] }
… and then pass myThing to any of your functions expecting MyThing, even though the nodes connected by Edge 0 aren't contained in myThing.nodes.
You can fix this by changing MyThing to be a custom type:
type MyThing
= MyThing { nodes : Dict Int String
, edges : Dict Int { from : Int, to : Int, label : String }
}
… and exposing it using exposing (MyThing) rather than exposing (MyThing(..)). That way, no constructor for MyThing is exposed, and code outside of your module must use the factory functions to obtain a value.
The same applies to Edge, wich I'm assuming is defined as:
type alias Edge =
{ from : Int, to : Int, label : String }
Unless it is changed to a custom type, it is trivial to construct arbitrary Edge values:
type Edge
= Edge { from : Int, to : Int, label : String }
Then however, you will need to expose some functions to obtain Edge values to pass to functions like getNodeTexts. Let's assume I have obtained a MyThing and one of its edges:
myThing : MyThing
-- created using factory functions
edge : Edge
-- an edge of myThing
Now I create another MyThing value, and pass it to getNodeTexts along with edge:
myOtherThing : MyThing
-- a different value of type MyThing
nodeTexts = getNodeTexts edge myOtherThing
This should return Maybe.Nothing or Result.Err String, but certainly not (String, String) – the edge does not belong to myOtherThing, so there is no guarantee its nodes are contained in it.

Balanced tree for functional symbol table

I'm doing exercises of "Modern Compiler Implementation in ML" (Andrew Appel). One of which (ex 1.1 d) is to recommend a balanced-tree data structure for functional symbol table. Appeal mentioned such data structure should rebalance on insertion but not on lookup. Being totally new to functional programming, I found this confusing. What is key insight on this requirement?
A tree that’s rebalanced on every insertion and deletion doesn’t need to rebalance on lookup, because lookup doesn’t modify the structure. If it was balanced before a lookup, it will stay balanced during and after.
In functional languages, insertion and rebalancing can be more expensive than in a procedural one. Because you can’t alter any node in place, you replace a node by creating a new node, then replacing its parent with a new node whose children are the new daughter and the unaltered older daughter, and then replace the grandparent node with one whose children are the new parent and her older sister, and so on up. You finish when you create a new root node for the updated tree and garbage-collect all the nodes you replaced. However, some tree structures have the desirable property that they need to replace no more than O(log N) nodes of the tree on an insertion and can re-use the rest. This means that the rotation of a red-black tree (for example) has not much more overhead than an unbalanced insertion.
Also, you will typically need to query a symbol table much more often than you update it. It therefore becomes less tempting to try to make insertion faster: if you’re inserting, you might as well rebalance.
The question of which self-balancing tree structure is best for a functional language has been asked here, more than once.
Since Davislor already answered your question extensively, here are mostly some implementation hints. I would add that choice of data structure for your symbol table is probably not relevant for a toy compiler. Compilation time only starts to become an issue when you compiler is used on a lot of code and the code is recompiled often.
Sticking to a O(n) insert/lookup data structure is fine in practice until it isn't.
Signature-wise, all you want is a key-value mapping, insert, and lookup:
signature SymTab =
sig
type id
type value
type symtab
val empty : symtab
val insert : id -> value -> symtab -> symtab
val lookup : id -> symtab -> value option
end
A simple O(n) implementation with lists might be:
structure ListSymTab : SymTab =
struct
type id = string
type value = int
type symtab = (id * value) list
val empty = []
fun insert id value [] = [(id, value)]
| insert id value ((id',value')::symtab) =
if id = id'
then (id,value)::symtab
else (id',value')::insert id value symtab
fun lookup _ [] = NONE
| lookup id ((id',value)::symtab) =
if id = id' then SOME value else lookup id symtab
end
You might use it like:
- ListSymTab.lookup "hello" (ListSymTab.insert "hello" 42 ListSymTab.empty);
> val it = SOME 42 : int option
Then again, maybe your symbol table doesn't map strings to integers, or you may have one symbol table for variables and one for functions.
You could parameterise the id/value types using a functor:
functor ListSymTabFn (X : sig
eqtype id
type value
end) : SymTab =
struct
type id = X.id
type value = X.value
(* The rest is the same as ListSymTab. *)
end
And you might use it like:
- structure ListSymTab = ListSymTabFn(struct type id = string type value = int end);
- ListSymTab.lookup "world" (ListSymTab.insert "hello" 42 ListSymTab.empty);
> val it = NONE : int option
All you need for a list-based symbol table is that the identifiers/symbols can be compared for equality. For your balanced-tree symbol table, you need identifiers/symbols to be orderable.
Instead of implementing balanced trees from scratch, look e.g. at SML/NJ's RedBlackMapFn:
To create a structure implementing maps (dictionaries) over a type T [...]:
structure MapT = RedBlackMapFn (struct
type ord_key = T
val compare = compareT
end)
Try this example with T as string and compare as String.compare:
$ sml
Standard ML of New Jersey v110.76 [built: Sun Jun 29 03:29:51 2014]
- structure MapS = RedBlackMapFn (struct
type ord_key = string
val compare = String.compare
end);
[autoloading]
[library $SMLNJ-BASIS/basis.cm is stable]
[library $SMLNJ-LIB/Util/smlnj-lib.cm is stable]
[autoloading done]
structure MapS : ORD_MAP?
- open MapS;
...
Opening the structure is an easy way to explore the available functions and their types.
We can then create a similar functor to ListSymTabFn, but one that takes an additional compare function:
functor RedBlackSymTabFn (X : sig
type id
type value
val compare : id * id -> order
end) : SymTab =
struct
type id = X.id
type value = X.value
structure SymTabX = RedBlackMapFn (struct
type ord_key = X.id
val compare = X.compare
end)
(* The 'a map type inside SymTabX maps X.id to anything. *)
(* We are, however, only interested in mapping to values. *)
type symtab = value SymTabX.map
(* Use other stuff in SymTabT for empty, insert, lookup. *)
end
Finally, you can use this as your symbol table:
structure SymTab = RedBlackSymTabFn(struct
type id = string
type value = int
val compare = String.compare
end);

F# map and distinct objects

I have some nondescript but distinct objects (specifically, unnamed variables in logic expressions) that I want to put in a map that associates them with their values. As I understand it, map needs to distinguish objects by some ordered field, so I can't just have
type Term =
...
| Var
as this would not allow different variables distinguishable from each other. Instead I could presumably have
type Term =
...
| Var of int64
and then have a new_var function that increments a global int64 counter and returns a new variable with the incremented value. This seems slightly inelegant, but should work.
Is the global counter the recommended way to handle this, or is there a more idiomatic method?
It's not really a "map having to distinguish objects" thing - when you declare a type like this:
type Term =
| Var
you have a type with a single valid value - Var. If you're saying you want to have objects that are distinct - this is not what you want. You can still use that type as a key in a map - not a particularly useful one though, since it will have at most a single element.
Using a counter is a good enough way to handle it. If you don't want a "global" one, you can roll it into a function using a ref cell to hold it:
type Term =
| Var of int
let make =
let counter = ref 0
fun () ->
counter := !counter + 1
Term.Var (!counter)
Or use GUIDs if you don't care about the values and want the counter out of the picture:
type Term =
| Var of System.Guid
let make () =
Term.Var (System.Guid.NewGuid())

Mutable vectors in struct

I'm trying to get a graph clustering algorithm to work in Rust. Part of the code is a WeightedGraph data structure with an adjacency list representation. The core would be represented like this (shown in Python to make it clear what I'm trying to do):
class Edge(object):
def __init__(self, target, weight):
self.target = target
self.weight = weight
class WeightedGraph(object):
def __init__(self, initial_size):
self.adjacency_list = [[] for i in range(initial_size)]
self.size = initial_size
self.edge_count = 0
def add_edge(self, source, target, weight):
self.adjacency_list[source].append(Edge(target, weight))
self.edge_count += 1
So, the adjacency list holds an array of n arrays: one array for each node in the graph. The inner array holds the neighbors of that node, represented as Edge (the target node number and the double weight).
My attempt to translate the whole thing to Rust looks like this:
struct Edge {
target: uint,
weight: f64
}
struct WeightedGraph {
adjacency_list: ~Vec<~Vec<Edge>>,
size: uint,
edge_count: int
}
impl WeightedGraph {
fn new(num_nodes: uint) -> WeightedGraph {
let mut adjacency_list: ~Vec<~Vec<Edge>> = box Vec::from_fn(num_nodes, |idx| box Vec::new());
WeightedGraph {
adjacency_list: adjacency_list,
size: num_nodes,
edge_count: 0
}
}
fn add_edge(mut self, source: uint, target: uint, weight: f64) {
self.adjacency_list.get(source).push(Edge { target: target, weight: weight });
self.edge_count += 1;
}
}
But rustc gives me this error:
weightedgraph.rs:24:9: 24:40 error: cannot borrow immutable dereference of `~`-pointer as mutable
weightedgraph.rs:24 self.adjacency_list.get(source).push(Edge { target: target, weight: weight });
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
So, 2 main questions:
1. How can I get the add_edge method to work?
I'm thinking that WeightedGraph is supposed to own all its inner data (please correct me if I'm wrong). But why can add_edge not modify the graph's own data?
2. Is ~Vec<~Vec<Edge>> the correct way to represent a variable-sized array/list that holds a dynamic list in each element?
The tutorial also mentions ~[int] as vector syntax, so should it be: ~[~[Edge]] instead? Or what is the difference between Vec<Edge> and ~[Edge]? And if I'm supposed to use ~[~[Edge]], how would I construct/initialize the inner lists then? (currently, I tried to use Vec::from_fn)
The WeightedGraph does own all its inner data, but even if you own something you have to opt into mutating it. get gives you a & pointer, to mutate you need a &mut pointer. Vec::get_mut will give you that: self.adjacency_list.get_mut(source).push(...).
Regarding ~Vec<Edge> and ~[Edge]: It used to be (until very recently) that ~[T] denoted a growable vector of T, unlike every other type that's written ~... This special case was removed and ~[T] is now just a unique pointer to a T-slice, i.e. an owning pointer to a bunch of Ts in memory without any growth capability. Vec<T> is now the growable vector type.
Note that it's Vec<T>, not ~Vec<T>; the ~ used to be part of the vector syntax but here it's just an ordinary unique pointer and represents completely unnecessary indirection and allocation. You want adjacency_list: Vec<Vec<Edge>>. A Vec<T> is a fully fledged concrete type (a triple data, length, capacity if that means anything to you), it encapsulates the memory allocation and indirection and you can use it as a value. You gain nothing by boxing it, and lose clarity as well as performance.
You have another (minor) issue: fn add_edge(mut self, ...), like fn add_edge(self, ...), means "take self by value". Since the adjacency_list member is a linear type (it can be dropped, it is moved instead of copied implicitly), your WeightedGraph is also a linear type. The following code will fail because the first add_edge call consumed the graph.
let g = WeightedGraph::new(2);
g.add_edge(1, 0, 2); // moving out of g
g.add_edge(0, 1, 3); // error: use of g after move
You want &mut self: Allow mutation of self but don't take ownership of it/don't move it.
get only returns immutable references, you have to use get_mut if you want to modify the data
You only need Vec<Vec<Edge>>, Vec is the right thing to use, ~[] was for that purpose in the past but now means something else (or will, not sure if that is changed already)
You also have to change the signature of add_edge to take &mut self because now you are moving the ownership of self to add_edge and that is not what you want

Resources