Canonical way to represent idea of sum type of records that all extend a "base" record - functional-programming

I'm new to PureScript. I was searching for sealed classes in Purescript to get an idea of how one would implement this, but I don't think I have the necessary PS jargon yet.
What is the canonical way in PureScript to have a bunch of records that extend a "base" record, but then have one sum type representing a "sealed" collection of those.
Something like in Kotlin,
sealed class Glazing(val name: String, val area: Int) {
class Window(val name: String, val area: Int, val frameMaterial: String): BaseGlazing(name, area)
class Door(val name: String, val area: Int, val isPreHung: Boolean): BaseGlazing(name, area)
}
in TypeScript, you'd probably do something like
interface BaseGlazing { ... }
interface Door extends BaseGlazing { ... }
interface Window extends BaseGlazing { ... }
type Glazing = Door | Window
and then you'd either take `A extends BaseGlazing` or `Glazing` (and use type guards) to do either of those two above functions.
Essentially I want a base class (that is technically abstract), things that extend it, and then a sum type/discriminated union of the extensions so that way I can both write, say, changeName:: Glazing -> Glazing (premised on the base class having a name prop) but also do something like calculateTotalLightPenetration :: Array Glazing -> Number (premised on the discriminated union being one of Door or Window since light penetration will be a different formula for doors vs windows)

The idea of "inheritance" (aka "is-a" relationship) is technically possible to model in PureScript, but it's hard and awkward. And there is a good reason for it: inheritance is almost never (and I am tempted to say "never, period") the most convenient, efficient, or reliable way of modeling the domain. Even OOP apologists tend to recommend aggregation over inheritance these days.
One useful thing to observe is that you don't actually need inheritance. What you need is to solve some specific problem in your domain, and inheritance is just a solution that you naturally reach for, which is probably informed by your past experience.
And this leads us to an insight: the particular way to model whatever it is you're modeling would depend on what the actual problem is. Chances are, PureScript has a different mechanism for modeling that.
But if I base my thinking on the specifics you gave in your question (i.e. the changeName and calculateTotalLightPenetration functions), I would model it via aggregation: the "glazing" would be the surrounding type, and it would have, as one of its parts, the specific kind of glazing. This would look something like this:
type Glazing = { name :: String, area :: Int, kind :: GlazingKind }
data GlazingKind = Window { frameMaterial :: String } | Door { isPreHung :: Boolean }
changeName :: Glazing -> Glazing
changeName g = g { name = "new name" }
calculateTotalLightPenetration :: Array Glazing -> Number
calculateTotalLightPenetration gs = sum $ individualPenetration <$> gs
where
individualPenetration g = case g.kind of
Door _ -> 0.3
Window _ -> 0.5

Related

performance of functional architecture with modules vs. class types, in F#

I understand that functional architecture is encouraged in F#, but I'm hitting a performance wall that doesn't exist with class types.
I have a state type with a bunch of fields in it and it is passed around a series of functions through the code pipeline.
At every state, when some transformation occurs, a new object is created.
Some example is this:
match ChefHelpers.evaluateOpening t brain with
| Some openOrder ->
info $"open order received: {openOrder.Side}"
match! ExchangeBus.submitOneOrderRetryAsync brain.Exchange openOrder with
| Ok _->
{brain.CookOrder.Instrument.PriceToString openOrder.Price} / sl.{stopLossOrder.Side.Letter} at {brain.CookOrder.Instrument.PriceToString stopLossOrder.Price}"
let m = $"{t.Timestamp.Format}: send open order {openOrder.Side}, last price was {brain.CookOrder.Instrument.PriceToString brain.Analysis.LastPrice}"
return Message m, ({ brain.WithStatus (Cooking MonitorForClose) with OpenOrder = Some openOrder })
| Error e ->
let m = $"{t.Timestamp}: couldn't open position:\n{e.Describe()}"
return! ChefHelpers.cancelAllOrdersAsync brain (ChefError m) (ExchangeError (Other (e.Describe())))
| None ->
return NoMessage, brain
Where the object 'brain' that holds all the states will get passed around, updated, etc
And this works very well when run live since everything may get executed a 2-3 times per second at most.
When I want to run the same code on static data to check behavior, etc, this is a different story because I'm running it millions of times while I'm waiting for it to finish.
All this code is dealing with small lists, doing basic comparisons, arithmetic, etc so the the cost of rebuilding the main object sticks out and becomes painfully apparent.
I tried to rebuild some of that logic as an object type where the state is a bunch of mutable variables and the performance difference is dramatic.
I have a lot of code like this:
type A = { }
let a : A = ...
let a = doSomething1 a
let a = doSomething2 a
let a =
match x with
| true -> doSomething3 a
| false -> a
etc
I'd say the whole tool architecture is built with code that looks like that.
and there is a lot of these:
let a = { a with X = 3 }
but there is no concurrency in the pipeline and it is very linear, so in the case of the last line, if I had a way to tell the compiler: it's the same object, it is not used anywhere else, edit it in place, then the performance would be a lot better.
What could be strategies I could use to keep the code readable, but minimize that issue?
Is the problem the actual data copying? the main object has 18 fields, so I can't imagine it being larger than 200 bytes, allocating space for it? or does it create a lot of garbage collection?
It's not something straightforward to profile since it's a cost that's everywhere and inside dotnet.
Any feedback / ideas would be great, especially "you're doing it wrong, do X instead" :)
From a design perspective, 18 fields is actually a fairly large record, in my opinion. Perhaps you could factor that into sub-records, so you're not constantly re-allocating the entire thing? So instead of this:
type A =
{
X : int
Field1 : int
Field2 : int
...
Field18 : int
}
You could have this instead:
type Sub1 =
{
Field1 : int
...
Field9 : int
}
type Sub2 =
{
Field10 : int
...
Field18 : int
}
type A =
{
X : int
Sub1 : Sub1
Sub2 : Sub2
}
Then the performance cost of let a = { a with X = 3 } would presumably be less.
Bonus idea: You might also want to emulate the cool Haskell kids, and look into lenses, which are designed specifically for reading and updating parts of immutable data.

What's a good pattern to manage impossible states in Elm?

Maybe you can help. I'm an Elm beginner and I'm struggling with a rather mundane problem. I'm quite excited with Elm and I've been rather successful with smaller things, so now I tried something more complex but I just can't seem to get my head around it.
I'm trying to build something in Elm that uses a graph-like underlying data structure. I create the graph with a fluent/factory pattern like this:
sample : Result String MyThing
sample =
MyThing.empty
|> addNode 1 "bobble"
|> addNode 2 "why not"
|> addEdge 1 2 "some data here too"
When this code returns Ok MyThing, then the whole graph has been set up in a consistent manner, guaranteed, i.e. all nodes and edges have the required data and the edges for all nodes actually exist.
The actual code has more complex data associated with the nodes and edges but that doesn't matter for the question. Internally, the nodes and edges are stored in the Dict Int element.
type alias MyThing =
{ nodes : Dict Int String
, edges : Dict Int { from : Int, to : Int, label : String }
}
Now, in the users of the module, I want to access the various elements of the graph. But whenever I access one of the nodes or edges with Dict.get, I get a Maybe. That's rather inconvenient because by the virtue of my constructor code I know the indexes exist etc. I don't want to clutter upstream code with Maybe and Result when I know the indexes in an edge exist. To give an example:
getNodeTexts : Edge -> MyThing -> Maybe (String, String)
getNodeTexts edge thing =
case Dict.get edge.from thing.nodes of
Nothing ->
--Yeah, actually this can never happen...
Nothing
Just fromNode -> case Dict.get edge.to thing.nodes of
Nothing ->
--Again, this can never actually happen because the builder code prevents it.
Nothing
Just toNode ->
Just ( fromNode.label, toNode.label )
That's just a lot of boilerplate code to handle something I specifically prevented in the factory code. But what's even worse: Now the consumer needs extra boilerplate code to handle the Maybe--potentially not knowing that the Maybe will actually never be Nothing. The API is sort of lying to the consumer. Isn't that something Elm tries to avoid? Compare to the hypothetical but incorrect:
getNodeTexts : Edge -> MyThing -> (String, String)
getNodeTexts edge thing =
( Dict.get edge.from thing.nodes |> .label
, Dict.get edge.to thing.nodes |> .label
)
An alternative would be not to use Int IDs but use the actual data instead--but then updating things gets very tedious as connectors can have many edges. Managing state without the decoupling through Ints just doesn't seem like a good idea.
I feel there must be a solution to this dilemma using opaque ID types but I just don't see it. I would be very grateful for any pointers.
Note: I've also tried to use both drathier and elm-community elm-graph libraries but they don't address the specific question. They rely on Dict underneath as well, so I end up with the same Maybes.
There is no easy answer to your question. I can offer one comment and a coding suggestion.
You use the magic words "impossible state" but as OOBalance has pointed out, you can create an impossible state in your modelling. The normal meaning of "impossible state" in Elm is precisely in relation to modelling e.g. when you use two Bools to represent 3 possible states. In Elm you can use a custom type for this and not leave one combination of bools in your code.
As for your code, you can reduce its length (and perhaps complexity) with
getNodeTexts : Edge -> MyThing -> Maybe ( String, String )
getNodeTexts edge thing =
Maybe.map2 (\ n1 n2 -> ( n1.label, n2.label ))
(Dict.get edge.from thing.nodes)
(Dict.get edge.to thing.nodes)
From your description, it looks to me like those states actually aren't impossible.
Let's start with your definition of MyThing:
type alias MyThing =
{ nodes : Dict Int String
, edges : Dict Int { from : Int, to : Int, label : String }
}
This is a type alias, not a type – meaning the compiler will accept MyThing in place of {nodes : Dict Int String, edges : Dict Int {from : Int, to : Int, label : String}} and vice-versa.
So rather than construct a MyThing value safely using your factory functions, I can write:
import Dict
myThing = { nodes = Dict.empty, edges = Dict.fromList [(0, {from = 0, to = 1, label = "Edge 0"})] }
… and then pass myThing to any of your functions expecting MyThing, even though the nodes connected by Edge 0 aren't contained in myThing.nodes.
You can fix this by changing MyThing to be a custom type:
type MyThing
= MyThing { nodes : Dict Int String
, edges : Dict Int { from : Int, to : Int, label : String }
}
… and exposing it using exposing (MyThing) rather than exposing (MyThing(..)). That way, no constructor for MyThing is exposed, and code outside of your module must use the factory functions to obtain a value.
The same applies to Edge, wich I'm assuming is defined as:
type alias Edge =
{ from : Int, to : Int, label : String }
Unless it is changed to a custom type, it is trivial to construct arbitrary Edge values:
type Edge
= Edge { from : Int, to : Int, label : String }
Then however, you will need to expose some functions to obtain Edge values to pass to functions like getNodeTexts. Let's assume I have obtained a MyThing and one of its edges:
myThing : MyThing
-- created using factory functions
edge : Edge
-- an edge of myThing
Now I create another MyThing value, and pass it to getNodeTexts along with edge:
myOtherThing : MyThing
-- a different value of type MyThing
nodeTexts = getNodeTexts edge myOtherThing
This should return Maybe.Nothing or Result.Err String, but certainly not (String, String) – the edge does not belong to myOtherThing, so there is no guarantee its nodes are contained in it.

How do I convert the "largest value in a Vec" example in the Rust book to not use the Copy trait?

I'm trying to accomplish an exercise "left to the reader" in the 2018 Rust book. The example they have, 10-15, uses the Copy trait. However, they recommend implementing the same without Copy and I've been really struggling with it.
Without Copy, I cannot use largest = list[0]. The compiler recommends using a reference instead. I do so, making largest into a &T. The compiler then complains that the largest used in the comparison is a &T, not T, so I change it to *largest to dereference the pointer. This goes fine, but then stumbles on largest = item, with complaints about T instead of &T. I switch to largest = &item. Then I get an error I cannot deal with:
error[E0597]: `item` does not live long enough
--> src/main.rs:6:24
|
6 | largest = &item;
| ^^^^ borrowed value does not live long enough
7 | }
8 | }
| - borrowed value only lives until here
|
note: borrowed value must be valid for the anonymous lifetime #1 defined on the function body at 1:1...
I do not understand how to lengthen the life of this value. It lives and dies in the list.iter(). How can I extend it while still only using references?
Here is my code for reference:
fn largest<T: PartialOrd>(list: &[T]) -> &T {
let mut largest = &list[0];
for &item in list.iter() {
if item > *largest {
largest = &item;
}
}
largest
}
When you write for &item, this destructures each reference returned by the iterator, making the type of item T. You don't want to destructure these references, you want to keep them! Otherwise, when you take a reference to item, you are taking a reference to a local variable, which you can't return because local variables don't live long enough.
fn largest<T: PartialOrd>(list: &[T]) -> &T {
let mut largest = &list[0];
for item in list.iter() {
if item > largest {
largest = item;
}
}
largest
}
Note also how we can compare references directly, because references to types implementing PartialOrd also implement PartialOrd, deferring the comparison to their referents (i.e. it's not a pointer comparison, unlike for raw pointers).

Mutable vectors in struct

I'm trying to get a graph clustering algorithm to work in Rust. Part of the code is a WeightedGraph data structure with an adjacency list representation. The core would be represented like this (shown in Python to make it clear what I'm trying to do):
class Edge(object):
def __init__(self, target, weight):
self.target = target
self.weight = weight
class WeightedGraph(object):
def __init__(self, initial_size):
self.adjacency_list = [[] for i in range(initial_size)]
self.size = initial_size
self.edge_count = 0
def add_edge(self, source, target, weight):
self.adjacency_list[source].append(Edge(target, weight))
self.edge_count += 1
So, the adjacency list holds an array of n arrays: one array for each node in the graph. The inner array holds the neighbors of that node, represented as Edge (the target node number and the double weight).
My attempt to translate the whole thing to Rust looks like this:
struct Edge {
target: uint,
weight: f64
}
struct WeightedGraph {
adjacency_list: ~Vec<~Vec<Edge>>,
size: uint,
edge_count: int
}
impl WeightedGraph {
fn new(num_nodes: uint) -> WeightedGraph {
let mut adjacency_list: ~Vec<~Vec<Edge>> = box Vec::from_fn(num_nodes, |idx| box Vec::new());
WeightedGraph {
adjacency_list: adjacency_list,
size: num_nodes,
edge_count: 0
}
}
fn add_edge(mut self, source: uint, target: uint, weight: f64) {
self.adjacency_list.get(source).push(Edge { target: target, weight: weight });
self.edge_count += 1;
}
}
But rustc gives me this error:
weightedgraph.rs:24:9: 24:40 error: cannot borrow immutable dereference of `~`-pointer as mutable
weightedgraph.rs:24 self.adjacency_list.get(source).push(Edge { target: target, weight: weight });
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
So, 2 main questions:
1. How can I get the add_edge method to work?
I'm thinking that WeightedGraph is supposed to own all its inner data (please correct me if I'm wrong). But why can add_edge not modify the graph's own data?
2. Is ~Vec<~Vec<Edge>> the correct way to represent a variable-sized array/list that holds a dynamic list in each element?
The tutorial also mentions ~[int] as vector syntax, so should it be: ~[~[Edge]] instead? Or what is the difference between Vec<Edge> and ~[Edge]? And if I'm supposed to use ~[~[Edge]], how would I construct/initialize the inner lists then? (currently, I tried to use Vec::from_fn)
The WeightedGraph does own all its inner data, but even if you own something you have to opt into mutating it. get gives you a & pointer, to mutate you need a &mut pointer. Vec::get_mut will give you that: self.adjacency_list.get_mut(source).push(...).
Regarding ~Vec<Edge> and ~[Edge]: It used to be (until very recently) that ~[T] denoted a growable vector of T, unlike every other type that's written ~... This special case was removed and ~[T] is now just a unique pointer to a T-slice, i.e. an owning pointer to a bunch of Ts in memory without any growth capability. Vec<T> is now the growable vector type.
Note that it's Vec<T>, not ~Vec<T>; the ~ used to be part of the vector syntax but here it's just an ordinary unique pointer and represents completely unnecessary indirection and allocation. You want adjacency_list: Vec<Vec<Edge>>. A Vec<T> is a fully fledged concrete type (a triple data, length, capacity if that means anything to you), it encapsulates the memory allocation and indirection and you can use it as a value. You gain nothing by boxing it, and lose clarity as well as performance.
You have another (minor) issue: fn add_edge(mut self, ...), like fn add_edge(self, ...), means "take self by value". Since the adjacency_list member is a linear type (it can be dropped, it is moved instead of copied implicitly), your WeightedGraph is also a linear type. The following code will fail because the first add_edge call consumed the graph.
let g = WeightedGraph::new(2);
g.add_edge(1, 0, 2); // moving out of g
g.add_edge(0, 1, 3); // error: use of g after move
You want &mut self: Allow mutation of self but don't take ownership of it/don't move it.
get only returns immutable references, you have to use get_mut if you want to modify the data
You only need Vec<Vec<Edge>>, Vec is the right thing to use, ~[] was for that purpose in the past but now means something else (or will, not sure if that is changed already)
You also have to change the signature of add_edge to take &mut self because now you are moving the ownership of self to add_edge and that is not what you want

How should I implement a Cayley Table in Haskell?

I'm interested in generalizing some computational tools to use a Cayley Table, meaning a lookup table based multiplication operation.
I could create a minimal implementation as follows :
date CayleyTable = CayleyTable {
ct_name :: ByteString,
ct_products :: V.Vector (V.Vector Int)
} deriving (Read, Show)
instance Eq (CayleyTable) where
(==) a b = ct_name a == ct_name b
data CTElement = CTElement {
ct_cayleytable :: CayleyTable,
ct_index :: !Int
}
instance Eq (CTElement) where
(==) a b = assert (ct_cayleytable a == ct_cayleytable b) $
ct_index a == ct_index b
instance Show (CTElement) where
show = ("CTElement" ++) . show . ctp_index
a **** b = assert (ct_cayleytable a == ct_cayleytable b) $
((ct_cayleytable a) ! a) ! b
There are however numerous problems with this approach, starting with the run time type checking via ByteString comparisons, but including the fact that read cannot be made to work correctly. Any idea how I should do this correctly?
I could imagine creating a family of newtypes CTElement1, CTElement2, etc. for Int with a CTElement typeclass that provides the multiplication and verifies their type consistency, except when doing IO.
Ideally, there might be some trick for passing around only one copy of this ct_cayleytable pointer too, perhaps using an implicit parameter like ?cayleytable, but this doesn't play nicely with multiple incompatible Cayley tables and gets generally obnoxious.
Also, I've gathered that an index into a vector can be viewed as a comonad. Is there any nice comonad instance for vector or whatever that might help smooth out this sort of type checking, even if ultimately doing it at runtime?
You thing you need to realize is that Haskell's type checker only checks types. So your CaleyTable needs to be a class.
class CaleyGroup g where
caleyTable :: g -> CaleyTable
... -- Any operations you cannot implement soley by knowing the caley table
data CayleyTable = CayleyTable {
...
} deriving (Read, Show)
If the caleyTable isn't known at compile time you have to use rank-2 types. Since the complier needs to enforce the invariant that the CaleyTable exists, when your code uses it.
manipWithCaleyTable :: Integral i => CaleyTable -> i -> (forall g. CaleyGroup g => g -> g) -> a
can be implemented for example. It allows you to perform group operations on the CaleyTable. It works by combining i and CaleyTable to make a new type it passes to its third argument.

Resources