performance of functional architecture with modules vs. class types, in F# - functional-programming

I understand that functional architecture is encouraged in F#, but I'm hitting a performance wall that doesn't exist with class types.
I have a state type with a bunch of fields in it and it is passed around a series of functions through the code pipeline.
At every state, when some transformation occurs, a new object is created.
Some example is this:
match ChefHelpers.evaluateOpening t brain with
| Some openOrder ->
info $"open order received: {openOrder.Side}"
match! ExchangeBus.submitOneOrderRetryAsync brain.Exchange openOrder with
| Ok _->
{brain.CookOrder.Instrument.PriceToString openOrder.Price} / sl.{stopLossOrder.Side.Letter} at {brain.CookOrder.Instrument.PriceToString stopLossOrder.Price}"
let m = $"{t.Timestamp.Format}: send open order {openOrder.Side}, last price was {brain.CookOrder.Instrument.PriceToString brain.Analysis.LastPrice}"
return Message m, ({ brain.WithStatus (Cooking MonitorForClose) with OpenOrder = Some openOrder })
| Error e ->
let m = $"{t.Timestamp}: couldn't open position:\n{e.Describe()}"
return! ChefHelpers.cancelAllOrdersAsync brain (ChefError m) (ExchangeError (Other (e.Describe())))
| None ->
return NoMessage, brain
Where the object 'brain' that holds all the states will get passed around, updated, etc
And this works very well when run live since everything may get executed a 2-3 times per second at most.
When I want to run the same code on static data to check behavior, etc, this is a different story because I'm running it millions of times while I'm waiting for it to finish.
All this code is dealing with small lists, doing basic comparisons, arithmetic, etc so the the cost of rebuilding the main object sticks out and becomes painfully apparent.
I tried to rebuild some of that logic as an object type where the state is a bunch of mutable variables and the performance difference is dramatic.
I have a lot of code like this:
type A = { }
let a : A = ...
let a = doSomething1 a
let a = doSomething2 a
let a =
match x with
| true -> doSomething3 a
| false -> a
etc
I'd say the whole tool architecture is built with code that looks like that.
and there is a lot of these:
let a = { a with X = 3 }
but there is no concurrency in the pipeline and it is very linear, so in the case of the last line, if I had a way to tell the compiler: it's the same object, it is not used anywhere else, edit it in place, then the performance would be a lot better.
What could be strategies I could use to keep the code readable, but minimize that issue?
Is the problem the actual data copying? the main object has 18 fields, so I can't imagine it being larger than 200 bytes, allocating space for it? or does it create a lot of garbage collection?
It's not something straightforward to profile since it's a cost that's everywhere and inside dotnet.
Any feedback / ideas would be great, especially "you're doing it wrong, do X instead" :)

From a design perspective, 18 fields is actually a fairly large record, in my opinion. Perhaps you could factor that into sub-records, so you're not constantly re-allocating the entire thing? So instead of this:
type A =
{
X : int
Field1 : int
Field2 : int
...
Field18 : int
}
You could have this instead:
type Sub1 =
{
Field1 : int
...
Field9 : int
}
type Sub2 =
{
Field10 : int
...
Field18 : int
}
type A =
{
X : int
Sub1 : Sub1
Sub2 : Sub2
}
Then the performance cost of let a = { a with X = 3 } would presumably be less.
Bonus idea: You might also want to emulate the cool Haskell kids, and look into lenses, which are designed specifically for reading and updating parts of immutable data.

Related

Canonical way to represent idea of sum type of records that all extend a "base" record

I'm new to PureScript. I was searching for sealed classes in Purescript to get an idea of how one would implement this, but I don't think I have the necessary PS jargon yet.
What is the canonical way in PureScript to have a bunch of records that extend a "base" record, but then have one sum type representing a "sealed" collection of those.
Something like in Kotlin,
sealed class Glazing(val name: String, val area: Int) {
class Window(val name: String, val area: Int, val frameMaterial: String): BaseGlazing(name, area)
class Door(val name: String, val area: Int, val isPreHung: Boolean): BaseGlazing(name, area)
}
in TypeScript, you'd probably do something like
interface BaseGlazing { ... }
interface Door extends BaseGlazing { ... }
interface Window extends BaseGlazing { ... }
type Glazing = Door | Window
and then you'd either take `A extends BaseGlazing` or `Glazing` (and use type guards) to do either of those two above functions.
Essentially I want a base class (that is technically abstract), things that extend it, and then a sum type/discriminated union of the extensions so that way I can both write, say, changeName:: Glazing -> Glazing (premised on the base class having a name prop) but also do something like calculateTotalLightPenetration :: Array Glazing -> Number (premised on the discriminated union being one of Door or Window since light penetration will be a different formula for doors vs windows)
The idea of "inheritance" (aka "is-a" relationship) is technically possible to model in PureScript, but it's hard and awkward. And there is a good reason for it: inheritance is almost never (and I am tempted to say "never, period") the most convenient, efficient, or reliable way of modeling the domain. Even OOP apologists tend to recommend aggregation over inheritance these days.
One useful thing to observe is that you don't actually need inheritance. What you need is to solve some specific problem in your domain, and inheritance is just a solution that you naturally reach for, which is probably informed by your past experience.
And this leads us to an insight: the particular way to model whatever it is you're modeling would depend on what the actual problem is. Chances are, PureScript has a different mechanism for modeling that.
But if I base my thinking on the specifics you gave in your question (i.e. the changeName and calculateTotalLightPenetration functions), I would model it via aggregation: the "glazing" would be the surrounding type, and it would have, as one of its parts, the specific kind of glazing. This would look something like this:
type Glazing = { name :: String, area :: Int, kind :: GlazingKind }
data GlazingKind = Window { frameMaterial :: String } | Door { isPreHung :: Boolean }
changeName :: Glazing -> Glazing
changeName g = g { name = "new name" }
calculateTotalLightPenetration :: Array Glazing -> Number
calculateTotalLightPenetration gs = sum $ individualPenetration <$> gs
where
individualPenetration g = case g.kind of
Door _ -> 0.3
Window _ -> 0.5

Kotlin - very frequent data removal and addition to a list causes npe

I've a buffer that is actually ArrayList<Object>.
Happens async:
This buffer list changes very frequently - I mean 15-50 times in single second and the idea is that whenever there's an update, I remove first element by position buffer.removeAt(0) and add new value in the end by buffer.add(new).
At some point I call a function that goes and do calculation with buffer list. What I do is I go through the list - element by element. At some point I run into NPE as the the element has been removed async.
How to solve this NPE? I was thinking of making deep copy, but making deep copy would mean to go through the buffer list and do some data allocation, which basically means that while I do deep copy I can still run into NPE.
How problems like these are solved?
How to solve NPE?
What would be more optimized way as this is gonna consume a lot of memory?
Code:
private fun observeFrequentData() {
frequentData.observe(owner, Observer { data ->
if (accelerationData == null) return#Observer
GlobalScope.launch {
val a = data[0].toDouble()
val b = data[1].toDouble()
val c = a + b
val timestamp = System.currentTimeMillis()
val customObj = CustomObj(c, timestamp)
if (buffer.size >= 5000) {
buffer.removeAt(0)
}
buffer.add(acceleration)
}
})
}
fun getBuffer() {
val mappedData = buffer.map { it.smth } // NPE, it == null
}
If you are doing lots of removing from 0, and insert at the end. Then ArrayList is probably not the container to use.
you can consider using a LinkedList .
buffer.removeFirst();
and
buffer.add(acceleration);
also note the following comments regarding synchronization.
Note that this implementation is not synchronized. If multiple threads
access a linked list concurrently, and at least one of the threads
modifies the list structurally, it must be synchronized externally. (A
structural modification is any operation that adds or deletes one or
more elements; merely setting the value of an element is not a
structural modification.) This is typically accomplished by
synchronizing on some object that naturally encapsulates the list. If
no such object exists, the list should be "wrapped" using the
Collections.synchronizedList method. This is best done at creation
time, to prevent accidental unsynchronized access to the list:
List list = Collections.synchronizedList(new LinkedList(...));
Using the synchronized keyword on your piece of code as #patrickf suggested.
To take care of performance, instead of making the method call itself synchronized, you can just write the 3 "buffer" related lines of code (size, removeAt and add) in a synchronized block.
Something like;
.
.
.
synchronized {
if (buffer.size >= 5000) {
buffer.removeAt(0)
}
buffer.add(acceleration)
}
}
})
Hope this helps!

"exponentially large number of cases" errors in latest Flow with common spread pattern

I frequently use the following pattern to create objects with null/undefined properties omitted:
const whatever = {
something: true,
...(a ? { a } : null),
...(b ? { b } : null),
};
As of flow release v0.112, this leads to the error message:
Computing object literal [1] may lead to an exponentially large number of cases to reason about because conditional [2] and conditional [3] are both unions. Please use at most one union type per spread to simplify reasoning about the spread result. You may be able to get rid of a union by specifying a more general type that captures all of the branches of the union.
It sounds to me like this isn't really a type error, just that Flow is trying to avoid some heavier computation. This has led to dozens of flow errors in my project that I need to address somehow. Is there some elegant way to provide better type information for these? I'd prefer not to modify the logic of the code, I believe that it works the way that I need it to (unless someone has a more elegant solution here as well). Asking here before I resort to // $FlowFixMe for all of these.
Complete example on Try Flow
It's not as elegant to write, and I think Flow should handle the case that you've shown, but if you still want Flow to type check it you could try rewriting it like this:
/* #flow */
type A = {
cat: number,
};
type B = {
dog: string,
}
type Built = {
something: boolean,
a?: A,
b?: B,
};
function buildObj(a?: ?A, b?: ?B): Built {
const ret: Built = {
something: true
};
if(a) ret.a = a
if(b) ret.b = b
return ret;
}
Try Flow

How do I convert the "largest value in a Vec" example in the Rust book to not use the Copy trait?

I'm trying to accomplish an exercise "left to the reader" in the 2018 Rust book. The example they have, 10-15, uses the Copy trait. However, they recommend implementing the same without Copy and I've been really struggling with it.
Without Copy, I cannot use largest = list[0]. The compiler recommends using a reference instead. I do so, making largest into a &T. The compiler then complains that the largest used in the comparison is a &T, not T, so I change it to *largest to dereference the pointer. This goes fine, but then stumbles on largest = item, with complaints about T instead of &T. I switch to largest = &item. Then I get an error I cannot deal with:
error[E0597]: `item` does not live long enough
--> src/main.rs:6:24
|
6 | largest = &item;
| ^^^^ borrowed value does not live long enough
7 | }
8 | }
| - borrowed value only lives until here
|
note: borrowed value must be valid for the anonymous lifetime #1 defined on the function body at 1:1...
I do not understand how to lengthen the life of this value. It lives and dies in the list.iter(). How can I extend it while still only using references?
Here is my code for reference:
fn largest<T: PartialOrd>(list: &[T]) -> &T {
let mut largest = &list[0];
for &item in list.iter() {
if item > *largest {
largest = &item;
}
}
largest
}
When you write for &item, this destructures each reference returned by the iterator, making the type of item T. You don't want to destructure these references, you want to keep them! Otherwise, when you take a reference to item, you are taking a reference to a local variable, which you can't return because local variables don't live long enough.
fn largest<T: PartialOrd>(list: &[T]) -> &T {
let mut largest = &list[0];
for item in list.iter() {
if item > largest {
largest = item;
}
}
largest
}
Note also how we can compare references directly, because references to types implementing PartialOrd also implement PartialOrd, deferring the comparison to their referents (i.e. it's not a pointer comparison, unlike for raw pointers).

OCaml: Does storing some values to be used later introduce "side effects"?

For a homework assignment, we've been instructed to complete a task without introducing any "side-effects". I've looked up "side-effects" on Wikipedia, and though I get that in theory it means "modifies a state or has an observable interaction with calling functions", I'm having trouble figuring out specifics.
For example, would creating a value that holds a non-compile time result be introducing side effects?
Say I had (might not be syntactically perfect):
val myList = (someFunction x y);;
if List.exists ((=) 7) myList then true else false;;
Would this introduce side-effects? I guess maybe I'm confused on what "modifies a state" means in the definition of side-effects.
No; a side-effect refers to e.g. mutating a ref cell with the assignment operator :=, or other things where the value referred to by a name changes over time. In this case, myList is an immutable value that never changes during the program, thus it is effect-free.
See also
http://en.wikipedia.org/wiki/Referential_transparency_(computer_science)
A good way to think about it is "have I changed anything which any later code (including running this same function again later) could ever possibly see other than the value I'm returning?" If so, that's a side effect. If not, then you can know that there isn't one.
So, something like:
let inc_nosf v = v+1
has no side effects because it just returns a new value which is one more than an integer v. So if you run the following code in the ocaml toplevel, you get the corresponding results:
# let x = 5;;
val x : int = 5
# inc_nosf x;;
- : int = 6
# x;;
- : int = 5
As you can see, the value of x didn't change. So, since we didn't save the return value, then nothing really got incremented. Our function itself only modifies the return value, not x itself. So to save it into x, we'd have to do:
# let x = inc_nosf x;;
val x : int = 6
# x;;
- : int = 6
Since the inc_nosf function has no side effects (that is, it only communicates with the outside world using its return value, not by making any other changes).
But something like:
let inc_sf r = r := !r+1
has side effects because it changes the value stored in the reference represented by r. So if you run similar code in the top level, you get this, instead:
# let y = ref 5;;
val y : int ref = {contents = 5}
# inc_sf y;;
- : unit = ()
# y;;
- : int ref = {contents = 6}
So, in this case, even though we still don't save the return value, it got incremented anyway. That means there must have been changes to something other than the return value. In this case, that change was the assignment using := which changed the stored value of the ref.
As a good rule of thumb, in Ocaml, if you avoid using refs, records, classes, strings, arrays, and hash tables, then you will avoid any risk of side effects. Although you can safely use string literals as long as you avoid modifying the string in place using functions like String.set or String.fill. Basically, any function which can modify a data type in place will cause a side effect.

Resources