How does Immutability leads to Scalability? - functional-programming

Can anyone give a deep insight into how immutability can lead to scalabilty? Or is it a misconception?

It doesn't, at least not really.
Immutability lets you make assumptions about what you can do with the data/memory/etc. and that can give you solutions that are more scalable. For example, if you have a bunch of immutable sets of data as inputs for a function you have a deterministic results which allows for caching and easy parallelisation. The entire story is a bit more complicated but that's the basic idea.
You should note that caching and parallelisation are not exclusive for immutable data, it just makes it easier to implement.

Related

Global State in Functional Programming (F#)

I want to compute some functions which are dependent on some variables (specific data on which I run the code) and global variables, which are unlikely to be changed, but I want to leave them user-tunable. Just to clarify with an example, suppose I want to declare the following function:
let multiplyByGain x =
x * gain
Where would you declare gain, being gain a global constant for the whole project. In a separate module with constants? That would couple the module with this code, though. Or would you use a curried version:
let multiblyByGain x gain =
x * gain
and then specialize for the specific values? But suppose you have many functions like that, you will have to inject gain to all of them (in a sort of linking module)?
In my specific problem this becomes more cumbersome because both x and gain are arrays which must have the same length, suppose I have to do a Array.zip, e.g.: what is the best practice in terms of functional design to address a global constant, as gain, in a general way?
P.S.: I have found this old postenter link description here, but addresses only a specific problem.
There is no single correct answer to the question and the best approach will depend on a variety of other constraints and requirements that you have. Also, it depends on whether you are asking specifically about F# or whether you are asking about functional programming more generally. I think there are three main points:
Keeping it simple.
Using a module that exposes gain as a global value, which has some initialization code to read configuration seems like a good default approach in F#. If this is changed only rarely (say, before you run the whole computation), then mutation is not going to cause you any troubles. You just need to be careful to avoid changing the values while some computation is still running. I think most F# programmers code tend to be quite pragmatic about this and this seems like the easiest thing to start with.
Unit testing.
If you want to unit ytest your multiplyByGain function with different gain as an argument, then you'll need some way of passing different values of gain to the function from your unit tests. In this case, having it as an additional parameter and using currying is nice, because you can just call it with other values of gain from your tests.
Functional programming.
Some functional language communities (especially Haskell and, sometimes, Scala) are way more strict about state. The purely functional way of keeping state would be to use monads (either the reader monad or some kind of free monad structure). This makes your code a lot more complicated (both conceptually and in terms of extra syntactic overhead), but it is a purely functional solution that eliminates state. In F#, this kind of approach is even more cumbersome, so it's not very common.

What is the need for immutable/persistent data structures in erlang

Each Erlang process maintains its own private address space. All communication happens via copying without sharing (except big binaries). If each process is processing one message at a time with no concurrent access over its objects, I don't see why do we need immutable/persistent data structures.
Erlang was initially implemented in Prolog, which doesn't really use mutable data structures either (though some dialects do). So it started off without them. This makes runtime implementation simpler and faster (garbage collection in particular).
So adding mutable data structures would require a lot of effort, could introduce bugs, and Erlang programmers are nearly by definition at least willing to live without them.
Many actually consider their absence to be a positive good: less concern about object identity, no need for defensive copying because you don't know whether some other piece of code is going to modify the data you passed (or might be changed later to modify it), etc.
This absence does mean that Erlang is pretty unusable in some domains (e.g. high performance scientific computing), at least as the main language. But again, this means that nobody in these domains is going to use Erlang in the first place and so there's no particular incentive to make it usable at the cost of making existing users unhappy.
I remember seeing a mailing list post by Joe Armstrong quite a long time ago (which I couldn't find with a quick search now) saying that he initially planned to add mutable variables when he'd need them... except he never quite did, and performance was good enough for everything he was using Erlang for.
It is indeed the case that in Erlang immutability does not solve any "shared state" problems, as immutable data are "process local".
From the functional programming language perspective, however, immutability offers a number of benefits, summarized adequately in this Quora answer:
The simplest definition of functional programming is that it’s a programming
paradigm where you are transforming immutable data with functions.
The definition uses functions in the mathematical sense, where it’s
something that takes an input, and produces an output.
OO + mutability tends to violate that definition because when you want
to change a piece of data it generally will not return the output, it
will likely return void or unit, and that when you call a method on
the object the object itself isn’t input for the function.
As far as what advantages the paradigm has, composability, thread
safety, being able to track what went wrong where better, the ability
to sort of separate the data from the actual computation on it being
done, etc.
how would this work?
factorial(1) -> 1;
factorial(X) ->
X*factorial(X-1).
if you run factorial(4), a single process will be running the same function. Each time the function will have it's own value of X, if the value of X was in the scope of the process and not the function recursive functions wouldn't work. So first we need to understand scope. If you want to say that you don't see why data needs to be immutable within the scope of a single function/block you would have a point, but it would be a headache to think about where data is immutable and where it isn't.

Why is functional programming good? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I've noticed that there are certain core concepts that a lot of functional programming fanatics cling to:
Avoiding state
Avoiding mutable data
Minimizing side effects
etc...
I'm not just wondering what other things make functional programming, but why these core ideas are good? Why is it good to avoid state, and the rest?
The simple answer is that if you don't have extra state to worry about, your code is simpler to reason about. Simpler code is easier to maintain. You don't need to worry about things outside a particular piece of code (like a function) to modify it. This has really useful ramifications for things like testing. If your code does not depend on some state, it becomes much easier to create automated tests for that code, since you do not need to worry about initializing some state.
Having stateless code makes it simpler to create threaded programs as well, since you don't need to worry about two threads of execution modifying/reading a shared piece of data at the same time. Your threads can run independent code, and this can save loads of development time.
Essentially, avoiding state creates simpler programs. In a way, there's less "moving parts" (i.e., ways lines of code can interact), so this will generally mean that the code is more reliable and contains less faults. Basically, the simpler the code, the less can go wrong. To me this is the essence of writing state-less code.
There are plenty of other reasons to create stateless, "functional" code, but they all boil down to simplicity for me.
In addition to what #Oleksi said, there is another important thing: referential transparency and transactional data structures. Of course, you do not need a functional programming language to do so, but it's a bit easier with them.
Purely functional data structures are guaranteed to remain the same - if one function returned a tree, it will always be the same tree, and all the further transforms would create new copies of it. It's much easier to backtrack to any previous version of a data structure this way, which is important for many essential algorithms.
Very generally, functional programming means:
encouraging the use of (first-class) functions
discouraging the use of (mutable) state
Why is mutation a problem? Think about it: mutation is to data structures what goto is to control flow. I.e., it allows you to arbitrarily "jump" to something completely different in a rather unstructured manner. Consequently, it is occasionally useful, but most of the time rather harmful to readability, testability, and compositionality.
One typical functional feature is "no subtyping". While it sounds a little bit odd to call this a feature, it is, for two (somehow related) reasons:
Subtyping relationships lead to a bunch of not-so-obvious problems. If you don't limit yourself to single or mixin inheritance, you end up with the diamond problem. More important is that you have to deal with variance (covariance, contravariance, invariance), which quickly becomes a nightmare, especially for type parameters (a.k.a. generics). There are several more reasons, and even in OO languages you hear statements like "prefer composition over inheritance".
On the other hand, if you simply leave out subtyping, you can reason much more detailled about your type system, which leads to the possibility to have (almost) full type inference, usually implemented using extensions of Hindley Milner type inference.
Of course sometimes you'll miss subtyping, but languages like Haskell have found a good answer to that problem: Type classes, which allow to define a kind of common "interface" (or "set of common operations") for several otherwise unrelated types. The difference to OO languages is that type classes can be defined "afterwards", without touching the original type definitions. It turns out that you can do almost everything with type classes that you can do with subtyping, but in a much more flexible way (and without preventing type inference). That's why other languages start to employ similar mechnisms (e.g. implicit conversions in Scala or extension methods in C# and Java 8)

How to identify that code is over abstracted?

What should be the measures that should be used to identify that code is over abstracted and very hard to understand and what should be done to reduce over abstraction?
"Simplicity over complexity, complexity over complicatedness"
So - there's a benefit to abstract something only if You are "de-leveling" complicatedness to complexity. Reasons to do that can vary: better modularity, better encapsulation etc.
Identifying over abstraction is a chicken and egg problem. In order to reduce over abstraction You need to understand actual reason behind code lines. That includes understanding idea of particular abstraction itself (in contrast to calling it over abstracted cause of lack of understanding). And that's not enough - You need to know a better, simpler solution to prove that it's over abstracted.
If You are looking for tool that could do it in Your place - look no more, only mind can reliably judge that.
I will give an answer that will get a LOT of down votes!
If the code is written in an OO language .. it is necessarily heavily over-abstracted. The purer the language the worse the problem.
Abstraction should be used with great caution. If in doubt always use concrete data structures. (You can always abstract later, this is easier than de-abstraction :)
You must be very certain you have the right abstraction in your current context, and you must be very sure that concept will stand the test of change. Abstraction has a high price in performance of both the code and the coder.
Some weak tests for over-abstraction: if the data structure is a product type (struct in C) and the programmer has written get and set method for each field, they have utterly failed to provide any real abstraction, disabled operators like C increment, for no purpose, and simply not understood that the struct field names are already the abstract representation of a product. Duplicating and laming up the interface is not a good idea.
A good test for the product case is whether there exist any data invariants to maintain. For example a pair of integers representing a rational number is almost sufficient, there's little need for any abstraction because all pairs are valid except when the denominator is zero. However for performance reasons one may choose to maintain an invariant, typically the denominator is required to be greater than zero, and the numerator and denominator are relatively prime. To ensure the invariant, the product representation is encapsulated: the initial value protected by a constructor and methods constrained to maintain the invariant.
To fix code I recommend these steps:
Document the representation invariants the abstraction is maintaining
Remove the abstraction (methods) if you can't find strong invariants
Rewrite code using the method to access the data directly.
This procedure only works for low level abstraction, i.e. abstraction of small values by classes.
Over abstraction at a higher level is much harder to deal with. Ideally you'd refactor the code repeatedly, checking to see after each step it continues to work. However this will be hard, and sometimes a major rewrite is required, rather than a refinement. It's probably not worth it unless the abstraction is so far off base it is not tenable to continue to maintain it.
Download Magento and have a look at the code, read some documents on it and have a look at their ERD: http://www.magentocommerce.com/wiki/_media/doc/magento---sample_database_diagram.png?cache=cache
I'm not joking, this is over-abstraction.. trying to please everyone and cover every base is a terrible idea and makes life extremely difficult for everyone.
Personally I would say that "What is the ideal level of abstraction?" is a subjective question.
I don't like code that uses a new line for every atomic operation, but I also don't like 10 nested operations within one line.
I like the use of recursive functions, but I don't appreciate recursion for the sole sake of recursion.
I like generics, but I don't like (nested) generic functions that e.g. use different code for each specific type that's expected...
It is a matter of personal opinion as well as common sense. Does this answer your question?
I completely agree with what #ArnisLapsa wrote:
"Simplicity over complexity, complexity over complicatedness"
And that
an abstraction is used to "de-level" those, from complicated to complex
(and from complex to simpler)
Also, as stated by #MartinHemmings a good abstraction is quite subjective because we don't all think the same way. And actually our way of thinking change with time. So Something that someone find simple might looks complex to others, and even become simpler with more experiences. Eg. A monadic operation is something trivial for functional programmer, but can be seriously confusing for others. Similarly, a design with mutable object communicating with each other can be natural for some and feel un-trackable for others.
That being said, I would like to add a couple of indicators. Note that this applies to abstractions used in code-base, not "paradigm abstraction" such as everything-is-a-function, or everything-is-designed-as-objects. So:
To the people it concerns, the abstraction should be conceptually simpler than other alternatives, without looking at the implementation. If you find that thinking of all possible cases is simpler that reasoning using the abstraction, then this abstraction is not suitable (for you)
Its implementation should reason only about the abstraction, not the specific cases that it will be used for. As soon as the abstraction implementation has parts made for specific cases, it indicates an "unfit" abstraction. And increasing generalization to cope with each new case, is going the wrong way (and tends to fall to the next issue).
A very common indicator of over-abstraction I have found (and actually fell for) are abstractions that represent more than what is needed, now. As much as possible, they should allow to do exactly what is required, but nothing more. For example, say you're thinking of, or already have, a "2d point" abstraction for which you can define many operators you need. Then you have another need that could really be a "4d point" similar to the 2d. Don't start to use a "Ndimensionnal point" abstraction, especially thinking that you might later need it. Maybe you'll never have anything else than 2 and 4d (because it stays as "a good idea" in the backlog forever) but instead some requirements pops to convert 4d points into pairs of 2d points. That's going to be hard to generalize to n-dimensions. So, each abstraction can be checked to cover and only cover the actual needs. In my point example, the complexity "n-dimensional" is actually only used to cope with the 2 and 4d cases (and the 4d might not even be used that much).
Finally, in a more global point of view, a code-base that has many not related abstractions, is an indicator that the dev team tends to abstract every little issues. So probably many of them are or became over-abstracted.

Advantages of stateless programming?

I've recently been learning about functional programming (specifically Haskell, but I've gone through tutorials on Lisp and Erlang as well). While I found the concepts very enlightening, I still don't see the practical side of the "no side effects" concept. What are the practical advantages of it? I'm trying to think in the functional mindset, but there are some situations that just seem overly complex without the ability to save state in an easy way (I don't consider Haskell's monads 'easy').
Is it worth continuing to learn Haskell (or another purely functional language) in-depth? Is functional or stateless programming actually more productive than procedural? Is it likely that I will continue to use Haskell or another functional language later, or should I learn it only for the understanding?
I care less about performance than productivity. So I'm mainly asking if I will be more productive in a functional language than a procedural/object-oriented/whatever.
Read Functional Programming in a Nutshell.
There are lots of advantages to stateless programming, not least of which is dramatically multithreaded and concurrent code. To put it bluntly, mutable state is enemy of multithreaded code. If values are immutable by default, programmers don't need to worry about one thread mutating the value of shared state between two threads, so it eliminates a whole class of multithreading bugs related to race conditions. Since there are no race conditions, there's no reason to use locks either, so immutability eliminates another whole class of bugs related to deadlocks as well.
That's the big reason why functional programming matters, and probably the best one for jumping on the functional programming train. There are also lots of other benefits, including simplified debugging (i.e. functions are pure and do not mutate state in other parts of an application), more terse and expressive code, less boilerplate code compared to languages which are heavily dependent on design patterns, and the compiler can more aggressively optimize your code.
The more pieces of your program are stateless, the more ways there are to put pieces together without having anything break. The power of the stateless paradigm lies not in statelessness (or purity) per se, but the ability it gives you to write powerful, reusable functions and combine them.
You can find a good tutorial with lots of examples in John Hughes's paper Why Functional Programming Matters (PDF).
You will be gobs more productive, especially if you pick a functional language that also has algebraic data types and pattern matching (Caml, SML, Haskell).
Many of the other answers have focused on the performance (parallelism) side of functional programming, which I believe is very important. However, you did specifically ask about productivity, as in, can you program the same thing faster in a functional paradigm than in an imperative paradigm.
I actually find (from personal experience) that programming in F# matches the way I think better, and so it's easier. I think that's the biggest difference. I've programmed in both F# and C#, and there's a lot less "fighting the language" in F#, which I love. You don't have to think about the details in F#. Here's a few examples of what I've found I really enjoy.
For example, even though F# is statically typed (all types are resolved at compile time), the type inference figures out what types you have, so you don't have to say it. And if it can't figure it out, it automatically makes your function/class/whatever generic. So you never have to write any generic whatever, it's all automatic. I find that means I'm spending more time thinking about the problem and less how to implement it. In fact, whenever I come back to C#, I find I really miss this type inference, you never realise how distracting it is until you don't need to do it anymore.
Also in F#, instead of writing loops, you call functions. It's a subtle change, but significant, because you don't have to think about the loop construct anymore. For example, here's a piece of code which would go through and match something (I can't remember what, it's from a project Euler puzzle):
let matchingFactors =
factors
|> Seq.filter (fun x -> largestPalindrome % x = 0)
|> Seq.map (fun x -> (x, largestPalindrome / x))
I realise that doing a filter then a map (that's a conversion of each element) in C# would be quite simple, but you have to think at a lower level. Particularly, you'd have to write the loop itself, and have your own explicit if statement, and those kinds of things. Since learning F#, I've realised I've found it easier to code in the functional way, where if you want to filter, you write "filter", and if you want to map, you write "map", instead of implementing each of the details.
I also love the |> operator, which I think separates F# from ocaml, and possibly other functional languages. It's the pipe operator, it lets you "pipe" the output of one expression into the input of another expression. It makes the code follow how I think more. Like in the code snippet above, that's saying, "take the factors sequence, filter it, then map it." It's a very high level of thinking, which you don't get in an imperative programming language because you're so busy writing the loop and if statements. It's the one thing I miss the most whenever I go into another language.
So just in general, even though I can program in both C# and F#, I find it easier to use F# because you can think at a higher level. I would argue that because the smaller details are removed from functional programming (in F# at least), that I am more productive.
Edit: I saw in one of the comments that you asked for an example of "state" in a functional programming language. F# can be written imperatively, so here's a direct example of how you can have mutable state in F#:
let mutable x = 5
for i in 1..10 do
x <- x + i
Consider all the difficult bugs you've spent a long time debugging.
Now, how many of those bugs were due to "unintended interactions" between two separate components of a program? (Nearly all threading bugs have this form: races involving writing shared data, deadlocks, ... Additionally, it is common to find libraries that have some unexpected effect on global state, or read/write the registry/environment, etc.) I would posit that at least 1 in 3 'hard bugs' fall into this category.
Now if you switch to stateless/immutable/pure programming, all those bugs go away. You are presented with some new challenges instead (e.g. when you do want different modules to interact with the environment), but in a language like Haskell, those interactions get explicitly reified into the type system, which means you can just look at the type of a function and reason about the type of interactions it can have with the rest of the program.
That's the big win from 'immutability' IMO. In an ideal world, we'd all design terrific APIs and even when things were mutable, effects would be local and well-documented and 'unexpected' interactions would be kept to a minimum. In the real world, there are lots of APIs that interact with global state in myriad ways, and these are the source of the most pernicious bugs. Aspiring to statelessness is aspiring to be rid of unintended/implicit/behind-the-scenes interactions among components.
One advantage of stateless functions is that they permit precalculation or caching of the function's return values. Even some C compilers allow you to explicitly mark functions as stateless to improve their optimisability. As many others have noted, stateless functions are much easier to parallelise.
But efficiency is not the only concern. A pure function is easier to test and debug since anything that affects it is explicitly stated. And when programming in a functional language, one gets in the habit of making as few functions "dirty" (with I/O, etc.) as possible. Separating out the stateful stuff this way is a good way to design programs, even in not-so-functional languages.
Functional languages can take a while to "get", and it's difficult to explain to someone who hasn't gone through that process. But most people who persist long enough finally realise that the fuss is worth it, even if they don't end up using functional languages much.
Without state, it is very easy to automatically parallelize your code (as CPUs are made with more and more cores this is very important).
Stateless web applications are essential when you start having higher traffic.
There could be plenty of user data that you don't want to store on the client side for security reasons for example. In this case you need to store it server-side. You could use the web applications default session but if you have more than one instance of the application you will need to make sure that each user is always directed to the same instance.
Load balancers often have the ability to have 'sticky sessions' where the load balancer some how knows which server to send the users request to. This is not ideal though, for example it means every time you restart your web application, all connected users will lose their session.
A better approach is to store the session behind the web servers in some sort of data store, these days there are loads of great nosql products available for this (redis, mongo, elasticsearch, memcached). This way the web servers are stateless but you still have state server-side and the availability of this state can be managed by choosing the right datastore setup. These data stores usually have great redundancy so it should almost always be possible to make changes to your web application and even the data store without impacting the users.
My understanding is that FP also has a huge impact on testing. Not having a mutable state will often force you to supply more data to a function than you would have to for a class. There's tradeoffs, but think about how easy it would be to test a function that is "incrementNumberByN" rather than a "Counter" class.
Object
describe("counter", () => {
it("should increment the count by one when 'increment' invoked without
argument", () => {
const counter = new Counter(0)
counter.increment()
expect(counter.count).toBe(1)
})
it("should increment the count by n when 'increment' invoked with
argument", () => {
const counter = new Counter(0)
counter.increment(2)
expect(counter.count).toBe(2)
})
})
functional
describe("incrementNumberBy(startingNumber, increment)", () => {
it("should increment by 1 if n not supplied"){
expect(incrementNumberBy(0)).toBe(1)
}
it("should increment by 1 if n = 1 supplied"){
expect(countBy(0, 1)).toBe(1)
}
})
Since the function has no state and the data going in is more explicit, there are fewer things to focus on when you are trying to figure out why a test might be failing. On the tests for the counter we had to do
const counter = new Counter(0)
counter.increment()
expect(counter.count).toBe(1)
Both of the first two lines contribute to the value of counter.count. In a simple example like this 1 vs 2 lines of potentially problematic code isn't a big deal, but when you deal with a more complex object you might be adding a ton of complexity to your testing as well.
In contrast, when you write a project in a functional language, it nudges you towards keeping fancy algorithms dependent on the data flowing in and out of a particular function, rather than being dependent on the state of your system.
Another way of looking at it would be illustrating the mindset for testing a system in each paradigm.
For Functional Programming: Make sure function A works for given inputs, you make sure function B works with given inputs, make sure C works with given inputs.
For OOP: Make sure Object A's method works given an input argument of X after doing Y and Z to the state of the object. Make sure Object B's method works given an input argument of X after doing W and Y to the state of the object.
The advantages of stateless programming coincide with those goto-free programming, only more so.
Though many descriptions of functional programming emphasize the lack of mutation, the lack of mutation also goes hand in hand with the lack of unconditional control transfers, such as loops. In functional programming languages, recursion, in particularly tail recursion, replaces looping. Recursion eliminates both the unconditional control construct and the mutation of variables in the same stroke. The recursive call binds argument values to parameters, rather than assigning values.
To understand why this is advantageous, rather than turning to functional programming literature, we can consult the 1968 paper by Dijkstra, "Go To Statement Considered Harmful":
"The unbridled use of the go to statement has an immediate consequence that it becomes terribly hard to find a meaningful set of coordinates in which to describe the process progress."
Dijkstra's observations, however still apply to structured programs which avoid go to, because statements like while, if and whatnot are just window dressing on go to! Without using go to, we can still find it impossible to find the coordinates in which to describe the process progress. Dijkstra neglected to observe that bridled go to still has all the same issues.
What this means is that at any given point in the execution of the program, it is not clear how we got there. When we run into a bug, we have to use backwards reasoning: how did we end up in this state? How did we branch into this point of the code? Often it is hard to follow: the trail goes back a few steps and then runs cold due to a vastness of possibilities.
Functional programming gives us the absolute coordinates. We can rely on analytical tools like mathematical induction to understand how the program arrived into a certain situation.
For example, to convince ourselves that a recursive function is correct, we can just verify its base cases, and then understand and check its inductive hypothesis.
If the logic is written as a loop with mutating variables, we need a more complicated set of tools: breaking down the logic into steps with pre- and post-conditions, which we rewrite in terms mathematics that refers to the prior and current values of variables and such. Yes, if the program uses only certain control structures, avoiding go to, then the analysis is somewhat easier. The tools are tailored to the structures: we have a recipe for how we analyze the correctness of an if, while, and other structures.
However, by contrast, in a functional program there is no prior value of any variable to reason about; that whole class of problem has gone away.
Haskel and Prolog are good examples of languages which may be implemented as stateless programming languages. But unfortunately they are not so far. Both Prolog and Haskel have imperative implementations currently. See some SMT's, seem closer to stateless coding.
This is why you are having hard time seeing any benefits from these programing languages. Due to imperative implementations we have no performance and stability benefits. So the lack of stateless languages infrastructure is the main reason you feel no any stateless programming language due to its absence.
These are some benefits of pure stateless:
Task description is the program (compact code)
Stability due to absense of state-dependant bugs (the most of bugs)
Cachable results (a set of inputs always cause same set of outputs)
Distributable computations
Rebaseable to quantum computations
Thin code for multiple overlapping clauses
Allows differentiable programming optimizations
Consistently applying code changes (adding logic breaks nothing written)
Optimized combinatorics (no need to bruteforce enumerations)
Stateless coding is about concentrating on relations between data which then used for computing by deducing it. Basically this is the next level of programming abstraction. It is much closer to native language then any imperative programming languages because it allow describing relations instead of state change sequences.

Resources