Are Immutable Collections Inherently Less Efficient

Are Immutable Collections Inherently Less Efficient - functional-programming

Immutable objects are objects that cannot change state. They can be easier to test and debug, and are very useful in concurrent programming. However, current implementations of immutable collections have poor performance compared to their mutable relatives. For example, implementing an associative array as an immutable red-black tree has on average O(log(n)) Insert/Delete, while a hash table has on average O(1) Insert/Delete.
In general, are immutable collections provably less efficient than their mutable cousins, or will we someday find immutable implementations that are just as fast?

Okasaki demonstrates that it is often possible to develop immutable data structures "with equivalent asymptotic performance" as their imperative counterparts. Likely the immutables structures have worse constants, however. But what you may pay for in performance you do get back in programmer time; as you said, it is much easier to work with and understand immutable collections. In this way the question is somewhat similar to the recurring question as to why we use other languages when C is so fast. Because it is easier and we value programmer time.

Related

In functional languages, how is the concept of immutability applied to addresses in memory?

I'm trying to understand how a lot of basic computer science concepts are implemented in functional languages. The point that I can't currently understand is how functional languages and philosophies deal with addresses in memory.
In the context of a very base computer science concept like sorts, how are issues of immutability dealt with efficiently? I know that structural sharing is really needed to keep memory from blowing up. But in my mind this means that relatively simple concepts like selection sort can become quite complicated.
Can someone explain how a functional language deals with in place sorts? Is the idea of being "in place" thrown out and replaced with a data structure that supports structural sharing?
I'm really trying to understand how immutability fits with addresses in memory (think pointers). For example, in an in place sort data is not destroyed, but it is moved to new addresses. Is this considered mutation? I think the answer is yes. But then how can you do things like rotations to balance a binary tree? How do functional programmers think about pointers?
I know that this is relatively hard question to answer, but I feel like its a big issue with respect to really understanding the functional paradigm. Any insights or resources would be greatly appreciated.

Just to get this out of the way:
For example, in an in place sort data is not destroyed, but it is moved to new addresses.
This does not make any sense. If the data is "moved to new addresses", the algorithm, by definition, no longer works "in place".
There is a long tradition of functional programming languages that do not insist on 100% purity. Starting with Lisp, over ML, then OCaml, Scala or Clojure - all these languages have mutable data structures. In "multi-paradigm" languages that have aspects of functional programming, like JavaScript and Python and even Java, you also have mutable data structures. Haskell is rather an exception in its insistence on purity.
Most functional programming languages prefer persistent data structures and algorithms that work on immutable data structures. That is, instead of a mutable hash map, those languages would usually prefer some kind of balanced sorted tree, and instead of mutable list buffers, they would prefer immutable singly-linked lists. For sorting those lists, you could take merge-sort, which is nicely expressible as a pure functional program (but is not in-place, at least not without some considerable extra effort).
Even if you insist on purity, you can still treat the mutable memory of your computer just like yet another part of the mutable "outside world" - as if it were some kind of user input-output, system clock, network communication, or a random number generator. That is, to deal with mutable memory in a pure functional way, you would need two components: first, you would need a way to describe what is to be done with the mutable memory by constructing a "plan" - which is immutable; and then, you would need an interpreter that can take this immutable plan, and apply it to an actual mutable chunk of memory. That is, the interpreter that mutates memory becomes somewhat external to the core of the language, and is treated just like any other part of the "external mutable world".
In languages which do not insist on purity, you can implement both the little domain-specific language for constructing the immutable plans, as well as the interpreter that actually mutates the memory, thereby separating the pure parts from the impure side-effecty mutable parts. For example, Chiusano & Bjarnason in their book "Functional Programming in Scala" have a chapter 14.2.5 literally called "A purely functional in-place quicksort".
In general, in statically typed functional programming, immutability is not the goal in itself. The goal is rather to ensure that half-backed mutable data structures do not escape the narrow scope of the algorithm for which the mutability is advantageous. If you find a way to ensure that, then it means that you can write purely functional programs that make use of mutable memory.

Your confusion comes from promiscuously mixing levels of abstraction.
How is memory allocation handled in your favorite OO garbage-collected language (Python, Java, Ruby, etc)? You don't know. That detail is left to the compiler and/or runtime. You are confusing the semantics of a programming language with an implementation detail for a compiler of that language. I will grant that C/C++ blur the distinction considerably, but that blurring is probably the most salient feature of those languages at this point.
Consider a common associative data structure, the C struct:
struct address
{
char number[10];
char street[100];
char city[50];
char state[15];
};
We know, in advance, what this will look like in memory. But consider a similar data structure in, say, Java:
public class Record {
public int number;
public String street;
public String city;
public String state;
}
How's that going to layout in memory? You don't know. Even if you replace the Strings with character buffers, you don't really know. Obviously javac makes it happen. It's no different with persistent data structures in functional languages: where stuff gets put in memory is up to the compiler, which is not bound by the semantics of the language it's compiling.

What assumptions could a garbage collector for a pure, functional, eagerly-evaluated language safely make?

Clarifying the question a bit:
Garbage collectors such as those used by the JVM involve a lot of complexity as a result of the nature of the languages they support. What simplifications would be afforded to a garbage collector purpose-built for a pure, functional, eagerly-evaluated programming language compared to say, the JVM garbage collector?

I'm barely an expert in functional languages design but when thinking about your question, immediately the following topics come to my mind:
most probably it would be generational GC, at least I see no reason why it should not be. It could probably benefit from tuning to a large number of temporary objects
no write-barriers - due to immutability it is not possible to create a reference from older object to newer one. No older-to-younger references mean no need for the remembered sets in case of generational GC, thus no write-barriers are necessary to manage them. This is a great simplification in my humble opinion.
easier safe-points - due to the functional languages nature, function calls are much denser than in object-oriented programming. Even loops may be defined as recursive function calls. This should make implementing GC safe-points easier - for example, simply on each function entry. For example, read this article as a reference.
no pinning - if our hypothetical, pure functional language does not support native code cooperation, object pinning will not be necessary in case of compacting GC. This can greatly simplify its design.
no finalization - object finalization probably would not fit into purely functional language. I feel it breaks referential transparency. And if we do not support native resources, one will not need it at the first place.

Are there any benefits to GC performance from immutable objects (functional languages)?

It is rising popularity of functional languages due to effective way to utilize multi-core CPUs (because immutability invariant provide some guarantees that allow some optimization) but are there any benefits to garbage-collector performance from immutability?
UPDATE During my search I found only one argument - possibility to avoid write barrier in GC algorithm (on sweep stage only, when GC at compat/defragmentation stage we still need write barrier, but that happen not often).

In absence of pointer comparison immutable objects can be passed by value and thus their heap-allocation may not be necessary if they can live on the stack or embedded in other objects. By eliminating them as referenceable objects you eliminate references that the GC has to traverse.
With pointer comparison immutable objects still force a programming style that can be more friendly to escape analysis / automatic stack allocation for some instances.
Additionally immutable objects also remove the need for defensive copying of which may be necessary when returning mutable data in a public interface.

While I am still search for benefits I found some arguments against performance to work with immutable objects.
Bergi suggest investigate Haskel memory model. Here immutability require creation of enormous number of new objects so there are a lot of work for GC, while it faster but amount of job also larger.
I also found another arguments against immutable structure here (while it not directly related to GC but related to most impotent data structure):
http://concurrencyfreaks.blogspot.com/2013/10/immutable-data-structures-are-not-as.html
Article show example of big tree that fit L2 CPU cache. Any mutable tree after modification require only 1 node insertion/deletion. Any imutalbe tree implementation require O(log(N)) nodes insertion/deletion into L2 cache. That may greatly reduce performance on tree structures.

Why is OCaml still reasonable fast while constantly creating new things?

OCaml is functional, so in many cases, all the data are immutable, which means it constantly creates new data, or copying data to new memory, etc.
However, it has the reputation of being fast.
Quite a number of talks about OCaml always say although it constantly creates new things, it is still fast. But I can't find anywhere explaining why.
Can someone summarise why it is fast even with functional way?

Also, you should know that copies are not made nearly as often as you might think. Only the changed part of an immutable data structure has to be updated. For example, say you have an immutable set x. You then define y to be x with one additional item in it. The set y will share most of its underlying representation with x even though semantically x and y are completely different sets. The usual reference for this is Okasaki's Purely Functional Data Structures.

I think the essence, as Jerry101 points out, is that you can make GC a lot faster if it's known to be working in an environment where virtually all objects are immutable and short-lived. You can use a generational collector, and virtually none of the objects make it out of the first generation. This is surprisingly fast.
OCaml has mutable values as well. For some cases (I would expect they are rare in practice) you could find that using mutability makes your code slower because GC is tuned for immutability.
OCaml doesn't have concurrent GC. That's something that would be great to see.
Another angle on this is that the OCaml implementors (Xavier Leroy et al) are excellent :-)
The Real World OCaml book seems to have a good description of GC in OCaml. Here's a link I found: https://realworldocaml.org/v1/en/html/understanding-the-garbage-collector.html

I'm not familiar with OCaml but here's some general background on programming language VM (including garbage collection) speed.
One aspect is to dig into the claims -- "fast" compared to what?
In one comparison, the summary is "D is slower than C++ but D programs are faster than C++ programs." The micro-benchmarks in D are slower but it's easier to see the big picture while programming and thus use better algorithms, avoid duplicate work, and not have to work around C++ rough edges.
Another aspect is to realize that modern garbage collectors are quite fast, that concurrent garbage collectors can do most of the work in another thread thus making use of multiple CPU cores in a way that saves time for the "mutator" threads, and that memory allocation in a GC environment is faster than C's malloc() because it doesn't have to search for a suitable memory gap.
Also functional languages are easy to parallelize.

Are there any functional languages that don't have garbage collection

Or even heavily functional styles in non functional/non memory managed languages.
What sort of techniques are there to deal with problems like intermediate garbage? Cleaning up after lazynizess/thunk allocated memory. Performance(since you can't easily share resources between immutable variables if you have to track its progress to deallocate it(smart pointers?)

You might be interested in programming languages with linear or uniqueness types, these can manage resources (and memory in particular). Recent examples: ATS and LinearML.
There have been attempts at "region-based memory management" (e.g. Cyclone), but they haven't lifted off just yet -- regions also allow for (earlier) memory reclamation, but they aren't enough (e.g., there are programs which, when run with region-based memory management, will exhibit unacceptable performance). The two schemes could be mixed, I think.
Back to your question, some ATS programs can run without garbage collection. (I won't say that such programs are written in "functional" style, such as in SML, but in a mix of imperative and first-order functional style.)

The only relevant thing I can think of is how Mlton is eliminating a significant part of garbage collection with a region analysis. It should be possible, in theory, to implement a compiler which will treat an unmanageable and un-annotated pointer leak as an error, and then one would be able to use many functional programming techniques in an entirely manual memory management setting.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex