Efficiency of list operations in functional languages - functional-programming

In functional languages like Racket or SML, we usually perform list operations in recursive call (pattern matching, list append, list concatenation...). However, I'm not sure the general implementation of these operations in functional languages. Will operations like create, update or delete elements in a list return a whole new copy of a list? I once read in a book an example about functional programming disadvantage; that is, every time a database is updated, a whole new copy of a database is returned.
I questioned this example, since data in FP is inherently immutable, thus the creating lists from existing lists should not create a whole new copy. Instead, a new list is simply just a different collection of reference to existing objects in other lists, based on filtering criteria.
For example, list A = [a,b,c], and list B=[1,2,3], and I created a new list that contains the first two elements from the existing lists, that is C=[a,b,1,2]. This new list simply contains references to a,b, from A and 1,2 from B. It should not be a new copy, because data is immutable.
So, to update an element in a list, it should only take a linear amount of time find an element in a list, create a new value and create a new list with same elements as in the old list except the updated one. To create a new list, the running environment merely updates the next pointer of the previous element. If a list is holding non-atomic elements (i.e. list, tree...), and only one atomic element in one of the non-atomic element is updated, this process is recursively applied for the non-atomic element until the atomic element is updated as described above. Is this how it should be implemented?
If someone creates a whole deep copy of a list every time a list is created from existing lists/added/updated/deleted/ with elements, they are doing it wrong, aren't they?
Another thing is, when the program environment is updated (i.e. add a new key/value entry for a new variable, so we can refer to it later), it doesn't violate the immutable property of functional programming, is it?

You are absolutely correct! FP languages with immutable data will NEVER do a deep copy (unless they are really poorly implemented). As the data is immutable there are never any problems in reusing it. It works in exactly the same way with all other structures. So for example if you are working with a tree structure then at most only the actual tree will be copied and never the data contained in it.
So while the copying sounds very expensive it is much less than you would first think if you coming from an imperative/OO background (where you really do have to copy as you have mutable data). And there are many benefits in having immutable data.

Related

Why not use vectors and lists together?

This might be a quite stupid question but, knowing the poor efficiency of searching for an element inside a list (singly-linked or doubly-linked), why not using a vector or dynamic array to store the elements of the list in order, therefore making it easier to access elements ?
Linked lists used to be more important because they are stored non-contiguously which is better for memory management. Linked lists and vectors / arrays both have a search time complexity of O(N). It's only faster to access array elements if you know the index in advance. Linked lists are for niche cases where you are frequently inserting elements into the beginning of the array. Linked lists let you do this in O(1) time as opposed to arrays O(n) because the other elements need to be shifted.

In OCaml, how to store "pointers" to elements of a list?

I have a list of some data type. However, I also want to index the elements of that list with a trie, so I can do more efficient lookups. But I don't want to store the same elements twice, so I want to store the elements in the list, and in the trie I store pointers to elements, in the leaf nodes. Is this possible? I could store the index of the element in the list, however getting an element of a linked list by index is slow, so that won't do.
Apologies if this is a misunderstanding of the OCaml memory model.
Just store the element. Under the hood, this doesn't copy the value, it just copies a pointer to the value, except for values that are stored in a single memory word (just like a pointer).
In other words, things like let b = a do not make a copy a. They make b an alias of a.
Values are automatically shared in Ocaml. The only case where you wouldn't want sharing is a mutable object (reference, or structure or object with mutable fields). If you want two mutable objects with the same current value but such that an assignment will only affect one of the objects, then you need to make a copy.

What is side effects in functional programming?

I am learning Java 8 newly , i see one definition related to functional programming which is "A program created using only pure functions , No Side effects allowed".
One of side effects is "Modifying a data structure in place".
i don't understand this line because at last some where we need to speak with database for storing or retrieving or updating the data.
modifying database is not functional means how we will speak with database in functional programming ?
"Modifying a data structure structure in place" means you directly manipulate the input datastructure (i.e. a List). "Pure functions" mean
the result is only a function of it's input and not some other hidden state
the function can be applied multiple times on the same input producing the same result. It will not change the input.
In Object Oriented Programming, you define behaviour of objects. Behaviour could be to provide read access to the state of the object, write access to it, or both. When combining operations of different concerns, you could introduce side effects.
For example a Stack and it's pop() operation. It will produce different results for every call because it changes the state of the stack.
In functional programming, you apply functions to immutable values. Functions represent a flow of data, not a change in state. So functions itself are stateless. And the result of a function is either the original input or a different value than the input, but never a modified input.
OO also knows functions, but those aren't pure in all cases, for example sorting: In non-functional programming you rearrange the elements of a list in the original datastructure ("in-place"). In Java, this is what Collections.sort()` does.
In functional programming, you would apply the sort function on an input value (a List) and thereby produce a new value (a new List) with sorted values. The function itself has no state and the state of the input is not modified.
So to generalize: given the same input value, applying a function to this value produces the same result value
Regarding the database operations. The contents of the database itself represent a state, which is the combination of all its stored values, tables etc (a "snapshot"). Of course you could apply a function to this data producing new data. Typically you store results of operations back to the db, thus changing the state of the entire system, but that doesn't mean you change the state of the function nor it's input data. Reapplying the function again, doesn't violate the pure-function constraints, because you apply the data to new input data. But looking at the entire system as a "datastructure" would violate the constraint, because the function application changes the state of the "input".
So the entire database system could hardly be considered functional, but of course you could operate on the data in a functional way.
But Java allows you to do both (OO and FP) and even mix both paradigms, so you could choose whatever approach fits your needs best.
or to quote from this answer
If you have several needs intermixed, mix your paradigms. Do not
restrict yourself to only using the lower right corner of your
toolbox.

LinkedHashMap's impl - Uses Double Linked List, not a Single Linked List; Why

As I referred to the documentation of LinkedHashMap, It says, a double linked list(DLL) is maintained internally
I was trying to understand why a DLL was chosen over S(ingle)LL
The biggest advantage I get with a DLL would be traversing backwards, but I dont see any use case for LinkedHashMap() exploiting this advantage, since there is no previous() sort of operation like next() in the Iterable interface..
Can anyone explain why was a DLL, and not a SLL?
It's because with an additional hash map, you can implement deletes in O(1). If you are using a singly linked list, delete would take O(n).
Consider a hash map for storing a key value pair, and another internal hash map with keys pointing to nodes in the linked list. When deleting, if it's a doubly linked list, I can easily get to the previous element and make it point to the following element. This is not possible with a singly linked list.
http://www.quora.com/Java-programming-language/Why-is-a-Java-LinkedHashMap-or-LinkedHashSet-backed-by-a-doubly-linked-list

Order of QObject children (strategy question)

For one of my projects I have a tree of QObject derived objects, which utilize QObject's parent/child functionality to build the tree.
This is very useful, since I make use of signals and slots, use Qt's guarded pointers and expect parent objects to delete children when they are deleted.
So far so good. Unfortunately now my project requires me to manage/change the order of children. QObject does not provide any means of changing the order of its children (exception: QWidget's raise() function - but that's useless in this case). So now I'm looking for a strategy of controlling the order of children. I had a few ideas, but I'm not sure about their pros & cons:
Option A: Custom sort index member variable
Use a int m_orderIndex member variable as a sort key and provide a sortedChildren() method which returns a list of QObjects sorted by this key.
Easy to implement into existing object structure.
Problematic when QObject::children() method is overriden - will lead to problems during loops when items' order is changed, also is more expensive than default implementation.
Should fall back to QObject object order if all sort keys are equal or 0/default.
Option B: Redundant list of children
Maintain a redundant list of children in a QList, and add children to it when they are created and destroyed.
Requires expensive tracking of added/deleted objects. This basically leads to a second child/parent tracking and lot of signals/slots. QObject does all of this internally already, so it might not be a good idea to do it again. Also feels like a lot of bloat is added for a simple thing like changing the order of children.
Good flexibility, since a QList of children can be modified as needed.
Allows a child to be in the QList more than one time, or not at all (even though it might be still a child of the QObject)
Option C: ...?
Any ideas or feedback, especially from people who already solved this in their own projects, is highly appreciated. Happy new year!
I spent a lot of time going through all these options in the past days and discussed them carefully with some other programmers. We decided to go for Option A.
Each of the objects we are managing is a child of a parent object. Since Qt does not provide any means of re-ordering these objects, we decided to add a int m_orderIndex property to each object, which defaults to 0.
Each object has an accessor function sortedChildren() which returns a QObjectList of the children. What we do in that function is:
Use the normal QObject::chilren() function to get a list of all available child objects.
dynamic_cast all objects to our "base class", which provides the m_orderIndex property.
If the object is castable, add it to a temporary object list.
use qSort with a custom LessThan function to find out if qSort needs to change the order of two objects.
Return the temporary object list.
We did this for the following reasons:
Existing code (especially Qt's own code) can continue using children() without having to worry about side effects.
We can use the normal children() function in places where the order does not matter, without having any performance loss.
In the places where we need the ordered list of children, we simply replace children() by sortedChildren() and get the desired effect.
One of the good things about this approach is, that the order of children does not change if all sort indices are set to zero.
Sorry for answering my own question, hope that enlightens people with the same problem. ;)
What about something like...
QList listChildren = (QList)children();
sort listChildren
foreach listChildren setParent( TempParent )
foreach listChildren setParent( OriginalParent )
A nasty hack: QObject::children() returns a reference-to-const. You could cast away the const-ness and thus manipulate the internal list directly.
This is pretty evil though, and has the risk of invalidating iterators which QObject keeps internally.
I do not have an separate option C, but comparing option A and B, you are speaking ~4 bytes (32-bit pointer, 32-bit integer) in either case, so I'd go with option B, as you can keep that list sorted.
To avoid the additional complexity of tracking children, you could cheat and keep your list sorted and tidy, but combine it with a sortedChildren method that filters out all the non-children. Complexity wise, this aught to end up around O(nlogm) (n = children, m = list entries, assuming that m >= n, i.e. children are always added) unless you have a large turnaround on children. Let's call this option C.
Quicksort, in your suggested option A, gives you O(n2) (wc), but also requires you to retreive pointers, trace them, retreive an integer, etc. The combined method only needs a list of the pointers (aught to be O(n)).
I had the same problem and I solved it by Option B. Tracking isn't that difficult, just create a method "void addChild(Type *ptr);" and another one to delete a childitem.
You don't suffer from evil redundancy if you exclusivly store the children within the private/public childlist (QList) of each item and drop the QObject base. Its actually quite easy to implement auto-child-free on free (though that requires an extra parent pointer).

Resources