Are Lists just a hack to represent sequences functionally? - functional-programming

I'm very interested in functional programming as a way to represent abstractions without lying as to what they truly are for convenience. Something feels off to me about lists (in the way they are recursively defined in functional programming).
Why do functional programming languages generally have lists defined with an Empty case? Is a collection of things really a sum-type? Or is my conception of a list separate from what this is:
type List = Empty | Element head (List tail)
I know that pattern matching makes the above safe to do, but to me it seems analogous to making all types Option types by default.
Is there a term for the right-hand side of this List definition?
Also is this List something from Mathematics? Are we representing things which are truly sequences as lists for convenience?

A very natural way to define a type of finite lists is to state: a list is either empty, or the addition of an element at (say) the front of an existing list. This is what the recursive sum type you refer to represents. An option type is simply another kind of sum type that represents two possibilities: either it does not contain a payload, or it does. The two types, lists and options, represent different things.
I have sometimes seen constructors of recursive types, like lists, classified as base constructors or recursive constructors, paralleling their use in proofs by structural induction. This makes clear what you mean, although other people may use slightly different terms.
On the one hand, the type of lists thus defined is a perfectly legitimate mathematical entity. On the other, you wonder whether lists are an adequate model of sequences. In programming terms, answering this question involves defining the abstract data type of sequences, including their desired properties, and proving that implementing sequences via lists satisfies those properties. For example, from the article you link:
The number of elements (possibly infinite) is called the length of the sequence.
So, if you want to represent infinite sequences, the type of finite lists by itself will be insufficient. The section on formal definitions in the same article considers sequences as functions, and this may be another way to model them. Finite lists are a simple, reasonable candidate to represent finite sequences.

Related

Functional Programming: why pair as a basic constructed unit?

Basic cons-cell sticks together two arbitrary things and is the basic unit that allows a construction of linked lists and arbitrary data objects. Question: is there a reason to stick with this simplistic language design descision (for instance, in all lisp families)?
Why not use fixed length arrays for this purpose (or some nested stacks)? I can't foresee any problems with that, but there are clear advantages of a more "packed" memory, less pointer resolution and less "dead-weight" cons-cells to define hierarchy of the data.
You have titled your question “Functional Programming: why pair as a basic constructed unit?”, but this title does not reflect correctly the fact that many important and well known functional languages (e.g. Haskell, F#, Scala, SML, Clojure etc.) have either algebraic data types or different collection of data structures, in which the pair is just one of the different type of constructors, if even available. The situation is similar for other multiparadigm languages, that have support for functional programming, like C++, Java, Objective-C, Swift, etc.
In all these cases the pair, if present, is exactly “basic” as an array, a record, or list, or any other type of data constructor.
What is left is the family of Lisp languages, notably Common Lisp and Scheme, that, beside having a rich set of data structures, like those cited in the comment of Rainer Joswig, use the pair for an important task: as basic data constructor to represent programs.
The fact that Lisp code is a s-expression (that is a list of lists and atoms) has foundamental consequences, the most notable of all being the rising of macro systems, that allow programmers to create easily new syntax, or even new domain-specific languages.
Renzo's answer about other structures in functional programming is spot on. Function programming is about aligning programming with logic and mathematics, where expressions denote values, and there's no such thing as a side effect. (Of course, in practice, we need side effects for I/O, etc.) Functional programming doesn't require the singly linked list as a fundamental construct.
Lists are one of the things that make Lisps lispy, though.
One of the reasons that pairs are so common in the Lisp family of languages may be that ordered pairs are very easy to implement in the lambda calculus that Lisps are inspired by. (I say "inspired by" rather than "based on" because after the syntax and use of lambda to denote anonymous functions, there's plenty of difference, and it's best not to assume that things about one apply to the other.) See the answer to Use of lambda for cons/car/cdr definition in SICP for a quick lesson in how cons, car, and cdr can implemented using nothing but lexical closures.

LISP - destructive and non-destructive constructs

What is the correct definition of destructive and non-destructive constructs in LISP (or in general). I have tried to search for the actual meaning but I have only found a lot of usage of these terms without actually explaining them.
It came to my understanding, that by destructive function is meant a function, that changes the meaning of the construct (or variable) - so when I pass a list as a parameter to a function, which changes it, it is called a destructive operation, because it changes the initial list and return a brand new one. Is this right or are there some exceptions?
So is for example set a destructive function (because it changes the value of x)? I think not but I do not how, how would I justify this.
(set 'x 1)
Sorry for probably a very basic question.... Thanks for any answers!
I would not interpret too much into the word 'destructive'.
In list processing, a destructive operation is one that potentially changes one or more of the input lists as a visible side effect.
Now, you can widen the meaning to operations over arrays, structures, CLOS objects, etc. You can also call variable assignment 'destructive' and so on.
In Common Lisp, it makes sense to talk about destructive operations over sequences (which are lists, strings, and vectors in general) and multi-dimensional arrays.
Practical Common Lisp distinguishes two kinds of destructive operations: for-side-effect operations and recycling operations.
set is destructive and for-side-effect: it always modifies its first argument. Beware, that it changes the binding for a symbol, but not the thing currently bound to that symbol. setf can change either bindings or objects in-place.
By contrast, nreverse is recycling: it is allowed to modify its argument list, although there's no guarantee that it will, so it should be used just like reverse (take the return value), except that the input argument may be "destroyed" and should no longer be used. [Scheme programmers may call this a "linear update" function.]

Is the concept of Algebraic Data Type akin to Class definitions in OO languages?

Both concepts allow new data types to be created.
The only difference I can see is that in functional languages, one can perform pattern matching on algebraic data types. But there is no comparable succinct feature for OO languages. Is this an accurate statement ?
Algebraic data types are so named because they form an "initial algebra",
+ represents sum types (disjoint unions, e.g. Either).
• represents product types (e.g. structs or tuples)
X for the singleton type (e.g. data X a = X a)
1 for the unit type ()
and μ for the least fixed point (e.g. recursive types), usually implicit.
from these operators all regular data types can be constructed.
Algebraic data types also support parametric polymophism -- meaning they can be used as constainers for any underlying type, with static guarantees of safety. Additionally, ADTs are provided with uniform syntax for introducing and eliminating data types (via constructors and pattern matching). E.g.
-- this defines a tree
data Tree a = Empty | Node a (Tree a) (Tree a)
-- this constructs a tree
let x = Node 1 (Node 2 Empty) Empty
-- this deconstructs a tree
f (Node a l r) = a + (f l) + (f r)
The richness and uniformity of algebraic data types, along with the fact they're immutable, distinguish them from OO objects, which largely:
only represent product types (so no recursive or sum-types)
do not support pattern matching
are mutable
do not support parametric polymorphism
I can see three major differences between algebraic data types and OO-style classes, not counting (im)mutablility because that varies.
Algebraic data types allows sums as well as products, whereas OO-style classes only allow products.
OO-style classes allow you to bundle a complex data item with it's accepted operations, whereas algebraic data types don't.
Algebraic data types don't distinguish between the data passed to the constructor and the data stored in the resulting value, whereas OO-style classes do (or can).
One thing I deliberately left out of that list was subtyping. While the vast majority of OO languages allow you to subclass (non-final, non-sealed, currently accessible) classes, and the vast majority of generally ML-family functional languages do not, it is clearly possible to forbid inheritance completely in a hypothetical OO (or at least OO-like) language, and it is likewise possible to produce subtyping and supertyping in algebraic data types; for a limited example of the latter, see this page on O'Haskell, which has been succeeded by Timber
A class is more than just a type definition -- classes in most OO languages are really kitchen sink features that provide all sorts of loosely related functionality.
In particular, classes act as a kind of module, giving you data abstraction and namespacing. Algebraic data types don't have this built in, modularity is usually provided as a separate, orthogonal feature (usually modules).
In some sense one can see it this way. Every language has only so many mechanisms to create user defined types. In functional (ML, Haskell style) languages, the only one is creation of an ADT. (Haskell's newtype can be seen as a degenerate case of an ADT). In OO languages, it's classes. In procedural languages it is struct or record.
It goes without saying that the semantics of a user defined data type vary from language to language, and much more so from language in paradigm#1 to language in paradigm#2. #Pharien's Flame has already outlined typical differences.
Is the concept of Algebraic Data Type akin to Class definitions in OO languages?
in functional languages, one can perform pattern matching on algebraic data types. But there is no comparable succinct feature for OO languages. Is this an accurate statement ?
That is a part of it.
As Andreas said, classes are a kitchen sink feature in statically-typed object oriented languages derived from Simula like C++, Java and C#. Classes are a jack of all trades but master of none feature in this respect: they solve many problems badly.
Comparing vanilla ML and Haskell with OO as seen in C++, Java and C# you find:
Classes can contain other classes whereas algebraic datatypes can refer to each other but cannot contain the definitions of each other.
Class hierarchies can be arbitrarily deep whereas algebraic datatypes are one-level deep: the type contains its type constructors and that is it.
New classes can be derived from old classes so classes are extensible types whereas algebraic datatypes are usually (but not always) closed.
So ADTs are not really "akin" to classes because they only solve one specific problem: class hierarchies that are one level deep. In this sense we can see two approximate observations:
ADTs require composition over inheritance.
Classes make it easy to extend a type but hard to extend the set of member functions whereas ADTs make it easy to extend functions over the type but hard to extend the type.
You might also look at the GoF design patterns. They have been expressed using classes in C++. The functional equivalents are not always ADTs but, rather, things like lambda functions instead of the command pattern and higher-order functions like map and fold instead of the visitor pattern and so on.

What is the disadvantage of list as a universal data type representation?

Lisp programmers tend to use lists to represent all other data types.
However, I have heard that lists are not a good universal representation for data types.
What are the disadvantage of lists being used in this manner, in contrast to using records?
You mention "record". By this I take it that you're referring to fixed-element structs/objects/compound data. For instance, in HtDP syntax:
;; a packet is (make-packet destination source text) where destination is a number,
;; source is a number, and text is a string.
... and you're asking about the pros and cons of representing a packet as a list of length three,
rather than as a piece of compound data (or "record").
In instances where compound data is appropriate--the values have specific roles and names, and there are a fixed number of them--compound data is generally preferable; they help you to catch errors in your programs, which is the sine qua non of programming.
The disadvantage is that it isn't universal. Sometimes this is performance related: you want constant time lookups (array, hash table). Sometimes this is organization related: you want to name your data locations (Hash table, record ... although you could use name,value pairs in the list). It requires a little diligence on the part of the author to make the code self-documenting (more diligence than the record). Sometimes you want the type system to catch mistakes made by putting things in the wrong spot (record, typed tuples).
However, most issues can be addressed with OptimizeLater. The list is a versatile little data structure.
You're talking about what Peter Seibel addresses in Chapter 11 of Practical Common Lisp:
[Starting] discussion of Lisp's collections
with lists . . . often leads readers to the mistaken
conclusion that lists are Lisp's only collection type. To make matters
worse, because Lisp's lists are such a flexible data structure, it is
possible to use them for many of the things arrays and hash tables are
used for in other languages. But it's a mistake to focus too much on
lists; while they're a crucial data structure for representing Lisp
code as Lisp data, in many situations other data structures are more
appropriate.
Once you're familiar with all the data types Common Lisp offers,
you'll also see that lists can be useful for prototyping data
structures that will later be replaced with something more efficient
once it becomes clear how exactly the data is to be used.
Some reasons I see are:
A large hashtable, for example, has faster access than the equivalent alist
A vector of a single datatype is more compact and, again, faster to access
Vectors are more efficiently and easily accessed by index
Objects and structures allow you to access data by name, not position
It boils down to using the right datatype for the task at hand. When it's not obvious, you have two options: guess and fix it later, or figure it out now; either of those is sometimes the right approach.

Map/Reduce: any theoretical foundation beyond "howto"?

For a while I was thinking that you just need a map to a monoid, and then reduce would do reduction according to monoid's multiplication.
First, this is not exactly how monoids work, and second, this is not exactly how map/reduce works in practice.
Namely, take the ubiquitous "count" example. If there's nothing to count, any map/reduce engine will return an empty dataset, not a neutral element. Bummer.
Besides, in a monoid, an operation is defined for two elements. We can easily extend it to finite sequences, or, due to associativity, to finite ordered sets. But there's no way to extend it to arbitrary "collections" unless we actually have a σ-algebra.
So, what's the theory? I tried to figure it out, but I could not; and I tried to go Google it but found nothing.
I think the right way to think about map-reduce is not as a computational paradigm in its own right, but rather as a control flow construct similar to a while loop. You can view while as a program constructor with two arguments, a predicate function and an arbitrary program. Similarly, the map-reduce construct has two arguments named map and reduce, each functions. So analogously to while, the useful questions to ask are about proving correctness of constructed programs relative to given preconditions and postconditions. And as usual, those questions involve (a) termination and run-time performance and (b) maintenance of invariants.

Resources