What is the disadvantage of list as a universal data type representation? - functional-programming

Lisp programmers tend to use lists to represent all other data types.
However, I have heard that lists are not a good universal representation for data types.
What are the disadvantage of lists being used in this manner, in contrast to using records?

You mention "record". By this I take it that you're referring to fixed-element structs/objects/compound data. For instance, in HtDP syntax:
;; a packet is (make-packet destination source text) where destination is a number,
;; source is a number, and text is a string.
... and you're asking about the pros and cons of representing a packet as a list of length three,
rather than as a piece of compound data (or "record").
In instances where compound data is appropriate--the values have specific roles and names, and there are a fixed number of them--compound data is generally preferable; they help you to catch errors in your programs, which is the sine qua non of programming.

The disadvantage is that it isn't universal. Sometimes this is performance related: you want constant time lookups (array, hash table). Sometimes this is organization related: you want to name your data locations (Hash table, record ... although you could use name,value pairs in the list). It requires a little diligence on the part of the author to make the code self-documenting (more diligence than the record). Sometimes you want the type system to catch mistakes made by putting things in the wrong spot (record, typed tuples).
However, most issues can be addressed with OptimizeLater. The list is a versatile little data structure.

You're talking about what Peter Seibel addresses in Chapter 11 of Practical Common Lisp:
[Starting] discussion of Lisp's collections
with lists . . . often leads readers to the mistaken
conclusion that lists are Lisp's only collection type. To make matters
worse, because Lisp's lists are such a flexible data structure, it is
possible to use them for many of the things arrays and hash tables are
used for in other languages. But it's a mistake to focus too much on
lists; while they're a crucial data structure for representing Lisp
code as Lisp data, in many situations other data structures are more
appropriate.
Once you're familiar with all the data types Common Lisp offers,
you'll also see that lists can be useful for prototyping data
structures that will later be replaced with something more efficient
once it becomes clear how exactly the data is to be used.
Some reasons I see are:
A large hashtable, for example, has faster access than the equivalent alist
A vector of a single datatype is more compact and, again, faster to access
Vectors are more efficiently and easily accessed by index
Objects and structures allow you to access data by name, not position
It boils down to using the right datatype for the task at hand. When it's not obvious, you have two options: guess and fix it later, or figure it out now; either of those is sometimes the right approach.

Related

What is the need for immutable/persistent data structures in erlang

Each Erlang process maintains its own private address space. All communication happens via copying without sharing (except big binaries). If each process is processing one message at a time with no concurrent access over its objects, I don't see why do we need immutable/persistent data structures.
Erlang was initially implemented in Prolog, which doesn't really use mutable data structures either (though some dialects do). So it started off without them. This makes runtime implementation simpler and faster (garbage collection in particular).
So adding mutable data structures would require a lot of effort, could introduce bugs, and Erlang programmers are nearly by definition at least willing to live without them.
Many actually consider their absence to be a positive good: less concern about object identity, no need for defensive copying because you don't know whether some other piece of code is going to modify the data you passed (or might be changed later to modify it), etc.
This absence does mean that Erlang is pretty unusable in some domains (e.g. high performance scientific computing), at least as the main language. But again, this means that nobody in these domains is going to use Erlang in the first place and so there's no particular incentive to make it usable at the cost of making existing users unhappy.
I remember seeing a mailing list post by Joe Armstrong quite a long time ago (which I couldn't find with a quick search now) saying that he initially planned to add mutable variables when he'd need them... except he never quite did, and performance was good enough for everything he was using Erlang for.
It is indeed the case that in Erlang immutability does not solve any "shared state" problems, as immutable data are "process local".
From the functional programming language perspective, however, immutability offers a number of benefits, summarized adequately in this Quora answer:
The simplest definition of functional programming is that it’s a programming
paradigm where you are transforming immutable data with functions.
The definition uses functions in the mathematical sense, where it’s
something that takes an input, and produces an output.
OO + mutability tends to violate that definition because when you want
to change a piece of data it generally will not return the output, it
will likely return void or unit, and that when you call a method on
the object the object itself isn’t input for the function.
As far as what advantages the paradigm has, composability, thread
safety, being able to track what went wrong where better, the ability
to sort of separate the data from the actual computation on it being
done, etc.
how would this work?
factorial(1) -> 1;
factorial(X) ->
X*factorial(X-1).
if you run factorial(4), a single process will be running the same function. Each time the function will have it's own value of X, if the value of X was in the scope of the process and not the function recursive functions wouldn't work. So first we need to understand scope. If you want to say that you don't see why data needs to be immutable within the scope of a single function/block you would have a point, but it would be a headache to think about where data is immutable and where it isn't.

An array- or vector-like type with values stored on disk in Julia

I am looking for an Array-like type with the following properties:
stores elements on disk
elements can have composite type
elements are read into memory, not the whole array
it is possible to write individual elements without writing the whole array
supports setindex!, getindex, push!, pop!, shift!, unshift! and maybe vcat
is reasonably efficient
So far I have found the following leads:
https://docs.julialang.org/en/latest/stdlib/SharedArrays/
http://juliadb.org
https://github.com/JuliaIO/JLD.jl
The first one seems promising, but it seems the type of the elements has to be isbits (meaning a simple number, some structs but not, e.g., an Array{Float64,1}). And it's not clear if the whole array contents are loaded into memory.
If it does not exist yet, I will of course try to construct it myself.
NCDatasets.jl addresses part of the requirements:
stores elements on disk: yes
elements can have composite type: no (although some support for composite type is in NetCDF4, but not yet in NCDatasets.jl). Currently you can have only Arrays of basic types and Arrays of Vectors (of basic types).
elements are read into memory, not the whole array: yes
it is possible to write individual elements without writing the whole array supports setindex!, getindex, push!, pop!, shift!, unshift! and maybe vcat: just setindex!, getindex
is reasonably efficient: the efficency is reasonable for me :-)
The project making it yourself sounds very interesting. I think it would server certainly a gap in the current ecosystem.
Some storage technologies that might be good to have a look at are:
HDF5 (for storage, cross-platform and cross-language)
JLD2 (successor of JLD) https://github.com/simonster/JLD2.jl
rasdaman (a "database" for arrays) http://www.rasdaman.org/
possibly also BSON http://bsonspec.org/
Maybe you can also reach out to the JuliaIO group.

Dictionary vs. hashtable

Can someone explain the difference between dictionaries and hashtables? In Java, I've read that dictionaries are a superset of hashtables, but I always thought it was the other way around. Other languages seem to treat the two as the same. When should one be used over the other, and what's the difference?
The Oxford Dictionary of Computing defines a dictionary as...
Any data structure representing a set of elements that can support the insertion and deletion of elements as well as a test for membership.
As such, dictionaries are an abstract idea that can be reasonably efficiently implemented as e.g. binary trees or hash tables, tries, or even direct array indexing if the keys are numeric and not too sparse. That said, python uses a closed-hashing hash table for its dict implementation, and C# seems to use some kind of hash table too (hence the need for a separate SortedDictionary type).
A hash table is a much more specific and concrete data structures: there are several implementations options (closed vs. open hashing being perhaps the most fundamental), but they're all characterised by O(1) amortised insertion, lookup and deletion, and there's no excuse for begin->end iteration worse than O(n + #buckets), while implementations may achieve better (e.g. GCC's C++ library has O(n) container iteration. The implementations necessarily depend on a hash function leading to an indexed probe in an array.
The way i see it, a hashtable is one way of implementing a dictionary. specifying that the key is hashfunction(x) and the value is any Object. The Java Dictionary can use any key as long as .equals(y) has been implemented for that object.
The 'answer' will also change depending on the language (C#? Java? JS?) you're using. in JS the 'dictionary' is implemented as a hashtable and there is no difference. ---- in another language (i believe it's C#), the Dictionary MUST be strongly typed fixed type key and fixed type value, while the Hashtable's value can be any type, and the two are not extended from one another.

Does sort function in OCaml use immutable or mutable data structure?

Does OCaml use mutable or immutable data structure in implementation of sort?
For many sort algorithms, we need to exchange data between positions in list or array or something.
I am just wondering, if OCaml is always intended to use immutable data structures, then each exchange operation will create a new copy?
Will that impact the performance?
For many sort algorithms, we need to exchange data between positions in list or array or something.
Not necessarily.
I am just wondering, if OCaml is always intended to use immutable data structures
No, OCaml has both mutable and immutable data structures. Using immutable data structures is generally preferred though.
then each exchange operation will create a new copy?
That would depend on the data structure in question. But generally you wouldn't want to express your sorting algorithm in terms of swapping individual elements when working with an immutable data structure (and as I indicated above, you certainly don't need to).
List.sort for example works on an immutable data structure (lists) and is perfectly efficient. It uses the merge sort algorithm (in current implementations of OCaml).

Does OCaml have general map()/reduce() functions?

In Python map() works on any data that follows the sequence protocol. It does The Right Thing^TM whether I feed it a string or a list or even a tuple.
Can't I have my cake in OCaml too? Do I really have no other choice but to look at the collection type I'm using and find a corresponding List.map or an Array.map or a Buffer.map or a String.map? Some of these don't even exist! Is what I'm asking for unusual? I must be missing something.
The closest you will get to this is the module Enum in OCaml Batteries Included (formerly of Extlib). Enum defines maps and folds over Enum.t; you just have to use a conversion to/from Enum.t for your datatype. The conversions can be fairly light-weight, because Enum.t is lazy.
What you really want is Haskell-style type classes, like Foldable and Functor (which generalizes "maps"). The Haskell libraries define instances of Foldable and Functor for lists, arrays, and trees. Another relevant technique is the "Scrap Your Boilerplate" approach to generic programming. Since OCaml doesn't support type classes or higher-kinded polymorphism, I don't think you'd be able to express patterns like these in its type system.
There are two main solutions in OCaml:
Jacques Garrigue already implemented a syntactically-light but inefficient approach for many data structures several years ago. You just wrap the collections in objects that provide a map method. Then you can do collection#map to use the map function for any kind of collection. This is more general than your requirements because it allows different kinds of data structures to be substituted at run time. However, this is not very useful in practice so the approach was never widely adopted.
A syntactically-heavier but efficient, robust and static solution is to use functors to parameterize your code over the data structure you are using. This makes it trivial to reuse your code with different data structures. See Markus Mottl's OCaml translations of Okasaki's book "Purely Functional Data Structures" for some great examples.
If you aren't looking for that kind of power and just want brevity then, of course, you can just create a module alias with a shorter name (e.g. module S = String).
The problem is that each container has a different representation and requires different code for map/reduce to iterate over it. This is why there are separate functions. Most languages provide some sort of general interface for containers (such as the sequence protocol you mentioned) so functions like map/reduce can be implemented abstractly, but this is not done for the types you mentioned.
As long as you define a type t and val compare (: t->t->int) in your module, Map.Make will give you the map you want.

Resources