How can a time function exist in functional programming?

How can a time function exist in functional programming? - functional-programming

I've to admit that I don't know much about functional programming. I read about it from here and there, and so came to know that in functional programming, a function returns the same output, for same input, no matter how many times the function is called. It's exactly like a mathematical function which evaluates to the same output for the same value of the input parameters which involves in the function expression.
For example, consider this:
f(x,y) = x*x + y; // It is a mathematical function
No matter how many times you use f(10,4), its value will always be 104. As such, wherever you've written f(10,4), you can replace it with 104, without altering the value of the whole expression. This property is referred to as referential transparency of an expression.
As Wikipedia says (link),
Conversely, in functional code, the output value of a function depends only on the arguments that are input to the function, so calling a function f twice with the same value for an argument x will produce the same result f(x) both times.
Can a time function (which returns the current time) exist in functional programming?
If yes, then how can it exist? Does it not violate the principle of functional programming? It particularly violates referential transparency which is one of the property of functional programming (if I correctly understand it).
Or if no, then how can one know the current time in functional programming?

Yes and no.
Different functional programming languages solve them differently.
In Haskell (a very pure one) all this stuff has to happen in something called the I/O Monad - see here.
You can think of it as getting another input (and output) into your function (the world-state) or easier as a place where "impureness" like getting the changing time happens.
Other languages like F# just have some impureness built in and so you can have a function that returns different values for the same input - just like normal imperative languages.
As Jeffrey Burka mentioned in his comment:
Here is the nice introduction to the I/O Monad straight from the Haskell wiki.

Another way to explain it is this: no function can get the current time (since it keeps changing), but an action can get the current time. Let's say that getClockTime is a constant (or a nullary function, if you like) which represents the action of getting the current time. This action is the same every time no matter when it is used so it is a real constant.
Likewise, let's say print is a function which takes some time representation and prints it to the console. Since function calls cannot have side effects in a pure functional language, we instead imagine that it is a function which takes a timestamp and returns the action of printing it to the console. Again, this is a real function, because if you give it the same timestamp, it will return the same action of printing it every time.
Now, how can you print the current time to the console? Well, you have to combine the two actions. So how can we do that? We cannot just pass getClockTime to print, since print expects a timestamp, not an action. But we can imagine that there is an operator, >>=, which combines two actions, one which gets a timestamp, and one which takes one as argument and prints it. Applying this to the actions previously mentioned, the result is... tadaaa... a new action which gets the current time and prints it. And this is incidentally exactly how it is done in Haskell.
Prelude> System.Time.getClockTime >>= print
Fri Sep 2 01:13:23 東京 (標準時) 2011
So, conceptually, you can view it in this way: A pure functional program does not perform any I/O, it defines an action, which the runtime system then executes. The action is the same every time, but the result of executing it depends on the circumstances of when it is executed.
I don't know if this was any clearer than the other explanations, but it sometimes helps me to think of it this way.

In Haskell one uses a construct called monad to handle side effects. A monad basically means that you encapsulate values into a container and have some functions to chain functions from values to values inside a container. If our container has the type:
data IO a = IO (RealWorld -> (a,RealWorld))
we can safely implement IO actions. This type means: An action of type IO is a function, that takes a token of type RealWorld and returns a new token, together with a result.
The idea behind this is that each IO action mutates the outside state, represented by the magical token RealWorld. Using monads, one can chain multiple functions that mutate the real world together. The most important function of a monad is >>=, pronounced bind:
(>>=) :: IO a -> (a -> IO b) -> IO b
>>= takes one action and a function that takes the result of this action and creates a new action out of this. The return type is the new action. For instance, let's pretend there is a function now :: IO String, which returns a String representing the current time. We can chain it with the function putStrLn to print it out:
now >>= putStrLn
Or written in do-Notation, which is more familiar to an imperative programmer:
do currTime <- now
putStrLn currTime
All this is pure, as we map the mutation and information about the world outside to the RealWorld token. So each time, you run this action, you get of course a different output, but the input is not the same: the RealWorld token is different.

Most functional programming languages are not pure, i.e. they allow functions to not only depend on their values. In those languages it is perfectly possible to have a function returning the current time. From the languages you tagged this question with this applies to Scala and F# (as well as most other variants of ML).
In languages like Haskell and Clean, which are pure, the situation is different. In Haskell the current time would not be available through a function, but a so-called IO action, which is Haskell's way of encapsulating side effects.
In Clean it would be a function, but the function would take a world value as its argument and return a fresh world value (in addition to the current time) as its result. The type system would make sure that each world value can be used only once (and each function which consumes a world value would produces a new one). This way the time function would have to be called with a different argument each time and thus would be allowed to return a different time each time.

"Current time" is not a function. It is a parameter. If your code depends on current time, it means your code is parameterized by time.

It can absolutely be done in a purely functional way. There are several ways to do it, but the simplest is to have the time function return not just the time but also the function you must call to get the next time measurement.
In C# you could implement it like this:
// Exposes mutable time as immutable time (poorly, to illustrate by example)
// Although the insides are mutable, the exposed surface is immutable.
public class ClockStamp {
public static readonly ClockStamp ProgramStartTime = new ClockStamp();
public readonly DateTime Time;
private ClockStamp _next;
private ClockStamp() {
this.Time = DateTime.Now;
}
public ClockStamp NextMeasurement() {
if (this._next == null) this._next = new ClockStamp();
return this._next;
}
}
(Keep in mind that this is an example meant to be simple, not practical. In particular, the list nodes can't be garbage collected because they are rooted by ProgramStartTime.)
This 'ClockStamp' class acts like an immutable linked list, but really the nodes are generated on demand so they can contain the 'current' time. Any function that wants to measure the time should have a 'clockStamp' parameter and must also return its last time measurement in its result (so the caller doesn't see old measurements), like this:
// Immutable. A result accompanied by a clockstamp
public struct TimeStampedValue<T> {
public readonly ClockStamp Time;
public readonly T Value;
public TimeStampedValue(ClockStamp time, T value) {
this.Time = time;
this.Value = value;
}
}
// Times an empty loop.
public static TimeStampedValue<TimeSpan> TimeALoop(ClockStamp lastMeasurement) {
var start = lastMeasurement.NextMeasurement();
for (var i = 0; i < 10000000; i++) {
}
var end = start.NextMeasurement();
var duration = end.Time - start.Time;
return new TimeStampedValue<TimeSpan>(end, duration);
}
public static void Main(String[] args) {
var clock = ClockStamp.ProgramStartTime;
var r = TimeALoop(clock);
var duration = r.Value; //the result
clock = r.Time; //must now use returned clock, to avoid seeing old measurements
}
Of course, it's a bit inconvenient to have to pass that last measurement in and out, in and out, in and out. There are many ways to hide the boilerplate, especially at the language design level. I think Haskell uses this sort of trick and then hides the ugly parts by using monads.

I am surprised that none of the answers or comments mention coalgebras or coinduction. Usually, coinduction is mentioned when reasoning about infinite data structures, but it is also applicable to an endless stream of observations, such as a time register on a CPU. A coalgebra models hidden state; and coinduction models observing that state. (Normal induction models constructing state.)
This is a hot topic in Reactive Functional Programming. If you're interested in this sort of stuff, read this: http://digitalcommons.ohsu.edu/csetech/91/ (28 pp.)
Kieburtz, Richard B., "Reactive functional programming" (1997). CSETech. Paper 91 (link)

Yes, it's possible for a pure function to return the time, if it's given that time as a parameter. Different time argument, different time result. Then form other functions of time as well and combine them with a simple vocabulary of function(-of-time)-transforming (higher-order) functions. Since the approach is stateless, time here can be continuous (resolution-independent) rather than discrete, greatly boosting modularity. This intuition is the basis of Functional Reactive Programming (FRP).

Yes! You are correct! Now() or CurrentTime() or any method signature of such flavour is not exhibiting referential transparency in one way. But by instruction to the compiler it is parameterized by a system clock input.
By output, Now() might look like not following referential transparency. But actual behaviour of the system clock and the function on top of it is adheres to
referential transparency.

Yes, a getting time function can exist in functional programming using a slightly modified version on functional programming known as impure functional programming (the default or the main one is pure functional programming).
In case of getting the time (or reading file, or launching missile) the code needs to interact with the outer world to get the job done and this outer world is not based on the pure foundations of functional programming. To allow a pure functional programming world to interact with this impure outside world, people have introduced impure functional programming. After all, software which doesn't interact with the outside world isn't any useful other than doing some mathematical computations.
Few functional programming programming languages have this impurity feature inbuilt in them such that it is not easy to separate out which code is impure and which is pure (like F#, etc.) and some functional programming languages make sure that when you do some impure stuff that code is clearly stand out as compared to pure code, like Haskell.
Another interesting way to see this would be that your get time function in functional programming would take a "world" object which has the current state of the world like time, number of people living in the world, etc. Then getting time from which world object would be always pure i.e you pass in the same world state you will always get the same time.

Your question conflates two related measures of a computer language: functional/imperative and pure/impure.
A functional language defines relationships between inputs and outputs of functions, and an imperative language describes specific operations in a specific order to perform.
A pure language does not create or depend on side effects, and an impure language uses them throughout.
One-hundred percent pure programs are basically useless. They may perform an interesting calculation, but because they cannot have side effects they have no input or output so you would never know what they calculated.
To be useful at all, a program has to be at least a smidge impure. One way to make a pure program useful is to put it inside a thin impure wrapper. Like this untested Haskell program:
-- this is a pure function, written in functional style.
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
-- This is an impure wrapper around the pure function, written in imperative style
-- It depends on inputs and produces outputs.
main = do
putStrLn "Please enter the input parameter"
inputStr <- readLine
putStrLn "Starting time:"
getCurrentTime >>= print
let inputInt = read inputStr -- this line is pure
let result = fib inputInt -- this is also pure
putStrLn "Result:"
print result
putStrLn "Ending time:"
getCurrentTime >>= print

You're broaching a very important subject in functional programming, that is, performing I/O. The way many pure languages go about it is by using embedded domain-specific languages, e.g., a sublanguage whose task it is to encode actions, which can have results.
The Haskell runtime for example expects me to define an action called main that is composed of all actions that make up my program. The runtime then executes this action. Most of the time, in doing so it executes pure code. From time to time the runtime will use the computed data to perform I/O and feeds back data back into pure code.
You might complain that this sounds like cheating, and in a way it is: by defining actions and expecting the runtime to execute them, the programmer can do everything a normal program can do. But Haskell's strong type system creates a strong barrier between pure and "impure" parts of the program: you cannot simply add, say, two seconds to the current CPU time, and print it, you have to define an action that results in the current CPU time, and pass the result on to another action that adds two seconds and prints the result. Writing too much of a program is considered bad style though, because it makes it hard to infer which effects are caused, compared to Haskell types that tell us everything we can know about what a value is.
Example: clock_t c = time(NULL); printf("%d\n", c + 2); in C, vs. main = getCPUTime >>= \c -> print (c + 2*1000*1000*1000*1000) in Haskell. The operator >>= is used to compose actions, passing the result of the first to a function resulting in the second action. This looking quite arcane, Haskell compilers support syntactic sugar that allows us to write the latter code as follows:
type Clock = Integer -- To make it more similar to the C code
-- An action that returns nothing, but might do something
main :: IO ()
main = do
-- An action that returns an Integer, which we view as CPU Clock values
c <- getCPUTime :: IO Clock
-- An action that prints data, but returns nothing
print (c + 2*1000*1000*1000*1000) :: IO ()
The latter looks quite imperative, doesn't it?

If yes, then how can it exist? Does it not violate the principle of
functional programming? It particularly violates referential
transparency
It does not exist in a purely functional sense.
Or if no, then how can one know the current time in functional
programming?
It may first be useful to know how a time is retrieved on a computer. Essentially there is onboard circuitry that keeps track of the time (which is the reason a computer would usually need a small cell battery). Then there might be some internal process that sets the value of time at a certain memory register. Which essentially boils down to a value that can be retrieved by the CPU.
For Haskell, there is a concept of an 'IO action' which represents a type that can be made to carry out some IO process. So instead of referencing a time value we reference a IO Time value. All this would be purely functional. We aren't referencing time but something along the lines of 'read the value of the time register'.
When we actually execute the Haskell program, the IO action would actually take place.

It can be answered without introducing other concepts of FP.
Possibility 1: time as function argument
A language consists of
language core and
standard library.
Referential transparency is a property of language core, but not standard library. By no means is it a property of programs written in that language.
Using OP's notation, should one have a function
f(t) = t*v0 + x0; // mathematical function that knows current time
They would ask standard library to get current time, say 1.23, and compute the function with that value as an argument f(1.23) (or just 1.23*v0 + x0, referential transparency!). That way the code gets to know the current time.
Possibility 2: time as return value
Answering OP's question:
Can a time function (which returns the current time) exist in functional programming?
Yes, but that function has to have an argument and you would have to compute it with different inputs so it returns different current time, otherwise it would violate the principals of FP.
f(s) = t(s)*v0 + x0; // mathematical function t(s) returns current time
This is an alternative approach to what I've described above. But then again, the question of obtaining those different inputs s in the first place still comes down to standard library.
Possibility 3: functional reactive programming
The idea is that function t() evaluates to current time paired with function t2. When one needs current time again later they are to call t2(), it will then give function t3 and so on
(x, t2) = t(); // it's x o'clock now
...
(x2, t3) = t2(); // now it's already x2 o'clock
...
t(); x; // both evaluate to the initial time, referential transparency!
There's more to FP but I believe it's out of the scope of OP. For example, how does one ask standard library to compute a function and act upon its return value in purely functional way: that one is rather about side effects than referential transparency.

How can a time function exist in functional programming?
Back in 1988, Dave Harrison was facing this very question when defining an early functional language with real-time processing facilities. The solution he chose for Ruth can be found on page 50 of his thesis Functional Real-Time Programming: The Language Ruth And Its Semantics:
A unique clock is automatically supplied to each Ruth process at run-time, to provide real-time information, [...]
So how are these clocks defined? From page 61:
A clock tree is composed of a node holding a non-negative integer denoting the current time and two sub-trees containing the times of future events.
Furthermore:
As the tree is (lazily) evaluated each of the nodes is instantiated with the value of system time at the time at which the node is instantiated, thus giving programs reference to the current time.
Translating that into Haskell:
type Clock = Tree Time
type Time = Integer -- must be zero or larger
data Tree a = Node { contents :: a,
left :: Tree a,
right :: Tree a }
In addition to accessing the current time (with contents), each Ruth process can provide other clocks (with left and right) for use elsewhere in the program. If a process needs the current time more than once, it must use a new node on each occasion - once instantiated, a node's contents remains constant.
So that's how a time function can exist in a functional language: by always being applied to a unique input value (a tree-of-times in this case) wherever it's called.

Related

Do functional language compilers optimize a "filter, then map" operation on lists into a single pass?

I mostly use TypeScript during my work day and when applying functional patterns I oftentimes see a pattern like:
const someArray = anotherArray.filter(filterFn).map(transformFn)
This code will filter through all of anotherArray's items and then go over the filtered list (which may be identical if no items are filtered) again and map things. In other words, we iterate over the array twice.
This behavior could be achieved with a "single pass" over the array with a reduce:
const someArray = anotherArray.reduce((acc, item) => {
if (filterFn(item) === false) {
return acc;
}
acc.push(item);
return acc;
}, [])
I was wondering if such optimization is something the transpiler (in the TS world) knows to do automatically and whether such optimizations are automatically done in more "functional-first" languages such as Clojure or Haskell. For example, I know that functional languages usually do optimizations with tail recursion, so I was wondering also about the "filter then map" case. Is this something that compilers actually do?

First of all, you usually shouldn't obsess about getting everything into a single pass. On small containers there is not that much difference between running a single-operation loop twice and running a dual-operation loop once. Aim to write code that's easily understandable. One for loop might be more readable than two, but a reduce is not more readable than a filter then map.
What the compiler does depends on your "container." When your loop is big enough to care about execution time, it's usually also big enough to care about memory consumption. So filtering then mapping on something like an observable works on one element at a time, all the way through the pipeline, before processing the next element. This means you only need memory for one element, even though your observable could be infinite.

Are there any languages that handle functional impurity (side effects) without modeling them as RealWorld or IO?

One thing that always bothered me in Haskell (and other functional languages, for that matter) is that the entire language is pure, but side-effects are indirectly allowed by using an object that represents the entire "real world" (the IO monad, for example).
I wonder, are there languages that handle this without modeling the entire world? For example, representing network input as a byte array that is lazily filled as the network input is read.

I've never really liked the real world analogy. I think it's popular because most people's first exposure to parametric polymorphism is containers, so their brain wants to know what an IO "contains." Really, it contains a lazily-evaluated sort of syntax tree data structure that is later interpreted to produce the side effects it describes, but that data structure isn't exposed to the user except through the much more abstract IO type.
At any rate, aside from #phipsgabler's excellent answer about what Haskell previously used, some kind of IO type is used pretty much everywhere people want pure FP nowadays. However, it's a sort of low-level, edge of your program abstraction. Many abstractions are built on top of it.
One example is functional reactive programming, which has several variations, but basically sets up streams of events over time. Elm has a command/subscription model.
Also, libraries usually set up abstractions that make sense for their domain, like web services are often modeled as a function that is called with a Request object and returns an IO of a Response object. Further down the stack from that function, where side effects aren't needed, your interface is just pure types, like a function that takes a User object and returns HTML for that user's profile.
But at some point, whether you call it an IO or a Command or an Observable, it all boils down to the very powerful idea of separating the specification of what side effects you want from the actual execution of those effects. The form may differ, but the fundamental concept isn't going away anytime soon.

Haskell itself was specified to use something like you describe before IO (the monad) was invented, under the name dialogues. The following example is from Imperative functional programming (the seminal paper on IO) by Peyton Jones and Wadler, 1993:
type Dialogue = [Response] -> [Request]
main :: Dialogue
data Request = Putc Char | Getc
data Response = OK | OKCh Char
echo :: Dialogue
echo resps = Getc :
if (a == eof)
then []
else Putc a :
echo (drop 2 resps)
where
OKCh a = resps !! 1
It is preceded by the explanation:
The I/O system specifed for the Haskell language (Hudak et al. [1992])
is based on dialogues, also called lazy streams (Dwelly [1989];
O'Donnell [1985]; Thompson [1989]). In Haskell, the value of the
program has type Dialogue, a synonym for a function between a list
of I/O responses to a list of I/O requests.
and concluded with some difficulties:
It is easy to extract the wrong element of the responses, a synchronisation error,
The Response data type has to contain a constructor for every possible response to every request, and
even more seriously, the style is not composable.

In terms of design and when writing a library, when should I use a pointer as an argument, and when should I not?

Sorry if my question seems stupid. My background is in PHP, Ruby, Python, Lua and similar languages, and I have no understanding of pointers in real-life scenarios.
From what I've read on the Internet and what I've got as responses in a question I asked (When is a pointer idiomatic?), I have understood that:
Pointers should be used when copying large data. Instead of getting the whole object hierarchy, receive its address and access it.
Pointers have to be used when you have a function on a struct that modifies it.
So, pointers seem like a great thing: I should just always get them as function arguments because they are so lightweight, and it's okay if I somehow end up not needing to modify anything on the struct.
However, looking at that statement intuitively, I can feel that it sounds very creepy, and yet I don't know why.
So, as someone who is designing a struct and its related functions, or just functions, when should I receive a pointer? When should I receive a value, and why?
In other words, when should my NewAuthor method return &Author{ ... }, and when should it return Author{ ... }? When should my function get a pointer to an author as an argument, and when should it just get the value (a copy) of type Author?

There's tradeoffs for both pointers and values.
Generally speaking, pointers will point to some other region of memory in the system. Be it the stack of the function that wants to pass a pointer to a local variable or some place on the heap.
func A() {
i := 25
B(&i) // A sets up stack frame to call B,
// it copies the address of i so B can look it up later.
// At this point, i is equal to 30
}
func B(i *int){
// Here, i points to A's stack frame.
// For this to execute, I look at my variable "i",
// see the memory address it points to, then look at that to get the value of 25.
// That address may be on another page of memory,
// causing me to have to look it up from main memory (which is slow).
println(10 + (*i))
// Since I have the address to A's local variable, I can modify it.
*i = 30
}
Pointers require me to de-reference them constantly whenever I was to see the data it points to. Sometimes you don't care. Other times it matters a lot. It really depends on the application.
If that pointer has to be de-referenced a lot (ie: you pass in a number to use in a bunch of different calcs), then you keep paying the cost.
Compared to using values:
func A() {
i := 25
B(i) // A sets up the stack frame to call B, copying in the value 25
// i is still 25, because A gave B a copy of the value, and not the address.
}
func B(i int){
// Here, i is simply on the stack. I don't have to do anything to use it.
println(10 + i)
// Since i here is a value on B's stack, modifications are not visible outside B's scpe
i = 30
}
Since there's nothing to dereference, it's basically free to use the local variable.
The down side of passing values happens if those values are large because copying data to the stack isn't free.
For an int it's a wash because pointers are "int" sized. For a struct, or an array, you are copying all the data.
Also, large objects on the stack can make the stack extra big. Go handles this well with stack re-allocation, but in high performance scenarios, it may be too much of an impact to performance.
There's a data safety aspect as well (can't modify something I pass by value), but I don't feel that is usually an issue in most code bases.
Basically, if your problem was already solvable by ruby, python or other language without value types, then these performance nuances don't super-matter.
In general, passing structs as pointers will usually do "the right thing" while learning the language.
For all other types, or things that you want to keep as read-only, pass values.
There are exceptions to that rule, but it's best that you learn those as needs arise rather than try to redefine your world all at once. If that makes sense.

Simply you can use pointers anywhere you want, sometimes you don't want to change your data. It may stand for abstract data, and you don't want to explicitly copy the data. Just pass by value and let compiler do its job.

Does 'foldp' violate FP's no mutable state principle?

I'm learning about Elm from Seven More Languages in Seven Weeks. The following example confuses me:
import Keyboard
main = lift asText (foldp (\dir presses -> presses + dir.x) 0 Keyboard.arrows)
foldp is defined as:
Signal.foldp : (a -> b -> b) -> b -> Signal a -> Signal b
It appears to me that:
the initial value of the accumulator presses is only 0 on the first evaluation of main
after the first evaluation of main it seems that the initial value of presses is whatever the result of function (a -> b -> b), or (\dir presses -> presses + dir.x) in the example, was on the previous evaluation.
If this is indeed the case, then isn't this a violation of functional programming principles, since main now maintains internal state (or at least foldp does)?
How does this work when I use foldp in multiple places in my code? Does it keep multiple internal states, one for each time I use it?
The only other alternative I see is that foldp (in the example) starts counting from 0, so to say, each time it's evaluated, and somehow folds up the entire history provided by Keyboard.arrows. This seems to me to be extremely wasteful and sure to cause out-of-memory exceptions for long run times.
Am I missing something here?

How it works
Yes, foldp keeps some internal state around. Saving the entire history would be wasteful and is not done.
If you use foldp multiple times in your code, doing distinct things or having distinct input signals, then each instance will keep it's own local state. Example:
import Keyboard
plus = (foldp (\dir presses -> presses + dir.x) 0 Keyboard.arrows)
minus = (foldp (\dir presses -> presses - dir.x) 0 Keyboard.arrows)
showThem p m = flow down (map asText [p, m])
main = lift2 showThem plus minus
But if you use the resulting signal from a foldp twice, only one foldp instance will be in your compiled program, the resulting changes will just be used in two place:
import Keyboard
plus = (foldp (\dir presses -> presses + dir.x) 0 Keyboard.arrows)
showThem p m = flow down (map asText [p, m])
main = lift2 showThem plus plus
The main question
If this is indeed the case, then isn't this a violation of functional programming principles, since main now maintains internal state (or at least foldp does)?
Functional programming doesn't have some great canonical definition that everybody uses. There are many examples of functional programming languages that allow for the use of mutable state. Some of these programming languages show you that a value is mutable in the type-system (you could see Haskell's State a type as such, it really depends on your viewpoint though).
But what is mutable state? What is a mutable value? It's a value inside the program, that is mutable. That is, it can change. It can be different things at different times. Ah, but we know how Elm calls values at change over time! That's a Signal.
So really a Signal in Elm is a value that can change over time, and can therefore be seen as a variable, a mutable value, or mutable state. It's just that we manage this value very strictly by allowing only a few well-chosen manipulations on Signals. Such a Signal can be based on other Signals in your program, or come from a library or come from the outside world (think of inputs like Mouse.position). And who knows how the outside world came up with that signal! So allowing your own Signals to be based on the past value of Signals is actually ok.
Conclusion / TL;DR
You could see Signal as a safety wrapper around mutable state. We assume that signals that come from the outside world (as input to your program) are not predictable, but because we have this safety wrapper that only allows lift/sample/filter/foldp, the program you write is otherwise completely predictable. Side-effects are contained and managed, therefore I think it's still "functional programming".

You're confusing an implementation detail with a conceptual detail. Every functional programming language eventually gets translated down to assembly code, which is decidedly imperative. That doesn't mean you can't have purity at the language level.
Don't think of main as being repeatedly evaluated, returning different results every time. A Signal is conceptually an infinite list of values. main takes an infinite list of keyboard arrows as input and translates that into an infinite list of elements. Given the same list of arrows, it will always return the exact same list of elements, without side effects. At this level of abstraction, it is therefore a pure function.
Now, it so happens that we are only interested in the last element of the sequence. This allows for some optimizations in the implementation, one of which is storing the accumulated value. What's important is that the implementation is referentially transparent. From the language's point of view, you're getting the exact same answer as if you stored the entire sequence and recomputed it from scratch every time a value is added to the end. You get the same output given the same input. The only difference is storage space and execution time.
In other words, the whole idea of functional programming is not to eliminate state tracking, but to abstract it away from the purview of the programmer. Programmers get to play in the ideal world, while the compiler and runtime slave away in the sewers of mutable state to make the ideal world possible for the rest of us.

You should note that "doesnt maintain internal state" isn't really strong definition of FP. Its much more like an implementation constraint. What definition I like more is "built from pure functions". Without diving deep, in plain English it means that all functions return same output when given same input. This definition unlike previous gives you huge reasoning power and a simple way to check whether some program follows it while keeping some optimization space on current hardware.
Given reformulated restriction functional languages are free to use mutables as long as it modelled with pure functions. Answering your question, elm programs built out of pure functions so its probably a functional language. Elm uses special data structure, Signal, to model outside world interactions and internal state as well as any other functional language does.

Are higher order functions on collections guaranteed to be executed sequentially?

In another question, a user suggested to write code like to that:
def list = ['a', 'b', 'c', 'd']
def i = 0;
assert list.collect { [i++] } == [0, 1, 2, 3]
Such code is, in other languages, considered bad practice because the content of collect changes the state of it's context (here it changes the value of i). In other words, the closure has side-effects.
Such higher order functions should be able to run the closure in parallel, and assemble it in a new list again. If the processing in the closure are long, CPU intensive operations, it may be worth executing them in separate threads. It would be easy to change collect to use an ExecutorCompletionService to achieve that, but it would break the above code.
Another example of a problem is if, for some reason, collect browse the collection in, say, reverse order, in which case the result would be [3, 2, 1, 0]. Note that in this case, the list have not been reverted, 0 is really the result of applying the closure to 'd'!
Interestingly, these functions are documented with "Iterates through this collection" in Collection's JavaDoc, which suggests the iteration is sequential.
Does the groovy specification explicitly defines the order of execution in higher order functions like collect or each? Is the above code broken, or is it OK?

I don't like explicit external variables being relied upon in my closures for the reasons you give above.
Indeed, the less variables I have to define, the happier I am ;-)
For the possibly parallel things as well, always code with a view to wrapping it with some level of GPars loveliness should it prove too much for a single thread to handle. For this, as you say, you want as little mutability as possible and to try and completely avoid side-effects (such as the external counter pattern above)
As for the question itself, if we take collect as an example function, and examine the source code, we can see that given an Object (Collection and Map are done in a similar way with slight differences as to how the Iterator is referenced) it iterates along InvokerHelper.asIterator(self), adding the result of each closure call to the resultant list.
InvokerHelper.asIterator (again source is here) basically calls the iterator() method on the Object passed in.
So for Lists, etc it will iterate down the objects in the order defined by the iterator.
It is therefore possible to compose your own class which follows the Iterable interface design (doesn't need to implement Iterable though, thanks to duck-typing), and define how the collection will be iterated.
I think by asking about the Groovy specification though, this answer might not be what you want, but I don't think there is an answer. Groovy has never really had a 'complete' specification (indeed this is point about groovy that some people dislike).

I think keeping the functions passed collect or findAll side-effect free is a good idea in general, not only for keeping the complexity low but making the code more parallel-friendly in case parallel execution is needed in the future.
But in the case of each there is not much point in keeping the function side-effect free, as it wouldn't do anything (in fact the sole purpose of this method is to replace act as a for-each loop). The Groovy's documentation have some examples of using each (and its variants, eachWithIndex and reverseEach) that require an execution order to be defined.
Now, from a pragmatic point of view, I think it can sometimes be OK to use functions with some side effects in methods like collect. For example, to transform a list in [index, value] pairs a transpose and range can be used
def list = ['a', 'b', 'c']
def enumerated = [0..<list.size(), list].transpose()
assert enumerated == [[0,'a'], [1,'b'], [2,'c']]
Or even an inject
def enumerated = list.inject([]) { acc, val -> acc << [acc.size(), val] }
But a collect and a counter does the trick too and I think the result is the most readable:
def n = 0, enumerated = list.collect{ [n++, it] }
Now, this example wouldn't make sense if Groovy provided acollect and similar methods with a index-value-param function (see Jira issue), but it kinda shows that sometimes practicality beats purity IMO :)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex