Are higher order functions on collections guaranteed to be executed sequentially? - collections

In another question, a user suggested to write code like to that:
def list = ['a', 'b', 'c', 'd']
def i = 0;
assert list.collect { [i++] } == [0, 1, 2, 3]
Such code is, in other languages, considered bad practice because the content of collect changes the state of it's context (here it changes the value of i). In other words, the closure has side-effects.
Such higher order functions should be able to run the closure in parallel, and assemble it in a new list again. If the processing in the closure are long, CPU intensive operations, it may be worth executing them in separate threads. It would be easy to change collect to use an ExecutorCompletionService to achieve that, but it would break the above code.
Another example of a problem is if, for some reason, collect browse the collection in, say, reverse order, in which case the result would be [3, 2, 1, 0]. Note that in this case, the list have not been reverted, 0 is really the result of applying the closure to 'd'!
Interestingly, these functions are documented with "Iterates through this collection" in Collection's JavaDoc, which suggests the iteration is sequential.
Does the groovy specification explicitly defines the order of execution in higher order functions like collect or each? Is the above code broken, or is it OK?

I don't like explicit external variables being relied upon in my closures for the reasons you give above.
Indeed, the less variables I have to define, the happier I am ;-)
For the possibly parallel things as well, always code with a view to wrapping it with some level of GPars loveliness should it prove too much for a single thread to handle. For this, as you say, you want as little mutability as possible and to try and completely avoid side-effects (such as the external counter pattern above)
As for the question itself, if we take collect as an example function, and examine the source code, we can see that given an Object (Collection and Map are done in a similar way with slight differences as to how the Iterator is referenced) it iterates along InvokerHelper.asIterator(self), adding the result of each closure call to the resultant list.
InvokerHelper.asIterator (again source is here) basically calls the iterator() method on the Object passed in.
So for Lists, etc it will iterate down the objects in the order defined by the iterator.
It is therefore possible to compose your own class which follows the Iterable interface design (doesn't need to implement Iterable though, thanks to duck-typing), and define how the collection will be iterated.
I think by asking about the Groovy specification though, this answer might not be what you want, but I don't think there is an answer. Groovy has never really had a 'complete' specification (indeed this is point about groovy that some people dislike).

I think keeping the functions passed collect or findAll side-effect free is a good idea in general, not only for keeping the complexity low but making the code more parallel-friendly in case parallel execution is needed in the future.
But in the case of each there is not much point in keeping the function side-effect free, as it wouldn't do anything (in fact the sole purpose of this method is to replace act as a for-each loop). The Groovy's documentation have some examples of using each (and its variants, eachWithIndex and reverseEach) that require an execution order to be defined.
Now, from a pragmatic point of view, I think it can sometimes be OK to use functions with some side effects in methods like collect. For example, to transform a list in [index, value] pairs a transpose and range can be used
def list = ['a', 'b', 'c']
def enumerated = [0..<list.size(), list].transpose()
assert enumerated == [[0,'a'], [1,'b'], [2,'c']]
Or even an inject
def enumerated = list.inject([]) { acc, val -> acc << [acc.size(), val] }
But a collect and a counter does the trick too and I think the result is the most readable:
def n = 0, enumerated = list.collect{ [n++, it] }
Now, this example wouldn't make sense if Groovy provided acollect and similar methods with a index-value-param function (see Jira issue), but it kinda shows that sometimes practicality beats purity IMO :)

Related

Do functional language compilers optimize a "filter, then map" operation on lists into a single pass?

I mostly use TypeScript during my work day and when applying functional patterns I oftentimes see a pattern like:
const someArray = anotherArray.filter(filterFn).map(transformFn)
This code will filter through all of anotherArray's items and then go over the filtered list (which may be identical if no items are filtered) again and map things. In other words, we iterate over the array twice.
This behavior could be achieved with a "single pass" over the array with a reduce:
const someArray = anotherArray.reduce((acc, item) => {
if (filterFn(item) === false) {
return acc;
}
acc.push(item);
return acc;
}, [])
I was wondering if such optimization is something the transpiler (in the TS world) knows to do automatically and whether such optimizations are automatically done in more "functional-first" languages such as Clojure or Haskell. For example, I know that functional languages usually do optimizations with tail recursion, so I was wondering also about the "filter then map" case. Is this something that compilers actually do?
First of all, you usually shouldn't obsess about getting everything into a single pass. On small containers there is not that much difference between running a single-operation loop twice and running a dual-operation loop once. Aim to write code that's easily understandable. One for loop might be more readable than two, but a reduce is not more readable than a filter then map.
What the compiler does depends on your "container." When your loop is big enough to care about execution time, it's usually also big enough to care about memory consumption. So filtering then mapping on something like an observable works on one element at a time, all the way through the pipeline, before processing the next element. This means you only need memory for one element, even though your observable could be infinite.

What persistent data structures does Raku/Rakudo include?

Raku provides many types that are immutable and thus cannot be modified after they are created. Until I started looking into this area recently, my understanding was that these Types were not persistent data structures – that is, unlike the core types in Clojure or Haskell, my belief was that Raku's immutable types did not take advantage of structural sharing to allow for inexpensive copies. I thought that statement my List $new = (|$old-list, 42); literally copied the values in $old-list, without the data-sharing features of persistent data structures.
That description of my understanding is in the past tense, however, due to the following code:
my Array $a = do {
$_ = [rand xx 10_000_000];
say "Initialized an Array in $((now - ENTER now).round: .001) seconds"; $_}
my List $l = do {
$_ = |(rand xx 10_000_000);
say "Initialized the List in $((now - ENTER now).round: .001) seconds"; $_}
do { $a.push: rand;
say "Pushed the element to the Array in $((now - ENTER now).round: .000001) seconds" }
do { my $nl = (|$l, rand);
say "Appended an element to the List in $((now - ENTER now).round: .000001) seconds" }
do { my #na = |$l;
say "Copied List \$l into a new Array in $((now - ENTER now).round: .001) seconds" }
which produced this output in one run:
Initialized an Array in 5.938 seconds
Initialized the List in 5.639 seconds
Pushed the element to the Array in 0.000109 seconds
Appended an element to the List in 0.000109 seconds
Copied List $l into a new Array in 11.495 seconds
That is, creating a new List with the old values + one more is just as fast as pushing to a mutable Array, and dramatically faster than copying the List into a new Array – exactly the performance characteristics that you'd expect to see from a persistent List (copying to an Array is still slow because it can't take advantage of structural sharing without breaking the immutability of the List). The fast copying of $l into $nl is not due to either List being lazy; neither are.
All of the above leads me to believe that Lists in Rakudo actually are persistent data structures, with all the performance benefits that implies. That leaves me with several questions:
Am I right about Lists being persistent data structures?
Are all other immutable Types also persistent data structures? Or are any?
Is any of this part of Raku, or just an implementation choice Rakudo has made?
Are any of these performance characteristics documented/guaranteed anywhere?
I have to say, I am both extremely impressed and more than a bit baffled to discover evidence that at least some of Raku(do)'s types are persistent. It's the sort of feature that other languages list as a key selling point or that leads to the creation of libraries with 30k+ stars on GitHub. Have we really had it in Raku without even mentioning it?
I remember implementing these semantics, and I certainly don't recall thinking about them giving rise to a persistent data structure at the time - although it does seems fair to attach that label to the result!
I don't think you'll find anywhere that explicitly spells out this exact behavior, however the most natural implementation of things that are required by the language quite naturally leads to it. Taking the ingredients:
The infix:<,> operator is the List constructor in Raku
When a List is created, it is non-committal with regards to laziness and flattening (these arise from how we use the List, which we don't - in general - know at the point of its construction)
When we write (|$x, 1), the prefix:<|> operator constructs a Slip, which is a kind of List that should melt into its surrounding List. Thus what infix:<,> sees is a Slip and an Int.
Making the Slip melt into the result List immediately would mean making a commitment about eagerness, which List construction alone should not do. Thus the Slip and everything after it is placed into the lazily evaluated ("non-reified") portion of the List.
This last of these is what gives rise to the observed persistent data structure style behavior.
I expect it would be possible to have a implementation that inspects the Slip and chooses to eagerly copy things that are known not to be lazy, and still be in compliance with the specification test suite. That would change the time complexity of your example. If you want to be defensive against that, then:
do { my $nl = (|$l.lazy, rand);
say "Appended an element to the List in $((now - ENTER now).round: .000001) seconds" }
Should be sufficient to force the issue even if the implementation changed.
Of other cases that immediately come to mind that are related to persistent data structures or at least tail sharing:
The MoarVM implementation of strings, which is behind str and thus Str, implements string concatenation by creating a new string that refers to the two that are being concatenated instead of copying the data in the two strings (and does similar tricks for substr and repetition). This is strictly an optimization, not a language requirement, and in some delicate cases (the last grapheme of one string and the first grapheme of the next will form a single grapheme in the resulting string), it gives up and takes the copying path.
Outside of the core, modules like Concurrent::Stack, Concurrent::Queue, and Concurrent::Trie use tail sharing as a technique to implement relatively efficient lock-free data structures.

Destructive place-modifying operators

The CLtL2 reference clearly distinguishes between nondestructive and destructive common-lisp operations. But, within the destructive camp, it seems a little less clear in marking the difference between those which simply return the result, and those which additionally modify a place (given as argument) to contain the result. The usual convention of annexing "f" to such place modifying operations (eg, setf, incf, alexandria:deletef) is somewhat sporadic, and also applies to many place accessors (eg, aref, getf). In an ideal functional programming style (based only on returned values) such confusion is probably not an issue, but it seems like it could lead to programming errors in some practical applications that do use place modification. Since different implementations can handle the place results differently, couldn't portability be affected? It even seems difficult to test a particular implementation's approach.
To better understand the above distinction, I've divided the destructive common-lisp sequence operations into two categories corresponding to "argument returning" and "operation returning". Could someone validate or invalidate these categories for me? I'm assuming these categories could apply to other kinds of destructive operations (for lists, hash-tables, arrays, numbers, etc) too.
Argument returning: fill, replace, map-into
Operation returning: delete, delete-if, delete-if-not, delete-duplicates, nsubstitute, nsubstitute-if, nsubstitute-not-if, nreverse, sort, stable-sort, merge
But, within the destructive camp, it seems a little less clear in marking the difference between those which simply return the result.
There are no easy syntactic marker about which operation is destructive or not, even though there are useful conventions like the n prefix. Remember that CL is a standard inspired by different Lisps, which does not help enforcing a consistent terminology.
The usual convention of annexing "f" to such place modifying operations (eg, setf, incf, alexandria:deletef) is somewhat sporadic, and also applies to many place accessors (eg, aref, getf).
All setf expanders should ends with f, but not everything that ends with f is a setf expander. For example, aref takes its name from array and reference and isn't a macro.
... but it seems like it could lead to programming errors in some practical applications that do use place modification.
Most data is mutable (see comments); once you code in CL with that in mind, you take care not to modify data you did not create yourself. As for using a destructive operation in place of a non-destructive one inadvertently, I don't know: I guess it can happen, with sort or delete, maybe the first times you use them. In my mind delete is stronger, more destructive than simply remove, but maybe that's because I already know the difference.
Since different implementations can handle the place results differently, couldn't portability be affected?
If you want portability, you follow the specification, which does not offer much guarantee w.r.t. which destructive operations are applied. Take for example DELETE (emphasis mine):
Sequence may be destroyed and used to construct the result; however, the result might or might not be identical to sequence.
It is wrong to assume anything about how the list is being modified, or even if it is being modified. You could actually implement delete as an alias of remove in a minimal implementation. In all cases, you use the return value of your function (both delete and remove have the same signature).
Categories
I've divided the destructive common-lisp sequence operations into two categories corresponding to "argument returning" and "operation returning".
It is not clear at all what those categories are supposed to represent. Are those definition the one you have in mind?
an argument returning operation is one which returns one of its argument as a return value, possibly modified.
an operation returning operation is one where the result is based on one of its argument, and might be identical to that argument, but needs not be.
The definition of operation returning is quite vague and encompass both destructive and non-destructive operations. I would classify cons as such because it does not return one of its argument; OTOH, it is a purely functional operation.
I don't really get what those categories offer in addition to destructive or non-destructive.
Setf composition gotcha
Suppose you write a function (remote host key) which gets a value from a remote key/value datastore. Suppose also that you define (setf remote) so that it updates the remote value.
You might expect (setf (first (remote host key)) value) to:
Fetch a list from host, indexed by key,
Replace its first element by value,
Push the changes back to the remote host.
However, step 3 does generally not happen: the local list is modified in place (this is the most efficient alternative, but it makes setf expansions somewhat lazy about updates). You could define a new set of macros such as the whole round-trip is always implemented, with DEFINE-SETF-EXPANDER, though.
Let me try to address your question by introducing some concepts.
I hope it helps you to consolidate your knowledge and to find your remaining answers about this subject.
The first concept is that of non-destructive versus destructive behavior.
A function that is non-destructive won't change the data passed to it.
A function that is destructive may change the data passed to it.
You can apply the (non-)destructive nature to something other than a single function. For instance, if a function stores the data passed to it somewhere, say in a object's slot, then the destructiveness depends on that object's behavior, its other operations, events, etc.
The convention for functions that immediately modify its arguments is to (usually) prefix with n.
The convention doesn't work the other way around, there are many functions that start with n (e.g. not/null, nth, ninth, notany, notevery, numberp etc.) There are also notable exceptions, such as delete, merge, sort and stable-sort. The only way to naturally grasp them is with time/experience. For instance, always refer to the HyperSpec whenever you see a function you don't know yet.
Moreover, you usually need to store the result of some destructive functions, such as delete and sort, because they may choose to skip the head of the list or to not be destructive at all. delete may actually return nil, the empty list, which is not possible to obtain from a modified cons.
The second concept is that of generalized reference.
A generalized reference is anything that can hold data, such as a variable, the car and cdr of a cons, the element locations of an array or hash table, the slots of an object, etc.
For each container data structure, you need to know the specific modifying function. However, for some generalized references, there might not be a function to modify it, such as a local variable, in which case there are still special forms to modify it.
As such, in order to modify any generalized reference, you need to know its modifying form.
Another concept closely related to generalized references is the place. A form that identifies a generalized reference is called a place. Or in other words, a place is the written way (form) that represents a generalized reference.
For each kind of place, you have a reader form and a writer form.
Some of these forms are documented, such as using the symbol of a variable to read it and setq a variable to write to it, or car/cdr to read from and rplaca/rplacd to write to a cons. Others are only documented to be accessors, such as aref to read from arrays; its writer form is not actually documented.
To get these forms, you have get-setf-expansion. You actually also get a set of variables and their initializing forms (to be used as through let*) that will be used by the reader form and/or the writer form, and a set of variables (to be bound to the new values) that will be used by the writer form.
If you've used Lisp before, you've probably used setf. setf is a macro that generates code that runs within the scope (environment) of its expansion.
Essentially, it behaves as if by using get-setf-expansion, generating a let* form for the variables and initializing forms, generating extra bindings for the writer variables with the result of the value(s) form and invoking the writer form within all this environment.
For instance, let's define a my-setf1 macro which takes only a single place and a single newvalue form:
(defmacro my-setf1 (place newvalue &environment env)
(multiple-value-bind (vars vals store-vars writer-form reader-form)
(get-setf-expansion place env)
`(let* (,#(mapcar #'(lambda (var val)
`(,var ,val))
vars vals))
;; In case some vars are used only by reader-form
(declare (ignorable ,#vars))
(multiple-value-bind (,#store-vars)
,newvalue
,writer-form
;; Uncomment the next line to mitigate buggy writer-forms
;;(values ,#store-vars)
))))
You could then define my-setf as:
(defmacro my-setf (&rest pairs)
`(progn
,#(loop
for (place newvalue) on pairs by #'cddr
collect `(my-setf1 ,place ,newvalue))))
There is a convention for such macros, which is to suffix with f, such as setf itself, psetf, shiftf, rotatef, incf, decf, getf and remf.
Again, the convention doesn't work the other way around, there are operators that end with f, such as aref, svref and find-if, which are functions, and if, which is a conditional execution special operator. And yet again, there are notable exceptions, such as push, pushnew, pop, ldb, mask-field, assert and check-type.
Depending on your point-of-view, many more operators are implicitly destructive, even if not effectively tagged as such.
For instance, every defining operator (e.g. the macros defun, defpackage, defclass, defgeneric, defmethod, the function load) changes either the global environment or a temporary one, such as the compilation environment.
Others, like compile-file, compile and eval, depend on the forms they'll execute. For compile-file, it also depends on how much it isolates the compilation environment from the startup environment.
Other operators, like makunbound, fmakunbound, intern, export, shadow, use-package, rename-package, adjust-array, vector-push, vector-push-extend, vector-pop, remhash, clrhash, shared-initialize, change-class, slot-makunbound, add-method and remove-method, are (more or less) clearly intended to have side-effects.
And it's this last concept that can be the widest. Usually, a side-effect is regarded as any observable variation in one environment. As such, functions that don't change data are usually considered free of side-effects.
However, this is ill-defined. You may consider that all code execution implies side-effects, depending on what you define to be your environment or on what you can measure (e.g. consumed quotas, CPU time and real time, used memory, GC overhead, resource contention, system temperature, energy consumption, battery drain).
NOTE: None of the example lists are exhaustive.

In terms of design and when writing a library, when should I use a pointer as an argument, and when should I not?

Sorry if my question seems stupid. My background is in PHP, Ruby, Python, Lua and similar languages, and I have no understanding of pointers in real-life scenarios.
From what I've read on the Internet and what I've got as responses in a question I asked (When is a pointer idiomatic?), I have understood that:
Pointers should be used when copying large data. Instead of getting the whole object hierarchy, receive its address and access it.
Pointers have to be used when you have a function on a struct that modifies it.
So, pointers seem like a great thing: I should just always get them as function arguments because they are so lightweight, and it's okay if I somehow end up not needing to modify anything on the struct.
However, looking at that statement intuitively, I can feel that it sounds very creepy, and yet I don't know why.
So, as someone who is designing a struct and its related functions, or just functions, when should I receive a pointer? When should I receive a value, and why?
In other words, when should my NewAuthor method return &Author{ ... }, and when should it return Author{ ... }? When should my function get a pointer to an author as an argument, and when should it just get the value (a copy) of type Author?
There's tradeoffs for both pointers and values.
Generally speaking, pointers will point to some other region of memory in the system. Be it the stack of the function that wants to pass a pointer to a local variable or some place on the heap.
func A() {
i := 25
B(&i) // A sets up stack frame to call B,
// it copies the address of i so B can look it up later.
// At this point, i is equal to 30
}
func B(i *int){
// Here, i points to A's stack frame.
// For this to execute, I look at my variable "i",
// see the memory address it points to, then look at that to get the value of 25.
// That address may be on another page of memory,
// causing me to have to look it up from main memory (which is slow).
println(10 + (*i))
// Since I have the address to A's local variable, I can modify it.
*i = 30
}
Pointers require me to de-reference them constantly whenever I was to see the data it points to. Sometimes you don't care. Other times it matters a lot. It really depends on the application.
If that pointer has to be de-referenced a lot (ie: you pass in a number to use in a bunch of different calcs), then you keep paying the cost.
Compared to using values:
func A() {
i := 25
B(i) // A sets up the stack frame to call B, copying in the value 25
// i is still 25, because A gave B a copy of the value, and not the address.
}
func B(i int){
// Here, i is simply on the stack. I don't have to do anything to use it.
println(10 + i)
// Since i here is a value on B's stack, modifications are not visible outside B's scpe
i = 30
}
Since there's nothing to dereference, it's basically free to use the local variable.
The down side of passing values happens if those values are large because copying data to the stack isn't free.
For an int it's a wash because pointers are "int" sized. For a struct, or an array, you are copying all the data.
Also, large objects on the stack can make the stack extra big. Go handles this well with stack re-allocation, but in high performance scenarios, it may be too much of an impact to performance.
There's a data safety aspect as well (can't modify something I pass by value), but I don't feel that is usually an issue in most code bases.
Basically, if your problem was already solvable by ruby, python or other language without value types, then these performance nuances don't super-matter.
In general, passing structs as pointers will usually do "the right thing" while learning the language.
For all other types, or things that you want to keep as read-only, pass values.
There are exceptions to that rule, but it's best that you learn those as needs arise rather than try to redefine your world all at once. If that makes sense.
Simply you can use pointers anywhere you want, sometimes you don't want to change your data. It may stand for abstract data, and you don't want to explicitly copy the data. Just pass by value and let compiler do its job.

How can a time function exist in functional programming?

I've to admit that I don't know much about functional programming. I read about it from here and there, and so came to know that in functional programming, a function returns the same output, for same input, no matter how many times the function is called. It's exactly like a mathematical function which evaluates to the same output for the same value of the input parameters which involves in the function expression.
For example, consider this:
f(x,y) = x*x + y; // It is a mathematical function
No matter how many times you use f(10,4), its value will always be 104. As such, wherever you've written f(10,4), you can replace it with 104, without altering the value of the whole expression. This property is referred to as referential transparency of an expression.
As Wikipedia says (link),
Conversely, in functional code, the output value of a function depends only on the arguments that are input to the function, so calling a function f twice with the same value for an argument x will produce the same result f(x) both times.
Can a time function (which returns the current time) exist in functional programming?
If yes, then how can it exist? Does it not violate the principle of functional programming? It particularly violates referential transparency which is one of the property of functional programming (if I correctly understand it).
Or if no, then how can one know the current time in functional programming?
Yes and no.
Different functional programming languages solve them differently.
In Haskell (a very pure one) all this stuff has to happen in something called the I/O Monad - see here.
You can think of it as getting another input (and output) into your function (the world-state) or easier as a place where "impureness" like getting the changing time happens.
Other languages like F# just have some impureness built in and so you can have a function that returns different values for the same input - just like normal imperative languages.
As Jeffrey Burka mentioned in his comment:
Here is the nice introduction to the I/O Monad straight from the Haskell wiki.
Another way to explain it is this: no function can get the current time (since it keeps changing), but an action can get the current time. Let's say that getClockTime is a constant (or a nullary function, if you like) which represents the action of getting the current time. This action is the same every time no matter when it is used so it is a real constant.
Likewise, let's say print is a function which takes some time representation and prints it to the console. Since function calls cannot have side effects in a pure functional language, we instead imagine that it is a function which takes a timestamp and returns the action of printing it to the console. Again, this is a real function, because if you give it the same timestamp, it will return the same action of printing it every time.
Now, how can you print the current time to the console? Well, you have to combine the two actions. So how can we do that? We cannot just pass getClockTime to print, since print expects a timestamp, not an action. But we can imagine that there is an operator, >>=, which combines two actions, one which gets a timestamp, and one which takes one as argument and prints it. Applying this to the actions previously mentioned, the result is... tadaaa... a new action which gets the current time and prints it. And this is incidentally exactly how it is done in Haskell.
Prelude> System.Time.getClockTime >>= print
Fri Sep 2 01:13:23 東京 (標準時) 2011
So, conceptually, you can view it in this way: A pure functional program does not perform any I/O, it defines an action, which the runtime system then executes. The action is the same every time, but the result of executing it depends on the circumstances of when it is executed.
I don't know if this was any clearer than the other explanations, but it sometimes helps me to think of it this way.
In Haskell one uses a construct called monad to handle side effects. A monad basically means that you encapsulate values into a container and have some functions to chain functions from values to values inside a container. If our container has the type:
data IO a = IO (RealWorld -> (a,RealWorld))
we can safely implement IO actions. This type means: An action of type IO is a function, that takes a token of type RealWorld and returns a new token, together with a result.
The idea behind this is that each IO action mutates the outside state, represented by the magical token RealWorld. Using monads, one can chain multiple functions that mutate the real world together. The most important function of a monad is >>=, pronounced bind:
(>>=) :: IO a -> (a -> IO b) -> IO b
>>= takes one action and a function that takes the result of this action and creates a new action out of this. The return type is the new action. For instance, let's pretend there is a function now :: IO String, which returns a String representing the current time. We can chain it with the function putStrLn to print it out:
now >>= putStrLn
Or written in do-Notation, which is more familiar to an imperative programmer:
do currTime <- now
putStrLn currTime
All this is pure, as we map the mutation and information about the world outside to the RealWorld token. So each time, you run this action, you get of course a different output, but the input is not the same: the RealWorld token is different.
Most functional programming languages are not pure, i.e. they allow functions to not only depend on their values. In those languages it is perfectly possible to have a function returning the current time. From the languages you tagged this question with this applies to Scala and F# (as well as most other variants of ML).
In languages like Haskell and Clean, which are pure, the situation is different. In Haskell the current time would not be available through a function, but a so-called IO action, which is Haskell's way of encapsulating side effects.
In Clean it would be a function, but the function would take a world value as its argument and return a fresh world value (in addition to the current time) as its result. The type system would make sure that each world value can be used only once (and each function which consumes a world value would produces a new one). This way the time function would have to be called with a different argument each time and thus would be allowed to return a different time each time.
"Current time" is not a function. It is a parameter. If your code depends on current time, it means your code is parameterized by time.
It can absolutely be done in a purely functional way. There are several ways to do it, but the simplest is to have the time function return not just the time but also the function you must call to get the next time measurement.
In C# you could implement it like this:
// Exposes mutable time as immutable time (poorly, to illustrate by example)
// Although the insides are mutable, the exposed surface is immutable.
public class ClockStamp {
public static readonly ClockStamp ProgramStartTime = new ClockStamp();
public readonly DateTime Time;
private ClockStamp _next;
private ClockStamp() {
this.Time = DateTime.Now;
}
public ClockStamp NextMeasurement() {
if (this._next == null) this._next = new ClockStamp();
return this._next;
}
}
(Keep in mind that this is an example meant to be simple, not practical. In particular, the list nodes can't be garbage collected because they are rooted by ProgramStartTime.)
This 'ClockStamp' class acts like an immutable linked list, but really the nodes are generated on demand so they can contain the 'current' time. Any function that wants to measure the time should have a 'clockStamp' parameter and must also return its last time measurement in its result (so the caller doesn't see old measurements), like this:
// Immutable. A result accompanied by a clockstamp
public struct TimeStampedValue<T> {
public readonly ClockStamp Time;
public readonly T Value;
public TimeStampedValue(ClockStamp time, T value) {
this.Time = time;
this.Value = value;
}
}
// Times an empty loop.
public static TimeStampedValue<TimeSpan> TimeALoop(ClockStamp lastMeasurement) {
var start = lastMeasurement.NextMeasurement();
for (var i = 0; i < 10000000; i++) {
}
var end = start.NextMeasurement();
var duration = end.Time - start.Time;
return new TimeStampedValue<TimeSpan>(end, duration);
}
public static void Main(String[] args) {
var clock = ClockStamp.ProgramStartTime;
var r = TimeALoop(clock);
var duration = r.Value; //the result
clock = r.Time; //must now use returned clock, to avoid seeing old measurements
}
Of course, it's a bit inconvenient to have to pass that last measurement in and out, in and out, in and out. There are many ways to hide the boilerplate, especially at the language design level. I think Haskell uses this sort of trick and then hides the ugly parts by using monads.
I am surprised that none of the answers or comments mention coalgebras or coinduction. Usually, coinduction is mentioned when reasoning about infinite data structures, but it is also applicable to an endless stream of observations, such as a time register on a CPU. A coalgebra models hidden state; and coinduction models observing that state. (Normal induction models constructing state.)
This is a hot topic in Reactive Functional Programming. If you're interested in this sort of stuff, read this: http://digitalcommons.ohsu.edu/csetech/91/ (28 pp.)
Kieburtz, Richard B., "Reactive functional programming" (1997). CSETech. Paper 91 (link)
Yes, it's possible for a pure function to return the time, if it's given that time as a parameter. Different time argument, different time result. Then form other functions of time as well and combine them with a simple vocabulary of function(-of-time)-transforming (higher-order) functions. Since the approach is stateless, time here can be continuous (resolution-independent) rather than discrete, greatly boosting modularity. This intuition is the basis of Functional Reactive Programming (FRP).
Yes! You are correct! Now() or CurrentTime() or any method signature of such flavour is not exhibiting referential transparency in one way. But by instruction to the compiler it is parameterized by a system clock input.
By output, Now() might look like not following referential transparency. But actual behaviour of the system clock and the function on top of it is adheres to
referential transparency.
Yes, a getting time function can exist in functional programming using a slightly modified version on functional programming known as impure functional programming (the default or the main one is pure functional programming).
In case of getting the time (or reading file, or launching missile) the code needs to interact with the outer world to get the job done and this outer world is not based on the pure foundations of functional programming. To allow a pure functional programming world to interact with this impure outside world, people have introduced impure functional programming. After all, software which doesn't interact with the outside world isn't any useful other than doing some mathematical computations.
Few functional programming programming languages have this impurity feature inbuilt in them such that it is not easy to separate out which code is impure and which is pure (like F#, etc.) and some functional programming languages make sure that when you do some impure stuff that code is clearly stand out as compared to pure code, like Haskell.
Another interesting way to see this would be that your get time function in functional programming would take a "world" object which has the current state of the world like time, number of people living in the world, etc. Then getting time from which world object would be always pure i.e you pass in the same world state you will always get the same time.
Your question conflates two related measures of a computer language: functional/imperative and pure/impure.
A functional language defines relationships between inputs and outputs of functions, and an imperative language describes specific operations in a specific order to perform.
A pure language does not create or depend on side effects, and an impure language uses them throughout.
One-hundred percent pure programs are basically useless. They may perform an interesting calculation, but because they cannot have side effects they have no input or output so you would never know what they calculated.
To be useful at all, a program has to be at least a smidge impure. One way to make a pure program useful is to put it inside a thin impure wrapper. Like this untested Haskell program:
-- this is a pure function, written in functional style.
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
-- This is an impure wrapper around the pure function, written in imperative style
-- It depends on inputs and produces outputs.
main = do
putStrLn "Please enter the input parameter"
inputStr <- readLine
putStrLn "Starting time:"
getCurrentTime >>= print
let inputInt = read inputStr -- this line is pure
let result = fib inputInt -- this is also pure
putStrLn "Result:"
print result
putStrLn "Ending time:"
getCurrentTime >>= print
You're broaching a very important subject in functional programming, that is, performing I/O. The way many pure languages go about it is by using embedded domain-specific languages, e.g., a sublanguage whose task it is to encode actions, which can have results.
The Haskell runtime for example expects me to define an action called main that is composed of all actions that make up my program. The runtime then executes this action. Most of the time, in doing so it executes pure code. From time to time the runtime will use the computed data to perform I/O and feeds back data back into pure code.
You might complain that this sounds like cheating, and in a way it is: by defining actions and expecting the runtime to execute them, the programmer can do everything a normal program can do. But Haskell's strong type system creates a strong barrier between pure and "impure" parts of the program: you cannot simply add, say, two seconds to the current CPU time, and print it, you have to define an action that results in the current CPU time, and pass the result on to another action that adds two seconds and prints the result. Writing too much of a program is considered bad style though, because it makes it hard to infer which effects are caused, compared to Haskell types that tell us everything we can know about what a value is.
Example: clock_t c = time(NULL); printf("%d\n", c + 2); in C, vs. main = getCPUTime >>= \c -> print (c + 2*1000*1000*1000*1000) in Haskell. The operator >>= is used to compose actions, passing the result of the first to a function resulting in the second action. This looking quite arcane, Haskell compilers support syntactic sugar that allows us to write the latter code as follows:
type Clock = Integer -- To make it more similar to the C code
-- An action that returns nothing, but might do something
main :: IO ()
main = do
-- An action that returns an Integer, which we view as CPU Clock values
c <- getCPUTime :: IO Clock
-- An action that prints data, but returns nothing
print (c + 2*1000*1000*1000*1000) :: IO ()
The latter looks quite imperative, doesn't it?
If yes, then how can it exist? Does it not violate the principle of
functional programming? It particularly violates referential
transparency
It does not exist in a purely functional sense.
Or if no, then how can one know the current time in functional
programming?
It may first be useful to know how a time is retrieved on a computer. Essentially there is onboard circuitry that keeps track of the time (which is the reason a computer would usually need a small cell battery). Then there might be some internal process that sets the value of time at a certain memory register. Which essentially boils down to a value that can be retrieved by the CPU.
For Haskell, there is a concept of an 'IO action' which represents a type that can be made to carry out some IO process. So instead of referencing a time value we reference a IO Time value. All this would be purely functional. We aren't referencing time but something along the lines of 'read the value of the time register'.
When we actually execute the Haskell program, the IO action would actually take place.
It can be answered without introducing other concepts of FP.
Possibility 1: time as function argument
A language consists of
language core and
standard library.
Referential transparency is a property of language core, but not standard library. By no means is it a property of programs written in that language.
Using OP's notation, should one have a function
f(t) = t*v0 + x0; // mathematical function that knows current time
They would ask standard library to get current time, say 1.23, and compute the function with that value as an argument f(1.23) (or just 1.23*v0 + x0, referential transparency!). That way the code gets to know the current time.
Possibility 2: time as return value
Answering OP's question:
Can a time function (which returns the current time) exist in functional programming?
Yes, but that function has to have an argument and you would have to compute it with different inputs so it returns different current time, otherwise it would violate the principals of FP.
f(s) = t(s)*v0 + x0; // mathematical function t(s) returns current time
This is an alternative approach to what I've described above. But then again, the question of obtaining those different inputs s in the first place still comes down to standard library.
Possibility 3: functional reactive programming
The idea is that function t() evaluates to current time paired with function t2. When one needs current time again later they are to call t2(), it will then give function t3 and so on
(x, t2) = t(); // it's x o'clock now
...
(x2, t3) = t2(); // now it's already x2 o'clock
...
t(); x; // both evaluate to the initial time, referential transparency!
There's more to FP but I believe it's out of the scope of OP. For example, how does one ask standard library to compute a function and act upon its return value in purely functional way: that one is rather about side effects than referential transparency.
How can a time function exist in functional programming?
Back in 1988, Dave Harrison was facing this very question when defining an early functional language with real-time processing facilities. The solution he chose for Ruth can be found on page 50 of his thesis Functional Real-Time Programming: The Language Ruth And Its Semantics:
A unique clock is automatically supplied to each Ruth process at run-time, to provide real-time information, [...]
So how are these clocks defined? From page 61:
A clock tree is composed of a node holding a non-negative integer denoting the current time and two sub-trees containing the times of future events.
Furthermore:
As the tree is (lazily) evaluated each of the nodes is instantiated with the value of system time at the time at which the node is instantiated, thus giving programs reference to the current time.
Translating that into Haskell:
type Clock = Tree Time
type Time = Integer -- must be zero or larger
data Tree a = Node { contents :: a,
left :: Tree a,
right :: Tree a }
In addition to accessing the current time (with contents), each Ruth process can provide other clocks (with left and right) for use elsewhere in the program. If a process needs the current time more than once, it must use a new node on each occasion - once instantiated, a node's contents remains constant.
So that's how a time function can exist in a functional language: by always being applied to a unique input value (a tree-of-times in this case) wherever it's called.

Resources