Postfix notation is changing precedence of parentheses in evalution - math

In a/b*(c+(d-e)) Infix notation (d-e) will be evaluated first but if we convert it into Post-fix ab/cde-+* then ab/ will be evaluated first.
why ab/ is evaluating first in post-fix instead of d-e?

Multiplication and division are left-associative, meaning they are evaluated from left to right. Since a and b are terminals (no further evaluation needs to be done), ab/ is ready to be evaluated. Once we get to the last term, c+(d-e), we need to delve deeper and only then do we evaluate de-.

When you talk about "precedence" (a concept which is designed to disambiguate infix notation hence not really applicable to postfix notation) you really seem to mean "order of operations", which is a broader notion.
One thing to realize is that the order of operations taught in elementary school (often with the pneumonic PEMDAS) isn't necessarily the order of operations that a computer will use when evaluating an expression like a/b*(c+(d-e)). Using PEMDAS, you would first calculate d-e then c+(d-e) etc., which is a different order than that implicit in ab/cde-+*. But, it is interesting to note that many programming languages will in fact evaluate a/b*(c+(d-e)) by using the order of ab/cde-+* rather than by a naive implementation of PEMDAS. As an example, if in Python you import the module dis and evaluate dis.dis("a/b*(c+(d-e))") to disassemble a/b*(c+(d-e)) into Python byte code you get:
0 LOAD_NAME 0 (a)
2 LOAD_NAME 1 (b)
4 BINARY_TRUE_DIVIDE
6 LOAD_NAME 2 (c)
8 LOAD_NAME 3 (d)
10 LOAD_NAME 4 (e)
12 BINARY_SUBTRACT
14 BINARY_ADD
16 BINARY_MULTIPLY
18 RETURN_VALUE
which is easily seen to be exactly the same order of operations as ab/cde-+*. In fact, this postfix notation can be thought of as shorthand for the stack-based computation that Python uses when it evaluates a/b*(c+(d-e))

Both evaluation orders execute the same operations with the same arguments and produce the same result, so in that sense the difference doesn't matter.
There are differences that matter, though. The grade-school arithmetic order is never used when infix expressions are evaluated in practice, because:
It requires more intermediate results to be stored. (a+b)(c+d)(e+f)*(g+h) requires 4 intermediate sums to be stored in grade-school order, but only 2 in the usual order.
It's actually more complicated to implement the grade-school order in most cases; and
When sub-expressions have side-effects, the order becomes important and the usual order is easier for programmers to reason about.

Related

Moral of the story from SICP Ex. 1.20?

In this exercise we are asked to trace Euclid's algorithm using first normal order then applicative order evaluation.
(define (gcd a b)
(if (= b 0)
a
(gcd b (remainder a b))))
I've done the manual trace, and checked it with the several solutions available on the internet. I'm curious here to consolidate the moral of the exercise.
In gcd above, note that b is re-used three times in the function body, plus this function is recursive. This being what gives rise to 18 calls to remainder for normal order, in contrast to only 4 for applicative order.
So, it seems that when a function uses an argument more than once in its body, (and perhaps recursively! as here), then not evaluating it when the function is called (i.e. applicative order), will lead to redundant recomputation of the same thing.
Note that the question is at pains to point out that the special form if does not change its behaviour when normal order is used; that is, if will always run first; if this didn't happen, there could be no termination in this example.
I'm curious regarding delayed evaluation we are seeing here.
As a plus, it might allow us to handle infinite things, like streams.
As a minus, if we have a function like here, it can cause great inefficiency.
To fix the latter it seems like there are two conceptual options. One, wrap it in some data structure that caches its result to avoid recomputation. Two, selectively force the argument to realise when you know it will otherwise cause repeated recomputation.
The thing is, neither of those options seem very nice, because both represent additional "levers" the programmer must know how to use and choose when to use.
Is all of this dealt with more thoroughly later in the book?
Is there any straightforward consolidation of these points which would be worth making clear at this point (without perhaps going into all the detail that is to come).
The short answer is yes, this is covered extensively later in the book.
It is covered in most detail in Chapter 4 where we implement eval and apply, and so get the opportunity to modify their behaviour. For example Exercise 4.31 suggests the following syntax:
(define (f a (b lazy) c (d lazy-memo))
As you can see this identifies three different behaviours.
Parameters a and c have applicative order (they are evaluated before they are passed to the f).
b is normal or lazy, it is evaluated everytime it is used (but only if it is used).
d lazy but it's value it memoized so it is evaluated at most once.
Although this syntax is possible it isn't used. I think the philosopy is that the the language has an expected behaviour (applicative order) and that normal order is only used by default when necessary (e.g., the consequent and alternative of an if statement, and in creating streams). When it is necssary (or desirable) to have a parameter with normal evaluation then we can use delay and force, and if necessary memo-proc (e.g. Exercise 3.77).
So at this point the authors are introducing the ideas of normal and applicative order so that we have some familiarity with them by the time we get into the nitty gritty later on.
In a sense this a recurring theme, applicative order is probably more intuitive, but sometimes we need normal order. Recursive functions are simpler to write, but sometimes we need the performance of iterative functions. Programs where we can use the substition model are easier to reason about, but sometimes we need environmental model because we need mutable state.

Perl 6 calculate the average of an int array at once using reduce

I'm trying to calculate the average of an integer array using the reduce function in one step. I can't do this:
say (reduce {($^a + $^b)}, <1 2 3>) / <1 2 3>.elems;
because it calculates the average in 2 separate pieces.
I need to do it like:
say reduce {($^a + $^b) / .elems}, <1 2 3>;
but it doesn't work of course.
How to do it in one step? (Using map or some other function is welcomed.)
TL;DR This answer starts with an idiomatic way to write equivalent code before discussing P6 flavored "tacit" programming and increasing brevity. I've also added "bonus" footnotes about the hyperoperation Håkon++ used in their first comment on your question.5
Perhaps not what you want, but an initial idiomatic solution
We'll start with a simple solution.1
P6 has built in routines2 that do what you're asking. Here's a way to do it using built in subs:
say { sum($_) / elems($_) }(<1 2 3>); # 2
And here it is using corresponding3 methods:
say { .sum / .elems }(<1 2 3>); # 2
What about "functional programming"?
First, let's replace .sum with an explicit reduction:
.reduce(&[+]) / .elems
When & is used at the start of an expression in P6 you know the expression refers to a Callable as a first class citizen.
A longhand way to refer to the infix + operator as a function value is &infix:<+>. The shorthand way is &[+].
As you clearly know, the reduce routine takes a binary operation as an argument and applies it to a list of values. In method form (invocant.reduce) the "invocant" is the list.
The above code calls two methods -- .reduce and .elems -- that have no explicit invocant. This is a form of "tacit" programming; methods written this way implicitly (or "tacitly") use $_ (aka "the topic" or simply "it") as their invocant.
Topicalizing (explicitly establishing what "it" is)
given binds a single value to $_ (aka "it") for a single statement or block.
(That's all given does. Many other keywords also topicalize but do something else too. For example, for binds a series of values to $_, not just one.)
Thus you could write:
say .reduce(&[+]) / .elems given <1 2 3>; # 2
Or:
$_ = <1 2 3>;
say .reduce(&[+]) / .elems; # 2
But given that your focus is FP, there's another way that you should know.
Blocks of code and "it"
First, wrap the code in a block:
{ .reduce(&[+]) / .elems }
The above is a Block, and thus a lambda. Lambdas without a signature get a default signature that accepts one optional argument.
Now we could again use given, for example:
say do { .reduce(&[+]) / .elems } given <1 2 3>; # 2
But we can also just use ordinary function call syntax:
say { .reduce(&[+]) / .elems }(<1 2 3>)
Because a postfix (...) calls the Callable on its left, and because in the above case one argument is passed in the parens to a block that expects one argument, the net result is the same as the do4 and the given in the prior line of code.
Brevity with built ins
Here's another way to write it:
<1 2 3>.&{.sum/.elems}.say; #2
This calls a block as if it were a method. Imo that's still eminently readable, especially if you know P6 basics.
Or you can start to get silly:
<1 2 3>.&{.sum/$_}.say; #2
This is still readable if you know P6. The / is a numeric (division) operator. Numeric operators coerce their operands to be numeric. In the above $_ is bound to <1 2 3> which is a list. And in Perls, a collection in numeric context is its number of elements.
Changing P6 to suit you
So far I've stuck with standard P6.
You can of course write subs or methods and name them using any Unicode letters. If you want single letter aliases for sum and elems then go ahead:
my (&s, &e) = &sum, &elems;
But you can also extend or change the language as you see fit. For example, you can create user defined operators:
#| LHS ⊛ RHS.
#| LHS is an arbitrary list of input values.
#| RHS is a list of reducer function, then functions to be reduced.
sub infix:<⊛> (#lhs, *#rhs (&reducer, *#fns where *.all ~~ Callable)) {
reduce &reducer, #fns».(#lhs)
}
say <1 2 3> ⊛ (&[/], &sum, &elems); # 2
I won't bother to explain this for now. (Feel free to ask questions in the comments.) My point is simply to highlight that you can introduce arbitrary (prefix, infix, circumfix, etc.) operators.
And if custom operators aren't enough you can change any of the rest of the syntax. cf "braid".
Footnotes
1 This is how I would normally write code to do the computation asked for in the question. #timotimo++'s comment nudged me to alter my presentation to start with that, and only then shift gears to focus on a more FPish solution.
2 In P6 all built in functions are referred to by the generic term "routine" and are instances of a sub-class of Routine -- typically a Sub or Method.
3 Not all built in sub routines have correspondingly named method routines. And vice-versa. Conversely, sometimes there are correspondingly named routines but they don't work exactly the same way (with the most common difference being whether or not the first argument to the sub is the same as the "invocant" in the method form.) In addition, you can call a subroutine as if it were a method using the syntax .&foo for a named Sub or .&{ ... } for an anonymous Block, or call a method foo in a way that looks rather like a subroutine call using the syntax foo invocant: or foo invocant: arg2, arg3 if it has arguments beyond the invocant.
4 If a block is used where it should obviously be invoked then it is. If it's not invoked then you can use an explicit do statement prefix to invoke it.
5 Håkon's first comment on your question used "hyperoperation". With just one easy to recognize and remember "metaop" (for unary operations) or a pair of them (for binary operations), hyperoperations distribute an operation to all the "leaves"6 of a data structure (for an unary) or create a new one based on pairing up the "leaves" of a pair of data structures (for binary operations). NB. Hyperoperations are done in parallel7.
6 What is a "leaf" for a hyperoperation is determined by a combination of the operation being applied (see the is nodal trait) and whether a particular element is Iterable.
7 Hyperoperation is applied in parallel, at least semantically. Hyperoperation assumes8 that the operations on the "leaves" have no mutually interfering side-effects -- that is to say, that any side effect when applying the operation to one "leaf" can safely be ignored in respect to applying the operation to any another "leaf".
8 By using a hyperoperation the developer is declaring that the assumption of no meaningful side-effects is correct. The compiler will act on the basis it is, but will not check that it is true. In the safety sense it's like a loop with a condition. The compiler will follow the dev's instructions, even if the result is an infinite loop.
Here is an example using given and the reduction meta operator:
given <1 2 3> { say ([+] $_)/$_.elems } ;

Simple example of call-by-need

I'm trying to understand the theorem behind "call-by-need." I do understand the definition, but I'm a bit confused. I would like to see a simple example which shows how call-by-need works.
After reading some previous threads, I found out that Haskell uses this kind of evaluation. Are there any other programming languages which support this feature?
I read about the call-by-name of Scala, and I do understand that call-by-name and call-by-need are similar but different by the fact that call-by-need will keep the evaluated value. But I really would love to see a real-life example (it does not have to be in Haskell), which shows call-by-need.
The function
say_hello numbers = putStrLn "Hello!"
ignores its numbers argument. Under call-by-value semantics, even though an argument is ignored, the parameter at the function call site may need to be evaluated, perhaps because of side effects that the rest of the program depends on.
In Haskell, we might call say_hello as
say_hello [1..]
where [1..] is the infinite list of naturals. Under call-by-value semantics, the CPU would run off trying to build an infinite list and never get to the say_hello at all!
Haskell merely outputs
$ runghc cbn.hs
Hello!
For less dramatic examples, the first ten natural numbers are
ghci> take 10 [1..]
[1,2,3,4,5,6,7,8,9,10]
The first ten odds are
ghci> take 10 $ filter odd [1..]
[1,3,5,7,9,11,13,15,17,19]
Under call-by-need semantics, each value — even a conceptually infinite one as in the examples above — is evaluated only to the extent required and no more.
update: A simple example, as asked for:
ff 0 = 1
ff 1 = 1
ff n = go (ff (n-1))
where
go x = x + x
Under call-by-name, each invocation of go evaluates ff (n-1) twice, each for each appearance of x in its definition (because + is strict in both arguments, i.e. demands the values of the both of them).
Under call-by-need, go's argument is evaluated at most once. Specifically, here, x's value is found out only once, and reused for the second appearance of x in the expression x + x. If it weren't needed, x wouldn't be evaluated at all, just as with call-by-name.
Under call-by-value, go's argument is always evaluated exactly once, prior to entering the function's body, even if it isn't used anywhere in the function's body.
Here's my understanding of it, in the context of Haskell.
According to Wikipedia, "call by need is a memoized variant of call by name where, if the function argument is evaluated, that value is stored for subsequent uses."
Call by name:
take 10 . filter even $ [1..]
With one consumer the produced value disappears after being produced so it might as well be call-by-name.
Call by need:
import qualified Data.List.Ordered as O
h = 1 : map (2*) h <> map (3*) h <> map (5*) h
where
(<>) = O.union
The difference is, here the h list is reused by several consumers, at different tempos, so it is essential that the produced values are remembered. In a call-by-name language there'd be much replication of computational effort here because the computational expression for h would be substituted at each of its occurrences, causing separate calculation for each. In a call-by-need--capable language like Haskell the results of computing the elements of h are shared between each reference to h.
Another example is, most any data defined by fix is only possible under call-by-need. With call-by-value the most we can have is the Y combinator.
See: Sharing vs. non-sharing fixed-point combinator and its linked entries and comments (among them, this, and its links, like Can fold be used to create infinite lists?).

Struggling with the basics

I am trying to self-learn SML. Though I can write some SML code, I have not attained the ah ha.
In:
val x = 5;
how does binding a value of 5 to the name x differ from assigning a value of 5 to the memory location/variable x in imperative programming?
How does the above expression elucidate "a style that treats computation as the evaluation of mathematical functions and avoids changing-state and mutable data"?
What do I have to throw away about imperative programming to catch on FP quickly?
Please be gentle with me.
How does binding a value of 5 to the name x differ from assigning a value of 5 to the memory location/variable x in imperative programming?
The key difference between variables in functional programming and variables in imperative programming is that variables in functional programming cannot be modified.
Then why are they called “variables”?
Variables in functional programming languages don't vary in the sense that you can modify their value. However, they do vary in the mathematical sense in that they represent some unknown constant value. For example, consider the following quadratic expression:
This is a quadratic expression in one variable. The variable x varies in the sense that you can select any value for x. However, once you select a certain value for x you cannot change that value.
Now, if you have a quadractic equation then the choice of x is no longer arbitrary. For example, consider the following quadratic equation:
The only choices for x are those which satisfy the equation (i.e. x = -2 or x = -1.5).
A mathematical function on the other hand is a relation between two sets, called the domain (input set) and the codomain (output set). For example, consider the following function:
This function relates every x belonging to the set of real numbers to its corresponding 2x^2 + 7x + 6, also belonging to the set of real numbers. Again, we are free to choose any value for x. However, once we choose some value for x we are not allowed to modify it.
The point is that they are called variables because they vary in the mathematical sense that they can assume different values. Hence, they vary with instance. However, they do not vary with time.
This immutability of variables is important in functional programming because it makes your program referentially transparent (i.e. invoking a function can be thought of as simply replacing the function call with the function body).
If variables were allowed to vary with time then you wouldn't be able to simply substitute the function call with the function body because the substituted variable could change over time. Hence, your substituted function body could become invalid over time.
How does the above expression elucidate "a style that treats computation as the evaluation of mathematical functions and avoids changing-state and mutable data"?
The expression val x = 5; could be thought of as a function application (fn x => y) 5 where y is the rest of the program. Functional programming is all about functions, and immutability in the sense that variables only vary with instance and not with time.
In particular, mutability is considered bad because it is implicit. Anything that is implicit could potentially cause unforeseeable errors in your program. Hence, the Zen of Python explicitly states that:
Explicit is better than implicit.
In fact, mutable state is the primary cause of software complexity. Hence, functional programming tries to eschew mutability as much as possible. However, that doesn't mean that we only use immutable structures. We can use mutable structures. We just need to be explicit about it.
What do I have to throw away about imperative programming to catch on FP quickly?
Nothing. Functional programming and imperative programming are two ways of thinking about computation. Computation is a very abstract idea. What exactly is computation? How do we know which problems can be computed? These were some of the questions that mathematicians struggled with in the early nineteen hundreds.
To think about computation we need to have a formal model of computation (i.e. a formal system that captures the notion of computation). There are various models of computation. However, the most famous are the lambda calculus (which gave rise to functional programming) and turing machines (which gave rise to imperative programming).
Both the lambda calculus and the turing machine are equivalent in power. They both capture the notion of computing and every problem that can be computed can be expressed as either a lambda calculus expression or an equivalent turing machine. The only difference is the way in which you express your problem.
You don't have to throw away anything that you learned about imperative programming to understand functional programming. Neither one is better than the other. They are both just different ways of expressing the same problem. However, you do need to start thinking like a functional programmer to write good functional programs.

How are functions curried?

I understand what the concept of currying is, and know how to use it. These are not my questions, rather I am curious as to how this is actually implemented at some lower level than, say, Haskell code.
For example, when (+) 2 4 is curried, is a pointer to the 2 maintained until the 4 is passed in? Does Gandalf bend space-time? What is this magic?
Short answer: yes a pointer is maintained to the 2 until the 4 is passed in.
Longer than necessary answer:
Conceptually, you're supposed to think about Haskell being defined in terms of the lambda calculus and term rewriting. Lets say you have the following definition:
f x y = x + y
This definition for f comes out in lambda calculus as something like the following, where I've explicitly put parentheses around the lambda bodies:
\x -> (\y -> (x + y))
If you're not familiar with the lambda calculus, this basically says "a function of an argument x that returns (a function of an argument y that returns (x + y))". In the lambda calculus, when we apply a function like this to some value, we can replace the application of the function by a copy of the body of the function with the value substituted for the function's parameter.
So then the expression f 1 2 is evaluated by the following sequence of rewrites:
(\x -> (\y -> (x + y))) 1 2
(\y -> (1 + y)) 2 # substituted 1 for x
(1 + 2) # substituted 2 for y
3
So you can see here that if we'd only supplied a single argument to f, we would have stopped at \y -> (1 + y). So we've got a whole term that is just a function for adding 1 to something, entirely separate from our original term, which may still be in use somewhere (for other references to f).
The key point is that if we implement functions like this, every function has only one argument but some return functions (and some return functions which return functions which return ...). Every time we apply a function we create a new term that "hard-codes" the first argument into the body of the function (including the bodies of any functions this one returns). This is how you get currying and closures.
Now, that's not how Haskell is directly implemented, obviously. Once upon a time, Haskell (or possibly one of its predecessors; I'm not exactly sure on the history) was implemented by Graph reduction. This is a technique for doing something equivalent to the term reduction I described above, that automatically brings along lazy evaluation and a fair amount of data sharing.
In graph reduction, everything is references to nodes in a graph. I won't go into too much detail, but when the evaluation engine reduces the application of a function to a value, it copies the sub-graph corresponding to the body of the function, with the necessary substitution of the argument value for the function's parameter (but shares references to graph nodes where they are unaffected by the substitution). So essentially, yes partially applying a function creates a new structure in memory that has a reference to the supplied argument (i.e. "a pointer to the 2), and your program can pass around references to that structure (and even share it and apply it multiple times), until more arguments are supplied and it can actually be reduced. However it's not like it's just remembering the function and accumulating arguments until it gets all of them; the evaluation engine actually does some of the work each time it's applied to a new argument. In fact the graph reduction engine can't even tell the difference between an application that returns a function and still needs more arguments, and one that has just got its last argument.
I can't tell you much more about the current implementation of Haskell. I believe it's a distant mutant descendant of graph reduction, with loads of clever short-cuts and go-faster stripes. But I might be wrong about that; maybe they've found a completely different execution strategy that isn't anything at all like graph reduction anymore. But I'm 90% sure it'll still end up passing around data structures that hold on to references to the partial arguments, and it probably still does something equivalent to factoring in the arguments partially, as it seems pretty essential to how lazy evaluation works. I'm also fairly sure it'll do lots of optimisations and short cuts, so if you straightforwardly call a function of 5 arguments like f 1 2 3 4 5 it won't go through all the hassle of copying the body of f 5 times with successively more "hard-coding".
Try it out with GHC:
ghc -C Test.hs
This will generate C code in Test.hc
I wrote the following function:
f = (+) 16777217
And GHC generated this:
R1.p[1] = (W_)Hp-4;
*R1.p = (W_)&stg_IND_STATIC_info;
Sp[-2] = (W_)&stg_upd_frame_info;
Sp[-1] = (W_)Hp-4;
R1.w = (W_)&integerzmgmp_GHCziInteger_smallInteger_closure;
Sp[-3] = 0x1000001U;
Sp=Sp-3;
JMP_((W_)&stg_ap_n_fast);
The thing to remember is that in Haskell, partially applying is not an unusual case. There's technically no "last argument" to any function. As you can see here, Haskell is jumping to stg_ap_n_fast which will expect an argument to be available in Sp.
The stg here stands for "Spineless Tagless G-Machine". There is a really good paper on it, by Simon Peyton-Jones. If you're curious about how the Haskell runtime is implemented, go read that first.

Resources