Functional Override: What occurs and when - overriding

So, functional override and discrete mathematics. A feature I'm not entirely sure of in critical system design. Say we have a feature where if f() should fail; g() would override; as denoted below...
"g() ⊕ f()"
x = 0.1;
f(x) x^2 when x ∈ ℕ1,
g(x) 2x -x when x ∈ ℕ0
I understand that in a situation where the input (x) isn't within the scope or domain of F(), then function G() is supposed to act as it's override and x will become a function of G(), g(x). But in the example above, you will notice that x is outside the domain of both f() and g().
So does this mean that the output is never given because x is an invalid input?
This seems unlikely to be a realistic exception to have to deal with in a critical system as one would expect "g() ⊕ f()" to be capable of compensating for any input; but in a recent examination, this kind of question was given to me and I found it to be quite the trick question. If anyone could shed some light on this, it would be much appreciated; None of my books mention anything about handling this kind of input and all of the example which have been taught to me have always been instances where x is within the range/domain of at least g().

So it turns out that even with g() overriding f(), if the input is outside of the range of both the first and overriding function; the output is an error as it never qualifies for either test. Similarly when we try to factorise a negative number using a calculator.

Related

Functional "simultanity"?

At this link, functional programming is spoken of. Specifically, the author says this:
Simultaneity means that we assume a statement in lambda calculus is evaluated all at once. The trivial function:
λf(x) ::= x f(x)
defines an infinite sequence of whatever you plug in for x. The stepwise expansion looks like this:
0 - f(x)
1 - x f(x)
2 - x x f(x)
3 - x x x f(x)
The point is that we have to assume that the 'f()' and 'x' in step three million have the same meaning they did in step one.
At this point, those of you who know something about FP are muttering "referential transparency" under your collective breath. I know. I'll beat up on that in a minute. For now, just suspend your disbelief enough to admit that the constraint does exist, and the aardvark won't get hurt.
The problem with infinite expansions in a real-world computer is that.. well.. they're infinite. As in, "infinite loop" infinite. You can't evaluate every term of an infinite sequence before moving on to the next evaluation unless you're planning to take a really long coffee break while you wait for the answers.
Fortunately, theoretical logic comes to the rescue and tells us that preorder evaluation will always give us the same results as postorder evaluation.
More vocabulary.. need another function for this.. fortunately, it's a simple one:
λg(x) ::= x x
Now.. when we make the statement:
g(f(x))
Preorder evaluation says we have to expand f(x) completely before plugging it into g(). But that takes forever, which is.. inconvenient. Postorder evaluation says we can do this:
0 - g(f(x))
1 - f(x) f(x)
2 - x f(x) x f(x)
3 - x x f(x) x x f(x)
. . . could someone explain to me what is meant here? I haven't a clue what's being said. Maybe point me to a really good FP primer that would get me started.
(Warning, this answer is very long-winded. I thought it best to include general knowledge of lambda calculus because it is near impossible to find good explanations of it)
The author appears to be using the syntax λg(x) to mean a named function, rather than a traditional function in lambda calculus. The author also appears to be going on at length about how lambda calculus is not functional programming in the same way that a Turing machine isn't imperative programming. There's practicalities and ideals that exist with those abstractions that aren't present in the programming languages frequently used to represent them. But before getting into that, a primer on lambda calculus may help. In lambda calculus, all functions look like this:
λarg.body
That's it. There's a λ symbol (called "lambda", hence the name) followed by a named argument and only one named argument, then followed by a period, then followed by an expression that represents the body of the function. For instance, the identity function which takes anything and just returns it right back would look like this:
λx.x
And evaluating an expression is just a series of simple rules for swapping out functions and arguments with their body expressions. An expression has the form:
function-or-expression arg-or-expression
Reducing it usually has the rules "If the left thing is an expression, reduce it. Otherwise, it must be a function, so use arg-or-expression as the argument to the function, and replace this expression with the body of the function. It is very important to note that there is no requirement that the arg-or-expression be reduced before being used as an argument. That is, both of the following are equivalent and mathematically identical reductions of the expression λx.x (λy.y 0) (assuming you have some sort of definition for 0, because lambda calculus requires you define numbers as functions):
λx.x (λy.y 0)
=> λx.x 0
=> 0
λx.x (λy.y 0)
=> λy.y 0
=> 0
In the first reduction, the argument was reduced before being used in the λx.x function. In the second, the argument was merely substituted into the λx.x function body - it wasn't reduced before being used. When this concept is used in programming, it's called "lazy evaluation" - you don't actually evaluate (reduce) an expression until you need to. What's important to note is that in lambda calculus, it does not matter whether an argument is reduced or not before substitution. The mathematics of lambda calculus prove that you'll get the same result either way as long as both terminate. This is definitely not the case in programming languages, because all sorts of things (usually relating to a change in the program's state) can make lazy evaluation different from normal evaluation.
Lambda calculus needs some extensions to be useful however. There's no way to name things. Suppose we allowed that though. In particular, let's create our own definition of what a function looks like in lambda calculus:
λname(arg).body
We'll say this means that the function λarg.body is bound to name, and anywhere else in any accompanying lambda expressions we can replace name with λarg.body. So we could do this:
λidentity(x).x
And now when we write identity, we'll just replace it with λx.x. This introduces a problem however. What happens if a named function refers to itself?
λevil(x).(evil x)
Now we've got a problem. According to our rule, we should be able to replace the evil in the body with what the name is bound to. But since the name is bound to λx.(evil x), as soon as we try:
λevil(x).(evil x)
=> λevil(x).(λx.(evil x) x)
=> λevil(x).(λx.(λx.(evil x) x) x)
=> ...
We get an infinite loop. We can never evaluate this expression, because we have no way of turning it from our special named lambda form to a regular lambda expression. We can't go from the language with our special extension down to regular lambda calculus because we can't satisfy the rule of "replace evil with the function expression evil is bound to". There are some tricks for dealing with this, but we'll get to that in a minute.
An important point here is that this is completely different from a regular lambda calculus program that evaluates infinitely and never finishes. For instance, consider the self application function which takes something and applies it to itself:
λx.(x x)
If we evaluate this with the identity function, we get:
λx.(x x) λx.x
=> λx.x λx.x
=> λx.x
Using named functions and naming this function self:
self identity
=> identity identity
=> identity
But what happens if we pass self to itself?
λx.(x x) λx.(x x)
=> λx.(x x) λx.(x x)
=> λx.(x x) λx.(x x)
=> ...
We get an expression that loops into repeatedly reducing self self into self self over and over again. This is a plain old infinite loop you'd find in any (Turing-complete) programming language.
The difference between this and our problem with recursive definitions is that our names and definitions are not lambda calculus. They are shorthands which we can expand to lambda calculus by following some rules. But in the case of λevil(x).(evil x), we can't expand it to lambda calculus so we don't even get a lambda calculus expression to run. Our named function "fails to compile" in a sense, similar to when you send the programming language compiler into an infinite loop and your code never even starts as opposed to when the actual runtime loops. (Yes, it is entirely possible to make the compiler get caught in an infinite loop.)
There are some very clever ways to get around this problem, one of which is the infamous Y-combinator. The basic idea is you take our problematic evil function and change it to instead of accepting an argument and trying to be recursive, accepts an argument and returns another function that accepts an argument, so your body expression has two arguments to work with:
λevil(f).λy.(f y)
If we evaluate evil identity, we'll get a new function that takes an argument and just calls identity with it. The following evaluation shows first the name replacement using ->, then the reduction using =>:
(evil identity) 0
-> (λf.λy.(f y) identity) 0
-> (λf.λy.(f y) λx.x) 0
=> λy.(λx.x y) 0
=> λx.x 0
=> 0
Where things get interesting is if we pass evil to itself instead of identity:
(evil evil) 0
-> (λf.λy.(f y) λf.λy.(f y)) 0
=> λy.(λf.λy.(f y) y) 0
=> λf.λy.(f y) 0
=> λy.(0 y)
We ended up with a function that's complete nonsense, but we achieved something important - we created one level of recursion. If we were to evaluate (evil (evil evil)), we would get two levels. With (evil (evil (evil evil))), three. So what we need to do is instead of passing evil to itself, we need to pass a function that somehow accomplishes this recursion for us. In particular, it should be a function with some sort of self application. What we want is the Y-combinator:
λf.(λx.(f (x x)) λx.(f (x x)))
This function is pretty tricky to wrap your head around from the definition, so it's best to just call it Y and see what happens when we try and evaluate a few things with it:
Y evil
-> λf.(λx.(f (x x)) λx.(f (x x))) evil
=> λx.(evil (x x)) λx.(evil (x x))
=> evil (λx.(evil (x x))
λx.(evil (x x)))
=> evil (evil (λx.(evil (x x))
λx.(evil (x x))))
=> evil (evil (evil (λx.(evil (x x))
λx.(evil (x x)))))
And as we can see, this goes on infinitely. What we've done is taken evil, which accepts first one function and then accepts an argument and evaluates that argument using the function, and passed it a specially modified version of the evil function which expands to provide recursion. So we can create a "recursion point" in the evil function by reducing evil (Y evil). So now, whenever we see a named function using recursion like this:
λname(x).(.... some body containing (name arg) in it somewhere)
We can transform it to:
λname-rec(f).λx.(...... body with (name arg) replaced with (f arg))
λname(x).((name-rec (Y name-rec)) x)
We turn the function into a version that first accepts a function to use as a recursion point, then we provide the function Y name-rec as the function to use as the recursion point.
The reason this works, and getting waaaaay back to the original point of the author, is because the expression name-rec (Y name-rec) does not have to fully reduce Y name-rec before starting its own reduction. I cannot stress this enough. We've already seen that reducing Y name-rec results in an infinite loop, so the recursion works if there's some sort of condition in the name-rec function that means that the next step of Y name-rec might not need to be reduced.
This breaks down in many programming languages, including functional ones, because they do not support this kind of lazy evaluation. Additionally, almost all programming languages support mutation. That is, if you define a variable x = 3, later in the same code you can make x = 5 and all the old code that referred to x when it was 3 will now see x as being 5. This means your program could have completely different results if that old code is "delayed" with lazy evaluation and only calculated later on, because by then x could be 5. In a language where things can be arbitrarily executed in any order at any time, you have to completely eliminate your program's dependency on things like order of statements and time-changing values. If you don't, your program could calculate arbitrarily different results depending on what order your code gets run in.
However, writing code that has no sense of order in it whatsoever is extremely difficult. We saw how complicated lambda calculus got just trying to get our heads around trivial recursion. Therefore, most functional programming languages pick a model that systematically defines in what order things are evaluated in, and they never deviate from that model.
Racket, a dialect of Scheme, specifies that in the normal Racket language, all expressions are evaluated "eagerly" (no delaying) and all function arguments are evaluated eagerly from left to right, but the Racket program includes special forms that let you selectively make certain expressions lazy, such as (promise ...). Haskell does the opposite, with expressions defaulting to lazy evaluation and having the compiler run a "strictness analyser" to determine which expressions are needed by functions that are specially declared to need arguments to be eagerly evaluated.
The primary point being made seems to be that it's just too impractical to design a language that completely allows all expressions to be individually lazy or eager, because the limitations this poses on what tools you can use in the language are severe. Therefore, it's important to keep in mind what tools a functional language provides you for manipulating lazy expressions and eager expressions, because they are most certainly not equivalent in all practical functional programming languages.

How to make nonsymbolic plot_vector_field in sage?

I have a function f(x,y) whose outcome is random (I take mean from 20 random numbers depending on x and y). I see no way to modify this function to make it symbolic.
And when I run
x,y = var('x,y')
d = plot_vector_field((f(x),x), (x,0,1), (y,0,1))
it says it can't cast symbolic expression to real or rationa number. In fact it stops when I write:
a=matrix(RR,1,N)
a[0]=x
What is the way to change this variable to real numbers in the beginning, compute f(x) and draw a vector field? Or just draw a lot of arrows with slope (f(x),x)?
I can create something sort of like yours, though with no errors. At least it doesn't do what you want.
def f(m,n):
return m*randint(100,200)-n*randint(100,200)
var('x,y')
plot_vector_field((f(x,y),f(y,x)),(x,0,1),(y,0,1))
The reason is because Python functions immediately evaluate - in this case, f(x,y) was 161*x - 114*y, though that will change with each invocation.
My suspicion is that your problem is similar, the immediate evaluation of the Python function once and for all. Instead, try lambda functions. They are annoying but very useful in this case.
var('x,y')
plot_vector_field((lambda x,y: f(x,y), lambda x,y: f(y,x)),(x,0,1),(y,0,1))
Wow, I now I have to find an excuse to show off this picture, cool stuff. I hope your error ends up being very similar.

Understanding Lazy Evaluation in Haskell

I am trying to learn Haskell, but i am stuck in understanding lazy evaluation.
Can someone explain me lazy evaluation in detail and the output of the following 2 cases[with explaination] in relation to the below given
Pseudo Code:
x = keyboard input (5)
y = x + 3 (=8)
echo y (8)
x = keyboard input (2)
echo y
Case 1: Static binding, lazy evaluation
Case 2: Dynamic binding, lazy evaluation.
I need to know what will the last line (echo y) is going to print...in the above 2 cases.
Sorry this is way too long but...
I'm afraid the answer is going to depend a lot on the meaning of the words...
First, here's that code in Haskell (which uses static binding and lazy evaluation):
readInt :: String -> Int
readInt = read
main = do
x <- fmap readInt getLine
let y = x + 3
print y
x <- fmap readInt getLine
print y
It prints 8 and 8.
Now here's that code in R which uses lazy evaluation and what some people call
dynamic binding:
delayedAssign('x', as.numeric(readLines(n=1)))
delayedAssign('y', x + 3)
print(y)
delayedAssign('x', as.numeric(readLines(n=1)))
print(y)
It prints 8 and 8. Not so different!
Now in C++, which uses strict evaluation and static binding:
#include <iostream>
int main() {
int x;
std::cin >> x;
int y = x + 3;
std::cout << y << "\n";
std::cin >> x;
std::cout << y << "\n";
}
It prints 8 and 8.
Now let me tell you what I think the point of the question actually was ;)
"lazy evaluation" can mean many different things. In Haskell it has a very
particular meaning, which is that in nested expressions:
f (g (h x))
evaluation works as if f gets evaluated before g (h x), ie evaluation
goes "outside -> in". Practically this means that if f looks like
f x = 2
ie just throws away its argument, g (h x) never gets evaluated.
But I think that that is not where the question was going with "lazy
evaluation". The reason I think this is that:
+ always evaluates its arguments! + is the same whether you're using lazy
evaluation or not.
The only computation that could actually be delayed is keyboard input --
and that's not really computation, because it causes an action to occur;
that is, it reads from the user.
Haskell people would generally not call this "lazy evaluation" -- they would call
it lazy (or deferred) execution.
So what would lazy execution mean for your question? It would mean that the
action keyboard input gets delayed... until the value x is really really
needed. It looks to me like that happens here:
echo y
because at that point you must show the user a value, and so you must know what
x is! So what would happen with lazy execution and static binding?
x = keyboard input # nothing happens
y = x + 3 # still nothing happens!
echo y (8) # y becomes 8. 8 gets printed.
x = keyboard input (2) # nothing happens
echo y # y is still 8. 8 gets printed.
Now about this word "dynamic binding". It can mean different things:
Variable scope and lifetime is decided at run time. This is what languages
like R do that don't declare variables.
The formula for a computation (like the formula for y is x + 3) isn't
inspected until the variable is evaluated.
My guess is that that is what "dynamic binding" means in your question. Going
over the code again with dynamic binding (sense 2) and lazy execution:
x = keyboard input # nothing happens
y = x + 3 # still nothing happens!
echo y (8) # y becomes 8. 8 gets printed.
x = keyboard input (2) # nothing happens
echo y # y is already evaluated,
# so it uses the stored value and prints 8
I know of no language that would actually print 7 for the last line... but I
really think that's what the question was hoping would happen!
The key thing about lazy evaluation in Haskell is that it doesn't affect the output of your program at all. You can read it just as if everything were evaluated as soon as it is defined, and you'll still get the same result.
Lazy evaluation is just a strategy for figuring out the value of an expression in the program. There are many possible and they all give the same result[1]; any evaluation strategy that changes the meaning of the program wouldn't be a valid strategy!
So from a certain perspective, you don't have to understand lazy evaluation (yet) if it's giving you trouble. When you're learning Haskell, especially if it's your first functional and pure language, thinking about expressing yourself in this way is much more important. I would also rate training yourself to become comfortable with reading Haskell's (often quite dense) syntax as more important than fully "grokking" lazy evaluation. So don't worry about it too much if the concept gives you difficulty.
That said, my go at explaining it is below. I haven't used your examples, as they're not really affected by lazy evaluation, and Owen has talked more clearly than I can about dynamic binding and delayed execution wrt your example.
The most important difference between (valid) evaluation strategies is that some strategies can fail to return a result at all where another strategy might succeed. Lazy evaluation has the particular property that if any (valid) evaluation strategy can find a result, lazy evaluation will find it. In particular, programs that generate infinite data structures and then only use a finite amount of the data can terminate with lazy evaluation. In the strict evaluation you're probably used to, the program has to finish generating the infinite data structure before it can go on to use part of it, and of course it will.
The way lazy evaluation achieves this is by only evaluating something when it's needed to figure out what to do next. When you call a function that returns a list, it "returns" straight away and gives you a placeholder for the list. That placeholder can be passed to other functions, stored in other data structures, anything. Only when the program needs to know something about the list will it be actually evaluated, and only as far as needed.
Say the program now is going to do something different if the list is empty than if it is not. The the function call that originally returned the placeholder is evaluated a little bit further, to see if it returns an empty list or a list with a head element. Then the evaluation stops again, as the program now knows which way to go. If the rest of the list is never needed, it will never be evaluated.
But it's also not evaluated more times than needed. If the placeholder was passed into multiple functions (so it's now involved in other not-yet-evaluated function calls), or stored into several different data structures, Haskell still "knows" that they're all the same thing, and arranges for them all to "see" the effects of any further evaluation of the placeholder triggered from any of them. Eventually, if all of the list is needed somewhere, they'll all be pointing to an ordinary fully-evaluated data structure, and laziness has no further impact.
But the key thing to remember is that everything needed to produce that list is already determined and fixed when the placeholder was generated. It can't be affected by anything else that's happened in the program since. If that were not so, then Haskell would not be pure. And vice versa; impure languages can't have Haskell-style full laziness behind the scenes, because the results you would get could change dramatically depending on when in the future the results are needed. Instead, impure languages that support lazy evaluation tend to have it only for certain things explicitly declared by the programmer, with warnings in the manual saying "don't use laziness on something dependent on side effects".
[1] I lie a little here. Keep reading below the line to see why.
Lazy Evaluation in Haskell: Leftmost-Outermost + Graph Reduction
Square x = x * x
Square (Square 42)
(Square 42) * (Square 42) -> Square 42 will be computed only one time thanks to Graph Reduction
(42 * 42) * (Square 42)
(1764) * (Square 42) -> next is Graph Reduction
1764 * 1764
=3111696
Leftmost-innermost (Java, C++)
Square (Square 42)
square ( 42 * 42)
square ( 1764 )
1764 * 1764
=3111696

How do I efficiently find the maximum value in an array containing values of a smooth function?

I have a function that takes a floating point number and returns a floating point number. It can be assumed that if you were to graph the output of this function it would be 'n' shaped, ie. there would be a single maximum point, and no other points on the function with a zero slope. We also know that input value that yields this maximum output will lie between two known points, perhaps 0.0 and 1.0.
I need to efficiently find the input value that yields the maximum output value to some degree of approximation, without doing an exhaustive search.
I'm looking for something similar to Newton's Method which finds the roots of a function, but since my function is opaque I can't get its derivative.
I would like to down-thumb all the other answers so far, for various reasons, but I won't.
An excellent and efficient method for minimizing (or maximizing) smooth functions when derivatives are not available is parabolic interpolation. It is common to write the algorithm so it temporarily switches to the golden-section search (Brent's minimizer) when parabolic interpolation does not progress as fast as golden-section would.
I wrote such an algorithm in C++. Any offers?
UPDATE: There is a C version of the Brent minimizer in GSL. The archives are here: ftp://ftp.club.cc.cmu.edu/gnu/gsl/ Note that it will be covered by some flavor of GNU "copyleft."
As I write this, the latest-and-greatest appears to be gsl-1.14.tar.gz. The minimizer is located in the file gsl-1.14/min/brent.c. It appears to have termination criteria similar to what I implemented. I have not studied how it decides to switch to golden section, but for the OP, that is probably moot.
UPDATE 2: I googled up a public domain java version, translated from FORTRAN. I cannot vouch for its quality. http://www1.fpl.fs.fed.us/Fmin.java I notice that the hard-coded machine efficiency ("machine precision" in the comments) is 1/2 the value for a typical PC today. Change the value of eps to 2.22045e-16.
Edit 2: The method described in Jive Dadson is a better way to go about this. I'm leaving my answer up since it's easier to implement, if speed isn't too much of an issue.
Use a form of binary search, combined with numeric derivative approximations.
Given the interval [a, b], let x = (a + b) /2
Let epsilon be something very small.
Is (f(x + epsilon) - f(x)) positive? If yes, the function is still growing at x, so you recursively search the interval [x, b]
Otherwise, search the interval [a, x].
There might be a problem if the max lies between x and x + epsilon, but you might give this a try.
Edit: The advantage to this approach is that it exploits the known properties of the function in question. That is, I assumed by "n"-shaped, you meant, increasing-max-decreasing. Here's some Python code I wrote to test the algorithm:
def f(x):
return -x * (x - 1.0)
def findMax(function, a, b, maxSlope):
x = (a + b) / 2.0
e = 0.0001
slope = (function(x + e) - function(x)) / e
if abs(slope) < maxSlope:
return x
if slope > 0:
return findMax(function, x, b, maxSlope)
else:
return findMax(function, a, x, maxSlope)
Typing findMax(f, 0, 3, 0.01) should return 0.504, as desired.
For optimizing a concave function, which is the type of function you are talking about, without evaluating the derivative I would use the secant method.
Given the two initial values x[0]=0.0 and x[1]=1.0 I would proceed to compute the next approximations as:
def next_x(x, xprev):
return x - f(x) * (x - xprev) / (f(x) - f(xprev))
and thus compute x[2], x[3], ... until the change in x becomes small enough.
Edit: As Jive explains, this solution is for root finding which is not the question posed. For optimization the proper solution is the Brent minimizer as explained in his answer.
The Levenberg-Marquardt algorithm is a Newton's method like optimizer. It has a C/C++ implementation levmar that doesn't require you to define the derivative function. Instead it will evaluate the objective function in the current neighborhood to move to the maximum.
BTW: this website appears to be updated since I last visited it, hope it's even the same one I remembered. Apparently it now also support other languages.
Given that it's only a function of a single variable and has one extremum in the interval, you don't really need Newton's method. Some sort of line search algorithm should suffice. This wikipedia article is actually not a bad starting point, if short on details. Note in particular that you could just use the method described under "direct search", starting with the end points of your interval as your two points.
I'm not sure if you'd consider that an "exhaustive search", but it should actually be pretty fast I think for this sort of function (that is, a continuous, smooth function with only one local extremum in the given interval).
You could reduce it to a simple linear fit on the delta's, finding the place where it crosses the x axis. Linear fit can be done very quickly.
Or just take 3 points (left/top/right) and fix the parabola.
It depends mostly on the nature of the underlying relation between x and y, I think.
edit this is in case you have an array of values like the question's title states. When you have a function take Newton-Raphson.

Is finding the equivalence of two functions undecidable?

Is it impossible to know if two functions are equivalent? For example, a compiler writer wants to determine if two functions that the developer has written perform the same operation, what methods can he use to figure that one out? Or can what can we do to find out that two TMs are identical? Is there a way to normalize the machines?
Edit: If the general case is undecidable, how much information do you need to have before you can correctly say that two functions are equivalent?
Given an arbitrary function, f, we define a function f' which returns 1 on input n if f halts on input n. Now, for some number x we define a function g which, on input n, returns 1 if n = x, and otherwise calls f'(n).
If functional equivalence were decidable, then deciding whether g is identical to f' decides whether f halts on input x. That would solve the Halting problem. Related to this discussion is Rice's theorem.
Conclusion: functional equivalence is undecidable.
There is some discussion going on below about the validity of this proof. So let me elaborate on what the proof does, and give some example code in Python.
The proof creates a function f' which on input n starts to compute f(n). When this computation finishes, f' returns 1. Thus, f'(n) = 1 iff f halts on input n, and f' doesn't halt on n iff f doesn't. Python:
def create_f_prime(f):
def f_prime(n):
f(n)
return 1
return f_prime
Then we create a function g which takes n as input, and compares it to some value x. If n = x, then g(n) = g(x) = 1, else g(n) = f'(n). Python:
def create_g(f_prime, x):
def g(n):
return 1 if n == x else f_prime(n)
return g
Now the trick is, that for all n != x we have that g(n) = f'(n). Furthermore, we know that g(x) = 1. So, if g = f', then f'(x) = 1 and hence f(x) halts. Likewise, if g != f' then necessarily f'(x) != 1, which means that f(x) does not halt. So, deciding whether g = f' is equivalent to deciding whether f halts on input x. Using a slightly different notation for the above two functions, we can summarise all this as follows:
def halts(f, x):
def f_prime(n): f(n); return 1
def g(n): return 1 if n == x else f_prime(n)
return equiv(f_prime, g) # If only equiv would actually exist...
I'll also toss in an illustration of the proof in Haskell (GHC performs some loop detection, and I'm not really sure whether the use of seq is fool proof in this case, but anyway):
-- Tells whether two functions f and g are equivalent.
equiv :: (Integer -> Integer) -> (Integer -> Integer) -> Bool
equiv f g = undefined -- If only this could be implemented :)
-- Tells whether f halts on input x
halts :: (Integer -> Integer) -> Integer -> Bool
halts f x = equiv f' g
where
f' n = f n `seq` 1
g n = if n == x then 1 else f' n
Yes, it is undecidable. This is a form of the halting problem.
Note that I mean that it's undecidable for the general case. Just as you can determine halting for sufficiently simple programs, you can determine equivalency for sufficiently simple functions, and it's not inconceivable that this could be of some use for an application. But you cannot make a general method for determining equivalency of any two possible functions.
The general case is undecidable by Rice's Theorem, as others have already said (Rice's Theorem essentially says that any nontrivial property of a Turing-complete formalism is undecidable).
There are special cases where equivalence is decidable, the best-known example is probably equivalence of finite state automata. If I remember correctly equivalence of pushdown automata is already undecidable by reduction to Post's Correspondence Problem.
To prove that two given functions are equivalent you would require as input a proof of the equivalence in some formalism, which you can then check for correctness. The essential parts of this proof are the loop invariants, as these cannot be derived automatically.
In the general case it's undecidable whether two turing machines have always the same output for the identical input. Since you can't even decide whether a tm will halt on the input, I don't see how it should be possible to decide whether both halt AND output the same result...
It depends on what you mean by "function."
If the functions you are talking about are guaranteed to terminate -- for example, because they are written in a language in which all functions terminate -- and operate over finite domains, it's "easy" (although it might still take a very, very long time): two functions are equivalent if and only if they have the same value at every point in their shared domain.
This is called "extensional" equivalence to distinguish it from syntactic or "intensional" equivalence. Two functions are extensionally equivalent if they are intensionally equivalent, but the converse does not hold.
(All the other people above noting that it is undecidable in the general case are quite correct, of course, this is a fairly uncommon -- and usually uninteresting in practice -- special case.)
Note that the halting problem is decidable for linear bounded automata. Real computers are always bounded, and programs for them will always loop back to a previous configuration after sufficiently many steps. If you are using an unbounded (imaginary) computer to keep track of the configurations, you can detect that looping and take it into account.
You could check in your compiler to see if they are "exactly" identical, sure, but determining if they return identical values would be difficult and time consuming. You would have to basically call that method and perform its routine over an infinite number of possible calls and compare the value with that from the other routine.
Even if you could do the above, you would have to account for what global values change within the function, what objects are destroyed / changed in the function that do not affect the outcome.
You can really only compare the compiled code. So compile the compiled code to refactor?
Imagine the run time on trying to compile the code with "that" compiler. You could spend a LOT of time on here answering questions saying: "busy compiling..." :)
I think if you allow side effects, you can show that the problem can be morphed into the Post correspondence problem so you can't, in general, show if two functions are even capable of having the same side effects.
Is it impossible to know if two functions are equivalent?
No. It is possible to know that two functions are equivalent. If you have f(x), you know f(x) is equivalent to f(x).
If the question is "it is possible to determine if f(x) and g(x) are equivalent with f and g being any function and for all functions g and f", then the answer is no.
However, if the question is "can a compiler determine that if f(x) and g(x) are equivalent that they are equivalent?", then the answer is yes if they are equivalent in both output and side effects and order of side effects. In other words, if one is a transformation of the other that preserves behavior, then a compiler of sufficient complexity should be able to detect it. It also means that the compiler can transform a function f into a more optimal and equivalent function g given a particular definition of equivalent. It gets even more fun if f includes undefined behavior, because then g can also include undefined (but different) behavior!

Resources