Surjective functions - math

As an extension question my lecturer for my maths in computer science module asked us to find examples of when a surjective function is vital to the operation of a system, he said he can't think of any!
I've been doing some googling and have only found a single outdated paper about non surjective rounding functions creating some flaws in some cryptographic systems.

Master edit:
[btw, thank you for the accepted response.]
In reviewing my response, and these of others in this post, I realized two things.
The first one is the fact that in looking things at a higher level of abstraction, most (all?) of the [counter-]examples provided are a form of "discretization" function. In other words, they correspond to the ubiquitous requirement in computer systems of mapping [possibly infinitely] numerous entities/values to a set (possibly "infinite" too, though most often a finite one) of discrete entities/values. While not all such mappings imply or require a non bijective surjection, many do, hence the several examples found.
The other observation is that the most compelling examples seem to be tied to stochastic (random) processes, or to the underlying primitives which support them.
Both of these things are quite telling, I think, for it mirrors, if only loosely, the way the real world's complexity (read "randomness", at many levels) is exploited in various systems in human (and animals) to produce simplified/stable/discrete maps that represent elements of this complex reality: Another case where mathematics and its practical-oriented friend, Computer Science, team up to describe or to mimic fundamental realities (or... are these realities? hum... we're getting too philosophical...)
It could be a matter of understanding exactly the frame of the question:
do bijective functions count (they are indeed a special case of a surjection)
Edit: No, bijective functions are not considered.
has it got to be a "function" in the sense of a procedural calculation as opposed to say a "relation" as in databases
Edit: yes, a procedural function of sorts... "take in a value and return another value" (IMHO this distinction is very tenuous as any "map" is a function, regardless of the inner working, but let's entertain this "numeric calculation like" restriction in the spirit of this question)
define "vital"...
With all these caveats in mind, the following may apply:
Elementary mathematical functions such as ABS() or even ROUND(), FLOOR() (absolute value, rouding of a decimal/float value to the nearest int respectively) etc.
In the case of ABS(), for example, used in the context of a program which draws shapes on the screen, using various properties of say symmetry, would be able to count on getting two, and exactly two values to map to a a given value, and to have all values in a given integral range (say from 0 to 10), to be an ABS() value, lest the drawings will start to look funny ;-)
the Soundex function (and its many derivations)
Modulo operations, even in such trivial uses as to show the status of a process, every x items processed.
Classification processes: it is both important that there'd be a important reduction factors (thousands instances mapped to a handful of categories), and it is vital [in some cases] that all instances yield one and only one category (ex: in real-time decision systems).
Various "simple" mathematical functions used in pseudo-random number generators.
It is vital that they'd be surjective, so that a) all values within the namespace would be reacheable, indeed, expectations of a specific, often uniform, distribution is implied. (Note, could be bit of a repeat of the "modulo" example above, although it doesn't have to use modulo arithmetic proper, other math function can do)
Following is a bad example, now that Martin clarified that [math operations like functions that] "take in a value and return another value" is what defines "function" hence disqualifying database/table-driven "maps" and such. And also that bijections were not considered either.
One-to-One relations (or one-to-many relations for that matter) : it can be so important to maintain these that we require triggers etc. to keep up with referential integrity

A very simple scheduler implemented by the function random(0, number of processes - 1) expects this function to be surjective, otherwise some processes will never run.
In practice the scheduler has some sort of internal state that it modifies. If you want to see it as a function in the mathematical sense, it takes a state and returns a new state and a process number to run, and in this context it's no longer important that it is surjective because not all possible states have to be reachable. Not a very good example, I'm afraid, but the only one I can think of.

A hashing function should ideally be surjective.
But in general I think the question is too vague to be answered. What is a system? What is a function used inside a system?
Edit:
I think the question is not very meaningful. After all there are many cases where you need to be able to produce every desired result. Just think about the identity function and imagine where you could argue that it is used:
using a reference to a variable in programming
using a text (or even hex-editor) to produce a file
It would be very bad, if you could not create any bit combination by xor or not when doing bit manipulations.

Related

Sparecode analysis in Frama-C

Sorry if this is detailed somewhere, I tried searching in the different documentations of Frama-C without luck.
I'm trying to do dead code elimination in my code, but I don't understand the results of the tool. Is there any paper / documentation that explains how this plugin works? I only know that it uses the results of the Value analysis.
Admittedly, the sparecode page on Frama-C's website is a bit terse. However, this is partly due to the fact that there's not much to parameterize in this plug-in. Mainly, it is a specialized form of the slicing plug-in, where the criterion is "preserve the state at the end of the program".
More generally, slicing consists in removing instructions that do not contribute to a user given criterion (e.g. the whole program state at a given point, the validity status of an ACSL annotation, or simply the fact that the program reaches a particular instruction).
In order to compute such slice, slicing, hence sparecode, indeed relies on the results of Eva, mainly to obtain an over-approximation (as always with Eva) of the dependencies between the various memory locations involved at each point in the program (you might want to have a look at Chapter 7 of the Eva manual which deals with dependencies. Very roughly speaking the slice will consist in the transitive closure of the dependencies for the memory locations involved in the criterion (in presence of pointers and branches, this notion of "transitive closure" becomes a bit complicated to define formally, but the essence is there).
Now, with respect to dead code, there are two points worth noting:
As mentioned before, Eva provides an over-approximation of the behavior of the program. For slicing, this means that some statements might be kept even though they do not contribute to the slicing criterion in any concrete trace, but appear to do so in the abstract trace due to over-approximation. On the other hand, if a statement is not included, then it definitely does not contribute to the criterion.
For sparecode, not contributing to the final state of the program does not mean that the code is dead, but merely that all its side-effects are shadowed by further instructions. The simplest example of that would be the following:
int x;
int main() {
x = 1;
x = 2;
}
Here, the x=1 has no influence over the final state of the program, and only x=2 will be kept.

Global State in Functional Programming (F#)

I want to compute some functions which are dependent on some variables (specific data on which I run the code) and global variables, which are unlikely to be changed, but I want to leave them user-tunable. Just to clarify with an example, suppose I want to declare the following function:
let multiplyByGain x =
x * gain
Where would you declare gain, being gain a global constant for the whole project. In a separate module with constants? That would couple the module with this code, though. Or would you use a curried version:
let multiblyByGain x gain =
x * gain
and then specialize for the specific values? But suppose you have many functions like that, you will have to inject gain to all of them (in a sort of linking module)?
In my specific problem this becomes more cumbersome because both x and gain are arrays which must have the same length, suppose I have to do a Array.zip, e.g.: what is the best practice in terms of functional design to address a global constant, as gain, in a general way?
P.S.: I have found this old postenter link description here, but addresses only a specific problem.
There is no single correct answer to the question and the best approach will depend on a variety of other constraints and requirements that you have. Also, it depends on whether you are asking specifically about F# or whether you are asking about functional programming more generally. I think there are three main points:
Keeping it simple.
Using a module that exposes gain as a global value, which has some initialization code to read configuration seems like a good default approach in F#. If this is changed only rarely (say, before you run the whole computation), then mutation is not going to cause you any troubles. You just need to be careful to avoid changing the values while some computation is still running. I think most F# programmers code tend to be quite pragmatic about this and this seems like the easiest thing to start with.
Unit testing.
If you want to unit ytest your multiplyByGain function with different gain as an argument, then you'll need some way of passing different values of gain to the function from your unit tests. In this case, having it as an additional parameter and using currying is nice, because you can just call it with other values of gain from your tests.
Functional programming.
Some functional language communities (especially Haskell and, sometimes, Scala) are way more strict about state. The purely functional way of keeping state would be to use monads (either the reader monad or some kind of free monad structure). This makes your code a lot more complicated (both conceptually and in terms of extra syntactic overhead), but it is a purely functional solution that eliminates state. In F#, this kind of approach is even more cumbersome, so it's not very common.

What is the purpose of single assignment?

I'm currently trying to master Erlang. It's the first functional programming language that I look into and I noticed that in Erlang, each assignments that you do is a single assignment. And apparently, not just in Erlang, but in many other functional programming languages, assignments are done through single assignment.
I'm really confused about why they made it like that. What exactly is the purpose of single assignment? What benefits can we get from it?
Immutability (what you call single assignment), simplifies a lot of things because it takes out the "time" variable from your programs.
For example, in mathematics if you say
x = y
You can replace x for y, everywhere. In operational programming languages you can't ensure that this equality holds: there is a "time" (state) associated with each line of code. This time state also leaves the door open to undesired side effects which is the enemy number one of modularity and concurrency.
For more information see this.
Because of Single Assignment, Side effects are so minimal. Infact, its so hard to write code with race conditions or any side effects in Erlang. This is because, the Compiler easilly tells un-used variables, created terms which are not used, shadowed variables (especially inside funs ) e.t.c. Another advantage that Erlang gained in this is Referential Transparency. A function in Erlang will depend only on the variables passed to it and NOT on global variables, except MACROS (and macros cannot be changed at run-time, they are constants.). Lastly, if you watched the Erlang Movie, the Sophisticated Error Detection Mechanism which was built into Erlang depends so much on the fact that in Erlang, variables are assigned Once.
Having variables keep their values makes it much easier to understand and debug the code. With concurrent processes you get the same kind of problem anyway, so there is enough complication anyway without having just any variable potentially change its value at any time. Think of it as encapsulating side effects by only allowing them when explicit.

Map/Reduce: any theoretical foundation beyond "howto"?

For a while I was thinking that you just need a map to a monoid, and then reduce would do reduction according to monoid's multiplication.
First, this is not exactly how monoids work, and second, this is not exactly how map/reduce works in practice.
Namely, take the ubiquitous "count" example. If there's nothing to count, any map/reduce engine will return an empty dataset, not a neutral element. Bummer.
Besides, in a monoid, an operation is defined for two elements. We can easily extend it to finite sequences, or, due to associativity, to finite ordered sets. But there's no way to extend it to arbitrary "collections" unless we actually have a σ-algebra.
So, what's the theory? I tried to figure it out, but I could not; and I tried to go Google it but found nothing.
I think the right way to think about map-reduce is not as a computational paradigm in its own right, but rather as a control flow construct similar to a while loop. You can view while as a program constructor with two arguments, a predicate function and an arbitrary program. Similarly, the map-reduce construct has two arguments named map and reduce, each functions. So analogously to while, the useful questions to ask are about proving correctness of constructed programs relative to given preconditions and postconditions. And as usual, those questions involve (a) termination and run-time performance and (b) maintenance of invariants.

How to identify that code is over abstracted?

What should be the measures that should be used to identify that code is over abstracted and very hard to understand and what should be done to reduce over abstraction?
"Simplicity over complexity, complexity over complicatedness"
So - there's a benefit to abstract something only if You are "de-leveling" complicatedness to complexity. Reasons to do that can vary: better modularity, better encapsulation etc.
Identifying over abstraction is a chicken and egg problem. In order to reduce over abstraction You need to understand actual reason behind code lines. That includes understanding idea of particular abstraction itself (in contrast to calling it over abstracted cause of lack of understanding). And that's not enough - You need to know a better, simpler solution to prove that it's over abstracted.
If You are looking for tool that could do it in Your place - look no more, only mind can reliably judge that.
I will give an answer that will get a LOT of down votes!
If the code is written in an OO language .. it is necessarily heavily over-abstracted. The purer the language the worse the problem.
Abstraction should be used with great caution. If in doubt always use concrete data structures. (You can always abstract later, this is easier than de-abstraction :)
You must be very certain you have the right abstraction in your current context, and you must be very sure that concept will stand the test of change. Abstraction has a high price in performance of both the code and the coder.
Some weak tests for over-abstraction: if the data structure is a product type (struct in C) and the programmer has written get and set method for each field, they have utterly failed to provide any real abstraction, disabled operators like C increment, for no purpose, and simply not understood that the struct field names are already the abstract representation of a product. Duplicating and laming up the interface is not a good idea.
A good test for the product case is whether there exist any data invariants to maintain. For example a pair of integers representing a rational number is almost sufficient, there's little need for any abstraction because all pairs are valid except when the denominator is zero. However for performance reasons one may choose to maintain an invariant, typically the denominator is required to be greater than zero, and the numerator and denominator are relatively prime. To ensure the invariant, the product representation is encapsulated: the initial value protected by a constructor and methods constrained to maintain the invariant.
To fix code I recommend these steps:
Document the representation invariants the abstraction is maintaining
Remove the abstraction (methods) if you can't find strong invariants
Rewrite code using the method to access the data directly.
This procedure only works for low level abstraction, i.e. abstraction of small values by classes.
Over abstraction at a higher level is much harder to deal with. Ideally you'd refactor the code repeatedly, checking to see after each step it continues to work. However this will be hard, and sometimes a major rewrite is required, rather than a refinement. It's probably not worth it unless the abstraction is so far off base it is not tenable to continue to maintain it.
Download Magento and have a look at the code, read some documents on it and have a look at their ERD: http://www.magentocommerce.com/wiki/_media/doc/magento---sample_database_diagram.png?cache=cache
I'm not joking, this is over-abstraction.. trying to please everyone and cover every base is a terrible idea and makes life extremely difficult for everyone.
Personally I would say that "What is the ideal level of abstraction?" is a subjective question.
I don't like code that uses a new line for every atomic operation, but I also don't like 10 nested operations within one line.
I like the use of recursive functions, but I don't appreciate recursion for the sole sake of recursion.
I like generics, but I don't like (nested) generic functions that e.g. use different code for each specific type that's expected...
It is a matter of personal opinion as well as common sense. Does this answer your question?
I completely agree with what #ArnisLapsa wrote:
"Simplicity over complexity, complexity over complicatedness"
And that
an abstraction is used to "de-level" those, from complicated to complex
(and from complex to simpler)
Also, as stated by #MartinHemmings a good abstraction is quite subjective because we don't all think the same way. And actually our way of thinking change with time. So Something that someone find simple might looks complex to others, and even become simpler with more experiences. Eg. A monadic operation is something trivial for functional programmer, but can be seriously confusing for others. Similarly, a design with mutable object communicating with each other can be natural for some and feel un-trackable for others.
That being said, I would like to add a couple of indicators. Note that this applies to abstractions used in code-base, not "paradigm abstraction" such as everything-is-a-function, or everything-is-designed-as-objects. So:
To the people it concerns, the abstraction should be conceptually simpler than other alternatives, without looking at the implementation. If you find that thinking of all possible cases is simpler that reasoning using the abstraction, then this abstraction is not suitable (for you)
Its implementation should reason only about the abstraction, not the specific cases that it will be used for. As soon as the abstraction implementation has parts made for specific cases, it indicates an "unfit" abstraction. And increasing generalization to cope with each new case, is going the wrong way (and tends to fall to the next issue).
A very common indicator of over-abstraction I have found (and actually fell for) are abstractions that represent more than what is needed, now. As much as possible, they should allow to do exactly what is required, but nothing more. For example, say you're thinking of, or already have, a "2d point" abstraction for which you can define many operators you need. Then you have another need that could really be a "4d point" similar to the 2d. Don't start to use a "Ndimensionnal point" abstraction, especially thinking that you might later need it. Maybe you'll never have anything else than 2 and 4d (because it stays as "a good idea" in the backlog forever) but instead some requirements pops to convert 4d points into pairs of 2d points. That's going to be hard to generalize to n-dimensions. So, each abstraction can be checked to cover and only cover the actual needs. In my point example, the complexity "n-dimensional" is actually only used to cope with the 2 and 4d cases (and the 4d might not even be used that much).
Finally, in a more global point of view, a code-base that has many not related abstractions, is an indicator that the dev team tends to abstract every little issues. So probably many of them are or became over-abstracted.

Resources