I'm confused by Data.Map API. I'm looking for a simple way to find a range of keys of the map at log(n) cost. This is a basic concept known as "binary search", and maybe "bisecting".
I see this strange takeWhileAntitone function where I need to provide an "antitone" predicate function. It's the first time I encounter this concept.
After reading Wikipedia on the topic, this seems to be simply saying that there may be only one place where the function changes from True to False when applied to arguments in key order. This fits a requirement for a binary search.
Since the API is documented in a strange (to me) language, I wanted to ask here:
if my understanding is correct, and
is there a reason these functions aren't called bisect, binarySearch or similar?
Since the API is documented in a strange (to me) language, I wanted to ask here:
if my understanding is correct, and
Yes. takeWhileAntitone (and other similarly named variants in the library) is the function for doing binary search on keys. It's not named takeWhile because it does not work for any argument predicate, so if you're reviewing code, it serves as a reminder to check for that.
is there a reason these functions aren't called bisect, binarySearch or similar?
This name serves to distinguish variants takeWhileAntitone, dropWhileAntitone, spanAntitone that "do binary search" but with different final results.
takeWhile is a well-known name from Haskell's standard library (in Data.List).
In FP we like to distinguish the "what" from the "how". "binary search" is an algorithm ("how"). "take while" is also literally a "how", but its meaning is arguably more naturally connected to a "what" (the longest prefix of elements satisfying a predicate). In particular, the interpretation of "take while" as "longest prefix" doesn't rely on any assumption about the predicate.
Related
I'm in a situation where I'm modifying an existing compiler written in OCaml. I've added locations to the AST of the compiled language, but it has cause a bunch of bugs, because equality checks that previously succeeded now fail when identical ASTs have a different location attached.
In particular, I'm seeing List.mem return false when it should return true, since it relies on equality.
I'm wondering, is there a way for me to specify that, for any two values of my location type, that = should always return true for any two values of this type?
It would be a ton of work to refactor the entire compiler to use a custom equality everywhere, particularly since many polymorphic functions rely on being able to use = on any type.
There's no existing OCaml mechanism to do what you want.
You can use ppx to write OCaml syntax extensions, and (as I understand it) the behavior can depend on types. So there's some chance you could get things working that way. But it wouldn't be as straightforward as what you're asking for. I suspect you would need to explicitly handle = and any standard functions (like List.mem) that use = implicitly. (Note that I have no experience with ppx.)
I found a description of PPX here: http://ocamllabs.io/doc/ppx.html
Many experienced OCaml programmers avoid the use of built-in polymorphic equality because its behavior is often surprising. So it might be worth converting to a custom comparison function after all.
What an annoying problem to have.
If you are desperate and willing to write a little C code you can change the representation of locations to Custom_tag blocks, which allow customising the behaviour of some of the polymorphic operations. It's a nasty solution, and I suggest you look hard for a better approach before resorting to this one.
One possibility is that most of the compiler does not use locations at all. If so, you might be able to get away with replacing every location in the AST with the same dummy location. That should allow equality to behave as if locations were not there at all. This is rather hacky, and may not be possible if passes later in the compiler make any use of location info.
The 'clean' solution is to define a sane equality operation for ASTs (or to derive one using ppx) and to change the code to use that. As you say, this would be a lot more work.
This question comes in a context where Isabelle is used with formal software development in mind more than with pure maths theorization in mind (and from a standalone developer's context).
Seems at best, SML programs generated from an Isabelle theory, use SML's IntInf.int, not the native integer type, which is Int.int; even if Code_Target_Int, Code_Binary_Nat or Code_Target_Nat is used. Investigation of these theories sources seems to confirm it's all it can do. Native platform integers may be required for multiple reasons, including efficiency and the case the SML imperative program is to be optionally translated into an imperative language subset (ex. C or Ada), which is relevant when the theory relies on the Imperative_HOL theory. The codegen.pdf document which comes with the Isabelle distribution, did not help with it, except in suggesting the first of the options below.
Options may be:
Not using Isabelle's int and nat and re‑create a new numeric type from scratch, then use the code_printing commands (with its type_constructor and constant) to give it the native platform representation and operations (implies inclusion of range limitations in some way in the theory) : must be tedious, although unlikely error‑prone I hope, due to the formal environment. Note this does seems feasible with Isabelle's own int and nat… it makes code generation fails, and nothing tells which constants are missing in the code_printing command.
If the SML program is to be compiled directly (ex. with MLTon), tweak the SML environment with a replacement IntInf structure : may be unsafe or not feasible, and still requires to embed the range limitations in the theory, so the previous options may finally be better than this one.
Touch the generated program to change IntInf into Int : easy, but it is safe? (at least, IntInf implements the same signature as Int do, so may be it's safe). As above, requires to specifies bounds in the theory in some way, it's OK with this.
Dive into Isabelle internals : surely unreasonable, even worse than the second option.
There exist a Word theory, but according to some readings, it's seems not suited for that purpose.
Are they other known options not listed here? Are they comments on the listed options?
If there is no ready‑to‑cook solutions (I feel there is no at the time), what hints or tracks would be best known? (ex. links to documents, mentions of concepts).
Update
Points #2 and #3 of the list, may be OK (if it really is) only if there is a single integer type. If the program use more than only one, it's not applicable.
Directly generating native words from Isabelle int would be unsound, because your formalisation would not take overflow into account where it exists in reality.
It looks like the AFP entry Native_Word does what you want, though:
http://afp.sourceforge.net/entries/Native_Word.shtml
I have always thought the definition of both of these were functions that take other functions as arguments. I understand the domain of each is different, but what are their defining characteristics?
Well, let me try to kind of derive their defining characteristics from their different domains ;)
First of all, in their usual context combinators are higher order functions. But as it turns out, context is an important thing to keep in mind when talking about differences of these two terms:
Higher Order Functions
When we think of higher order functions, the first thing usually mentioned is "oh, they (also) take at least one function as an argument" (thinking of fold, etc)... as if they were something special because of that. Which - depending on context - they are.
Typical context: functional programming, haskell, any other (usually typed) language where functions are first class citizens (like when LINQ made C# even more awesome)
Focus: let the caller specify/customize some functionality of this function
Combinators
Combinators are somewhat special functions, primitive ones do not even mind what they are given as arguments (argument type often does not matter at all, so passing functions as arguments is not a problem at all). So can the identity-combinator also be called "higher order function"??? Formally: No, it does not need a function as argument! But hold on... in which context would you ever encounter/use combinators (like I, K, etc) instead of just implementing desired functionality "directly"? Answer: Well, in purely functional context!
This is not a law or something, but I can really not think of a situation where you would see actual combinators in a context where you suddenly pass pointers, hash-tables, etc. to a combinator... again, you can do that, but in such scenarios there should really be a better way than using combinators.
So based on this "weak" law of common sense - that you will work with combinators only in a purely functional context - they inherently are higher order functions. What else would you have available to pass as arguments? ;)
Combining combinators (by application only, of course - if you take it seriously) always gives new combinators that therefore also are higher order functions, again. Primitive combinators usually just represent some basic behaviour or operation (thinking of S, K, I, Y combinators) that you want to apply to something without using abstractions. But of course the definition of combinators does not limit them to that purpose!
Typical context: (untyped) lambda calculus, combinatory logic (surprise)
Focus: (structurally) combine existing combinators/"building blocks" to something new (e.g. using the Y-combinator to "add recursion" to something that is not recursive, yet)
Summary
Yes, as you can see, it might be more of a contextual/philosophical thing or about what you want to express: I would never call the K-combinator (definition: K = \a -> \b -> a) "higher order function" - although it is very likely that you will never see K being called with something else than functions, therefore "making" it a higher order function.
I hope this sort of answered your question - formally they certainly are not the same, but their defining characteristics are pretty similar - personally I think of combinators as functions used as higher order functions in their typical context (which usually is somewhere between special an weird).
EDIT: I have adjusted my answer a little bit since - as it turned out - it was slightly "biased" by personal experience/imression. :) To get an even better idea about correctly distinguishing combinators from HOFs, read the comments below!
EDIT2: Taking a look at HaskellWiki also gives a technical definition for combinators that is pretty far away from HOFs!
As an extension question my lecturer for my maths in computer science module asked us to find examples of when a surjective function is vital to the operation of a system, he said he can't think of any!
I've been doing some googling and have only found a single outdated paper about non surjective rounding functions creating some flaws in some cryptographic systems.
Master edit:
[btw, thank you for the accepted response.]
In reviewing my response, and these of others in this post, I realized two things.
The first one is the fact that in looking things at a higher level of abstraction, most (all?) of the [counter-]examples provided are a form of "discretization" function. In other words, they correspond to the ubiquitous requirement in computer systems of mapping [possibly infinitely] numerous entities/values to a set (possibly "infinite" too, though most often a finite one) of discrete entities/values. While not all such mappings imply or require a non bijective surjection, many do, hence the several examples found.
The other observation is that the most compelling examples seem to be tied to stochastic (random) processes, or to the underlying primitives which support them.
Both of these things are quite telling, I think, for it mirrors, if only loosely, the way the real world's complexity (read "randomness", at many levels) is exploited in various systems in human (and animals) to produce simplified/stable/discrete maps that represent elements of this complex reality: Another case where mathematics and its practical-oriented friend, Computer Science, team up to describe or to mimic fundamental realities (or... are these realities? hum... we're getting too philosophical...)
It could be a matter of understanding exactly the frame of the question:
do bijective functions count (they are indeed a special case of a surjection)
Edit: No, bijective functions are not considered.
has it got to be a "function" in the sense of a procedural calculation as opposed to say a "relation" as in databases
Edit: yes, a procedural function of sorts... "take in a value and return another value" (IMHO this distinction is very tenuous as any "map" is a function, regardless of the inner working, but let's entertain this "numeric calculation like" restriction in the spirit of this question)
define "vital"...
With all these caveats in mind, the following may apply:
Elementary mathematical functions such as ABS() or even ROUND(), FLOOR() (absolute value, rouding of a decimal/float value to the nearest int respectively) etc.
In the case of ABS(), for example, used in the context of a program which draws shapes on the screen, using various properties of say symmetry, would be able to count on getting two, and exactly two values to map to a a given value, and to have all values in a given integral range (say from 0 to 10), to be an ABS() value, lest the drawings will start to look funny ;-)
the Soundex function (and its many derivations)
Modulo operations, even in such trivial uses as to show the status of a process, every x items processed.
Classification processes: it is both important that there'd be a important reduction factors (thousands instances mapped to a handful of categories), and it is vital [in some cases] that all instances yield one and only one category (ex: in real-time decision systems).
Various "simple" mathematical functions used in pseudo-random number generators.
It is vital that they'd be surjective, so that a) all values within the namespace would be reacheable, indeed, expectations of a specific, often uniform, distribution is implied. (Note, could be bit of a repeat of the "modulo" example above, although it doesn't have to use modulo arithmetic proper, other math function can do)
Following is a bad example, now that Martin clarified that [math operations like functions that] "take in a value and return another value" is what defines "function" hence disqualifying database/table-driven "maps" and such. And also that bijections were not considered either.
One-to-One relations (or one-to-many relations for that matter) : it can be so important to maintain these that we require triggers etc. to keep up with referential integrity
A very simple scheduler implemented by the function random(0, number of processes - 1) expects this function to be surjective, otherwise some processes will never run.
In practice the scheduler has some sort of internal state that it modifies. If you want to see it as a function in the mathematical sense, it takes a state and returns a new state and a process number to run, and in this context it's no longer important that it is surjective because not all possible states have to be reachable. Not a very good example, I'm afraid, but the only one I can think of.
A hashing function should ideally be surjective.
But in general I think the question is too vague to be answered. What is a system? What is a function used inside a system?
Edit:
I think the question is not very meaningful. After all there are many cases where you need to be able to produce every desired result. Just think about the identity function and imagine where you could argue that it is used:
using a reference to a variable in programming
using a text (or even hex-editor) to produce a file
It would be very bad, if you could not create any bit combination by xor or not when doing bit manipulations.
Can someone tell me what Strong typing and weak typing means and which one is better?
That'll be the theory answers taken care of, but the practice side seems to have been neglected...
Strong-typing means that you can't use one type of variable where another is expected (or have restrictions to doing so). Weak-typing means you can mix different types. In PHP for example, you can mix numbers and strings and PHP won't complain because it is a weakly-typed language.
$message = "You are visitor number ".$count;
If it was strongly typed, you'd have to convert $count from an integer to a string, usually with either with casting:
$message = "you are visitor number ".(string)$count;
...or a function:
$message = "you are visitor number ".strval($count);
As for which is better, that's subjective. Advocates of strong-typing will tell you that it will help you to avoid some bugs and/or errors and help communicate the purpose of a variable etc. They'll also tell you that advocates of weak-typing will call strong-typing "unnecessary language fluff that is rendered pointless by common sense", or something similar. As a card-carrying member of the weak-typing group, I'd have to say that they've got my number... but I have theirs too, and I can put it in a string :)
"Strong typing" and its opposite "weak typing" are rather weak in meaning, partly since the notion of what is considered to be "strong" can vary depending on whom you ask. E.g. C has been been called both "strongly typed" and "weakly typed" by different authors, it really depends on what you compare it to.
Generally a type system should be considered stronger if it can express the same constraints as another and more. Quite often two type systems are not be comparable, though -- one might have features the other lacks and vice versa. Any discussion of relative strengths is then up to personal taste.
Having a stronger type system means that either the compiler or the runtime will report more errors, which is usually a good thing, although it might come at the cost of having to provide more type information manually, which might be considered effort not worthwhile. I would claim "strong typing" is generally better, but you have to look at the cost.
It's also important to realize that "strongly typed" is often incorrectly used instead of "statically typed" or even "manifest typed". "Statically typed" means that there are type checks at compile-time, "manifest typed" means that the types are declared explicitly. Manifest-typing is probably the best known way of making a type system stronger (think Java), but you can add strength by other means such as type-inference.
I would like to reiterate that weak typing is not the same as dynamic typing.
This is a rather well written article on the subject and I would definitely recommend giving it a read if you are unsure about the differences between strong, weak, static and dynamic type systems. It details the differences much better than can be expected in a short answer, and has some very enlightening examples.
http://en.wikipedia.org/wiki/Type_system
Strong typing is the most common type model in modern programming languages. Those languages have one simple feature - knowing about type values in run time. We can say that strong typed languages prevent mixing operations between two or more different kind of types. Here is an example in Java:
String foo = "Hello, world!";
Object obj = foo;
String bar = (String) obj;
Date baz = (Date) obj; // This line will throw an error
The previous example will work perfectly well until program hit the last line of code where the ClassCastException is going to be thrown because Java is strong typed programming language.
When we talk about weak typed languages, Perl is one of them. The following example shows how Perl doesn't have any problems with mixing two different types.
$a = 10;
$b = "a";
$c = $a . $b;
print $c; # returns 10a
I hope you find this useful,
Thanks.
This article is a great read: http://blogs.perl.org/users/ovid/2010/08/what-to-know-before-debating-type-systems.html Cleared up a lot of things for me when researching trying to answer a similar question, hope others find it useful too.
Strong and Weak Typing:
Probably the most common way type systems are classified is "strong"
or "weak." This is unfortunate, since these words have nearly no
meaning at all. It is, to a limited extent, possible to compare two
languages with very similar type systems, and designate one as having
the stronger of those two systems. Beyond that, the words mean nothing
at all.
Static and Dynamic Types
This is very nearly the only common classification of type systems
that has real meaning. As a matter of fact, it's significance is
frequently under-estimated [...] Dynamic and static type systems are
two completely different things, whose goals happen to partially
overlap.
A static type system is a mechanism by which a compiler examines
source code and assigns labels (called "types") to pieces of the
syntax, and then uses them to infer something about the program's
behavior. A dynamic type system is a mechanism by which a compiler
generates code to keep track of the sort of data (coincidentally, also
called its "type") used by the program. The use of the same word
"type" in each of these two systems is, of course, not really entirely
coincidental; yet it is best understood as having a sort of weak
historical significance. Great confusion results from trying to find a
world view in which "type" really means the same thing in both
systems. It doesn't.
Explicit/Implicit Types:
When these terms are used, they refer to the extent to which a
compiler will reason about the static types of parts of a program. All
programming languages have some form of reasoning about types. Some
have more than others. ML and Haskell have implicit types, in that no
(or very few, depending on the language and extensions in use) type
declarations are needed. Java and Ada have very explicit types, and
one is constantly declaring the types of things. All of the above have
(relatively, compared to C and C++, for example) strong static type
systems.
Strong/weak typing in a language is related to how easily you can do type conversions:
For example in Python:
str = 5 + 'a'
# would throw an error since it does not want to cast one type to the other implicitly.
Where as in C language:
int a = 5;
a = 5 + 'c';
/* is fine, because C treats 'c' as an integer in this case */
Thus Python is more strongly typed than C (from this perspective).
May be this can help you to understand Strong and Weak Typing.......
Strong typing: It checks the type of variables as soon as possible, usually at compile time. It prevents mixing operations between mismatched types.
A strong-typed programming language is one in which:
All variables (or data types) are known at compile time
There is strict enforcement of typing rules (a String can't be used
where an Integer would be expected)
All exceptions to typing rules results in a compile time error
Weak Typing: While weak typing is delaying checking the types of the system as late as possible, usually to run-time. In this you can mix types without an explicit conversion.
A "weak-typed" programming language is simply one which is not strong-typed.
which is preferred depends on what you want. for scripts and good stuff you will usually want weak typing, because you want to write as much less code as possible. in big programs, strong typing can reduce errors at compile time.
Weak typing means that you don't specify what type a variable is, and strong typing means you give a strict type to each variable.
Each has its advantages, with weak typing (or dynamic typing, as it is often called), being more flexible and requiring less code from the programmer. Strong typing, on the other hand, requires more work from the developer, but in return it can alert you of many mistakes when compiling your code, before you run it. Dynamic typing may delay the discovery of these simple problems until the code is executed.
Depending on the task at hand, weak typing may be better than strong typing, or vice versa, but it is mostly a matter of taste. Weak typing is commonly used in scripting languages, while strong typing is used in most compiled languages.