Ignoring certain types with respect to = in OCaml - functional-programming

I'm in a situation where I'm modifying an existing compiler written in OCaml. I've added locations to the AST of the compiled language, but it has cause a bunch of bugs, because equality checks that previously succeeded now fail when identical ASTs have a different location attached.
In particular, I'm seeing List.mem return false when it should return true, since it relies on equality.
I'm wondering, is there a way for me to specify that, for any two values of my location type, that = should always return true for any two values of this type?
It would be a ton of work to refactor the entire compiler to use a custom equality everywhere, particularly since many polymorphic functions rely on being able to use = on any type.

There's no existing OCaml mechanism to do what you want.
You can use ppx to write OCaml syntax extensions, and (as I understand it) the behavior can depend on types. So there's some chance you could get things working that way. But it wouldn't be as straightforward as what you're asking for. I suspect you would need to explicitly handle = and any standard functions (like List.mem) that use = implicitly. (Note that I have no experience with ppx.)
I found a description of PPX here: http://ocamllabs.io/doc/ppx.html
Many experienced OCaml programmers avoid the use of built-in polymorphic equality because its behavior is often surprising. So it might be worth converting to a custom comparison function after all.

What an annoying problem to have.
If you are desperate and willing to write a little C code you can change the representation of locations to Custom_tag blocks, which allow customising the behaviour of some of the polymorphic operations. It's a nasty solution, and I suggest you look hard for a better approach before resorting to this one.
One possibility is that most of the compiler does not use locations at all. If so, you might be able to get away with replacing every location in the AST with the same dummy location. That should allow equality to behave as if locations were not there at all. This is rather hacky, and may not be possible if passes later in the compiler make any use of location info.
The 'clean' solution is to define a sane equality operation for ASTs (or to derive one using ppx) and to change the code to use that. As you say, this would be a lot more work.

Related

Julia functions: making mutable types immutable

Coming from Wolfram Mathematica, I like the idea that whenever I pass a variable to a function I am effectively creating a copy of that variable. On the other hand, I am learning that in Julia there are the notions of mutable and immutable types, with the former passed by reference and the latter passed by value. Can somebody explain me the advantage of such a distinction? why arrays are passed by reference? Naively I see this as a bad aspect, since it creates side effects and ruins the possibility to write purely functional code. Where I am wrong in my reasoning? is there a way to make immutable an array, such that when it is passed to a function it is effectively passed by value?
here an example of code
#x is an in INT and so is immutable: it is passed by value
x = 10
function change_value(x)
x = 17
end
change_value(x)
println(x)
#arrays are mutable: they are passed by reference
arr = [1, 2, 3]
function change_array!(A)
A[1] = 20
end
change_array!(arr)
println(arr)
which indeed modifies the array arr
There is a fair bit to respond to here.
First, Julia does not pass-by-reference or pass-by-value. Rather it employs a paradigm known as pass-by-sharing. Quoting the docs:
Function arguments themselves act as new variable bindings (new
locations that can refer to values), but the values they refer to are
identical to the passed values.
Second, you appear to be asking why Julia does not copy arrays when passing them into functions. This is a simple one to answer: Performance. Julia is a performance oriented language. Making a copy every time you pass an array into a function is bad for performance. Every copy operation takes time.
This has some interesting side-effects. For example, you'll notice that a lot of the mature Julia packages (as well as the Base code) consists of many short functions. This code structure is a direct consequence of near-zero overhead to function calls. Languages like Mathematica and MatLab on the other hand tend towards long functions. I have no desire to start a flame war here, so I'll merely state that personally I prefer the Julia style of many short functions.
Third, you are wondering about the potential negative implications of pass-by-sharing. In theory you are correct that this can result in problems when users are unsure whether a function will modify its inputs. There were long discussions about this in the early days of the language, and based on your question, you appear to have worked out that the convention is that functions that modify their arguments have a trailing ! in the function name. Interestingly, this standard is not compulsory so yes, it is in theory possible to end up with a wild-west type scenario where users live in a constant state of uncertainty. In practice this has never been a problem (to my knowledge). The convention of using ! is enforced in Base Julia, and in fact I have never encountered a package that does not adhere to this convention. In summary, yes, it is possible to run into issues when pass-by-sharing, but in practice it has never been a problem, and the performance benefits far outweigh the cost.
Fourth (and finally), you ask whether there is a way to make an array immutable. First things first, I would strongly recommend against hacks to attempt to make native arrays immutable. For example, you could attempt to disable the setindex! function for arrays... but please don't do this. It will break so many things.
As was mentioned in the comments on the question, you could use StaticArrays. However, as Simeon notes in the comments on this answer, there are performance penalties for using static arrays for really big datasets. More than 100 elements and you can run into compilation issues. The main benefit of static arrays really is the optimizations that can be implemented for smaller static arrays.
Another package-based options suggested by phipsgabler in the comments below is FunctionalCollections. This appears to do what you want, although it looks to be only sporadically maintained. Of course, that isn't always a bad thing.
A simpler approach is just to copy arrays in your own code whenever you want to implement pass-by-value. For example:
f!(copy(x))
Just be sure you understand the difference between copy and deepcopy, and when you may need to use the latter. If you're only working with arrays of numbers, you'll never need the latter, and in fact using it will probably drastically slow down your code.
If you wanted to do a bit of work then you could also build your own array type in the spirit of static arrays, but without all the bells and whistles that static arrays entails. For example:
struct MyImmutableArray{T,N}
x::Array{T,N}
end
Base.getindex(y::MyImmutableArray, inds...) = getindex(y.x, inds...)
and similarly you could add any other functions you wanted to this type, while excluding functions like setindex!.

Global State in Functional Programming (F#)

I want to compute some functions which are dependent on some variables (specific data on which I run the code) and global variables, which are unlikely to be changed, but I want to leave them user-tunable. Just to clarify with an example, suppose I want to declare the following function:
let multiplyByGain x =
x * gain
Where would you declare gain, being gain a global constant for the whole project. In a separate module with constants? That would couple the module with this code, though. Or would you use a curried version:
let multiblyByGain x gain =
x * gain
and then specialize for the specific values? But suppose you have many functions like that, you will have to inject gain to all of them (in a sort of linking module)?
In my specific problem this becomes more cumbersome because both x and gain are arrays which must have the same length, suppose I have to do a Array.zip, e.g.: what is the best practice in terms of functional design to address a global constant, as gain, in a general way?
P.S.: I have found this old postenter link description here, but addresses only a specific problem.
There is no single correct answer to the question and the best approach will depend on a variety of other constraints and requirements that you have. Also, it depends on whether you are asking specifically about F# or whether you are asking about functional programming more generally. I think there are three main points:
Keeping it simple.
Using a module that exposes gain as a global value, which has some initialization code to read configuration seems like a good default approach in F#. If this is changed only rarely (say, before you run the whole computation), then mutation is not going to cause you any troubles. You just need to be careful to avoid changing the values while some computation is still running. I think most F# programmers code tend to be quite pragmatic about this and this seems like the easiest thing to start with.
Unit testing.
If you want to unit ytest your multiplyByGain function with different gain as an argument, then you'll need some way of passing different values of gain to the function from your unit tests. In this case, having it as an additional parameter and using currying is nice, because you can just call it with other values of gain from your tests.
Functional programming.
Some functional language communities (especially Haskell and, sometimes, Scala) are way more strict about state. The purely functional way of keeping state would be to use monads (either the reader monad or some kind of free monad structure). This makes your code a lot more complicated (both conceptually and in terms of extra syntactic overhead), but it is a purely functional solution that eliminates state. In F#, this kind of approach is even more cumbersome, so it's not very common.

Better way to get the reflect.Type of an interface in Go

Is there an better way to get the reflect.Type of an interface in Go than reflect.TypeOf((*someInterface)(nil)).Elem()?
It works, but it makes me cringe every time I scroll past it.
Unfortunately, there is not. While it might look ugly, it is indeed expressing the minimal amount of information needed to get the reflect.Type that you require. These are usually included at the top of the file in a var() block with all such necessary types so that they are computed at program init and don't incur the TypeOf lookup penalty every time a function needs the value.
This idiom is used throughout the standard library, for instance:
html/template/content.go: errorType = reflect.TypeOf((*error)(nil)).Elem()
The reason for this verbose construction stems from the fact that reflect.TypeOf is part of a library and not a built-in, and thus must actually take a value.
In some languages, the name of a type is an identifier that can be used as an expression. This is not the case in Go. The valid expressions can be found in the spec. If the name of a type were also usable as a reflect.Type, it would introduce an ambiguity for method expressions because reflect.Type has its own methods (in fact, it's an interface). It would also couple the language spec with the standard library, which reduces the flexibility of both.

How to hand over variables to a function? With an array or variables?

When I try to refactor my functions, for new needs, I stumble from time to time about the crucial question:
Shall I add another variable with a default value? Or shall I use only one array, where I´m able to add an additional variable without breaking the API?
Unless you need to support a flexible number of variables, I think it's best to explicitly identify each parameter. In most cases you can add an overloaded method that has a different signature to support the extra parameter while still supporting the original method signature. If you use an array for passing variables it just makes it too confusing for users of your API. Obviously there are some inputs that lend themselves to an array (a list of points in a polygon, a list of account IDs you wish to perform an action on, etc.) but if it's not a variable that you would reasonably expect to be an array or list, you should pass it into the method as a separate parameter.
Just like many questions in programming, the right answer is "it depends".
To take Javascript/jQuery as an example, one good rule of thumb is whether the parameter will be required each time the function is called or whether it is optional. For example, the main jQuery function itself requires an expression to determine what element(s) the operation will affect:
jQuery(expresssion)
It makes no sense to try to pass this parameter as part of an array as it will be required every time this function is called.
On the other hand, many jQuery plugins require several miscellaneous parameters that may be optional. By convention, these are passed as parameters via an 'options' array. As you said, this provides a nice interface as new parameters can be added without affecting the existing API. This makes the API clean as well since the user can ignore those options that are not applicable.
In general, when several parameters are involved, passing them as an array is a nice convention as many of them are certainly going to be optional. This would have helped clean up many WIN32 API's, although it is more difficult to deal with arrays in C/C++ than in Javascript.
It depends on the programming language used.
If you have a run-of-the-mill OO language, you should use an object that you can easily extend, if you are really concerned about API consistency.
If that doesn't matter that much, there is the option of changing the method signature and overloading the method with more / different parameters.
If your language doesn't support either and you want the API to be binary stable, use an array.
There are several considerations that must be made.
Where is the function used? - Only in code you created? One place or hundreds of places? The amount of work that will need to be done to maintain existing code is important. Remember to include the amount of time it will take to communicate to other programmers that may currently be using your function.
How critical is the new parameter? - Do you want to require it to be used? If it has a default value, will that default value break existing use of the function in any subtle ways?
Ease of comprehension - How many parameters are already passed into the function? The larger the number, the more confusing and error prone it will be. Code Complete recommends that you restrict the number of parameters to 7 or less. If you need more than that, you should try to abstract some or all of the related parameters into one object.
Other special considerations - Do you want to optimize your efforts for any special conditions such as code speed or size? Are there any special considerations that must be taken into account for your execution environment? Keep in mind your goals for the project and make sure you aren't working against them with whatever design choice you make.
In his book Code Complete, Steve McConnell decrees that a function should never have more than 7 arguments, and rarely even that many. He presents compelling arguments - that I can't cite from memory, alas.
Clean Code, more recently, advocates even fewer arguments.
So unless the number of things to pass is really small, they should be passed in an enveloping structure. If they're homogenous, an array. If not, then a reasonably lightweight object should be built for the purpose.
You should do neither. Just add the parameter and change all callers to supply the proper default value. The reason is that parameters with default values can only be at the end, and will not be able to add any more required parameters anywhere in the parameters list, without having a risk of misinterpretation.
These are the critical steps to disaster:
1. add one or two parameters with defaults
2. some callers will supply it, and some will rely on defaults.
[half a year passed]
3. add a required parameter (before them)
4. change all callers to accept the required parameter
5. get a phone call, or other event which will make you forget to change one of the instances in part#2
6. now your program compiles perfectly, but is invalid.
Unfortunately, in function call semantics we usually don't have a chance to say, by name, which value goes where.
Array is also not a proper solution. Array should be used as a connection of similar objects, upon which there's a uniform activity performed. As they say here, if it's worth refactoring, it's worth refactoring now.

Can someone tell me what Strong typing and weak typing means and which one is better?

Can someone tell me what Strong typing and weak typing means and which one is better?
That'll be the theory answers taken care of, but the practice side seems to have been neglected...
Strong-typing means that you can't use one type of variable where another is expected (or have restrictions to doing so). Weak-typing means you can mix different types. In PHP for example, you can mix numbers and strings and PHP won't complain because it is a weakly-typed language.
$message = "You are visitor number ".$count;
If it was strongly typed, you'd have to convert $count from an integer to a string, usually with either with casting:
$message = "you are visitor number ".(string)$count;
...or a function:
$message = "you are visitor number ".strval($count);
As for which is better, that's subjective. Advocates of strong-typing will tell you that it will help you to avoid some bugs and/or errors and help communicate the purpose of a variable etc. They'll also tell you that advocates of weak-typing will call strong-typing "unnecessary language fluff that is rendered pointless by common sense", or something similar. As a card-carrying member of the weak-typing group, I'd have to say that they've got my number... but I have theirs too, and I can put it in a string :)
"Strong typing" and its opposite "weak typing" are rather weak in meaning, partly since the notion of what is considered to be "strong" can vary depending on whom you ask. E.g. C has been been called both "strongly typed" and "weakly typed" by different authors, it really depends on what you compare it to.
Generally a type system should be considered stronger if it can express the same constraints as another and more. Quite often two type systems are not be comparable, though -- one might have features the other lacks and vice versa. Any discussion of relative strengths is then up to personal taste.
Having a stronger type system means that either the compiler or the runtime will report more errors, which is usually a good thing, although it might come at the cost of having to provide more type information manually, which might be considered effort not worthwhile. I would claim "strong typing" is generally better, but you have to look at the cost.
It's also important to realize that "strongly typed" is often incorrectly used instead of "statically typed" or even "manifest typed". "Statically typed" means that there are type checks at compile-time, "manifest typed" means that the types are declared explicitly. Manifest-typing is probably the best known way of making a type system stronger (think Java), but you can add strength by other means such as type-inference.
I would like to reiterate that weak typing is not the same as dynamic typing.
This is a rather well written article on the subject and I would definitely recommend giving it a read if you are unsure about the differences between strong, weak, static and dynamic type systems. It details the differences much better than can be expected in a short answer, and has some very enlightening examples.
http://en.wikipedia.org/wiki/Type_system
Strong typing is the most common type model in modern programming languages. Those languages have one simple feature - knowing about type values in run time. We can say that strong typed languages prevent mixing operations between two or more different kind of types. Here is an example in Java:
String foo = "Hello, world!";
Object obj = foo;
String bar = (String) obj;
Date baz = (Date) obj; // This line will throw an error
The previous example will work perfectly well until program hit the last line of code where the ClassCastException is going to be thrown because Java is strong typed programming language.
When we talk about weak typed languages, Perl is one of them. The following example shows how Perl doesn't have any problems with mixing two different types.
$a = 10;
$b = "a";
$c = $a . $b;
print $c; # returns 10a
I hope you find this useful,
Thanks.
This article is a great read: http://blogs.perl.org/users/ovid/2010/08/what-to-know-before-debating-type-systems.html Cleared up a lot of things for me when researching trying to answer a similar question, hope others find it useful too.
Strong and Weak Typing:
Probably the most common way type systems are classified is "strong"
or "weak." This is unfortunate, since these words have nearly no
meaning at all. It is, to a limited extent, possible to compare two
languages with very similar type systems, and designate one as having
the stronger of those two systems. Beyond that, the words mean nothing
at all.
Static and Dynamic Types
This is very nearly the only common classification of type systems
that has real meaning. As a matter of fact, it's significance is
frequently under-estimated [...] Dynamic and static type systems are
two completely different things, whose goals happen to partially
overlap.
A static type system is a mechanism by which a compiler examines
source code and assigns labels (called "types") to pieces of the
syntax, and then uses them to infer something about the program's
behavior. A dynamic type system is a mechanism by which a compiler
generates code to keep track of the sort of data (coincidentally, also
called its "type") used by the program. The use of the same word
"type" in each of these two systems is, of course, not really entirely
coincidental; yet it is best understood as having a sort of weak
historical significance. Great confusion results from trying to find a
world view in which "type" really means the same thing in both
systems. It doesn't.
Explicit/Implicit Types:
When these terms are used, they refer to the extent to which a
compiler will reason about the static types of parts of a program. All
programming languages have some form of reasoning about types. Some
have more than others. ML and Haskell have implicit types, in that no
(or very few, depending on the language and extensions in use) type
declarations are needed. Java and Ada have very explicit types, and
one is constantly declaring the types of things. All of the above have
(relatively, compared to C and C++, for example) strong static type
systems.
Strong/weak typing in a language is related to how easily you can do type conversions:
For example in Python:
str = 5 + 'a'
# would throw an error since it does not want to cast one type to the other implicitly.
Where as in C language:
int a = 5;
a = 5 + 'c';
/* is fine, because C treats 'c' as an integer in this case */
Thus Python is more strongly typed than C (from this perspective).
May be this can help you to understand Strong and Weak Typing.......
Strong typing: It checks the type of variables as soon as possible, usually at compile time. It prevents mixing operations between mismatched types.
A strong-typed programming language is one in which:
All variables (or data types) are known at compile time
There is strict enforcement of typing rules (a String can't be used
where an Integer would be expected)
All exceptions to typing rules results in a compile time error
Weak Typing: While weak typing is delaying checking the types of the system as late as possible, usually to run-time. In this you can mix types without an explicit conversion.
A "weak-typed" programming language is simply one which is not strong-typed.
which is preferred depends on what you want. for scripts and good stuff you will usually want weak typing, because you want to write as much less code as possible. in big programs, strong typing can reduce errors at compile time.
Weak typing means that you don't specify what type a variable is, and strong typing means you give a strict type to each variable.
Each has its advantages, with weak typing (or dynamic typing, as it is often called), being more flexible and requiring less code from the programmer. Strong typing, on the other hand, requires more work from the developer, but in return it can alert you of many mistakes when compiling your code, before you run it. Dynamic typing may delay the discovery of these simple problems until the code is executed.
Depending on the task at hand, weak typing may be better than strong typing, or vice versa, but it is mostly a matter of taste. Weak typing is commonly used in scripting languages, while strong typing is used in most compiled languages.

Resources