I'm in a situation where I'm modifying an existing compiler written in OCaml. I've added locations to the AST of the compiled language, but it has cause a bunch of bugs, because equality checks that previously succeeded now fail when identical ASTs have a different location attached.
In particular, I'm seeing List.mem return false when it should return true, since it relies on equality.
I'm wondering, is there a way for me to specify that, for any two values of my location type, that = should always return true for any two values of this type?
It would be a ton of work to refactor the entire compiler to use a custom equality everywhere, particularly since many polymorphic functions rely on being able to use = on any type.
There's no existing OCaml mechanism to do what you want.
You can use ppx to write OCaml syntax extensions, and (as I understand it) the behavior can depend on types. So there's some chance you could get things working that way. But it wouldn't be as straightforward as what you're asking for. I suspect you would need to explicitly handle = and any standard functions (like List.mem) that use = implicitly. (Note that I have no experience with ppx.)
I found a description of PPX here: http://ocamllabs.io/doc/ppx.html
Many experienced OCaml programmers avoid the use of built-in polymorphic equality because its behavior is often surprising. So it might be worth converting to a custom comparison function after all.
What an annoying problem to have.
If you are desperate and willing to write a little C code you can change the representation of locations to Custom_tag blocks, which allow customising the behaviour of some of the polymorphic operations. It's a nasty solution, and I suggest you look hard for a better approach before resorting to this one.
One possibility is that most of the compiler does not use locations at all. If so, you might be able to get away with replacing every location in the AST with the same dummy location. That should allow equality to behave as if locations were not there at all. This is rather hacky, and may not be possible if passes later in the compiler make any use of location info.
The 'clean' solution is to define a sane equality operation for ASTs (or to derive one using ppx) and to change the code to use that. As you say, this would be a lot more work.
Can someone tell me what Strong typing and weak typing means and which one is better?
That'll be the theory answers taken care of, but the practice side seems to have been neglected...
Strong-typing means that you can't use one type of variable where another is expected (or have restrictions to doing so). Weak-typing means you can mix different types. In PHP for example, you can mix numbers and strings and PHP won't complain because it is a weakly-typed language.
$message = "You are visitor number ".$count;
If it was strongly typed, you'd have to convert $count from an integer to a string, usually with either with casting:
$message = "you are visitor number ".(string)$count;
...or a function:
$message = "you are visitor number ".strval($count);
As for which is better, that's subjective. Advocates of strong-typing will tell you that it will help you to avoid some bugs and/or errors and help communicate the purpose of a variable etc. They'll also tell you that advocates of weak-typing will call strong-typing "unnecessary language fluff that is rendered pointless by common sense", or something similar. As a card-carrying member of the weak-typing group, I'd have to say that they've got my number... but I have theirs too, and I can put it in a string :)
"Strong typing" and its opposite "weak typing" are rather weak in meaning, partly since the notion of what is considered to be "strong" can vary depending on whom you ask. E.g. C has been been called both "strongly typed" and "weakly typed" by different authors, it really depends on what you compare it to.
Generally a type system should be considered stronger if it can express the same constraints as another and more. Quite often two type systems are not be comparable, though -- one might have features the other lacks and vice versa. Any discussion of relative strengths is then up to personal taste.
Having a stronger type system means that either the compiler or the runtime will report more errors, which is usually a good thing, although it might come at the cost of having to provide more type information manually, which might be considered effort not worthwhile. I would claim "strong typing" is generally better, but you have to look at the cost.
It's also important to realize that "strongly typed" is often incorrectly used instead of "statically typed" or even "manifest typed". "Statically typed" means that there are type checks at compile-time, "manifest typed" means that the types are declared explicitly. Manifest-typing is probably the best known way of making a type system stronger (think Java), but you can add strength by other means such as type-inference.
I would like to reiterate that weak typing is not the same as dynamic typing.
This is a rather well written article on the subject and I would definitely recommend giving it a read if you are unsure about the differences between strong, weak, static and dynamic type systems. It details the differences much better than can be expected in a short answer, and has some very enlightening examples.
http://en.wikipedia.org/wiki/Type_system
Strong typing is the most common type model in modern programming languages. Those languages have one simple feature - knowing about type values in run time. We can say that strong typed languages prevent mixing operations between two or more different kind of types. Here is an example in Java:
String foo = "Hello, world!";
Object obj = foo;
String bar = (String) obj;
Date baz = (Date) obj; // This line will throw an error
The previous example will work perfectly well until program hit the last line of code where the ClassCastException is going to be thrown because Java is strong typed programming language.
When we talk about weak typed languages, Perl is one of them. The following example shows how Perl doesn't have any problems with mixing two different types.
$a = 10;
$b = "a";
$c = $a . $b;
print $c; # returns 10a
I hope you find this useful,
Thanks.
This article is a great read: http://blogs.perl.org/users/ovid/2010/08/what-to-know-before-debating-type-systems.html Cleared up a lot of things for me when researching trying to answer a similar question, hope others find it useful too.
Strong and Weak Typing:
Probably the most common way type systems are classified is "strong"
or "weak." This is unfortunate, since these words have nearly no
meaning at all. It is, to a limited extent, possible to compare two
languages with very similar type systems, and designate one as having
the stronger of those two systems. Beyond that, the words mean nothing
at all.
Static and Dynamic Types
This is very nearly the only common classification of type systems
that has real meaning. As a matter of fact, it's significance is
frequently under-estimated [...] Dynamic and static type systems are
two completely different things, whose goals happen to partially
overlap.
A static type system is a mechanism by which a compiler examines
source code and assigns labels (called "types") to pieces of the
syntax, and then uses them to infer something about the program's
behavior. A dynamic type system is a mechanism by which a compiler
generates code to keep track of the sort of data (coincidentally, also
called its "type") used by the program. The use of the same word
"type" in each of these two systems is, of course, not really entirely
coincidental; yet it is best understood as having a sort of weak
historical significance. Great confusion results from trying to find a
world view in which "type" really means the same thing in both
systems. It doesn't.
Explicit/Implicit Types:
When these terms are used, they refer to the extent to which a
compiler will reason about the static types of parts of a program. All
programming languages have some form of reasoning about types. Some
have more than others. ML and Haskell have implicit types, in that no
(or very few, depending on the language and extensions in use) type
declarations are needed. Java and Ada have very explicit types, and
one is constantly declaring the types of things. All of the above have
(relatively, compared to C and C++, for example) strong static type
systems.
Strong/weak typing in a language is related to how easily you can do type conversions:
For example in Python:
str = 5 + 'a'
# would throw an error since it does not want to cast one type to the other implicitly.
Where as in C language:
int a = 5;
a = 5 + 'c';
/* is fine, because C treats 'c' as an integer in this case */
Thus Python is more strongly typed than C (from this perspective).
May be this can help you to understand Strong and Weak Typing.......
Strong typing: It checks the type of variables as soon as possible, usually at compile time. It prevents mixing operations between mismatched types.
A strong-typed programming language is one in which:
All variables (or data types) are known at compile time
There is strict enforcement of typing rules (a String can't be used
where an Integer would be expected)
All exceptions to typing rules results in a compile time error
Weak Typing: While weak typing is delaying checking the types of the system as late as possible, usually to run-time. In this you can mix types without an explicit conversion.
A "weak-typed" programming language is simply one which is not strong-typed.
which is preferred depends on what you want. for scripts and good stuff you will usually want weak typing, because you want to write as much less code as possible. in big programs, strong typing can reduce errors at compile time.
Weak typing means that you don't specify what type a variable is, and strong typing means you give a strict type to each variable.
Each has its advantages, with weak typing (or dynamic typing, as it is often called), being more flexible and requiring less code from the programmer. Strong typing, on the other hand, requires more work from the developer, but in return it can alert you of many mistakes when compiling your code, before you run it. Dynamic typing may delay the discovery of these simple problems until the code is executed.
Depending on the task at hand, weak typing may be better than strong typing, or vice versa, but it is mostly a matter of taste. Weak typing is commonly used in scripting languages, while strong typing is used in most compiled languages.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Is it worth learning the convention or is it a bane to readability and maintainability?
Considering that most people that use Hungarian Notation is following the misunderstood version of it, I'd say it's pretty pointless.
If you want to use the original definition of it, it might make more sense, but other than that it is mostly syntactic sugar.
If you read the Wikipedia article on the subject, you'll find two conflicting notations, Systems Hungarian Notation and Apps Hungarian Notation.
The original, good, definition is the Apps Hungarian Notation, but most people use the Systems Hungarian Notation.
As an example of the two, consider prefixing variables with l for length, a for area and v for volume.
With such notation, the following expression makes sense:
int vBox = aBottom * lVerticalSide;
but this doesn't:
int aBottom = lSide1;
If you're mixing the prefixes, they're to be considered part of the equation, and volume = area * length is fine for a box, but copying a length value into an area variable should raise some red flags.
Unfortunately, the other notation is less useful, where people prefix the variable names with the type of the value, like this:
int iLength;
int iVolume;
int iArea;
some people use n for number, or i for integer, f for float, s for string etc.
The original prefix was meant to be used to spot problems in equations, but has somehow devolved into making the code slightly easier to read since you don't have to go look for the variable declaration. With todays smart editors where you can simply hover over any variable to find the full type, and not just an abbreviation for it, this type of hungarian notation has lost a lot of its meaning.
But, you should make up your own mind. All I can say is that I don't use either.
Edit Just to add a short notice, while I don't use Hungarian Notation, I do use a prefix, and it's the underscore. I prefix all private fields of classes with a _ and otherwise spell their names as I would a property, titlecase with the first letter uppercase.
The Hungarian Naming Convention can be useful when used correctly, unfortunately it tends to be misused more often than not.
Read Joel Spolsky's article Making Wrong Code Look Wrong for appropriate perspective and justification.
Essentially, type based Hungarian notation, where variables are prefixed with information about their type (e.g. whether an object is a string, a handle, an int, etc.) is mostly useless and generally just adds overhead with very little benefit. This, sadly, is the Hungarian notation most people are familiar with. However, the intent of Hungarian notation as envisioned is to add information on the "kind" of data the variable contains. This allows you to partition kinds of data from other kinds of data which shouldn't be allowed to be mixed together except, possibly, through some conversion process. For example, pixel based coordinates vs. coordinates in other units, or unsafe user input versus data from safe sources, etc.
Look at it this way, if you find yourself spelunking through code to find out information on a variable then you probably need to adjust your naming scheme to contain that information, this is the essence of the Hungarian convention.
Note that an alternative to Hungarian notation is to use more classes to show the intent of variable usage rather than relying on primitive types everywhere. For example, instead of having variable prefixes for unsafe user input, you can have simple string wrapper class for unsafe user input, and a separate wrapper class for safe data. This has the advantage, in strongly typed languages, of having partitioning enforced by the compiler (even in less strongly typed languages you can usually add your own tripwire code) but adds a not insignificant amount of overhead.
I still use Hungarian Notation when it comes to UI elements, where several UI elements are related to a particular object/value, e.g.,
lblFirstName for the label object, txtFirstName for the text box. I definitely can't name them both "FirstName" even if that is the concern/responsibility of both objects.
How do others approach naming UI elements?
I think hungarian notation is an interesting footnote along the 'path' to more readable code, and if done properly, is preferable to not-doing it.
In saying that though, I'd rather do away with it, and instead of this:
int vBox = aBottom * lVerticalSide;
write this:
int boxVolume = bottomArea * verticalHeight;
It's 2008. We don't have 80 character fixed width screens anymore!
Also, if you're writing variable names which are much longer than that you should be looking at refactoring into objects or functions anyway.
It is pointless (and distracting) but is in relatively heavy use at my company, at least for types like ints, strings, booleans, and doubles.
Things like sValue, iCount, dAmount or fAmount, and bFlag are everywhere.
Once upon a time there was a good reason for this convention. Now, it is a cancer.
Sorry to follow up with a question, but does prefixing interfaces with "I" qualify as hungarian notation? If that is the case, then yes, a lot of people are using it in the real world. If not, ignore this.
I see Hungarian Notation as a way to circumvent the capacity of our short term memories. According to psychologists, we can store approximately 7 plus-or-minus 2 chunks of information. The extra information added by including a prefix helps us by providing more details about the meaning of an identifier even with no other context. In other words, we can guess what a variable is for without seeing how it is used or declared. This can be avoided by applying oo techniques such as encapsulation and the single responsibility principle.
I'm unaware of whether or not this has been studied empirically. I would hypothesize that the amount of effort increases dramatically when we try to understand classes with more than nine instance variables or methods with more than 9 local variables.
When I see Hungarian discussion, I'm glad to see people thinking hard about how to make their code clearer, and how to mistakes more visible. That's exactly what we should all be doing!
But don't forget that you have some powerful tools at your disposal besides naming.
Extract Method If your methods are getting so long that your variable declarations have scrolled off the top of the screen, consider making your methods smaller. (If you have too many methods, consider a new class.)
Strong typing If you find that you are taking zip codes stored in an integer variable and assigning them to a shoe size integer variable, consider making a class for zip codes and a class for shoe size. Then your bug will be caught at compile time, instead of requiring careful inspection by a human. When I do this, I usually find a bunch of zip code- and shoe size-specific logic that I've peppered around my code, which I can then move in to my new classes. Suddenly all my code gets clearer, simpler, and protected from certain classes of bugs. Wow.
To sum up: yes, think hard about how you use names in code to express your ideas clearly, but also look to the other powerful OO tools you can call on.
Isn't scope more important than type these days, e.g.
l for local
a for argument
m for member
g for global
etc
With modern techniques of refactoring old code, search and replace of a symbol because you changed its type is tedious, the compiler will catch type changes, but often will not catch incorrect use of scope, sensible naming conventions help here.
I don't use a very strict sense of hungarian notation, but I do find myself using it sparing for some common custom objects to help identify them, and also I tend to prefix gui control objects with the type of control that they are. For example, labelFirstName, textFirstName, and buttonSubmit.
I use Hungarian Naming for UI elements like buttons, textboxes and lables. The main benefit is grouping in the Visual Studio Intellisense Popup. If I want to access my lables, I simply start typing lbl.... and Visual Studio will suggest all my lables, nicley grouped together.
However, after doing more and more Silverlight and WPF stuff, leveraging data binding, I don't even name all my controls anymore, since I don't have to reference them from code-behind (since there really isn't any codebehind anymore ;)
What's wrong is mixing standards.
What's right is making sure that everyone does the same thing.
int Box = iBottom * nVerticleSide
The original prefix was meant to be
used to spot problems in equations,
but has somehow devolved into making
the code slightly easier to read since
you don't have to go look for the
variable declaration. With todays
smart editors where you can simply
hover over any variable to find the
full type, and not just an
abbreviation for it, this type of
hungarian notation has lost a lot of
its meaning.
I'm breaking the habit a little bit but prefixing with the type can be useful in JavaScript that doesn't have strong variable typing.
When using a dynamically typed language, I occasionally use Apps Hungarian. For statically typed languages I don't. See my explanation in the other thread.
Hungarian notation is pointless in type-safe languages. e.g. A common prefix you will see in old Microsoft code is "lpsz" which means "long pointer to a zero-terminated string". Since the early 1700's we haven't used segmented architectures where short and long pointers exist, the normal string representation in C++ is always zero-terminated, and the compiler is type-safe so won't let us apply non-string operations to the string. Therefore none of this information is of any real use to a programmer - it's just more typing.
However, I use a similar idea: prefixes that clarify the usage of a variable.
The main ones are:
m = member
c = const
s = static
v = volatile
p = pointer (and pp=pointer to pointer, etc)
i = index or iterator
These can be combined, so a static member variable which is a pointer would be "mspName".
Where are these useful?
Where the usage is important, it is a good idea to constantly remind the programmer that a variable is (e.g.) a volatile or a pointer
Pointer dereferencing used to do my head in until I used the p prefix. Now it's really easy to know when you have an object (Orange) a pointer to an object (pOrange) or a pointer to a pointer to an object (ppOrange). To dereference an object, just put an asterisk in front of it for each p in its name. Case solved, no more deref bugs!
In constructors I usually find that a parameter name is identical to a member variable's name (e.g. size). I prefer to use "mSize = size;" than "size = theSize" or "this.size = size". It is also much safer: I don't accidentally use "size = 1" (setting the parameter) when I meant to say "mSize = 1" (setting the member)
In loops, my iterator variables are all meaningful names. Most programmers use "i" or "index" and then have to make up new meaningless names ("j", "index2") when they want an inner loop. I use a meaningful name with an i prefix (iHospital, iWard, iPatient) so I always know what an iterator is iterating.
In loops, you can mix several related variables by using the same base name with different prefixes: Orange orange = pOrange[iOrange]; This also means you don't make array indexing errors (pApple[i] looks ok, but write it as pApple[iOrange] and the error is immediately obvious).
Many programmers will use my system without knowing it: by add a lengthy suffix like "Index" or "Ptr" - there isn't any good reason to use a longer form than a single character IMHO, so I use "i" and "p". Less typing, more consistent, easier to read.
This is a simple system which adds meaningful and useful information to code, and eliminates the possibility of many simple but common programming mistakes.
I've been working for IBM for the past 6 months and I haven't seen it anywhere (thank god because I hate it.) I see either camelCase or c_style.
thisMethodIsPrettyCool()
this_method_is_pretty_cool()
It depends on your language and environment. As a rule I wouldn't use it, unless the development environment you're in makes it hard to find the type of the variable.
There's also two different types of Hungarian notation. See Joel's article. I can't find it (his names don't exactly make them easy to find), anyone have a link to the one I mean?
Edit: Wedge has the article I mean in his post.
Original form (The Right Hungarian Notation :) ) where prefix means type (i.e. length, quantity) of value stored by variable is OK, but not necessary in all type of applications.
The popular form (The Wrong Hungarian Notation) where prefix means type (String, int) is useless in most of modern programming languages.
Especially with meaningless names like strA. I can't understand we people use meaningless names with long prefixes which gives nothing.
I use type based (Systems HN) for components (eg editFirstName, lblStatus etc) as it makes autocomplete work better.
I sometimes use App HN for variables where the type infomation is isufficient. Ie fpX indicates a fixed pointed variable (int type, but can't be mixed and matched with an int), rawInput for user strings that haven't been validated etc
Being a PHP programmer where it's very loosely typed, I don't make a point to use it. However I will occasionally identify something as an array or as an object depending on the size of the system and the scope of the variable.