Google Closure munging efficiency - google-closure-compiler

I have a JavaScript file that I ran through the Google Closure compiler, and it munged the name on to za, which is obviously not more efficient from a length standpoint. Is there some reason it is, in fact, more efficient (possibly because of the letters used), or is it just a couple random letters?
For instance, I can imagine that za might be more compressible than on when Gzipped, but there's a good chance it's just a couple of random letters.

After playing around with http://closure-compiler.appspot.com/home some, it seems that the compiler just start renaming the variable regardless of their previous name. It stays at 'a' and makes it way through 'b', etc... It does not appear to care about current length of the variable name, but it will make sure the overall size is the same or smaller than before. In some cases variable may be refactored out of the code if you use the advanced optimization setting.
By letting the compiler handle the naming of your variables it is sure that none of them conflict with each other. If it tried to keep your existing variable name it would have to keep track of that separately and maintain a list to compare against.
Another benefit of minifying code is to obfuscate your code base. Sure someone can figure it out if they really wanted to, but by making the variable names programatic it makes it harder to figure out the meaning.
If you really wanted to debug through combined & minified code I would recommend looking into the source maps feature of Google Chrome Canary http://www.html5rocks.com/en/tutorials/developertools/sourcemaps/

Is there some reason it is, in fact, more efficient (possibly because of the letters used)
No.
or is it just a couple random letters?
Pretty much. AFAIK, Closure Compiler (and other JS minifiers) systematically renames variables, with the goal of using fewer characters overall, without looking at what they are already named.

Related

Goto is considered harmful, but did anyone attempt to make code using goto re-usable and maintainable?

Everyone is aware of Dijkstra's Letters to the editor: go to statement considered harmful (also here .html transcript and here .pdf). I was wondering is anyone attempted to find a way to make code using goto's re-usable and maintainable and not-harmful by adding any other language extensions or developing a language which allows for gotos.
The reason I ask the question is that it occurs to me that code written in Assembly language often used goto's and global variables to make the program work well within a limited space. The Atari 2600 which had 128 bytes of ram and the program was loaded from ROM cartridge. In this case, it was better to use unstructured programming and to make the most of the freedoms this allows to make the most of a very limited space for the program.
When you compare this with a game programmed today without the use of gotos, the game takes up much more space.
Then it occurs to me that perhaps its possible to program with the use of gotos if some rules or other language changes are made to support this, then the negative effects of gotos could be reduced or eliminated. Has anyone tried to find a way to make goto's NOT considered harmful by creating a language or some rules to follow which allow gotos to be not harmful.
If no-one looked for a way to use gotos in a non-harmful way then perhaps we adopted structured programming un-necessarily based solely on this paper? Perhaps there is another solution which allows for the use of gotos without the down-side.
Comparing gotos to structured programming is comparing a situation where the programmer has to remember what every labels in the code actually mean and do, and where there are, to a situation where the conditional branches are explicitly described.
As of the advantages of the goto statement regarding the place a program might take, I think that games today are big because of the graphic and sound resources they use. That is, show 1,000,000 polygons. The cost of a goto compared to that is totally neglectable.
Moreover, the structural statements are ultimately compiled into goto ("jmp") statements by the compiler when outputting assembly.
To answer the question, it might be possible to make goto less harmful by creating naming and syntax conventions. Enforcing these conventions into rules is however pretty much what structural programming does.
Linus Torvald argued once that goto can make source code clearer, but goto is useful in so very special cases that I would not dare use it as a programmer.
This question is somehow related to yours, since I think this one of the most common situations where a goto is needed.

Qt's QList: which is the canonical form of getting the number of items in a list?

Qt's QList class provides several methods for getting the number of items in the list - count, length, and size. As we all know, consistency is important, so which should be the canonical/preferred method to use of those 3?
I agree with everything #Cogwheel said, but in all honesty I would just pick one and stick with it. I think good style would dictate that if "size" sounds the best to you, then use "size" everywhere...don't alternate between "count", "length" and "size" haphazardly. That will lead to potential confusion or a lot of unnecessary trips to documentation pages.
You could try to come up with some other kind of rationale, but the language itself isn't even consistent. All the STL containers (e.g. list, vector) only provide "size", the string class provides "size" and "length", etc.
Pick your favorite (or if you have multiple developers, you should all agree on a favorite) and just stick with it.
The consistency you should work towards is within your project. You're not really going to gain anything by trying to be consistent with everyone else, unless there's some way they'd be incompatible.
That being said, there are subtle semantic differences (in English, not C++) between the names of the functions. If you can make your code clearer by taking advantage of the differences, then consistency may actually work against you.
IMHO, Any one of those. Since even different developers follow different functions within your project, the function names are quite self documented, in the sense that other developers can easily understand what the function is meant to.
I usually go with "size." Ultimately, it is a bit arbitrary, but Qt containers and STL containers both generally have a size, so it is easy to stay consistent between the two types. It's also the shortest. Whenever several solutions are basically equivalent, I always go with whichever results in the least amount of typing. It's a simple rule of thumb, so everybody on the project can usually remember it.

Abstraction or not?

The other day i stumbled onto a rather old usenet post by Linus Torwalds. It is the infamous "You are full of bull****" post when he defends his choice of using plain C for Git over something more modern.
In particular this post made me think about the enormous amount of abstraction layers that accumulate one over the other where I work. Mine is a Windows .Net environment. I must say that I like C# and the .Net environment, it really makes most things easy.
Now, I come from a very different background made of Unix technologies like C and a plethora or scripting languages; to me, also, OOP is just one, and not always the best, programming paradigm.. I often struggle (in a working kind of way, of course!) with my colleagues (one in particular), because they appear to be of the "any problem can be solved with an additional level of abstraction" church, while I'm more of the "keeping it simple" school. I think that there is a very different mental approach to the problems that maybe comes from the exposure to different cultures.
As a very simple example, for the first project I did here I needed some configuration for an application. I made a 10 rows class to load and parse a txt file to be located in the program's root dir containing colon separated key / value pairs, one per row. It worked.
In the end, to standardize the approach to the configuration problem, we now have a library to be located on every machine running each configured program that calls a service that, at startup, loads up an xml that contains the references to other xmls, one per application, that contain the configurations themselves.
Now, it is extensible and made up of fancy reusable abstractions, providers and all, but I still think that, if we one day really happen to reuse part of it, with the time taken to make it up, we can make the needed code from start or copy / past the old code and modify it.
What are your thoughts about it? Can you point out some interesting reference dealing with the problem?
Thanks
Abstraction makes it easier to construct software and understand how it is put together, but it complicates fully understanding certain issues around performance and security, because the abstraction layers introduce certain kinds of complexity.
Torvalds' position is not absurd, but he is an extremist.
Simple answer: programming languages provide data structures and ways to combine them. Use these directly at first, do not abstract. If you find you have representation invariants to maintain that are at a high risk of being broken due to a large number of usage sites possibly outside your control, then consider abstraction.
To implement this, first provide functions and convert the call sites to use them without hiding the representation. Hide the data representation only when you're satisfied your functional representation is sufficient. Make sure at this time to document the invariant being protected.
An "extreme programming" version of this: do not abstract until you have test cases that break your program. If you think the invariant can be breached, write the case that breaks it first.
Here's a similar question: https://stackoverflow.com/questions/1992279/abstraction-in-todays-languages-excited-or-sad.
I agree with #Steve Emmerson - 'Coders at Work' would give you some excellent perspective on this issue.

Best way to incorporate spell checkers with a build process

I try to externalize all strings (and other constants) used in any application I write, for many reasons that are probably second-nature to most stack-overflowers, but one thing I would like to have is the ability to automate spell checking of any user-visible strings. This poses a couple problems:
Not all strings are user-visible, and it's non-trivial to spearate them, and keep that separation in place (but it is possible)
Most, if not all, string externalization methods I've used involve significant text that will not pass a spell checker such as aspell/ispell (eg: theStrName="some string." and comments)
Many spellcheckers (once again, aspell/ispell) don't handle many words out of the box (generally technical terms, proper nouns, or just 'new' terminology, like metadata).
How do you incorporate something like this into your build procedures/test suites? It is not feasible to have someone manually spell check all the strings in an application each time they are changed -- and there is no chance that they will all be spelled correctly the first time.
We do it manually, if errors aren't picked up during testing then they're picked up by the QA team, or during localization by the translators, or during localization QA. Then we lodge a bug.
Most of our developers are not native English speakers, so it's not an uncommon problem for us. The number that slip through the cracks is so small that this is a satisfactory solution for us.
Nothing over a few hundred lines is ever 100% bug-free (well... maybe the odd piece of embedded code), just think of spelling mistakes as bugs and don't waste too much time on it.
As soon as your application matures, over 90% of strings won't change between releases and it would be a reasonably trivial exercise to compare two versions of your resources, figure out what'ts new (check them first), what's changed/updated (check next) and what hasn't changed (no need to check these)
So think of it more like I need to check ALL of these manually the first time, and I'm only going to have to check 10% of them next time. Now ask yourself if you still really need to automate spell checking.
I can think of two ways to approach this semi-automatically:
Have the compiler help you differentiate between strings used in the UI and strings used elsewhere. Overload different variants of the string datatype depending on it's purpose, and overload the output methods to only accept that type - that way you can create a fake UI that just outputs the UI strings, and do the spell checking on that.
If this is doable of course depends on the platform and the overall architecture of the application.
Another approach could be to simply update the spell checkers database with all the strings that appear in the code - comments, xpaths, table names, you name it - and regard them as perfectly cromulent. This will of course reduce the precision of the spell checking.
First thing, regarding string externalization - GNU GetText (if used properly) creates string files that are contain almost no text other then the actual content of the strings (there are some headers but its easy to cause a spell checker to ignore them).
Second thing, what I would do is to run the spell checker in a continuous integration environment and have the errors fed externally, probably through a web interface but email will also work. Developers can then review the errors and either fix them in the code or use some easy interface to let the spell check know that a misspelling should be ignored (a web interface can integrate both the error view and the spell checker interface).
If you're using java and are storing your localized strings in resource bundles then you could check the Bundle.properties files and validate the bundle strings. You could also add a special comment annotation that your processor could use to determine if an entry should be skipped.
This method will allow you to give a hint as to the locale and provide a way of checking multiple languages within the one build process.
I can't answer how you would perform the actual spell checking itself, though I think what I've presented will guid you as for the method of performing the spell checking.
Use aspell. It's a programme, it's available for unixoids and cygwin, it can be run over lots of kinds of source code. Use it.
First point, please don't put it into you build process. I would be a vengeful coder if I (meaning my computer) had to spell check all the content on the site every time I tried to debug or build a new feature. I don't even think this kind of operation belongs as a unit test (you're testing a human interface, not a computerised one).
Second point, don't write a script. You're going to have so many false positives fall through the cracks that people will stop reading the reports and you are no better off than when you started.
Third point, this is probably most easily solved by having humans do it: QA team, copy writers, beta testers, translators, etc. All the big sites with internationalised content that I've built had the same process: we took the copy from the copy writers, sent it to the translating service/agency, put it into the persistence layer, and deployed it. Testers (QA, developers, PMs, designers, etc.) would find spelling or grammatical mistakes and lodge bug reports. There is just too much red tape and pairs of eyes for that many spelling/grammar errors to slip through.
Fourth point, there will always be spelling and grammar mistakes on your page. Even major newspaper web sites haven't gotten around this and they have whole office buildings filled with editors.

Do people use the Hungarian Naming Conventions in the real world? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Is it worth learning the convention or is it a bane to readability and maintainability?
Considering that most people that use Hungarian Notation is following the misunderstood version of it, I'd say it's pretty pointless.
If you want to use the original definition of it, it might make more sense, but other than that it is mostly syntactic sugar.
If you read the Wikipedia article on the subject, you'll find two conflicting notations, Systems Hungarian Notation and Apps Hungarian Notation.
The original, good, definition is the Apps Hungarian Notation, but most people use the Systems Hungarian Notation.
As an example of the two, consider prefixing variables with l for length, a for area and v for volume.
With such notation, the following expression makes sense:
int vBox = aBottom * lVerticalSide;
but this doesn't:
int aBottom = lSide1;
If you're mixing the prefixes, they're to be considered part of the equation, and volume = area * length is fine for a box, but copying a length value into an area variable should raise some red flags.
Unfortunately, the other notation is less useful, where people prefix the variable names with the type of the value, like this:
int iLength;
int iVolume;
int iArea;
some people use n for number, or i for integer, f for float, s for string etc.
The original prefix was meant to be used to spot problems in equations, but has somehow devolved into making the code slightly easier to read since you don't have to go look for the variable declaration. With todays smart editors where you can simply hover over any variable to find the full type, and not just an abbreviation for it, this type of hungarian notation has lost a lot of its meaning.
But, you should make up your own mind. All I can say is that I don't use either.
Edit Just to add a short notice, while I don't use Hungarian Notation, I do use a prefix, and it's the underscore. I prefix all private fields of classes with a _ and otherwise spell their names as I would a property, titlecase with the first letter uppercase.
The Hungarian Naming Convention can be useful when used correctly, unfortunately it tends to be misused more often than not.
Read Joel Spolsky's article Making Wrong Code Look Wrong for appropriate perspective and justification.
Essentially, type based Hungarian notation, where variables are prefixed with information about their type (e.g. whether an object is a string, a handle, an int, etc.) is mostly useless and generally just adds overhead with very little benefit. This, sadly, is the Hungarian notation most people are familiar with. However, the intent of Hungarian notation as envisioned is to add information on the "kind" of data the variable contains. This allows you to partition kinds of data from other kinds of data which shouldn't be allowed to be mixed together except, possibly, through some conversion process. For example, pixel based coordinates vs. coordinates in other units, or unsafe user input versus data from safe sources, etc.
Look at it this way, if you find yourself spelunking through code to find out information on a variable then you probably need to adjust your naming scheme to contain that information, this is the essence of the Hungarian convention.
Note that an alternative to Hungarian notation is to use more classes to show the intent of variable usage rather than relying on primitive types everywhere. For example, instead of having variable prefixes for unsafe user input, you can have simple string wrapper class for unsafe user input, and a separate wrapper class for safe data. This has the advantage, in strongly typed languages, of having partitioning enforced by the compiler (even in less strongly typed languages you can usually add your own tripwire code) but adds a not insignificant amount of overhead.
I still use Hungarian Notation when it comes to UI elements, where several UI elements are related to a particular object/value, e.g.,
lblFirstName for the label object, txtFirstName for the text box. I definitely can't name them both "FirstName" even if that is the concern/responsibility of both objects.
How do others approach naming UI elements?
I think hungarian notation is an interesting footnote along the 'path' to more readable code, and if done properly, is preferable to not-doing it.
In saying that though, I'd rather do away with it, and instead of this:
int vBox = aBottom * lVerticalSide;
write this:
int boxVolume = bottomArea * verticalHeight;
It's 2008. We don't have 80 character fixed width screens anymore!
Also, if you're writing variable names which are much longer than that you should be looking at refactoring into objects or functions anyway.
It is pointless (and distracting) but is in relatively heavy use at my company, at least for types like ints, strings, booleans, and doubles.
Things like sValue, iCount, dAmount or fAmount, and bFlag are everywhere.
Once upon a time there was a good reason for this convention. Now, it is a cancer.
Sorry to follow up with a question, but does prefixing interfaces with "I" qualify as hungarian notation? If that is the case, then yes, a lot of people are using it in the real world. If not, ignore this.
I see Hungarian Notation as a way to circumvent the capacity of our short term memories. According to psychologists, we can store approximately 7 plus-or-minus 2 chunks of information. The extra information added by including a prefix helps us by providing more details about the meaning of an identifier even with no other context. In other words, we can guess what a variable is for without seeing how it is used or declared. This can be avoided by applying oo techniques such as encapsulation and the single responsibility principle.
I'm unaware of whether or not this has been studied empirically. I would hypothesize that the amount of effort increases dramatically when we try to understand classes with more than nine instance variables or methods with more than 9 local variables.
When I see Hungarian discussion, I'm glad to see people thinking hard about how to make their code clearer, and how to mistakes more visible. That's exactly what we should all be doing!
But don't forget that you have some powerful tools at your disposal besides naming.
Extract Method If your methods are getting so long that your variable declarations have scrolled off the top of the screen, consider making your methods smaller. (If you have too many methods, consider a new class.)
Strong typing If you find that you are taking zip codes stored in an integer variable and assigning them to a shoe size integer variable, consider making a class for zip codes and a class for shoe size. Then your bug will be caught at compile time, instead of requiring careful inspection by a human. When I do this, I usually find a bunch of zip code- and shoe size-specific logic that I've peppered around my code, which I can then move in to my new classes. Suddenly all my code gets clearer, simpler, and protected from certain classes of bugs. Wow.
To sum up: yes, think hard about how you use names in code to express your ideas clearly, but also look to the other powerful OO tools you can call on.
Isn't scope more important than type these days, e.g.
l for local
a for argument
m for member
g for global
etc
With modern techniques of refactoring old code, search and replace of a symbol because you changed its type is tedious, the compiler will catch type changes, but often will not catch incorrect use of scope, sensible naming conventions help here.
I don't use a very strict sense of hungarian notation, but I do find myself using it sparing for some common custom objects to help identify them, and also I tend to prefix gui control objects with the type of control that they are. For example, labelFirstName, textFirstName, and buttonSubmit.
I use Hungarian Naming for UI elements like buttons, textboxes and lables. The main benefit is grouping in the Visual Studio Intellisense Popup. If I want to access my lables, I simply start typing lbl.... and Visual Studio will suggest all my lables, nicley grouped together.
However, after doing more and more Silverlight and WPF stuff, leveraging data binding, I don't even name all my controls anymore, since I don't have to reference them from code-behind (since there really isn't any codebehind anymore ;)
What's wrong is mixing standards.
What's right is making sure that everyone does the same thing.
int Box = iBottom * nVerticleSide
The original prefix was meant to be
used to spot problems in equations,
but has somehow devolved into making
the code slightly easier to read since
you don't have to go look for the
variable declaration. With todays
smart editors where you can simply
hover over any variable to find the
full type, and not just an
abbreviation for it, this type of
hungarian notation has lost a lot of
its meaning.
I'm breaking the habit a little bit but prefixing with the type can be useful in JavaScript that doesn't have strong variable typing.
When using a dynamically typed language, I occasionally use Apps Hungarian. For statically typed languages I don't. See my explanation in the other thread.
Hungarian notation is pointless in type-safe languages. e.g. A common prefix you will see in old Microsoft code is "lpsz" which means "long pointer to a zero-terminated string". Since the early 1700's we haven't used segmented architectures where short and long pointers exist, the normal string representation in C++ is always zero-terminated, and the compiler is type-safe so won't let us apply non-string operations to the string. Therefore none of this information is of any real use to a programmer - it's just more typing.
However, I use a similar idea: prefixes that clarify the usage of a variable.
The main ones are:
m = member
c = const
s = static
v = volatile
p = pointer (and pp=pointer to pointer, etc)
i = index or iterator
These can be combined, so a static member variable which is a pointer would be "mspName".
Where are these useful?
Where the usage is important, it is a good idea to constantly remind the programmer that a variable is (e.g.) a volatile or a pointer
Pointer dereferencing used to do my head in until I used the p prefix. Now it's really easy to know when you have an object (Orange) a pointer to an object (pOrange) or a pointer to a pointer to an object (ppOrange). To dereference an object, just put an asterisk in front of it for each p in its name. Case solved, no more deref bugs!
In constructors I usually find that a parameter name is identical to a member variable's name (e.g. size). I prefer to use "mSize = size;" than "size = theSize" or "this.size = size". It is also much safer: I don't accidentally use "size = 1" (setting the parameter) when I meant to say "mSize = 1" (setting the member)
In loops, my iterator variables are all meaningful names. Most programmers use "i" or "index" and then have to make up new meaningless names ("j", "index2") when they want an inner loop. I use a meaningful name with an i prefix (iHospital, iWard, iPatient) so I always know what an iterator is iterating.
In loops, you can mix several related variables by using the same base name with different prefixes: Orange orange = pOrange[iOrange]; This also means you don't make array indexing errors (pApple[i] looks ok, but write it as pApple[iOrange] and the error is immediately obvious).
Many programmers will use my system without knowing it: by add a lengthy suffix like "Index" or "Ptr" - there isn't any good reason to use a longer form than a single character IMHO, so I use "i" and "p". Less typing, more consistent, easier to read.
This is a simple system which adds meaningful and useful information to code, and eliminates the possibility of many simple but common programming mistakes.
I've been working for IBM for the past 6 months and I haven't seen it anywhere (thank god because I hate it.) I see either camelCase or c_style.
thisMethodIsPrettyCool()
this_method_is_pretty_cool()
It depends on your language and environment. As a rule I wouldn't use it, unless the development environment you're in makes it hard to find the type of the variable.
There's also two different types of Hungarian notation. See Joel's article. I can't find it (his names don't exactly make them easy to find), anyone have a link to the one I mean?
Edit: Wedge has the article I mean in his post.
Original form (The Right Hungarian Notation :) ) where prefix means type (i.e. length, quantity) of value stored by variable is OK, but not necessary in all type of applications.
The popular form (The Wrong Hungarian Notation) where prefix means type (String, int) is useless in most of modern programming languages.
Especially with meaningless names like strA. I can't understand we people use meaningless names with long prefixes which gives nothing.
I use type based (Systems HN) for components (eg editFirstName, lblStatus etc) as it makes autocomplete work better.
I sometimes use App HN for variables where the type infomation is isufficient. Ie fpX indicates a fixed pointed variable (int type, but can't be mixed and matched with an int), rawInput for user strings that haven't been validated etc
Being a PHP programmer where it's very loosely typed, I don't make a point to use it. However I will occasionally identify something as an array or as an object depending on the size of the system and the scope of the variable.

Resources