Best way to incorporate spell checkers with a build process - build-process

I try to externalize all strings (and other constants) used in any application I write, for many reasons that are probably second-nature to most stack-overflowers, but one thing I would like to have is the ability to automate spell checking of any user-visible strings. This poses a couple problems:
Not all strings are user-visible, and it's non-trivial to spearate them, and keep that separation in place (but it is possible)
Most, if not all, string externalization methods I've used involve significant text that will not pass a spell checker such as aspell/ispell (eg: theStrName="some string." and comments)
Many spellcheckers (once again, aspell/ispell) don't handle many words out of the box (generally technical terms, proper nouns, or just 'new' terminology, like metadata).
How do you incorporate something like this into your build procedures/test suites? It is not feasible to have someone manually spell check all the strings in an application each time they are changed -- and there is no chance that they will all be spelled correctly the first time.

We do it manually, if errors aren't picked up during testing then they're picked up by the QA team, or during localization by the translators, or during localization QA. Then we lodge a bug.
Most of our developers are not native English speakers, so it's not an uncommon problem for us. The number that slip through the cracks is so small that this is a satisfactory solution for us.
Nothing over a few hundred lines is ever 100% bug-free (well... maybe the odd piece of embedded code), just think of spelling mistakes as bugs and don't waste too much time on it.
As soon as your application matures, over 90% of strings won't change between releases and it would be a reasonably trivial exercise to compare two versions of your resources, figure out what'ts new (check them first), what's changed/updated (check next) and what hasn't changed (no need to check these)
So think of it more like I need to check ALL of these manually the first time, and I'm only going to have to check 10% of them next time. Now ask yourself if you still really need to automate spell checking.

I can think of two ways to approach this semi-automatically:
Have the compiler help you differentiate between strings used in the UI and strings used elsewhere. Overload different variants of the string datatype depending on it's purpose, and overload the output methods to only accept that type - that way you can create a fake UI that just outputs the UI strings, and do the spell checking on that.
If this is doable of course depends on the platform and the overall architecture of the application.
Another approach could be to simply update the spell checkers database with all the strings that appear in the code - comments, xpaths, table names, you name it - and regard them as perfectly cromulent. This will of course reduce the precision of the spell checking.

First thing, regarding string externalization - GNU GetText (if used properly) creates string files that are contain almost no text other then the actual content of the strings (there are some headers but its easy to cause a spell checker to ignore them).
Second thing, what I would do is to run the spell checker in a continuous integration environment and have the errors fed externally, probably through a web interface but email will also work. Developers can then review the errors and either fix them in the code or use some easy interface to let the spell check know that a misspelling should be ignored (a web interface can integrate both the error view and the spell checker interface).

If you're using java and are storing your localized strings in resource bundles then you could check the Bundle.properties files and validate the bundle strings. You could also add a special comment annotation that your processor could use to determine if an entry should be skipped.
This method will allow you to give a hint as to the locale and provide a way of checking multiple languages within the one build process.
I can't answer how you would perform the actual spell checking itself, though I think what I've presented will guid you as for the method of performing the spell checking.

Use aspell. It's a programme, it's available for unixoids and cygwin, it can be run over lots of kinds of source code. Use it.

First point, please don't put it into you build process. I would be a vengeful coder if I (meaning my computer) had to spell check all the content on the site every time I tried to debug or build a new feature. I don't even think this kind of operation belongs as a unit test (you're testing a human interface, not a computerised one).
Second point, don't write a script. You're going to have so many false positives fall through the cracks that people will stop reading the reports and you are no better off than when you started.
Third point, this is probably most easily solved by having humans do it: QA team, copy writers, beta testers, translators, etc. All the big sites with internationalised content that I've built had the same process: we took the copy from the copy writers, sent it to the translating service/agency, put it into the persistence layer, and deployed it. Testers (QA, developers, PMs, designers, etc.) would find spelling or grammatical mistakes and lodge bug reports. There is just too much red tape and pairs of eyes for that many spelling/grammar errors to slip through.
Fourth point, there will always be spelling and grammar mistakes on your page. Even major newspaper web sites haven't gotten around this and they have whole office buildings filled with editors.

Related

What draws the line between reflective programming & non-reflective programming like simple softcoding?

Not sure if this is the right place to bring up this kind of discussion but, Im reading https://en.wikipedia.org/wiki/Reflective_programming and feel I need a bit further clarification for where the line between "reflective" & non-reflective programming really goes. Theres a series of examples of reflective v non-reflective code towards the end of the wikipedia page where all the "reflective" examples seem to access data with string identifiers - but what would actually differentiate this from say putting a bunch of objects in a collection/array of some sort and accessing them by an index - say compared to accessing them by an array of string identifiers that you can use to fetch the desired object?
In some languages you can clearly see the difference & benefit, like in Python & JS they have the eval method that lets them insert all sorts of code at runtime that can be pretty much endlessly complex and completely change the code flow of an application - and no longer limited to accessing mere special type objects. But in the examples listed on the wiki page you can also find examples where the "reflection" seems limited only to accessing specially declared objects by there name (at which point Im questioning if you can really argue that the program really can be considered to be "modifying" itself at all, at least on a high level in conceptual point of view).
Does the way that the underlying machinery producd by the compiler (or the way that the interpreter reads your code) affect whats considered to be reflective?
Is the ability of redefining the contents of existing objects or declaring new objects without a "base class"/preexisting structure created at compile time that differentiates reflective & non-reflective code? If so, how would this play with the examples at the wikipedia page that doesnt seem to showcase this ability?
Can the meaning of "reflective programming" vary slightly depending on the scenario?
Any thoughts appreciated <3

Why we need to compile the program of progress 4GL?

I would like to know Why we need to compile the program of progress 4GL? Really what is happening behind there? Why we are getting .r file after compiled the program? When we check the syntax if its correct then we will get one message box 'Syntax is correct' how its finding the errors and showing the messages.Any explanations welcome and appreciated.
Benefits of compiled r-code include:
Syntax checking
Faster execution (r-code executes faster)
Security (r-code is not "human readable" and tampering with it will likely be noticed)
Licensing (r-code runtime licenses are much less expensive)
For "how its finding the errors and showing the messages" -- at a high level it is like any compiler. It evaluates the provided source against a syntax tree and lets you know when you violate the rules. Compiler design and construction is a fairly advanced topic that probably isn't going to fit into a simple SO question -- but if you had something more specific that could stand on its own as a question someone might be able to help.
The short answer is that when you compile, you're translating your program to a language the machine understands. You're asking two different questions here, so let me give you a simple answer to the first: you don't NEED to compile if you're the only one using the program, for example. But in order to have your program optimized (since it's already at the machine language level) and guarantee no one is messing with your logic, we compile the code and usually don't allow regular users to access the source code.
The second question, how does the syntax checker work, I believe it would be better for you to Google and choose some articles to read about compilers. They're complex, but in a nutshell what they do is take what Progress expects as full, operational commands, and compare to what you do. For example, if you do a
Find first customer where customer.active = yes no-error.
Progress will check if customer is a table, if customer.active is a field in that table, if it's the logical type, since you are filtering if it is yes, and if your whole conditions can be translated to one single true or false Boolean value. It goes on to check if you specified a lock (and default to shared if you haven't, like in my example, which is a no-no, by the way), what happens if there are multiple records (since I said first, then get just the first one) and finally what happens if it fails. If you check the find statement, there are more options to customize it, and the compiler will simply compare your use of the statement to what Progress can have for it. And collect all errors if it can't. That's why sometimes compilers will give you generic messages. Since they don't know what you're trying to do, all they can do is tell you what's basically wrong with what you wrote.
Hope this helps you understand.

How should multi-part Codecept.js scenarios be organized?

What is the preferred (or just a good) pattern for a multi-part Codecept.js scenario, such as this:
Select file to upload.
Clear selection.
Select file to upload after having cleared selection.
I can do this in a single scenario and use I.say to delineate the parts, but it feels like I should be able to write these as independent scenarios and use .only on part 2, for example, and have part 1 run prior to part 2, because it depends on it.
I would also want to skip parts 2 and 3 if part 1 fails when running the whole suite.
I like thinking about behaviour in terms of capabilities. I can see that you have a couple here:
Uploading files
Correcting mistakes while uploading files.
So I would expect these to be in two scenarios:
One where you actually upload the file
One where you correct mistakes you make.
A lot of people say there should only be one "When" in scenarios, but that doesn't take into account interactions with people (including your past mistaken self) or time. In this situation, it's the whole process of correcting the file upload that provides the value. I can't see any value in the intermediate steps, so I'd leave them all in one scenario.
If there's any different behaviour associated with different contexts (eg: you've already got too many files uploaded) or outcomes (eg: your file system doesn't have room) or rules (eg: your status means you qualify for super-fast upload) then I would expect those to be new scenarios. If you start getting to the point where there are a lot of scenarios associated with file uploads and different things that happen to them, that might be a good time to separate this scenario out. Right now I can't see any reason to do that.
Re failing the first part: if you're doing BDD right, you'll be talking through not just the behaviour of your system, but the behaviour of individual bits of code too. That should help produce a good design which minimizes the chances of having bugs. Really good BDD teams produce scenarios that hardly ever catch bugs!
The scenarios act as living documentation, rather than regression tests; helping future devs understand the value of the code and get it right, rather than nailing it down to stop them getting it wrong.
So I wouldn't worry about it failing. If it's doing that a lot, you've got a different problem. Code it as if it's going to be passing most of the time, and make sure it's readable and comprehensible. As long as you can see when it fails and work it out, even if that takes a little bit of time, it'll be fine.
Having said that, I'd be surprised if Codecept doesn't have at least an option to stop on failure. Most BDD tools don't continue a scenario after a failed step; it would be an odd thing to do.
As far as I know, you're not be able to set priority for execution in codeceptjs. Better make one test. it is also will be more flexible if you will need to add or delete some part.

Abstraction or not?

The other day i stumbled onto a rather old usenet post by Linus Torwalds. It is the infamous "You are full of bull****" post when he defends his choice of using plain C for Git over something more modern.
In particular this post made me think about the enormous amount of abstraction layers that accumulate one over the other where I work. Mine is a Windows .Net environment. I must say that I like C# and the .Net environment, it really makes most things easy.
Now, I come from a very different background made of Unix technologies like C and a plethora or scripting languages; to me, also, OOP is just one, and not always the best, programming paradigm.. I often struggle (in a working kind of way, of course!) with my colleagues (one in particular), because they appear to be of the "any problem can be solved with an additional level of abstraction" church, while I'm more of the "keeping it simple" school. I think that there is a very different mental approach to the problems that maybe comes from the exposure to different cultures.
As a very simple example, for the first project I did here I needed some configuration for an application. I made a 10 rows class to load and parse a txt file to be located in the program's root dir containing colon separated key / value pairs, one per row. It worked.
In the end, to standardize the approach to the configuration problem, we now have a library to be located on every machine running each configured program that calls a service that, at startup, loads up an xml that contains the references to other xmls, one per application, that contain the configurations themselves.
Now, it is extensible and made up of fancy reusable abstractions, providers and all, but I still think that, if we one day really happen to reuse part of it, with the time taken to make it up, we can make the needed code from start or copy / past the old code and modify it.
What are your thoughts about it? Can you point out some interesting reference dealing with the problem?
Thanks
Abstraction makes it easier to construct software and understand how it is put together, but it complicates fully understanding certain issues around performance and security, because the abstraction layers introduce certain kinds of complexity.
Torvalds' position is not absurd, but he is an extremist.
Simple answer: programming languages provide data structures and ways to combine them. Use these directly at first, do not abstract. If you find you have representation invariants to maintain that are at a high risk of being broken due to a large number of usage sites possibly outside your control, then consider abstraction.
To implement this, first provide functions and convert the call sites to use them without hiding the representation. Hide the data representation only when you're satisfied your functional representation is sufficient. Make sure at this time to document the invariant being protected.
An "extreme programming" version of this: do not abstract until you have test cases that break your program. If you think the invariant can be breached, write the case that breaks it first.
Here's a similar question: https://stackoverflow.com/questions/1992279/abstraction-in-todays-languages-excited-or-sad.
I agree with #Steve Emmerson - 'Coders at Work' would give you some excellent perspective on this issue.

make your Jar not to be decompiled

How can I package my Java application into an executable jar that cannot be decompiled (for example , by Jadclipse)?
You can't. If the JRE can run it, an application can de-compile it.
The best you can hope for is to make it very hard to read (replace all symbols with combinations of 'l' and '1' and 'O' and '0', put in lots of useless code and so on). You'd be surprised how unreadable you can make code, even with a relatively dumb translation tool.
This is called obfuscation and, while not perfect, it's sometimes adequate.
Remember, you can't stop the determined hacker any more than the determined burglar. What you're trying to do is make things very hard for the casual attacker. When presented with the symbols O001l1ll10O, O001llll10O, OO01l1ll10O, O0Ol11ll10O and O001l1ll1OO, and code that doesn't seem to do anything useful, most people will just give up.
First you can't avoid people reverse engineering your code. The JVM bytecode has to be plain to be executed and there are several programs to reverse engineer it (same applies to .NET CLR). You can only make it more and more difficult to raise the barrier (i.e. cost) to see and understand your code.
Usual way is to obfuscate the source with some tool. Classes, methods and fields are renamed throughout the codebase, even with invalid identifiers if you choose to, making the code next to impossible to comprehend. I had good results with JODE in the past. After obfuscating use a decompiler to see what your code looks like...
Next to obfuscation you can encrypt your class files (all but a small starter class) with some method and use a custom class loader to decrypt them. Unfortunately the class loader class can't be encrypted itself, so people might figure out the decryption algorithm by reading the decompiled code of your class loader. But the window to attack your code got smaller. Again this does not prevent people from seeing your code, just makes it harder for the casual attacker.
You could also try to convert the Java application to some windows EXE which would hide the clue that it's Java at all (to some degree) or really compile into machine code, depending on your need of JVM features. (I did not try this.)
GCJ is a free tool that can compile to either bytecode or native code. Keeping in mind, that does sort of defeat the purpose of Java.
A little late I know, but the answer is no.
Even if you write in C and compile to native code, there are dissasemblers / debuggers which will allow people to step through your code. Granted - debugging optimized code without symbolic information is a pain - but it can be done, I've had to do it on occasion.
There are steps that you can take to make this harder - e.g. on windows you can call the IsDebuggerPresent API in a loop to see if somebody is debugging your process, and if yes and it is a release build - terminate the process. Of course a sufficiently determined attacker could intercept your call to IsDebuggerPresent and always return false.
There are a whole variety of techniques that have cropped up - people who want to protect something and people who are out to crack it wide open, it is a veritable arms race! Once you go down this path - you will have to constantly keep updating/upgrading your defenses, there is no stopping.
This not my practical solution but , here i think good collection or resource and tutorials for making it happen to highest level of satisfaction.
A suggestion from this website (oracle community)
(clean way), Obfuscate your code, there are many open source and free
obfuscator tools, here is a simple list of them : [Open source
obfuscators list] .
These tools make your code unreadable( though still you can decompile
it) by changing names. this is the most common way to protect your
code.
2.(Not so clean way) If you have a specific target platform (like windows) or you can have different versions for different platforms,
you can write a sophisticated part of your algorithms in a low level
language like C (which is very hard to decompile and understand) and
use it as a native library in you java application. it is not clean,
because many of us use java for it's cross-platform abilities, and
this method fades that ability.
and this one below a step by step follow :
ProtectYourJavaCode
Enjoy!
Keep your solutions added we need this more.

Resources