Related
What are the syntactical features of current languages that prove problematic for screen readers or braille readers? What symbols and constructs are annoying to hear or feel? To the blind programmers out there, if you were to design a programming language which was easier for other blind people to work with, what would it be like?
Although it is a very interesting question, it is highly a matter of personal preferences and likings, so I'll answer as if you asked me personally.
Note: My working system is Windows, so I'll focus on it. I can write cross-platform apps, of course, but I do that on a Windows machine.
Indentation, White Spaces
All indentation-related things are more or less annoying, especially if the indentation is made with multiple spaces, not tabs (yes yes, I know the whole "holy war" around it!). Python did teach me to indent code properly even in languages that don't require it, but it still hurts me to write proper Python because of these spaces. Why so? The answer is simple: my screen reader tells me the number of spaces, and the actual nesting level is four times less. Each such operation (20 spaces, aha, dividing by 4, it is fifth nesting level) brings some overhead to my mind and makes me spend my inner "CPU" resources that I could free for debugging or other fancy stuff. It is a wee little thing, you'd say, and you'd be right, but this overhead is on each and single line of my (or another person's!) code I must read or debug! This is quite a lot.
Tabs are much better: 5 tabs, fifth nesting level, nice and well. Braille display here would be also a problem because, as you probably know, a Braille display (despite the name) is a single line of text, usually 14 to 40 characters long. I.e., imagine a tiny monitor with one single line of text that you pan (i.e., scroll), and nothing besides that. If 20 chars are spaces, you stay with only 20 chars left for the code. A bit more, if you read in Grade 2 Braille, but I don't know whether it would be appropriate for coding, I mostly use speech for it, except some cases.
Yet more painful are some code styling standards where you have to align code in the line. For instance, this was the case with tests in moment.js. There the expected values and messages should match their line position so, for example, the opening quote would be in column 55 on every line (this is beautiful to see, I admit). I couldn't get my pull request accepted for more than a week until I finally understood what Iskren (thank him for his patience with me!), one of the main developers, was trying to tell me. As you can guess, this would be completely obvious for a sighted person.
Block Endings
A problem adjacent to the previous one: for me personally it is very nifty when I know that a particular code block ends. A right brace (as in C) or the word end (as in Ruby) are good, indentation level change (as in Python) is not: still some overhead on getting knowing that the nesting level has abruptly changed.
IDEs
Sorry to admit it, but there is virtually no comfortable IDE for the blind. The closest to such an IDE is Microsoft Visual Studio, especially the latest version (gods bless Jenny Lay-Flurrie!), Android Studio seems also moving towards accessibility starting with version 2. However, these are not as usable, nifty and comfortable as they are for the sighted users. So, for instance, I use text editors and command-line tools to write, compile and debug my code, as do many blind people around me.
Ballad of the snake_case, or Another Holy War
Yet another thing to blame Python about: camelCase is far more comfortable to deal with then snake_case or even PascalCase. Usually screen readers separate words written in camelCase as if they were separated with spaces, so I get no pain readingThisPartOfSentence.
When you write code, you have to turn your punctuation on, otherwise you'll miss something really tiny and "unimportant", such as a quote, a semicolon or a parenthesis. Then, if your punctuation is on, and you read my_very_cool_descriptive_variable_name, you hear the following: "my underline very underline cool underline.... underline underline underline!!!" (bad language and swears censored). I tried even to replace underlines with sounds (yes, my screen reader gives such an opportunity), but the sounds can't be so well synchronized because of the higher speech rate I use. And it is quite a nightmare when dealing with methods and properties like __proto__ (aha, there are two underscores on both sides, not one, not three - well, got it right, I think!), __repr__ and so on, and so forth. Yes, you might say, I can replace the word "underline" with something really short like "un" (this is also possible), but some overhead is still here, as with white spaces and code nesting.
PascalCase is far better but it implies a bit more concentration, also, since we need to remember putting the first capital letter (oh, I'm too fastidious now, let it be PascalCase, but not those under... oh well, you got it already). This was the reason I gave up with Rust, btw.
Searching for Functions
As I have already told you, IDEs are not good for us, so text editor is our best friend. and then, you need to search for functions and methods, as well as for classes and code blocks. In certain languages (not Python this time), there are no keyword that would start a function (see, for example, C or Java code). Searching for functions in these conditions becomes quite painful if, for example, you do know that you have a logical error somewhere in the third or fourth function in the file, but you don't exactly remember its name, or you skim through someone's code... well, you know, there are lots of reasons to do that. In this particular context, Python is good, C is not.
Multiple Repeating and Alike Characters
This is not a problem per se, but a circumstance that complicates debugging, for example, regular expressions or strongly nested operations like ((((a + ((b * c) - d) ** e) / f) + g) - h. this particular example is really synthetic, but you get what I mean: nested ternary operators (that I love, btw!), conditional blocks, and so on. And regular expressions, again.
The Ideal Language
The closest to the ideal blind-friendly language as for me is the D language. The only drawback it has is absence of the word function except for anonymous functions. PHP and Javascript are also good, but they have tons of other, blind-unrelated drawbacks, unfortunately.
Update about Go
In one of his talks Rob Pike, the main developer of the Go language said that no one likes the code style imposed by the Gofmt utility. Probably, no one — except me! I like it, I love it so much, every file in Go is so concise and good to read, I'm really excited about the language because of that. The only slightly annoying thing for a blind coder is when a function has several pairs of parentheses in its definition, like if it is actually a struct method. the <- channel operator still gives me moments to think about what I'm doing, sending or receiving, but I believe it's a matter of habit.
Update about Visual Studio Code
Believe it or not, when I was writing this answer and the first update to it, I wasn't working as a full-time developer — now I am. So many circumstances went favorably for accessibility and me in particular! Slack, that virtually every business uses, became accessible, and so became Microsoft Visual Studio Code (again, gods bless Jenny and her team!). Now I use it as my primary code editor. Yes, it's not an IDE per se, but it's sufficient for my needs. And yes, I had to rework my punctuation reading: now I have shorter and often artificial names for many punctuation signs.
So, calling from end of January 2021, I definitely recommend Visual Studio Code as the coding editor for blind people and their colleagues. What is even more amazing is that LiveShare, their pair programming service, is also accessible! Yes, it does have some quirks (for now you can't tell what line in the file your colleague is editing if you're blind), but still it's an extremely huge step forward.
This question is to learn and understand whether a particular technology exists or not. Following is the scenario.
We are going to provide 200 english words. Software can add additional 40 words, which is 20% of 200. Now, using these, the software should write dialogs, meaningful dialogs with no grammar mistake.
For this, I looked into Spintax and Article Spinning. But you know what they do, taking existing articles and rewrite it. But that is not the best way for this (is it? let me know if it is please). So, is there any technology which is capable of doing this? May be semantic theory that Google uses? Any proved AI method?
Please help.
To begin with, a word of caution: this is quite the forefront of research in natural language generation (NLG), and the state-of-the-art research publications are not nearly good enough to replace human teacher. The problem is especially complicated for students with English as a second language (ESL), because they tend to think in their native tongue before mentally translating the knowledge into English. If we disregard this fearful prelude, the normal way to go about this is as follows:
NLG comprises of three main components:
Content Planning
Sentence Planning
Surface Realization
Content Planning: This stage breaks down the high-level goal of communication into structured atomic goals. These atomic goals are small enough to be reached with a single step of communication (e.g. in a single clause).
Sentence Planning: Here, the actual lexemes (i.e. words or word-parts that bear clear semantics) are chosen to be a part of the atomic communicative goal. The lexemes are connected through predicate-argument structures. The sentence planning stage also decides upon sentence boundaries. (e.g. should the student write "I went there, but she was already gone." or "I went there to see her. She has already left." ... notice the different sentence boundaries and different lexemes, but both answers indicating the same meaning.)
Surface Realization: The semi-formed structure attained in the sentence planning step is morphed into a proper form by incorporating function words (determiners, auxiliaries, etc.) and inflections.
In your particular scenario, most of the words are already provided, so choosing the lexemes is going to be relatively simple. The predicate-argument structures connecting the lexemes needs to be learned by using a suitable probabilistic learning model (e.g. hidden Markov models). The surface realization, which ensures the final correct grammatical structure, should be a combination of grammar rules and statistical language models.
At a high-level, note that content planning is language-agnostic (but it is, quite possibly, culture-dependent), while the last two stages are language-dependent.
As a final note, I would like to add that the choice of the 40 extra words is something I have glossed over, but it is no less important than the other parts of this process. In my opinion, these extra words should be chosen based on their syntagmatic relation to the 200 given words.
For further details, the two following papers provide a good start (complete with process flow architectures, examples, etc.):
Natural Language Generation in Dialog Systems
Stochastic Language Generation for Spoken Dialogue Systems
To better understand the notion of syntagmatic relations, I had found Sahlgren's article on distributional hypothesis extremely helpful. The distributional approach in his work can also be used to learn the predicate-argument structures I mentioned earlier.
Finally, to add a few available tools: take a look at this ACL list of NLG systems. I haven't used any of them, but I've heard good things about SPUD and OpenCCG.
I am adding a feature to an application in which the students answer questions that are more descriptive in nature. I am curious to know if there's a way to make the system "smart" enough to grade these answers. Ofcourse, I can run the answers through a set of keywords to ensure that the student has atleast included the keywords in the answers, but obviously this is not smart enough.
I know there's no fool proof way of grading descriptive answers, but was wondering if there's any technologies out there that I can look into.
You could use mechanical turk which is an API for humans. Which is probably as far as you can get with AI'ing your system. Understanding and grading actual text is one of the last remaining problems where humans are way better than computers (i.e. computers suck)
One notable exception is Watson which is actually really good at Jeopardy, but it runs on a huge computing cluster and includes some serious optimizations and smarts. That's nothing you just turn on. Sorry...
The answer is not so simple. There are "automated grading systems" out there, used, I believe, for example, to grade GRE exams. For example, see this paper and this by ETS.
Is it bad to be semantic purist all
the time, at work? is it not achievable all
the time ?
when i saw code of any other
person/interviewee. I know selection
of element for a purpose is most
important thing.
what i should judge person ability
from his code; from a good written,
managed, optimized css or how he
wrote class and id names?
Or both every time.
Is it bad to be semantic purist all
the time, at work? is it not
achievable all the time?
Yes. Sometimes the tools doesn't work the way that you would like, and you just have to use something that does work. It's worth a little effort to make the code more semantically correct to make maintenance more effective, but at some point you just can't defend the development cost because the gain in maintenance that it would give is a lot less.
what i should judge person ability
from his code; from a good written,
managed, optimized css or how he wrote
class and id names?
Anyone can have bad habits or incomplete knowledge in some areas. You should rather judge a person on the ability to see the difference between good and bad code when presented with it, and for showing a certain interrest in getting it right.
It depends on what you mean by "semantic purist".
Is his code so bad that you MUST refactor it first before you can do any maintenance on it or even be able to understand what the heck it does? Then it's important.
If the code is not perfect by your standards but udnerstandable, AND can be maintained, AND the person seems receptive to your hints about ways to improve it - especially if those hints are accompanied by explanation of why the improvement makes the code X% more readable/stable/maintainable, it's not THAT important what their level of "purity" is now.
If the person is willing to spend extra week polishing their code to be semantically pure at the cost of shipping late, he's a net loss by a big margin.
To give a specific example for css, if someone will add a class to EVERY element on a page just because it "should be added", bad.
If they indicate that the class needs to be added to places X, Y and Z so that jQuery selectors perform 20% better than alternative solution, VERY GOOD.
Both actually contribute to different abilities. A great abstraction design in Classes and Properties will depict his ability to analyze things where as a well writted managed optimized css expresses that he is himself/herself an organized person. The later property is also very important.
Still I will go for the first property if the css is atleast presentable.
Are you in danger of setting up a false alternative? We can either choose class and identifier names carefully or we can write working CSS? Ideally I'd prefer someone who can do both.
What do you deduce about the likely quality of a person's work if their job application has serious misspellings? Like the misspelling the name of the company they are applying to! What do you deduce about a project proposal that mis-states the objectives of the project? A person applying for a SW security job who confuses "authentication" and "authorisation".
In the interview you are trying to determine from a very small sample of information the likely way a person will perform if they come to work for you. We need to factor in many factors such as the effect of interview pressure on some folks performance, and whether they are working in their native language. I don't see any absolute standards here, and no scope for simple either-or questions such as you pose.
Of course we need them to be able to produce code that works, that's the easy bit! Quality of code, and quality of other aspects of work also do matter.
CAPTCHAs that ask users to read distorted text are fine for sighted people, but a terrible barrier for those who are blind or have other disabilities. Audio alternatives are occasionally available but still don't help those who are both deaf and blind and can be hard to use with a screenreader (which is already reading words to you).
There exist a couple of solutions that use humans to solve the CAPTCHA on behalf of the user, such as WebVisium and Solona, but these rely on the availability of volunteer operators (for example, Solona apparently has just one volunteer so you have to hope he is awake when you want help).
It occurs to me that the volume of CAPTCHA solutions needed by blind people is very low - I'd guess less than a few hundred per day in a populous country like the UK. This means that unlike the bad folks who want to perform an action many times in a short period, a CAPTCHA assistance service for blind people could afford to devote considerable computational resource - for example, a cloud of computers in Amazon EC2 - to identifying the presented text.
My question is this: assuming you don't care about speed very much, and you have lots of computers available, are there algorithms that let you solve the text-distortion CAPTCHAs that are common today, such as those used by reCaptcha? Or are these problems really intractable even with lots of resource and time?
A few notes:
At this point, my question is just theoretical, but clearly any such service would have to carefully control access to keep spammers out. Perhaps only registered blind people would be allowed to use it.
I am aware that an old Yahoo CAPTCHA was broken a few years ago using an algorithm that runs in seconds on a single computer. I am asking whether modern CAPTCHAs can be broken, perhaps more slowly and with more resource.
I am aware that some new CAPTCHA types are appearing, which ask users to identify kittens or orient a picture. These aren't widespread yet, so I'm just asking about text-distortion for now.
Basically solving a text distortion CAPTCHA consists of three individual steps:
Find out where the interesting parts are
Segment the text into individual letters
Recognize the letters
The only problem that's left which is pretty hard for computers is the second one. The first usually isn't very hard, unless you happen to stumble upon the CAPTCHA from hell. And the third gets solved by computers with a much better success rate than by humans.
An interesting site for learning how CAPTCHAs are broken is the one by the OCR Research Team.
CAPTCHA has been created to avoid machines from detecting the words. It's meant to be read by humans only. Making it more readable for blind/deaf people adds a risk of machines being able to understand them again, thus nullifying their effect.
Spammers did find a very effective way to break the more popular CAPTCHA's though. They just hire cheap labourers to read them, in return for a few cents per working account. As a result, there's a small industry around breaking CAPTCHA's to create millions of accounts that can then be used to send more spam. Compared to the amount gained by the spammers, the costs is almost none. A similar solution could be used by blind/deaf people, who would send the CAPTCHA image to some cheap labourer in China or wherever, where they will reply with the correct words and the blind/deaf person will be able to proceed. Unfortunately, blind people only need this service only a few times while spammers need a continuous flow, thus those labourers will prefer to work for spammers instead. (The pay is better.) Still, the best solution would be to send the CAPTCHA to some friend, let them read and/or decipher it and return the answer.
The ReCAPTCHA style also reads out the words. A simple speech recognition application might be able to recognise whatever is said, although speech recognition still needs more optimizations. Still, you might want to work from that angle, getting the application to listen to the sound byte instead.
When it is possible to break CAPTCHA's, they will just think of better CAPTCHA-like methods. OCR techniques are still improving thus more work will be done to make CAPTCHA's harder. That is, until OCR has become as good as the human eye at recognizing words...
An algorithm could be created, although slow. With 26 lowercase and 26 uppercase letters and 10 digits, it should not be too difficult to come up with an algorithm. With Serif and Sans-serif fonts, the number of combinations would need to be doubled, though. Still, if you try to curve all letters in a similar way as the letter in the CAPTCHA, you should be able to detect a letter which gets covered by the CAPTCHA letter the most. And that would be the most likely candidate. Still needs you to clear lines, dirt and other artefacts from the image that the human eye has less trouble to recognise than a computer. You'd need the following steps:
Clean up the image.
Detect the locations of the letters.
For every letter
3a. Determine the curve of the letter by checking the left side.
3b. Do an overlay of every possible letter/digit to find the one that covers it the best. (That's the most likely letter.)
Once you've found the word, do a dictionary check to make sure it's a real word. (Unless the CAPTCHA doesn't use real words.)
Even though they can twist the letters in the CAPTCHA's, it should be possible to detect the twist rotation that they used simply by looking at the left side of every letter and then trying to apply the same curve to every letter. (52 combinations, plus 10 digits, if digits are also used.) Basically, you'd try to put a box around every letter, then check which letter will contain the least amount of white space. That's the most likely letter.
The main reason why this isn't often used for OCR is basically the need for speed. Step 3a/b tends to be slow, especially if you have to take font style in consideration.
Making this answer bigger but in reply to one of the comments:
There are several ways to cleanup an image. You'd need some color filtering, noise reduction and an algorithm that's able to recognise the noisy lines through an image. The DEFCON slideshow that you've pointed to shows a few simple techniques to filter away some of the noise. It shows that a basic image processing tool can already make an image a lot clearer for a machine to read. A simple blur will clean up random dots and thin lines while color filters would filter away the noisy colours. A next step would be to try to put a box around every letter in the CAPTCHA, hoping the system is able to recognise their locations. I don't know any practical algorithms for this but there should be ways to recognise them. There's software that can create vector images from bitmaps, thus there should be software that's able to calculate a box around a letter.
It is likely that this box won't have rectangular corners, thus you would have to distort all 52 letters to match the same box. Italic or bold shouldn't make much of a difference since these styles are just additional distortions. Serif or Sans-serif does make a difference, though. Serif fonts tend to have a few more spikes and ornaments. Fortunately, there are algorithms that can transform a box to any other figure with four corners.
Regular OCR applications will assume that letters are mostly straight and will just check a few hotspots to find a match. Thus, they sometimes get it wrong because of noise. To crack CAPTCHA, you would need a more sensitive match, preferably "XOR-ing" the CAPTCHA letter image with an image of one of the 52 letters, then counting the number of black and white spots to calculate the ratio. Assuming white=1 and black=0, the result of the XOR should be almost black for the best match.
I think several spammers have already found some useful algorithms to crack CAPTCHA's but for them, keeping these algorithms a secret just keeps them in business.
Another comment, more text. :-)
Segmentation would be a problem, but it's not impossible to solve. It's just extremely complex. But when you've cleaned the image, it should be possible to calculate two lines. One line that touches the bottom of every letter and a second line that touches the top. However, good CAPTCHA's won't put letters on the same lines any more, but those not-so-good ones could be cracked by just following the lines. (Guess? ReCAPTCHA puts letters between two lines!) With two lines, you know the first letter will start at the left, thus you can try overlaying all 52 possibilities there until you've found a match. When you found one, move to the right for the second one. And further until you've read all letters. With two lines to guide you, you don't need a complete box.
Letters tend to use a constant ratio between width and height. With two lines, you can calculate the height of the complete letter and thus get a good estimation of the matching width.
Still, working out the correct algorithm to calculate this all is a bit too much for my poor math skills. You'd need an expert mathematician to crack this algorithm.
My answer to your question "are these problems really intractable even with lots of resource and time?" is to point out that this is the very reason that CAPTCHAs work.
My understanding is that the purpose of a CAPTCHA is to prove that you are human rather than a spam bot. reCAPTCHAs are a novel take on this theme because they take images that represent text that cannot be resolved by OCR (optical character recognition) engines. The difference between a person and a machine in this instance is that specialized algorithm(s) has tried to interpret this image and failed while a "normal" person has the intrinsic ability to interpret the text in a consistently human way. That being said, in the future we hope that someone will come up with better OCR engines so that there needs to be less human intervention in digitizing the worlds information. We hope that someone will come up with an tractable solution to this particular problem.
From your point of view of trying to make CAPTCHAs more accessible to blind people -- who still need to prove that they're people rather than spam bots -- the community needs to become aware of this issue and find a way to identify people in a less vision centric way.
The introduction of CAPTCHA has certainly made the web less accessible to the visually impaired, and I agree with you in citing this as a significant problem that deserves more attention and concern. However, while CAPTCHA can be and has been inconsistently bypassed on popular web sites, I don't think this is a viable long-term solution for those in need. Indeed, the day that the CAPTCHA variants currently present on sites like Facebook, Google, MySpace etc. can be reliably and consistently broken is the day they will become obsolete and abandoned for either stronger variants of the same or an entirely new solution (as you implied, distinguishing cats from dogs in pictures has been a popular alternative trend).
When it comes to online accessibility, what I think those with disabilities need most right now is advocacy. The more people contact software companies, open source groups, and standards bodies and speak out about this need, the more awareness will be raised and that will (hopefully) lead to more action on behalf of the development community. Ultimately, it would be great to see sites like Google or Facebook offering alternative access methods just for their visually impaired users.
Idealism aside, I think it is productive to pursue other avenues like you mentioned with the CAPTCHA volunteer network, possibly even the development of something like OpenID for those with relevant disabilities as a universal form validation pass.
As for the technical aspect of your question, I don't think the availability of additional processing power alone will allow you to reliably and consistently break CAPTCHA. There is A LOT of money in spam, and you can be sure that shady SEO companies and Spammers alike have a great number of servers at their disposal. As Johannes Rössel mentioned, if you want to learn more about how this is done and where the technical difficulty lies, research Optical Character Recognition (OCR) and look at the wide variety of number/letter skewing that occurs on high traffic sites.
This related SO question has a number of good ideas in it, including a DEFCON talk that claims using multiple OCRs and voting breaks many simple CAPTCHAs. This suggests a candidate solution method: distribute the problem over several servers, each of which runs one or more OCR tools in parallel, collect the results, and take the most popular answer. Comments welcome.