Voiceover and Text To Speech - differences in pronunciation - accessibility

I've been playing around a little with both VoiceOver and the Text to speech functionality on my mac. I've noticed a few differences in the way numbers and punctuation is pronounced. For example the sentence "the year was 1978", is read out perfectly when I highlight it and use text to speech. With Voiceover however, it reads "the year was one nine seven eight".
How can I tell screen readers that I want something pronounced in a certain way? Is there ARIA attributes I can add for this kind of behaviour?
It is not just dates and years but prices and punctuation as well (and probably a lot of other things!).

I don't think you can control speech output with the APIs currently available.
I assume that you are talking about HTML page as you referenced ARIA attributes. WAI ARIA does not have attributes to control the screen reading. I believe W3C is coming up with Text-to-Speech synthesizing markup language (SSML) to provide better control over speech output.
If your issue is related to native Mac OS application screens, you can check out Apple's Speech Synthesis APIs.
The Apple VoiceOver as well as Microsoft Windows Narrator are very basic screen readers with little or no intelligence built in. Apple's text-to-speech is little advanced but it still lags behind commercial screen readers like JAWS. But good thing is that users with impaired vision typically have good screen readers which can read your content appropriately.

Common speech synthesizers allow their clients to configure whether numbers will be read as words or spelled-out (both on a global default and per-specific-case setting). E.g. Apple's speech synthesis API propagates this possibility using the NSSpeechNumberModeProperty property. VoiceOver users can then set only global (or at best per-app) default in VoiceOver Utility > Verbosity > Text > "Read numbers as:".
Now if you want to influence how screen readers should pronounce numbers in web content, there is a CSS Speech Module. It defines the speak-as property, which can have a value of normal or digits.
However whether such CSS value will be respected by a combination of particular version of particular screen reader and particular version on a particular browser running of particular version of particular OS is a thing you have to try yourself. The accessibility APIs on OS X currently have no way to specify pronunciation of digits in the returned strings, so if it worked for WebKit, it would have to be using some private extension to those public accessibility APIs. I just tried on OS X 10.9.3 and this does not work.
If I had to guess, you would be lucky to find a combination of screen reader / browser / OS where this would be implemented as of now. But this is just speculation.

Related

make non-native application accessible to screen readers for the visually impaired

I create applications, that are divorced from any native framework. All rendering happens in OpenGL, with a context provided by GLFW, all in C, with no framework to rely on supplying compatibility. As such, standard screen readers like NVDA have no chance of picking up information ( excluding OCR ) and my applications are an accessibility black hole.
How can I provide an interface for screen readers to cling unto? I presume this is a per OS thing... How would that be possible on Windows, Linux, BSD or even android? In the *NIX world, I presume this would be Desktop environment dependent...
I'm finding a lot of information on this, with a framework as a starting point, but have a hard time finding resources on how to do it from scratch.
I'm fully aware this is far beyond the capability of a sole developer and know, that writing programs by ignoring native interfaces is a common accessibility hole, which you are advised to avoid.
However, I have a tough time finding resources and jump-in points to explore this topic. Can someone point me in the right direction?
TL;DR: How to provide screen-reader compatibility from scratch. Not in detail - but conceptually.
As you have already well identified, your app is an accessibility blackhole because you are using a rendering engine.
It's basicly the same for OpenGL, SDL, or <canvas> on the web, or any library rendering something without specific accessibility support.
WE can talk about several possibilities:
Become an accessibility server. Under windows, it means doing the necessary so that your app provide accessible components on demand from UIA / IAccessible2 interface.
Use a well known GUI toolkits having accessibility support and their provieded accessibility API to make your app.
Directly talk to screen readers via their respective API in order to make them say something and/or show something on a connected braille display.
Do specific screen reader scripting
However, it doesnt stops there. Supporting screen readers isn't sufficient to make your app really accessible. You must also think about many other things.
1. Accessibility server, UIA, IAccessible2
This option is of course the best, because users of assistive technologies in general (not only screen readers) will feel right at home with a perfectly accessible app if you do your job correctly.
However, it's also by far the hardest since you have to reinvent everything. You must decompose your interface into components, tell which category of component each of them are (more commonly called roles), make callback to fetch values and descriptions, etc.
IF you are making web development, compare that with if you had to use ARIA everywhere because there's no defaults, no titles, no paragraphs, no input fields, no buttons, etc.
That's an huge job ! But if you do it really well, your app will be well accessible.
You may get code and ideas on how to do it by looking at open source GUI toolkits or browsers which all do it.
Of course, the API to use are different for each OS. UIA and IAccessible2 are for windows, but MacOS and several linux desktops also have OS-specific accessibility API that are based on the same root principles.
Note about terminology: the accessibility server or provider is your app or the GUI toolkit you are using, while the accessibility client or consumer is the scren reader (or others assistive tools).
2. Use a GUI toolkit with good accessibility support
By chance, you aren't obliged to reinvent the wheel, of course !
Many people did the job of point 1 above and it resulted in libraries commonly called GUI toolkits.
Some of them are known to generally produce well accessible apps, while others are known to produce totally inaccessible apps.
QT, WXWidgets and Java SWT are three of them with quite good accessibility support.
So you can quite a lot simplify the job by simply using one of them and their associated accessibility API. You will be saved from talking more or less directly to the OS with UIA/IAccessible2 and similar API on other platforms.
Be careful though, it isn't as easy as it seems: all components provided by GUI toolkits aren't necessarily all accessible under all platforms.
Some components may be accessible out of the box, some other need configuration and/or a few specific code on your side, and some are unaccessible no matter what.
Some are accessible under windows but not under MacOS or vice-versa.
For example, GTK is the first choice for linux under GNOME for making accessible apps, but GTK under windows give quite poor results. Another example: wxWidgets's DataView control is known to be good under MacOS, but it is emulated under windows and therefore much less accessible.
In case of doubt, the best is to test yourself under all combinations of OS and screen readers you intent to support.
Sadly, for a game, using a GUI toolkit is perhaps not a viable option, even if there exist OpenGL components capable of displaying a 3D scene.
Here come the third possibility.
3. Talk directly to screen readers
Several screen readers provide an API to make them speak, adjust some settings and/or show something on braille display. If you can't, or don't want to use a GUI toolkit, this might be a solution.
Jaws come with an API called FSAPI, NVDA with NVDA controller client. Apple also alow to control several aspects of VoiceOver programatically.
There are still several disadvantages, though:
You are specificly targetting some screen readers. People using another one, or another assistive tool than a screen reader (a screen magnifier for example), are all out of luc. Or you may multiply support for a big forest of different API for different products on different platforms.
All of these screen reader specific API support different things that may not be supported by others. There is no standards at all here.
Thinking about WCAG and how it would be transposed to desktop apps, in fact you are bypassing most best practices, which all recommand first above anything else to use well known standard component, and only customize when really necessary.
So this third possibility should ideally be used if, and only if, using a good GUI toolkit isn't possible, or if the accessibility of the used GUI toolkit isn't sufficient.
I'm the autohr of UniversalSpeech, a small library trying to unify direct talking with several screen readers.
You may have a look at it if you are interested.
4. Screen reader scripting
If your app isn't accessible alone, you may distribute screen reader specific scripts to users.
These scripts can be instructed to fetch information to give to the user, add additional keyboard shortcuts and several other things.
Jaws has its own scripting language, while NVDA scripts are developed with Python. AS far as I know, there's also scripting capabilities with VoiceOver under MacOS.
I gave you this fourth point for your information, but since you are starting from a completely inaccessible app, I wouldn't advise you to go that way.
In order for scripts to be able to do useful things, you must have a working accessible base. A script can help fixing small accessibility issues, but it's nearly impossible to turn a completly inaccessible app into an accessible one just with a script.
Additionally, you must distribute these scripts separately from your app, and users have to install them. It may be a difficulty for some people, depending on your target audience.
Beyond screen reader support
Screen reader support isn't everything.
This is beyond your question, so I won't enter into details, but you shouldn't forget about the following points if you really want to make an accessible app which isn't only accessible but also comfortable to use for a screen reader user.
This isn't at all an exhaustive list of additional things to watch out.
Keyboard navigation: most blind and many visually impaired aren't comfortable with the mouse and/or a touch screen. You must provide a full and consist way of using your app only with a keyboard, or, on mobile, only by standard touch gestures supported by the screen reader. Navigation should be as simple as possible, and should as much as you can conform to user preferences and general OS conventions (i.e. functions of tab, space, enter, etc.). This in turn implies to have a good structure of components.
Gamepad, motion sensors and other inputs: unless it's absolutely mandatory because it's your core concept, don't force the use of them and always allow a keyboard fallback
Visual appearance: as much as you can, you should use the settings/preferences defined at OS level for disposition, colors, contrasts, fonts, text size, dark mode, high contrast mode, etc. rather than using your own
Audio: don't output anything if the user can't reasonably expect any, make sure the volume can be changed at any time very easily, and if possible if it isn't against your core concept, always allow it to be paused, resumed, stopped and muted. Same reflection can apply to other outputs like vibration which you should always be able to disable.

How to handle version numbers for screen readers?

I'm using Windows Narrator and a version number like 2.3.96 is being reader as a date "2nd March, 1996". How do I handle version numbers for screen readers? I see some answers suggested using a label and spell out the dots, like: aria-label="2 dot 3 dot 96". Is there a better way to do this?
There are already several similar questions on stackoverflow and elsewhere about numbers and other things that aren't pronounced as the page writer expects.
The same answer applies to version numbers interpreted as dates: you'd better do nothing and write it without anything special.
The problem is that the pronunciation of numbers depends on many layers:
The browser and its way to expose accessibility tree.
The OS, the screen reader and its settings. For example, Jaws offers many options that can change how numbers and dates are interpreted and spoken out.
The voice used. The same screen reader, on the same OS, with the same browser, but with different voices can indeed read the same text very differently.
You may try several things, like separating it in a different <span>, adding or removing spaces, writing "dot" via aria-label, etc.
However, by doing so, you are very likely to improve the pronunciation for some users, while degrading it for most others. The combinations of OS, browsers, screen readers, and voices are too huge to be able to test everything.
So the best is to keep pragmatic and don't do anything special. Keep your version number written as it is usually done everywhere.
Screen reader users are generally used to such pronunciation quirks and can, if necessary, adjust options and set up dictionaries.
Short Answer
Use <p aria-label="Version 2 point 3 point 96">Version 2.3.96</p>, this will get parsed correctly in most popular screen readers. Make sure there are no <spans> splitting the version number from the text, use an aria-label as overkill / a backup.
Long Answer
Normally I would advise to never interfere with the way a screen reader pronounces things, as #QuentinC has suggested.
However in this circumstance I would say that you do want to try and fix this problem, as hearing a date read where there should be a version number could be very confusing (was it the date the software was written, is it the date the company was formed? Who knows!).
This is a parsing problem in the screen reader, very different from problems most people have where they don't like the way something is read, please only follow the advice below for this one example only.
Fix one, change your markup.
You didn't add your markup but the below explanation is based on my testing.
This problem will actually fix itself (at least in JAWs, NVDA and VoiceOver from my testing) by simply adding 'Version' before the version number.
My guess is you have the version number in a format similar to the following:
<p>Version <span>2.3.96</span></p> or <p>2.3.96</p> without the word 'version'.
By removing the <span> in the first example JAWS, NVDA and VoiceOver read the following correctly in my testing:
<p>Version 2.3.96</p>
Once the <span> is added around the version number the context of the numbers is lost (text within <span> elements is evaluated on it's own) so a screen reader will try and parse it on it's own, without any context, and decide it is a date.
Surely using aria-label will fix it though?
Also aria-label will probably not work on static elements (<div>, <span> etc.) that do not have a role that indicates they are an active element, so odds are it will not work anyway for you.
For that reason I would not recommend trying to fix this problem with aria-label alone, however adding it for the screen readers that do take it into consideration is a step that would not hurt.
This is the only time I have ever recommended interfering with how a screen reader talks, please do not take this as a step to solve other pronunciation problems as this is a parsing problem not a pronunciation problem.
Taking that into account I would recommend you use the following:
<span aria-label="Version 2 point 4 point 99">Version 2.4.99</span>
With either a <span>, <p> or other content block surrounding it.

EPUB & Kindle File Glossary and Dictionary Selection

The Puzzle
I am working on an eBook file, or series of files, which should be compatible with the maximum range of eReaders on the market. This would include, for example:
The e-ink Kindle family
The Kindle App on iOS, Android (including Kindle Fire) and anywhere else
iBooks for iOS
The Nook
The Nook App for iOS, Android, and anywhere else
Kobo, possibly (haven't looked into this much)
So we're going for maximum compatibility with different eReaders, I hope that's clear.
Now, here's the issue:
The eBook file has a unique custom glossary which should be easily accessible to the reader.
Although it is one thing to have a glossary in the back of the eBook, all of the modern eReaders have some kind of dictionary functionality that is very accessible (put your cursor on a word, long-press on a word, etc.) so the glossary needs to be just as accessible to encourage usage by readers.
Hyperlinks
One way to do this would be hyperlinking each (or the first) occurrence of a term to the glossary entry in the back, and having hyperlinks in the glossary that go back to the occurrence(s).
Hyperlinks are supported by EPUB, as well as MOBI / AZN / KF8 / etc. for Kindle. The links can be styled so that they are unobtrusive (not underlined, dark gray or black, etc.)
This is the best solution I have been able to concoct so far.
However, having the hyperlinked words look different from the rest of the text could be distracting to the reader. If I use this method, and the hyperlinks are styled to look like the rest of the text, the reader will not know whether to navigate (press) so they will simply use the built-in dictionary (long-press).
(Also note that the newest Kindle software (latest Kindle Paperwhite) shows a little "Footnotes" popup window instead of navigating to the glossary. This is great, except for the fact that it says "Footnote", whereas it should say "Glossary", but this seems to be a Kindle software default--any hints on how to change this would be awesome.)
Modify Built-In Software
If there is any way to let the software know (iBooks / Kindle App / other software) that the book has a custom glossary, so that the default behavior is modified, this would be ideal. In other words, when you long-press the word, you don't just get the default popover (as in the Kindle software or iBooks) but you also get some way to look at the glossary definition.
Personally I know of no way that this can be done, but I'm asking in case anyone knows.
Javascript
Javascript is theoretically supported in EPUB 3, but in reality, out of the major eReading options, iBooks has support, and possibly Kobo (haven't looked into it) but nothing else does. Certainly not the rather antiquated MOBI format, and the KF8 format officially does not support JS.
The idea behind using JS would be to create a custom popover, whenever you tap or long-press a word that has a glossary entry. The custom popover would ideally allow you to choose between the built-in dictionary and the glossary entry.
It seems like this would be feasible only in iBooks, and perhaps Kobo, to show the glossary entries. (The popover would just have the hyperlink in it, basically.) In iBooks, I'm not sure how I would activate the built-in dictionary from my own custom JS popover, because the default popover you get in iBooks is the iBooks app's own hook into the text, based on your long-press.
Anyway, this obviously doesn't have cross-platform support into the Kindle family, but I'm throwing it out there as an option.
In Summary
In summary, I'm looking for a way to allow a reader easy access to a glossary in an otherwise standard eBook file (EPUB, Kindle family) across a wide range of eReading options. The eBook will be purchased in a normal eBook store and downloaded through normal methods. An App is not a solution, because of their limited distribution capabilities.
Any potential way of solving this puzzle is welcome!

How can I replace the screen reader audio with a prerecorded audio file?

I work on a multilingual website that will contain many languages that are not normally written, and I wonder if there are any ways to get this working for people using screen readers? Is it possible to give a text an attribute to make the screen reader play a prerecorded sound instead of trying to read the text by itself?
The whole menu system will be translated into the languages that are not supported by any screen readers.
The two popular screen readers are JAWS and NVDA. You can see what languages JAWS supports, 28 in total. NVDA supports 43 languages (I couldn't find a list).
I wonder if there are any ways to get this working for people using screen readers
There is a few things you could do that come to mind:
Declare the language of the page via the <body lang="">, so that if the screen reader happens to know how to interpret it, it uses that language
Put links to common language translations near the top of the page so if somebody lands on a random page from a search engine hit, they can change languages quickly.
Is it possible to give a text an attribute to make the screen reader play a prerecorded sound instead of trying to read the text by itself?
The lang attribute makes the screen reader switch to another language if it understands it. You can provide links to audio files to be listened to, I would be a little cautious with providing your own audio player. Not all audio players are accessible, the two common issues with these are the controls are not labeled and they cause focus trap.
Unlabeled controls make the assistive technology say "unlabeled" or something similar, so you cannot tell the buttons apart from each other. Focus trap effects people who use the keyboard to navigate a page, this is usually using the Tab key, and instead of getting out of the audio player, it goes to the first element of the audio player again.
From Comments
How I can make the screen reader play these files instead of trying to read the text.
The only thing you can do is use ARIA to hide the content via the aria-hidden='true' attribute. You can check my answer about aria-hidden for more details. Essentially you would do something like:
<article aria-hidden="true">
<h1>Some Really cool language</h1>
<p>Blah blah blah</p>
<section aria-hidden="false">
<h2>Audio of language</h2>
<p>below is an audio sample of ____. Blah blah blah</p>
<p class="offScreen"><!-- it may be a good idea to put additional
info for people using assistive tech --></p>
<p>audio stuff</p>
</section>
<article>
CSS
.offScreen{
position: absolute;
top: 0;
left: -999px;
}
Ryan, I've seen this question asked elsewhere about "click" languages, as of southwestern Africa. So far as I know, there is no written alphabet that is intrinsic to these languages. Scholars might record the languages phonetically, but more common techniques involve adding exclamation points and perhaps other basic keyboard characters to indicate the vocalizations that cannot be conveyed by European alphabets. The Kx'a family of languages is one such group.
If you look for RFC 1766 on sourceforge.net, you'll find a list of 122 languages or variants of languages that map to specific values of the lang attribute. And RFC 1766 itself shows how to add Klingon and other "experimental" languages to the mix.
So there are several issues, it seems:
If a language has not yet been mapped, how does one create a mapping of its characters and groups of characters (its graphemes) to its sounds (its phonemes)?
Assuming that's all that is required, how does one get that mapping associated with a new value for the lang attribute? (To get that new value, RFC 1766 says to create, complete, and submit a simple form. But, given that the document called RFC 1766 is 18 years old, how reliable is that information? And just where does the mapping of symbols to sounds fit into the picture?)
Ultimately, how does one get a screen reader to recognize that mapping and the corresponding value of the lang attribute?
My somewhat contrarian take: don't try to automatically replace the text with pre-recoreded content; instead focus on ensuring that the user is aware that both are available, and can access whichever is most appropriate for them based on the tools they have at their disposal.
Some more background context might help: from your description, it sounds like this is perhaps an academic or research site, that has fragments of text in these languages, with audio; but where the remainder of the site structure - headings and supporting narrative text - is in some 'well-supported' language (English, etc.)? (What is the encoding system used for this test?)
If so...
Be aware that a screenreader user does not typically read an entire page top-to-bottom in a completely linear fashion; they can browse the page using the heading structure. In a well-marked-up-page, the user has the freedom to skip over the portions that they are not interested in or which are not relevant to them. Focus on providing this flexibility rather than making (well-meaning, but potentially incorrect) policy decisions on behalf of the user.
Don't assume that a screenreader user is using speech in the first place; they could be using Braille, whether due to the fact that speech output is not an option for them, or simply because Braille is their preferred form of output.
Finally, don't assume that because a screenreader user can't hear the text properly (due to text-to-speech limitations), that the textual form of the content should be hidden from them entirely; they may still want the ability to cut-and-paste the characters that represent the text so that they can send them to a colleague, for example. Or, depending on the writing scheme used, a screenreader user may still be able to step through the characters letter-by-letter and have the words spelled out to them letter-by-letter - many screenreaders can call out non-latin characters by their Unicode name.
The issue here is less about JAWS and more about having a Synthesizer which speaks the language and can communicate with JAWS through a driver such as SAPI 5. Development of these languages for the various synthesizer companies can be costly, especially if there is not a good business case driving it such as GPS, ATMs, Call Centers, etc.
There are open source solutions such as eSpeak which you might look into as well. It is not the highest quality but could be an approach if you have access to developers willing to work on such a project.
As for the question regarding an API or method to communicate information to JAWS via prerecorded sound files of the web site? This is not really going to meet the need of the screen reader user who would have no way to navigate the information or interact with it using Links or form field elements. I really think the synthesizer development is the only solution unfortunately.

Do you develop with Accessibility in mind?

I've never really learned much about accessibility but it seems like an important topic.
When you build a website or piece of software, or when you're talking to a client about a website, where does accessibility come in? Or from your experience, if you don't have accessibility in something you've built for a client, do you get a lot of requests to include it, or does it limit you in some financial way?
What are the numbers, I guess. What's the return in your business, how many people have you talked to that need it? Do you yourself need accessibility features?
I do mainly Flex/Flash and it seems like I'll have to do a bit of work to have full accessibility.
Thanks for the help.
As a person with a disability myself, I am consious of adding accessibility features when I write software
Accessibility is an area of software design concerned with making software user interfaces avvessibile for people with physical or mental disabilities or imparements. Different people have different specific needs and you can't be expected to cater specifically to each but there are some broad groupings
Visual Imparements:
This includes blindness or color blindness. To assist in this area consider providing "good" alt text (clarified blow) and hints so that screen readers can present a view of your content that makes sense aurally. Providing easy access to links to raise text size and/or access to some high contrast stylesheet options is also a good idea.
Non-Mouse Users
There are a huge number of conditions that can prevent one from being able to successfully mouse, it took a few years for me and my brain, which is somewhat unreliable when it comes to spatial relationships to pick up the skill. For these people keyboard access is really helpful, I don't work in the web space so I'm not sure if there are standard keys to use, but these are communicated by screenreaders and tooltips so having any is better than none.
Hanselminutes episode #125 is quite educational. He talks with a blind user about accessibility on the web and in generalAccessibility is omitted from a lot of design processes, either because businesses don't have an immediate need for it and therefore don't consider it at all, or consider it a low priority feature. Leguslation in various countries has helped a bit in this regard, but the real problem is that accessibility in general is usually an afterthaught to the design process,
1"Good" alt text is judicious use of alt text that accentuates the content or purpose of a page, navigation elements should have alt text describing where interacting with them will take the user, similarly, things that aren't content, like spacers should have no alt text at all, because there is nothinng worse then hearing "Foo's widgets spacer spacer spacer spacer spacer nav_Products spacer nav_support"
I think accessibility is usually completely forgotten about (either implicitly or explicitly dismissed beforehand because of issues like cost) in most software development projects. Unless companies (or individual developers, more likely) already have experience with either people with disabilities or with writing software with disabilities of users in mind.
As a developer I at least try to do keyboard shortcuts correctly in software I work on (because that's something I can easily dog-food myself, since I try to keep hands-on-keyboard as much as possible). Apart from that it depends on whether there are requirements about accessibility.
I do think this kind of thing is part of "programming taxes", i.e. things that you as a developer should always be doing, but...
I am only aware of this - at least more than the average developer, I think - because I have once written software for a software magazine on floppy disk, or Flagazine. This was in PowerBasic 3.2, grown out of BASIC sources in a magazine, making these sources available by BBS and disk, eventually growing a menu around the little applications to easily start them, etc.
One of our primary users (and later members of the editorial staff) was blind and was appalled when we switched from text mode to an EGA mouse driven menu, as his TSR screenreader software couldn't do anything with graphics. It turned out that his speech synthesizer simply accepted text from a COM port. It had a small (8K I think?) buffer that would be instantly cleared on reception of (I think) an ASCII 1 character. And that was it.
So we made the graphical menu (and most other programs on the Flagazine) completely keyboard accessable at all times and in the graphical programs we use a small library I wrote to send ASCII text to a configured COM port. This had small utility methods like ClearBuffer(). With this, and the convention of speaking possible menu actions when pressing the space bar, made all of this software accessable to our blind users.
I even adapted a terminal application for my HP48 calculator (adding a clear buffer/screen on ASCII 1) so I could use that to emulate a speech synthesizer. I would then test all of our software in each Flagazine by attaching my HP48 with the emulator running, turning off my computer monitor and trying if I could use all the software without seeing anything.
Those were the days, about 12 years ago... ;-)
I am a blind individual so have to develop with accessibility in mind if I want to use my own programs. I find my self focusing on accessibility based on the type of application I’m writing. When doing command line or mainframe applications I don’t think about accessibility since those environments are inherently accessible. With web based applications I have to give some thought to accessibility but not a lot. This is mainly because I write simple web applications for limited use so don’t have to worry about making the interface appealing, just usable. The area I spend the most time focused on accessibility is desktop applications. For example using .net I need to make sure accessible properties are set properly and that labels are in the proper position in relation to a text box so my screen reader can find them and associate them with the proper control.

Resources