How do you justify a paragraph in sharpPDF? - asp.net

I have a asp.net MVC project that uses sharpPDF (http://sharppdf.sourceforge.net/) to create PDFs. There are 20+ PDFs written using sharpPDF, so switching to another PDF generator probably isn't possible at this stage (and our client is unlikely to pay for a paid version of any PDF software). The client wants the PDF that's generated to look exactly like the original Word document. The letters seem to be just a little smaller in the Word version despite using the same font and font size (Arial Narrow Bold, 10). Turns out, the Word document justified the paragraph. I can't find a justify option for sharpPDF (only left, right, center). How do I justify a paragraph in sharpPDF? Alternatively, how to I get the text to look like it's justified?
Here is a comparison of the text in Word (top) and in my PDF generated by from sharpPDF (bottom):
Here are the settings for Word compared to the settings in sharpPDF:

Looking at the API for SharpPDF it doesn't expose a sufficiently low level interface to allow you to implement the justification yourself on top of that library.
PDFSharp is another free (MIT licensed) library which does support text justification. I realise that you don't want to do a wholesale change, but if it's only a subset of the content that requires justified text one option would be to generate the sections that require justified text using PDFSharp and then use the library to combine the two documents. Having said that, the API is fairly similar and a port probably would not take long.
Another option would be to make use of SharePoint Server if your client has it, to perform the conversion for you. This is likely to produce something closer to the original word document than anything you're likely to be able to render using custom code.

Related

Accessibility with Google Slides Embed on a webpage

Got a site with powerpoint presentations that the client wants embedded, the embedding is being done via google docs embed. I have been doing some accessibility testing, albeit not particularly in-depth but even with OS X screen reader it is not having any luck reading the slides. (I am aware slides are probably terrible for accessibility anyway). I can get the text content of the slides stripped out via the Google API, but I don't know if thats the best thing, to include it on the page below/above the iframe embed with one of the CSS tricks for hiding it from normal view?
I am aware of iframe title and aria-label but those seem to imply they are only to describe the contents of the iframe, which I am doing, but I need somewhere that can contain more text.
Has anyone got any good tips for the best way to deal with such things? Thanks!!
Embedding rich 3rd-party content in web pages poses many challenges.
When we put something like this in a web page, we typically think we're adding a bit of "content", but it often amounts to embedding a complex application; and the user interface, semantics, and presentation are outside of our control. In your case it's a presentation slide deck, but it could also be a Flash/Silverlight/Java applet, a slippy map, interactive SVG infographic, a 3D-panorama, virtual tour, zoomable image, chemical molecule viewer, or who-knows-what. (Note: I'm not familiar with the Google docs embed/API specifically, so most parts of my answer will address these rich content cases in general.)
Even if the embedded rich 3rd-party content/application is accessible today, there's no guarantee it will remain so after the 3rd-party system gets an update.
So what can you do? The safest thing might be to assume it's inaccessible, and consider the best way to provide an accessible alternative. The Web Content Accessibility Guidelines (WCAG) calls this a "conforming alternate version", and it sounds like you're already thinking along these lines.
An important caveat to all of this, is that the use of "conforming alternate versions" isn't considered ideal by many accessibility specialists. It's greatly preferred to make your main content accessible as you can.
Some relevant parts of WCAG:
Understanding conformance, especially the section about "Understanding Conforming Alternate Versions".
Technique G190: Providing a link adjacent to or associated with a non-conforming object that links to a conforming alternate version.
F19: Failure of Conformance Requirement 1 due to not providing a method for the user to find the alternative conforming version of a non-conforming Web page.
It's worth mentioning the 3rd-party content in your website's accessibility statement. Statement of Partial Conformance - Third Party Content offers guidance about that.
The crucial thing about conforming alternate versions, is that it's no use at all if the user isn't made aware of it, or isn't able to reach it.
Implementation-wise, there are a variety of approaches you might take. In many ways, providing an alternative for embedded rich content is similar to providing a long description of a complex image, or a transcript of a video. Have a look at these WAI tutorials for some ideas.
Web Accessibility Tutorials: Complex Images
Making Audio and Video Media Accessible: Where to Put Transcripts
I can get the text content of the slides stripped out via the Google API
It sounds like you're trying to automate the process. That's fine, but it might not give satisfactory results. Some things you should consider:
Is the text content alone going to be enough? Presentations often have images too. Did the author provide text alternatives for these images, and are they present in the text extracted via the API? If the author hasn't provided text alternatives for images in the slide deck, the text you get from the API won't be telling the whole story.
Not all text in slides carries the same weight. Some slides serve to introduce sections, some slides have headings. Does the text obtained from the API convey these relationships?
Lists are very commonly found in presentation. Does the text obtained from the API preserve this structure?
Slides sometimes contain links. Are these included in the text obtained from the API, so the links are available to everybody using the alternative version?
Slides sometimes contain tables and charts. How will the information in these be conveyed in your alterative version? Is the information included in the text obtained from the API?
Sometimes, presentation decks also contain rich 3rd-party embedded content themselves! A slide containing a video, or an animated GIF are examples of this. If so, you can find yourself with a much more complex challenge.
... and many other meaningful info and relationships. Quotations, code samples, etc.
If any of the above points give cause for concern, you will need to consider managing your alternative version manually.
The "conforming alternative version" has to conform to WCAG; if you just offer a choice of two non-conforming version, that doesn't satisfy WCAG.
include it on the page below/above the iframe embed with one of the CSS tricks for hiding it from normal view?
No, I wouldn't recommend that. I assume you're refering to visually-hidden text, using CSS utility classes such as .visually-hidden or .sr-only. It sounds like you're only thinking about screen reader users.
You need to offer the alternative version available to everybody, not just one group of users who you think will need it.
Many groups of users may experience difficulty using the rich 3rd-party embedded content. This includes people using the keyboard only, screen readers, magnifiers, speech control, switch access, or other tools. The conforming alternative version can be navigated like a normal web page though.
The 3rd-party content likely has a different visual style, and it may not be adaptable in the same way as the page it is embedded in. That can frustrate people who make use of browser zooming, text resizing, font preferences, reader mode, Windows high-contrast themes, viewport resizing, and other user-applied presentation changes. The conforming alternate version should be as adaptable as the rest of your site.
Rather than hiding the alternative version in a visually-hidden container, here are some other ways to present it. The first two are the simplest and most reliable.
Put it on the same page, just after the original content, visible to everybody.
Put it in a collapsible disclosure element just after the embedded content. A <details> element is an easy way to achieve this. This is useful if the alternative version is quite long.
Put it elsewhere on the same page, and tell users where to find it. An internal link can help here.
Put it on a separate page, with a link next to the embedded content. I'm less keen on this approach. Only use it if you can commit to maintaining both pages.
Provide a way for users to switch between the two versions. For example some buttons, or a tabbed UI. However, you must also ensure that the switching mechanism is accessible. That might mean a full-blown ARIA tabs implementation.
I am aware of iframe title and aria-label but those seem to imply they are only to describe the contents of the iframe
Giving the iFrame a useful name is important. It's also a useful mechanism to inform screen reader users that an alternative version is available. The WAI Complex Images tutorial linked above has some similar approaches. Example: <iframe title="Google Slides Presentation of FOO BAR BAZ. Link to text version follows this frame.">. This only helps screen reader users though; you still need to make the availability of the alternative version clear to everyone.
How committed are you to using Google Docs for displaying these slides?
Any accessibility enhancements that you'll be able to implement on Google Slides won't be very good.
One way around this whole thing is to offer PDF versions of the slides that have been fully-remediated for accessibility instead of using Google Slides. That would potentially be a single solution that could be accessible to all visitors. Going this route means that you wouldn't have multiple copies of the same slides to update, which could lead to a split in content if one gets updated and the other is forgotten.
If you're really set on having the slides embedded in the page, then you could offer both formats by applying aria-hidden to the embedded iframe and then hiding the PDF links from sighted users using CSS clip, or by positioning content off-screen.

AvalonEdit - Syntax Highlighting - How do I add underline, change font size, etc?

I'm trying to build a basic, Markdown-style plain/rich text editor. (One where the text is styled inline, instead of having two panels side-by-side the way most Markdown editors do) (I'm also not going to support the full Markdown spec - no lists or tables, mainly just rich text formatting like bold, italics, underline, etc)
I have a project that consumes the AvalonEdit project (via the source code, not the Nuget package) - I got the editor all setup exactly how I want - then I started to write a syntax highlighting XSHD file when I realized that the highlighter only supports formatting like font colors, italics, bold, etc and not font size, underline and others...
How can I add additional font formatting? Will I have to write a whole new parser/highlighter/whatever? Is there an easy way to hook into and extend the existing highlighter?
I've already made a few small changes to one file in the source (TextEditor.xaml), and I'm willing to change more to make this work - though when I started I was hoping to touch the source as little as possible...
If someone could just point me in the right direction, I'd appreciate it - Thanks!
From the syntax highlighting documentation:
Among the text rendering extension points is the support for "visual
line transformers" that can change the display of a visual line after
it has been constructed by the "visual element generators". A useful
base class implementing IVisualLineTransformer for the purpose of
syntax highlighting is DocumentColorizingTransformer. Take a look at
that class' documentation to see how to write fully custom syntax
highlighters. This article only discusses the XML-driven built-in
highlighting engine.
Having read and/or scanned through that page a number of times, I couldn't fully grok this until I'd looked through the code a bit more, read some posts on the SharpDevelop forums, etc.
And if you're at the same stage I was (and can't quite wrap your head around that quote), the gist is that the editor does these two things (simplified, of course):
It generates lines of visible text (it only bothers with the lines
currently visible on the screen for performance reasons)
It then runs various transformers over said generated text, to style it in various ways
So the "XML-driven built-in highlighting engine" is only one way to find and style text - one that's meant to be a simple implementation of the more 'advanced' way, which is to build a custom text transformer, like a DocumentColorizingTransformer.
And here's some info on DocumentColorizingTransformer that you may find useful (besides the API documentation they point you to):
https://stackoverflow.com/a/23251990/859833
http://danielgrunwald.de/coding/AvalonEdit/rendering.php

What are appropriate markup languages for users with disabilities?

Suppose you're developing a web site and blind users will be a significant chunk of your target market. If the web site includes document editing functionality, what would be appropriate WYSIWYM tools? Are languages like Markdown, Textile and Wiki Formatting really accessible or are they inconvenient to blind users?
I'm a blind programmer and while I haven't used most of the languages you mention I've found that any markdown language is fairly easy to use if you have the desire to learn it. I've had no problem using either HTML or several markup languages for wiki's. Part of it will depend on how invested the users are in your site. If it's a site that will be visited infrequently or for short periods of time, it's much less likely that a user will take the time to learn the required markup whether they are blind or not. Unfortunately, I have not found an accessible JavaScript WYSIWYG editor but I find it easier to manually enter the markup so haven't looked very hard.
the first question is: how important is semantic structure? could you get away with plain text. You could do simple parsing like treating blank lines as paragraph markers, treating a series of lines which begin with * as a bulleted list, identify URLs and make them into links, etc.
As a blind developer myself, I have no problem in understanding languages like Markdown. But if it's a syntax I'm unfamiliar with, I'll only learn it if I expect to use the site very often, or care deeply about the content.
Two final thoughts come to mind: while I certainly experience some accessibility challenges using TinyMCE, you could develop something much simpler - provide less than 10 formatting options, like inserting hyperlinks, making lists, centering text, setting the style (such as heading) etc.
And lastly, when I talk to non-technical blind people, they often just write their content in Word and paste into a wiki or blog post. This sounded strange when I first heard it, but it does make sense. So an ideal solution would accept pasted in content.
In closing - it depends how important this is, and how much effort you want to expend. Maybe a Markdown editor with a live preview (like on this site), buttons for inserting simple formatting like URLs, and the ability to paste in rich text would tick all boxes :-)
On a web page, the most accessible embedded text editor for blind users is one that uses standard HTML, such as a <textarea> element, with a corresponding <label> element:
<label for="editor">Enter your text here using wiki markup:</label>
<textarea id="editor"></textarea>
If a WYSIWYM tool is built using standard accessible HTML, then blind users can easily enter text into it, with full confidence that they're entering text in the right place. Then the question becomes: Which is the better markup language? They all require memorization, but some may be more intuitive than others. One way to find out which is best would be to do some usability testing with a wide variety of target users. Also be sure to providing easy, accessible access to syntax help.
Picture yourself working in pure text 80x4 display (just open a console and resize appropriately), then use vi/emacs/ed and you'll soon realize what markup will get in the way.
Try to do as much work as possible to understand plain text, else use light markup like POD, finally things like AsciiDoc are very powerful but needs training.
I don't know about WYSIWYG/WYSIWYM tools, but I do know that complying with W3C standards (especially their HTML5 en CSS3 drafts) while writing your own editor code will help a lot.
In CSS you can specify speed and intonation of speech. In HTML you can specify alternative text (alt attribute in many elements) that screen readers are compatible with. Be sure to know when to use the abbr and the acronym elements. Use the former when you want the screen reader to read the meaning of an abbreviation and the latter when the acronym should be read as a word (e.g. ASAP, NATO and OS).
For the editor itself, I recommend creating a WYSIWYG editor that uses divs and spans. Screen readers will understand easily the structure of a document. For the current line, use a text box; for every other line that's not being edited, convert the contents immediately to valid HTML.
If you find a good tool, be sure to post it here. I'm looking for one too. :-)

What is the best way to get clean semantic XHTML from MS word documents?

Some days ago I received a rather lengthy and somewhat elaborate MS Word document, which I was asked to convert to HTML for uploading to a 3rd party’s website. My first instinct was to save the Word document as HTML and use Dreamweaver’s "Clean Up Word HTML" Command. But not only did I have to leave it running all night for Dreamweaver to finish "cleaning", but the results were far from desirable in my opinion. There were still a lot of left over inline styles, etc. that Dreamweaver just plain missed.
I approached it differently this morning and just selected the entire document in Word, copied it, and then pasted it into Dreamweaver’s Design window. Not only was it much, much faster, but the output code was much, much cleaner! I didn’t have to run the "Clean Up Word HTML" Command afterwords either.
Now I don't ever convert a Word File straight to HTML for standards reasons. Instead I cut and paste content between Word and Dreamweaver. Happily I can do the following.
If a Word heading is in the Heading 1 Style, it will become an H1 in Dreamweaver (following the Dreamweaver stylesheet). Similarly Heading 2 becomes H2, Heading 3 becomes H3 and so forth.
If the Word author wasn't that organized, you can use a shortcut like Control+1 (or Command+1) on a Mac to convert any line to an H1. Can you guess the shortcut for H2? Yes it's Control+2 or Command+2 on a Mac.
Paragraphs now cut and paste as paragraphs (with the P tag). If you don't want an HTML paragraph right then, then use Control+0 (or Command+0 on a Mac) to remove it in Dreameaver.
A new one I discovered is that some embedded images in Word may be transferred to your Dreamweaver site as "clip" images when you copy and paste from Word. So, if you have a Word file with embedded images, you may be able to extract them fairly quickly via Dreamweaver.
I also found this free tool useful http://www.textfixer.com/html/convert-word-to-html.php it works same like design view of dreamweaver, useful for people who doesn't have Dreamweaver.
but what code we will get is depends on how much properly formatted MS word document is?
WORD 2007 has also style like html?
Headings, tables, ordered and unordered lists, bold, italic , hyperlinks etc?
How to use word 2007 semantically?
To get maximum possible semantic html
on save as html option
To get maximum possible clean code to
Copy in dreamweaver design view ?
To get maximum possible clean code to
place browser based WYSIWYG HTML
Editor which comes with every CMS
Does any knows any tips, tricks, tutorial , article or advice to format MS WORD documents semantically?
Or any other best way than mine?
HTML Tidy has options for this: word-2000, bare and clean.
FCKEditor and similar try to clean up code pasted from Word.
There's (rather old now) demoroniser.
However don't expect miracles. It's unlikely that Word document will have decent structure (it theoretically could, but no Word user bothers with this). These programs can't add semantic information if it's not there.
As for semantic editing in Word – use styles. It supports headers properly (sadly not much else). You can check that in outline view.
You don't need – and shouldn't use – spaces or line breaks for indentation or space adjustment. Word has ability to explicitly control paragraphs' padding.
I've found that the OpenOffice.org html generator (Open .doc in OO and save as HTML) works better than MS's in Office.
It's still not perfect, but gives MUCH cleaner HTML that's much more sane to look at.
There is no dependable way to clean up WORD docs and make them into nice HTML. If the document has any special characters, they are often encoded as Windows charset instead of UTF-8, so they just "break" when displayed online. The list goes on. You often end up with silliness like:
<strong>hello</strong><strong>th<strong>er</strong>e</strong><i></i>
The only depandable method is to paste it into Notepad and mark it up manually. You can write a few macros to do things like insert <p></p> at paragraph breaks, but that's about it.
If there is a huge volume of material that needs to go online from Word, you may be better off using a PDF.
have you tried this? Word Cleaner
Try our Doc To HTML Converter software. It was designed specially for the purpose of producing maximum possible clear (X)HTML code, and has many customizable options. It requires MS Word to be installed on your system. It is not free, but it has trial 30-day period.

Printing barcode labels from a web page

I am working on an ASP.Net web application that must print dynamically created labels on standard Avery-style label sheets (one particular size, so only one overall layout). The labels have a variable number of lines (3-6) and may contain either lines of text or a graphic barcode image.
Our first cut, that I inherited, used monospaced fonts to reduce the formatting issues, but that did not allow enough text to the fit on the labels and the customer was dissatisfied. Basically it was formatted text.
My next version used TABLEs, DIVs, CSS, and a bit of JavaScript calculations to format the labels using proportional fonts. It still required a bit of tweaking (the user had to set their print margins correctly and turn off the print headers and footers), but it seemed to work.
However, it seems that there are some variations on how different printers render the text (WYS ain't WYG), so even though we tested on different browsers using at least two different printers (an inkjet and a laser printer), some user's labels don't line up. Slight margin variations can be adjusted by adjusting the margins on the page setup dialog, but the harder problem is that the inter-label spacing can be off by a tiny fraction of an inch, so that if the first label is pretty well centered, by the end of the page the label text and images have crawled off the top or bottom of the labels.
We are about to the point of switching to generating Word, Excel, or PDF output which is going to take quite a bit of development time and possible add extra steps in the printing process.
So, does anyone have any suggestions on how to do an HTML/CSS layout that will precisely render on different types of printers? I don't really care if the line/word breaks are a bit different, but I need to be able to predictably position the upper left corners of each label area.
Right now the labels flow down the page in a table and we have been tweaking the box model of the cells and internal DIVs to make them a uniform height. I suspect that using absolute positioning of each element may be the best answer, but that is going to be tricky as well due to the ASP.Net generation of the label elements. If I knew for sure that would work, I would rather try it than throw away everything we have to go to a different generation method.
Slight Update:
Right now I'm doing some tests with absolute positioning - setting only the top and left coordinate of a containing block element. So far there are minor variations on the offset onto the page (margins, paper alignment, etc.), but all browsers and printers tested put the elements in exactly the right spots relative to each other. I appreciate the PDF tips, but does anyone know of additional "gotchas" on using absolute positioning this way?
Update:
For the record, I rewrote the label printing portion using iTextSharp and it works perfectly - definitely the way to do this in the future...
Forget HTML and make a PDF. HTML printing is extremely variable - not just across browsers but across different versions of the same browser. PDF is a lot easier.
Even if you get it exactly right with one browser / font setup / printer / phase of the moon, it will be the most fragile thing you've ever had to maintain. No matter how long you think it will take to make a PDF (and it's not really that hard as there are some free libraries out there), HTML will ultimately take a lot more of your time. PDF readers are widely deployed and print more consistently than even Word files.
The web is not a format that is guaranteed to get consistent print results. Given the standard support for label printing with MS Word, and the relative ease of automation and generation, I would strongly recommend going that route.
I'm not aware of ANY method to get percise printing across all types of browsers, operating systems, and printers when using web content.
"precisely" and "printing" aren't two words that really work together that well. I did an OCR/OMR application a year or so ago, and even when building a PDF I saw significant differences between different print drivers and such. Because of that, my gut is to tell you that you might not have 100% success.
If CSS and layout issues don't work that well for you, you might need to resort to building the labels as images using GDI+ -- at least that way you can use GetFontMetrics() and such.
Good luck!
I had a similiar issue and the answer is you can't do it. Instead, I generated a PDF file in realtime using iTextSharp and passed that to the response.
Using SQL Server Reporting Services, I generate a PDF to send to the printer, but it can be seen as HTML on the screen using the control you can include in your web pages. There are RDLC files that are available on the internet to print to various Avery formats.
I rewrote the SharpPDFLabel code that was mentioned back in 2011 this week as I needed it to be a lot more flexible (and to work with the current iTextSharp library).
You can get it here:
https://github.com/finalcut/SharpPDFLabel
I added the ability to specify the contents of each individual label if you want (or to continue creating a sheet of identical labels too). By extending the LabelDefinition class you can specify the layout of your labels pretty easily.
I also struggled with the HTML/CSS approach due to the inconsistent printing behaviour across browsers.
I created a C# library to produce Avery Labels from ASP.NET which I hope you might find useful:
https://github.com/wheelibin/SharpPDFLabel#readme
You can add images and text to the labels, and it's easy to define more labels types.
(I use it for barcode labels, the barcode is generated as an image and then added to the label using this library.)
Cheers
Add a few options to your app that let users adjust spacing for their particular configuration. You could include this right on the label if you want, and style it away via media selectors, but you'll probably want to persist them somewhere, too.
Flash is also good method to push a printable like a label albeit a little more complex to implement and maintain. In most cases it displays much quicker than a PDF and you can embed it into the design of the page and simply add a "Print" button within the flash.
I did this several years ago when we were using HTML and PDF to generate confirmation receipts. HTML is "ok" but is at the mercy of the end users web browser so we quickly dumped that method. PDF's are good as long as they have a PDF reader, which to our surprise a lot of our customers did not. So that was dumped as well after we switched to a FLASH version using a simple flash movie that included a few dynamic text areas and a "print" button. I communicated the data between the page and flash using a few flash vars. You can also use web service.
When I need something more than just simple text I use the free community edition of the PDF Generator component from DynamicPDF.com. It works great and is very quick.
I just went through the same thing. Ended up switching and making a short little JSF app (running on Glassfish) that uses JasperReports to print directly to the lable printer. Push button, instant label at the printer, don't even have to view it on-screen if you don't want to since Jasper can directly output to printer (as well as PDF in browser).

Resources