Extra blank page when converting HTML to PDF using abcPDF

Extra blank page when converting HTML to PDF using abcPDF - asp.net

I have an HTML report, with each print page contained by a <div class="page">. The page class is defined as
width: 180mm;
height: 250mm;
page-break-after: always;
background-position: centre top;
background-image: url(Images/MainBanner.png);
background-repeat: no-repeat;
padding-top: 30mm;
After making a few changes to my report content, when I call abcPDF to convert the report to PDF, suddenly I'm getting a blank page inserted after every real report page. I don't want to roll back the changes I've just made to remove this problem, so I'm hoping someone may know why the extra pages are being inserted.

I have experienced the same exact problem. the empty page is due to the page-break-after: always; in the CSS. Not just ABCpdf but also the printed will spit out an extra page. So I used the following code to eliminate the last page:
MyDoc.Delete(MyDoc.Page);
This however lead to a different kind of a problem. On development server, which has IE 8 I get an extra blank page and on production where I have IE6, I get no extra blank page. So I have emailed the support team at websupergoo to show me a way to look for a blank page. The idea is to iterate through a pdf and identify all blank pages and delete them using above logic.
And I second Jakkwylde's opinion. Websupergoo folks are extremely helpful and prompt in responding. I had another problem getting ABCpdf to work under 64 bit and had spent almost a day trying to figure it out. They provided me multiple scenarios which I could try out. Their support was right on the money and I got my app up and running in minutes.

protected void RemoveBlankPages(Doc pdf)
{
for (int i = pdf.PageCount; i > 0; i--)
{
pdf.PageNumber = i;
//get the pdf content
string textContent = pdf.GetText("Text");
//delete the page if it is blank
if (string.IsNullOrEmpty(textContent))
pdf.Delete(pdf.Page);
}
}

I've found abcPDF to be strange and unpredictable. That being said, what may be happening is that the combination of the page size and page-break-after may be the culprit. Reduce your page height and/or remove the page break.

One thing worth revisiting is the validity of your HTML markup if you are using the AddImageUrl method. Instances where the rendered PDF is not as expected can result from bad markup, busted tags, etc.
For what it's worth, WebSuperGoo has excellent support and respond great when you encounter anomalies. Often they can advise a work around or provide alternatives to your implementation if you send them your source code.

Kush is correct in that "I have experienced the same exact problem. the empty page is due to the page-break-after: always; in the CSS. Not just ABCpdf but also the printed will spit out an extra page."
If a div has "page-break-after:always" IE will literally always start a new page, and if nothing is added it will just print blank. Firefox does not.
abcpdf uses IE8s rendering engine, and as such makes a blank page. For purposes of the OP, just using an explicit height should solve the problem, and the engine will insert the page breaks for you.
I am trying to solve a similar issue, where I can't set the height explicitly because sometimes the content may take 2 pages. (Each page corresponds to a person, and each person should start on a new page when printed). I emailed abcpdf as well to see if they have a hack fix to detect the empty page, but was curious if anyone knows how to fix the underlying problem and css hack IE8 so as to make it not print the final page if empty. I'm guessing it's not possible, but wanted to make sure I'm not missing something obvious.

The AddImageURL() method of ABCPDF is loosely bind method which doesn't render html tightly within required area which causes new blank page.
try to use AddImageHTML() method to convert your desired HTML into PDF..
Doc theDoc = new Doc();
theDoc.Page = theDoc.AddPage();
int theID = 0;
theDoc.SetInfo(0, "CheckBgImages", "1");
theDoc.SetInfo(0, "RenderDelay", "5000");
theDoc.HtmlOptions.Engine = EngineType.MSHtml;
theID = theDoc.AddImageHtml(HTML);
while (true)
{
if (!theDoc.Chainable(theID))
break;
theDoc.Page = theDoc.AddPage();
theID = theDoc.AddImageToChain(theID);
}
for (int i = 0; i <= theDoc.PageCount; i++)
{
theDoc.PageNumber = i;
theDoc.Flatten();
}
theDoc.Save(HttpContext.Current.Server.MapPath(Path));
theDoc.Clear();
It will always give accurate results.

To avoid page breaking in the last page, I did something like this and it worked.
I made sure that the last page didn't have the page-break-after: always, this can be done with any templating or front-end framework like angularJS, but for this example I use blade templating (but any php will do...)
#if ($last_page)
<div class='footer last-page'>
#else
<div class='footer'>
and then I have this in my style sheet
.footer {
page-break-after:always;
}
.last-page {
page-break-after:avoid;
}

We had the same issue in production environment only but not in test environment. We only had page-brek-after used at multiple places in the html.
Fix for first issue: I spotted the issue by removing the page-brek-after attributes one by one and this finally gave me the DIV section where page break was causing by some of it's element.
I fixed the height of each elements inside the DIV and this finally fixed my issue without removing the page-break-after attribute.
Fix for similar issue: If you have a custom hard coded footer, make sure to check by increasing/decresing it's height and margine.

I had this same thing happen with html to pdf where abcpdf was adding a blank page with nothing but a footer on it before the rest of the content (as page 1). This would occur when my content included a table surrounded by a div where the div height was height:auto followed by a page-break-before: always. It only happened when the table data contained "", string.Empty, or a single line of text. If the table data contained 2 lines or more of text, the issue did not present.
I solved it by adding a min-hight: 1in style to the div that had height: auto.

Related

Selectively hiding redlinks in MediaWiki

I've got a template designed to transclude content into the mainspace from a page in another namespace; it's used to aggregate a large number of pages into a single table. Its basic structure is this:
Template:Paget
<div class="plainlinks">
<span style="font-weight:normal; font-size:85%;">[[{{fullurl:{{{1|}}} {{{2|}}}.{{{3|}}}}} {{{2|}}}]]</span> {{#if: {{{blank|}}} | [No text] | {{{{{1|}}} {{{2|}}}.{{{3|}}}}} }}
</div>
So when you enter {{paget|page:cod.icon. 393 I|100r|jpg}} it transcludes the content of Page:Cod.icon. 393 I 100r.jpg and also labels it with a link back to that page that opens in a new tab. Very simple.
Aggregation pages are often constructed before all of the content exists, and in that case the template produces a redlink in place of the page content. I want to change this behavior so that it simply displays nothing when no page exists.
There are three main solutions, an {{#ifexist}} function, a {{#dpl}} function, and an {{#ifeq}} function comparing the output to a redlink url. All of these are unworkable for various reasons, but mostly because they slow the page loading way down (sometimes we're transcluding thousands of one-paragraph pages).
So I turned to a CSS solution, and created this rule in Mediawiki:Common.css:
.hidden-redlink > a.new,
.hidden-redlink a.new {
display: none;
visibility: hidden;
}
Then I added the class to the template, i.e. <div class="plainlinks hidden-redlink"></div>. This produced no result. I also tried wrapping just the transcluded portion in a <span class="hidden-redlink"></span>, and just adding the class to the aggregation table itself, but those also failed to produce any result. Wrapping it directly in <span style="display:none;"></span> hides the link, but obviously also hides the transcluded content.
I've rejiggered the CSS rules and class assignment every way I can think, but come up empty. Is there some piece of the puzzle I'm missing?
MediaWiki: 1.21.2
PHP: 5.3.10-1ubuntu3.9 (apache2handler)
MySQL: 5.5.29-0ubuntu0.12.04.2

Well, I tried doing something similar, getting a redlinked page through transcluding an uncreated help page by doing {{help:doesn't exist}} inside a div with class="hidden-redlink" and the following CSS worked to hide the red link:
.hidden-redlink a.new {
display:none !important;
}
To be honest with you, I don't quite understand why you are using such a long piece of code to get your transclusions, but yet again I don't recognise that namespace you're getting your code from, so I probably just don't use the software to the level of complexity you're pushing it to. Are there any problems with transcluding using {{namespace:pagename}} (obviously changing the words namespace and pagename to the namespace and page name respectively) instead of your current long piece of code which might be throwing things out of whack?

SVG used as background-image loses embed images?

I'm using svg's as background-images for a responsive layout that recreates a complex brochure in online format.
Everything works perfectly for vector objects however if I embed images on the svg they don't appear on the background.
The strangest thing is if I check the svg on its own, the images are there, so this is kind of annoying!
Does anyone know if it has something to do with the svg configuration or something like that?
How can I solve this and still be able to use the svg as a background-image (background-size:cover rules!)?
Oh I should add that I've seen this "phenom" happen on chrome in my mac, if it's browser specific please say so!
The svg in question is this: http://nonstoptrip.limsomnium.com/img/fundoinfo1.svg
Unfortunately I'm not much of a jsfiddler so I couldn't create something to show you all.
Thanks in advance!

The images will appear if you load the svg at the document level. You can remove this element later and the images won't disappear. You can set it to load into a 1px x 1px element...
function loadSVG(svgpath){
if( /webkit/gi.test(navigator.userAgent.toLowerCase()) ){
var obj = document.createElement("object");
obj.setAttribute("type", "image/svg+xml");
obj.setAttribute("data", svgpath);
obj.setAttribute("width", "1");
obj.setAttribute("height", "1");
obj.setAttribute("style", "width: 0px; height: 0px; position: absolute;visibility : hidden");
document.getElementsByTagName("html")[0].appendChild(obj);
}
}
window.onload = function(){
loadSVG("../img/mySVG.svg");
}
The author of this technique is Dirk Weber, here are more details: http://www.eleqtriq.com/2012/01/enhancing-css-sprites-and-background-image-with-svg/

Webkit simply doesn't support this yet I'm afraid. https://bugs.webkit.org/show_bug.cgi?id=63548 is tracking this issue.

#Duopixel, using just "image/xml" for the type attribute also works (I've only tested it in chrome) and doesn't cause a "Resource interpreted as Document but transferred with MIME type image/svg+xml" error (while "image/svg+xml" does). Hope this helps get rid of that annoying error in the console you may be getting!

Table widths hugging elements

I'm working on a site for one of my in-laws, who insisted on using Joomla so that he could update the content himself in the future. That being said, one of the things that I developed for him was a character generator for a game that he and his brothers created. That is working fine. The issue is that they want a way to print the final sheets off when finished, and to do so without all of the menus, banners, etc. I was told that the simplest way to handle that was to pass ?tmpl=component in the URL to strip everything out, which is also working.
The problem that I am running into is that the CSS in the Joomla template is causing the tables to behave in a way that I cannot figure out how to correct. The page consists of nested tables, with widths defined in terms of % (currently), but it seems that the specifically defined widths are being ignored in favor of the widths hugging the largest cell. To see what I'm talking about:
The trouble page: http://www.basementgames.com/tools/character-generator.html?s=36&tmpl=component
What the page should look like: http://www.basementgames.com/char_gen.php?s=36
This is the exact same code in both places, with the first being inside Joomla, and thus subject to the CSS of the template. I don't know much about CSS, and I'm driving myself crazy trying to figure out what to override to make the first example look like the second. Any thoughts?

You can run this script on the page and it will remove the offending print.css file on page load:
<script>
if (window.location.href.indexOf('/character-generator.html') > 0 &&
window.location.href.indexOf('tmpl=component') > 0) {
(function(){
var links = document.getElementsByTagName('link');
for (var i = 0; i < links.length; i++) {
if (links[i].href.indexOf('/print.css') > 0) {
links[i].href = '';
}
}
})();
}
</script>
http://jfcoder.com/test/character-generator.html?tmpl=component
Note, it only runs on the character-generator.html page with a tmpl=component in the query string. It also has to run after the link elements, as well, so it should be inserted into the body tag or at the very bottom of the head tag. Since you have MooTools available, you could also use DOMReady().

How to get rid of copy & paste text styling in ajax html editor

I am using ajax html editor for news description page. When I copy paste the stuff from word or internet , it copies the styling of that text , paragraph etc which overcomes the default class style of the html editor textbox, What I want is to get rid of inline style like below but not the html which have i want to keep that into paragraph
<span id="ContentPlaceHolder1_newsDetaildesc" class="newsDetails"><span style="font-family: arial, helvetica, sans; font-size: 11px; line-height: 14px; color: #000000; "><strong>Lorem Ipsum</strong> is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.<BR /> It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</span></span></p>
#left_column .newsDetails span[style]
{
font-family: Arial !important;
font-size: small !important;
font-weight: normal !important;
color: #808080 !important;
}

First, be aware that the HTML you receive by pasting from Word (or any other HTML source) is going to vary wildly depending on the source. Even different versions of Word will give you radically different input. If you design some code that works perfectly on content from the version of MS Word that you have, it may not work at all for a different version of MS Word.
Also, some sources will paste content that looks like HTML, but is actually garbage. When you paste HTML content into a rich text area in your browser, your browser has nothing to do with how that HTML is generated. Do not expect it to be valid by any stretch of your imagination. In addition, your browser will further munge the HTML as it is inserted into the DOM of your rich text area.
Because the potential inputs vary so much, and because the acceptable outputs are difficult to define, it is hard to design a proper filter for this sort of thing. Further, you cannot control how future versions of MS Word will handle their HTML content, so your code will be difficult to future-proof.
However, take heart! If all the world's problems were easy ones, it would be a pretty boring place. There are some potential solutions. It is possible to keep the good parts of the HTML and discard the bad parts.
It looks like your HTML-based RTE works like most HTML editors out there. Specifically, it has an iframe, and on the document inside the iframe, it has set designMode to "on".
You'll want to trap the paste event when it occurs in the <body> element of the document inside that iframe. I was very specific here because I have to be: don't trap it on the iframe; don't trap it on the iframe's window; don't trap it on the iframe's document. Trap it on the <body> element of the document inside the iframe. Very important.
var iframe = your.rich.text.editor.getIframe(), // or whatever
win = iframe.contentWindow,
doc = win.document,
body = doc.body;
// Use your favorite library to attach events. Don't actually do this
// yourself. But if you did do it yourself, this is how it would be done.
if (win.addEventListener) {
body.addEventListener('paste', handlePaste, false);
} else {
body.attachEvent("onpaste", handlePaste);
}
Notice my sample code has attached a function called handlePaste. We'll get to that next. The paste event is funny: some browsers fire it before the paste, some browsers fire it afterwards. You'll want to normalize that, so that you are always dealing with the pasted content after the paste. To do this, use a timeout method.
function handlePaste() {
window.setTimeout(filterHTML, 50);
}
So, 50 milliseconds after a paste event, the filterHTML function will be called. This is the meat of the job: you need to filter the HTML and remove any undesireable styles or elements. You have a lot to worry about here!
I have personally seen MSWord paste in these elements:
meta
link
style
o:p (A paragraph in a different namespace)
shapetype
shape
Comments, like <!-- comment -->.
font
And of course, the MsoNormal class.
The filterHTML function should remove these when appropriate. You may also wish to remove other items as you deem necessary. Here is an example filterHTML that removes the items I have listed above.
// Your favorite JavaScript library probably has these utility functions.
// Feel free to use them. I'm including them here so this example will
// be library-agnostic.
function collectionToArray(col) {
var x, output = [];
for (x = 0; x < col.length; x += 1) {
output[x] = col[x];
}
return output;
}
// Another utility function probably covered by your favorite library.
function trimString(s) {
return s.replace(/^\s\s*/, '').replace(/\s\s*$/, '');
}
function filterHTML() {
var iframe = your.rich.text.editor.getIframe(),
win = iframe.contentWindow,
doc = win.document,
invalidClass = /(?:^| )msonormal(?:$| )/gi,
cursor, nodes = [];
// This is a depth-first, pre-order search of the document's body.
// While searching, we want to remove invalid elements and comments.
// We also want to remove invalid classNames.
// We also want to remove font elements, but preserve their contents.
nodes = collectionToArray(doc.body.childNodes);
while (nodes.length) {
cursor = nodes.shift();
switch (cursor.nodeName.toLowerCase()) {
// Remove these invalid elements.
case 'meta':
case 'link':
case 'style':
case 'o:p':
case 'shapetype':
case 'shape':
case '#comment':
cursor.parentNode.removeChild(cursor);
break;
// Remove font elements but preserve their contents.
case 'font':
// Make sure we scan these child nodes too!
nodes.unshift.apply(
nodes,
collectionToArray(cursor.childNodes)
);
while (cursor.lastChild) {
if (cursor.nextSibling) {
cursor.parentNode.insertBefore(
cursor.lastChild,
cursor.nextSibling
);
} else {
cursor.parentNode.appendChild(cursor.lastChild);
}
}
break;
default:
if (cursor.nodeType === 1) {
// Remove all inline styles
cursor.removeAttribute('style');
// OR: remove a specific inline style
cursor.style.fontFamily = '';
// Remove invalid class names.
invalidClass.lastIndex = 0;
if (
cursor.className &&
invalidClass.test(cursor.className)
) {
cursor.className = trimString(
cursor.className.replace(invalidClass, '')
);
if (cursor.className === '') {
cursor.removeAttribute('class');
}
}
// Also scan child nodes of this node.
nodes.unshift.apply(
nodes,
collectionToArray(cursor.childNodes)
);
}
}
}
}
You included some sample HTML that you wanted to filter, but you did not include a sample output that you would like to see. If you update your question to show what you want your sample to look like after filtering, I will try to adjust the filterHTML function to match. For the time being, please consider this function as a starting point for devising your own filters.
Note that this code makes no attempt to distinguish pasted content from content that existed prior to the paste. It does not need to do this; the things that it removes are considered invalid wherever they appear.
An alternative solution would be to filter these styles and contents using regular expressions against the innerHTML of the document's body. I have gone this route, and I advise against it in favor of the solution I present here. The HTML that you will receive by pasting will vary so much that regex-based parsing will quickly run into serious issues.
Edit:
I think I see now: you are trying to remove the inline style attributes themselves, right? If that is so, you can do this during the filterHTML function by including this line:
cursor.removeAttribute('style');
Or, you can target specific inline styles for removal like so:
cursor.style.fontFamily = '';
I've updated the filterHTML function to show where these lines would go.
Good luck and happy coding!

Here is a potential solution that strips out the text from the HTML. It works by first copying the text as HTML into an element (which probably should be hidden but is shown for comparison in my example). Next, you get the innerText of that element. Then you can put that text into your editor wherever you like. You will have to capture the paste event on the editor, run this sequence to get the text, and then put that text wherever you like in your editor.
Here is a fiddle for an example of how to do this: Getting text from HTML

If you are using Firefox, you can install this extension: https://addons.mozilla.org/en-US/firefox/addon/extended-copy-menu-fix-vers/. It allows you to copy the text from any website without the formatting.

Generally when supporting HTML editing by end users I have opted for leveraging one of a number of solid client-side HTML editing controls that already have the requisite functionality built in to handle stuff like this. There are a number of commercial versions, such as from Component Art, as well as some great free/open source versions, such as CKEditor.
All of the good ones have solid paste-from-Word support to strip out/fix this excessive CSS. I would either just leverage one (the easy way) or see how they do it (the hard way).

I always get this kind of problem, it is interesting one. Well the way I do is very simple, just open Notepad in windows and paste your text into Notepad and copy over to your AJAX text editor. It will strip all your text styling.
:)

From what I understand from your question, you are using a WYSIWYG editor. And when copying and pasting text from other web pages or word documents you get some ugly html with inline-styles etc.
I would suggest that you don't bother at all to fix this, because it's a mess to deal with this issue cross-browser. If you really want to fix it though I would recommend using TinyMCE which got this exact behavior that you want.
You can try it in action by visiting http://tinymce.moxiecode.com/tryit/full.php and just copy some text into the editor and then submit it all to see the generated html. It's clean.
TinyMCE is probably the best WYSIWYG editor that you'll find imo. So instead of building something on your own, just use it and customize it to your exact needs.

CSS: Start numbering pages with 2 instead of 1

In CSS, with:
#page { #top-right { content: "Page " counter(page) " of " counter(pages); } }
I can have page numbers displayed at the top of every page when the page is printed. This works great. But now, how can I make it so the page number starts with 2 instead of 1? Can I do that by modifying the CSS rule above?

If you are using Flying Saucer (which was my case), use the following CSS:
table { -fs-table-paginate: paginate; }
It works like a charm. And Flying Saucer rocks :). Really highly recommended.

Try:
#page {
counter-increment: page;
counter-reset: page 1;
#top-right {
content: "Page " counter(page) " of " counter(pages);
}
}
using page 1 will reset the starting point of the counter. You can use any integer to start counting from. The default is 0.

After playing with Flying Saucer a bit, I guess there's no way to do this with CSS (or it's a very complicated one), as "page"/"pages" seem to be internal CSS variables. Perhaps it gets better with CSS 3, there seems to be a calc() function, so counter(calc(page+1)) could perhaps work...
But there is another way to get the PDF starting with page 2. You can add a blank first page to the PDF by adding this line to the xhtml file:
<h1 style="page-break-before:always"></h1>
Then you can either print only pages 2-... of the PDF when using a printer or remove the first page from the PDF with some PDF editor.

Have you seen the CSS documentation about counters? see here It seems to me that you can call the counter-reset. By default counters are set to 0. If in your Body tag you did a "content-reset: page 1;" then it should force the first page to start at 2 instead of 1.

Posting this for someone else viewing this page. You can also look into another stackoverflow post provided below. This has worked for me.
flying saucer - page count with css
.seq-start{
-fs-page-sequence: start;
}

Don't know if this works, but why don't you try counter(page+1)?

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Extra blank page when converting HTML to PDF using abcPDF - asp.net

protected void RemoveBlankPages(Doc pdf) { for (int i = pdf.PageCount; i > 0; i--) { pdf.PageNumber = i; //get the pdf content string textContent = pdf.GetText("Text"); //delete the page if it is blank if (string.IsNullOrEmpty(textContent)) pdf.Delete(pdf.Page); } }

I've found abcPDF to be strange and unpredictable. That being said, what may be happening is that the combination of the page size and page-break-after may be the culprit. Reduce your page height and/or remove the page break.

Related

Selectively hiding redlinks in MediaWiki

SVG used as background-image loses embed images?

Table widths hugging elements

How to get rid of copy & paste text styling in ajax html editor

CSS: Start numbering pages with 2 instead of 1

Categories

Resources