Collation with CSS print stylesheets? - css

I've written a app that, among other things, lets users export data in a print-friendly format. It does this by generating a HTML file that contains print-related CSS (e.g. the #page media query). The resulting exported file is pure HTML, CSS and Javascript, no fancy frameworks.
We've also got a printer at work that automatically staples jobs together. So if you print 10 copies of a document that has 3 pages, it'll print 3 pages, staple those together, then repeat.
The HTML file the app exports has about 1,500 records in it, grouped by a field (e.g. Username). I'm using the page-break-before CSS property to force a page break at the end of each grouped section of data, but I'm wondering if there's a way to tell the printer to "end" a document there and start a new one so it'll be stapled?
Basically splitting one file up into several individual "documents", while only sending one job to the printer.
I'm pretty sure there isn't, and the solution is to just print the whole document and manually separate and staple the documents together, which I'm happy to do and will no doubt end up doing, but now I'm curious if there's a way to do a "soft end" to a printed page, in the same way that you can force a page break using CSS.

No, there is no way to instruct the printer to staple a job together based on page breaks specified in the HTML/CSS of the exported file.
You might be able to achieve this with a generated PDF, or rather multiple PDF that you send at once to the printer.

Related

Separate stylesheets for printing and PDFing?

Using CSS media query for print stylesheets can be a great way to make websites more print-friendly:
p { color: grey; }
#media print {
p { color: black; }
}
For one project, we find that creating PDF files from webpages to send to clients is very efficient (better than starting from scratch).
For PDFing purposes, we've applied a few simple CSS rules via #media print to make the webpages more friendly in that format — remove navigation, certain footer elements, etc.
(Some people may want to download and print the PDFs at a later date, and that's fine. There will also be a link on each page to access the PDF we've created.)
However, it seems that for the general public's printing needs, it's advised to create stylesheets without much formatting: remove backgrounds, increase contrast, optimize font-size, and so on.
We haven't done that yet. Can there be more than one set of print rules — one applied when PDFing, and the other when printing to a printer? Or if not, what workarounds are there?
When the end user decides to print, they should have the option to also print to PDF if they have some type of PDF software already installed on their machine. For you to determine that for them, and know how it's configure on their machine and what software company they are going though, and version, is probably not the best idea for you to determine from a CSS perspective. I would personally simply rely on your print style-sheet as a main source for both avenues.
Some people will want to print these PDFs, and that's fine...
True, and if they really want to, they should already know that they have a PDF software app already installed on their machine. Let them choose how they should print, when they get to the print-preview window. But you can perhaps market/brand a creative help guide on how to print to PDF and where to they can get a free PDF software app to download.
Regards
I haven't found a way to specify via CSS if the browser is printing to a physical printer, or to a PDF format. (In addition to various browsers, it would also have to compatible on multiple operating systems.)
There are two solutions I've come up with:
Have two print stylesheets, and when creating PDFs, include the PDFing one, and/or comment out the one for the public. (Or have one stylesheet, and comment out relevant lines of code.)
Have two versions of the HTML document, e.g. index.php and pdf.html, each with different print rules. In this case, you would access the latter when you want to create a PDF, and the former would be the default. The issue will be managing two sets of content, so unless you can automate it, this wouldn't be advised.

How do I output HTML form data to PDF?

I need to collect data from a visitor in an HTML form and then have them print a document with the appropriate fields pre-populated. They'll need to have a couple of signatures on the document, so it has to be printed.
The paper form already exists, so one idea was to scan it in, with nothing filled out, as an image. I would then have the HTML form data print out using CSS for positioning and using the blank scanned form as a background image.
A better option, I would think, would be to automatically generate the PDF with this data, but I'm not sure how to accomplish either.
Suggestions and ideas would be greatly appreciated! =)
I would have to respectfully disagree with Osvaldo. Using CSS to align on a printed document would take ages to do efficiently in the aspect of cross-browser integration. Plus, if Microsoft comes out with a new browser, you're going to have to constantly update for the new use in browsers.
If you know any PHP (Which, if you know JavaScript and HTML, basic PHP is very simple), here's a good library you can use, FDPF:
Thankfully, PHP doesn't deprecate a whole lot of methods and the total code is less than 10 lines if you have to go in and change things around.
You can control printed documents acceptably well with CSS, so I would suggest you to try that option first. Because it's easier.
This is actually a great php library for converting HTML to PDF documents http://code.google.com/p/dompdf/ there are many demo's available on the site
XSL-FO is what I would recommend. XSL-FO (along with XSLT and XPath) is a sub-standard of XSL that was designed to be an abstract representation of a formatted document (that contains, text, graphic elements, fonts, styles, etc).
XSL-FO documents are valid xml documents, and there exist tools and apis that allow you to convert an XSL-FO documet to MS Word, PDF, RTF, etc. Depending on the technology you use, a quick google search will tell you what is available.
Here are a few links to help you get started with XSL-FO:
http://en.wikipedia.org/wiki/XSL_Formatting_Objects
http://www.w3schools.com/xslfo/xslfo_intro.asp
http://www.w3.org/TR/xsl11/

Exporting web applications pages to excel/pdf

We have a requirement to export different pages of our I.E. only web application to Excel/pdf documents.
The pages have graphics/grids/text, etc...They should also be printable as well.
I heard weSuperGoo mentioned, but have no experience with it.
I am in the research phase now and I wonder what tools/technologies/methods are out there for this task?
I would appreciate any pointers/direction.
Thanks!
We have used ABCpdf by WebSupergoo which includes the ability to retrieve a URL and convert it to a PDF (see documentation). This means all we need to do is provide a suitably formatted version of the page in plain old HTML and point ABCpdf at this URL and it will convert everything automatically for us - beats having to build the page up manually element by element.
I should add that this isn't perfect - we have had some issues relating to matters like paging (very difficult to page HTML when you need things like headers and footers on every page) but for simple uses it's up to the job.
You can get ABCpdf free if you're prepared to link to them.
To export to Excel, you can simply just export a HTML table as HTML and name the file whatever.xls. Excel will automatically convert the HTML table to a spreadsheet. I've been using that trick for many, many years. If you're using something like a DataGrid, then that makes it even easier to just write out the contents of the control to a HTML file (or string) and then return it as a .xls file.
For PDF, I recommend iTextSharp. It's really easy to use and has worked well for me for many years. You can use the iText (Java version) documentation or the iTextSharp documentation, the methods and classes are the same (maybe capitalization is different, but you should be able to figure it out.)
Links
http://itextsharp.sourceforge.net/

What's the best "file format" for saving complete web pages (images, etc.) in a single archive? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I'm working on a project which stores single images and text files in one place, like a time capsule. Now, most every project can be saved as one file, like DOC, PPT, and ODF. But complete web pages can't -- they're saved as a separate HTML file and data folder. I want to save a web page in a single archive, and while there are several solutions, there's no "standard". Which is the best format for HTML archives?
Microsoft has MHTML -- basically a file encoded exactly as a MIME HTML email message. It's already based on an existing standard, and MHTML as its own was proposed as rfc2557. This is a great idea and it's been around forever, except it's been a "proposed standard" since 1999. Plus, implementations other than IE's are just cumbersome. IE and Opera support it; Firefox and Safari with a cumbersome extension.
Mozilla has Mozilla Archive Format -- basically a ZIP file with the markup and images, with metadata saved as RDF. It's an awesome idea -- Winamp does this for skins, and ODF and OOXML for their embedded images. I love this, except, 1. Nobody else except Mozilla uses it, 2. The only extension supporting it wasn't updated since Firefox 1.5.
Data URIs are becoming more popular. Instead of referencing an external location a la MHTML or MAF, you encode the file straight into the HTML markup as base64. Depending on your view, it's streamlined since the files are right where the markup is. However, support is still somewhat weak. Firefox, Opera, and Safari support it without gaffes; IE, the market leader, only started supporting it at IE8, and even then with limits.
Then of course, there's "Save complete webpage" where the HTML markup is saved as "savedpage.html" and the files in a separate "savedpage_files" folder. Afaik, everyone does this. It's well supported. But having to handle two separate elements is not simple and streamlined at all. My project needs to have them in a single archive.
Keeping in mind browser support and ease of editing the page, what do you think's the best way to save web pages in a single archive? What would be best as a "standard"? Or should I just buckle down and deal with the HTML file and separate folder? For the sake of my project, I could support that, but I'd best avoid it.
My favourite is the ZIP format. Because:
It is very well sutied for the purpose
It is well documented
There a a lot of implementations available for creating or reading them
A user can easily extract single files, change them and put them back in the archive
Almost every major Operating System (Windows, Mac and most linux) have a ZIP program built in
The alternatives all have some flaw:
With MHTMl, you can not easily edit.
With data URI's, I don't know how difficult the implementation would be. (With ZIP, even I could do it in PHP, 3 years ago...)
The option to store things as seperate files just has far too many things that could go wrong and mess up your archive.
It is not only question of file format. Another crucial question is what exactly you want to store? Is it:
store whole page as it is with all referenced resources - images,
CSS and javascript?
to capture page as it was rendered at some point in time; a static
image of some rendered state of web page DOM?
Most current "save page as" functionality in browser, be it to MAF or MHTML or file+dir, attempts the first way. This is ultimately flawed approach.
Don't forget web pages there days are rather local applications then a static document you can easily store. Potential issues:
one page is in fact several pages build dynamically by JS, user interaction is needed
to get it to desired state
AJAX applications can do remote communication with remote service rendering it
unusable for offline view.
Hidden links in javascript code. Such resource is then not part of stored page.
Even parsing JS code may not discover them. You need to run the code.
Even position of basic html elements may be recomputed may be computed dynamically by
JS and it is not always possible/easy to recreate it locally.
You would need some sort of JS memory dump and load this to get page to desired state
you hoped to store
And many many more issues...
Check Chrome SingleFile extension. It stores a web page to one html file with images inlined using already mentioned data URIs. I haven't tested it much so I cannot say how well it handles "volatile" ajax pages.
PDFs are supported on nearly all browsers on nearly all platforms and store content and images in a single file. They can be edited with the right tools. This is almost definitely not ideal, but it's an option to consider.
Use a zip file.
You could always make a program/script that extracts the zip file to a temp directory and loads the index.html file in your browser. You could even use an index.ini/txt file to specify the file that should be loaded when extracting.
Basically, you want something like the Mozilla Archive format, but without the unnecessary rdf crap just to specify what file to load.
MHT files are good, but they usually use base64 to embed files, which will make the file size bigger than it should be (data URIs are the same way). You can add attachments as binary, but you'll have to manually do that with a hex editor or create a tool and support for it by clients might not be as good.
Of course, if you want to use what browsers generate, MHT (Opera and IE at least) might be better.
i see no excuse to use anything other than a zipfile
Well, if browser support and ease of editing are the biggest concerns I think you are stuck with the file+directory approach unless you are willing to provide an editor for the single file format and live with not very good support in browsers.
You can create a single file by compressing the contents. You can also create a parent directory to ease handling.
The problem is that html is bottoms up not top down. Look at your file name which saved on my box as "What's the best "file format" for saving complete web pages (images, etc.) in a single archive? - Stack Overflow.html"
Just add a '|' and one has trouble doing copy and paste backups to a spare drive. In the end you end up. chopping the file name in order to save it. Dozens/ perhaps hundreds of identical index.html or index.php are cluttering my drives.
The partial solution is to write you own CMS and use scripts to map all relevant files to a flat file database - then use fileName, size, mtime and md5 to get a unique Id for each file. Create a flat file index permitting 100k or 1000k records. The goal is to write once and use many times. So you need a real CMS you need a unique id based on content (eg index8765432.html) that goes in your files_archive. Ditto for the others. Then you can non-destructively symlink from the saved original html to the files_archive and just recreate the file using a php or alternative script if need be. Don't know if it will work as I'm at the same point you're at - maybe in a week will know for sure. The more useful approach is to have a top down structure based on your business or personal wants and related tasks. So your files might be organized top down but external ones bottom up to preserve the original content. My interest is in Web 3.0 services and the closer you get to machine to machine interaction the greater the need to structure the information. Maybe time to rethink the idea of bundling everything into a single file. So you have hundreds of main.css why bundle when a top down solution might let you modify one file instead of hundreds.

Printing barcode labels from a web page

I am working on an ASP.Net web application that must print dynamically created labels on standard Avery-style label sheets (one particular size, so only one overall layout). The labels have a variable number of lines (3-6) and may contain either lines of text or a graphic barcode image.
Our first cut, that I inherited, used monospaced fonts to reduce the formatting issues, but that did not allow enough text to the fit on the labels and the customer was dissatisfied. Basically it was formatted text.
My next version used TABLEs, DIVs, CSS, and a bit of JavaScript calculations to format the labels using proportional fonts. It still required a bit of tweaking (the user had to set their print margins correctly and turn off the print headers and footers), but it seemed to work.
However, it seems that there are some variations on how different printers render the text (WYS ain't WYG), so even though we tested on different browsers using at least two different printers (an inkjet and a laser printer), some user's labels don't line up. Slight margin variations can be adjusted by adjusting the margins on the page setup dialog, but the harder problem is that the inter-label spacing can be off by a tiny fraction of an inch, so that if the first label is pretty well centered, by the end of the page the label text and images have crawled off the top or bottom of the labels.
We are about to the point of switching to generating Word, Excel, or PDF output which is going to take quite a bit of development time and possible add extra steps in the printing process.
So, does anyone have any suggestions on how to do an HTML/CSS layout that will precisely render on different types of printers? I don't really care if the line/word breaks are a bit different, but I need to be able to predictably position the upper left corners of each label area.
Right now the labels flow down the page in a table and we have been tweaking the box model of the cells and internal DIVs to make them a uniform height. I suspect that using absolute positioning of each element may be the best answer, but that is going to be tricky as well due to the ASP.Net generation of the label elements. If I knew for sure that would work, I would rather try it than throw away everything we have to go to a different generation method.
Slight Update:
Right now I'm doing some tests with absolute positioning - setting only the top and left coordinate of a containing block element. So far there are minor variations on the offset onto the page (margins, paper alignment, etc.), but all browsers and printers tested put the elements in exactly the right spots relative to each other. I appreciate the PDF tips, but does anyone know of additional "gotchas" on using absolute positioning this way?
Update:
For the record, I rewrote the label printing portion using iTextSharp and it works perfectly - definitely the way to do this in the future...
Forget HTML and make a PDF. HTML printing is extremely variable - not just across browsers but across different versions of the same browser. PDF is a lot easier.
Even if you get it exactly right with one browser / font setup / printer / phase of the moon, it will be the most fragile thing you've ever had to maintain. No matter how long you think it will take to make a PDF (and it's not really that hard as there are some free libraries out there), HTML will ultimately take a lot more of your time. PDF readers are widely deployed and print more consistently than even Word files.
The web is not a format that is guaranteed to get consistent print results. Given the standard support for label printing with MS Word, and the relative ease of automation and generation, I would strongly recommend going that route.
I'm not aware of ANY method to get percise printing across all types of browsers, operating systems, and printers when using web content.
"precisely" and "printing" aren't two words that really work together that well. I did an OCR/OMR application a year or so ago, and even when building a PDF I saw significant differences between different print drivers and such. Because of that, my gut is to tell you that you might not have 100% success.
If CSS and layout issues don't work that well for you, you might need to resort to building the labels as images using GDI+ -- at least that way you can use GetFontMetrics() and such.
Good luck!
I had a similiar issue and the answer is you can't do it. Instead, I generated a PDF file in realtime using iTextSharp and passed that to the response.
Using SQL Server Reporting Services, I generate a PDF to send to the printer, but it can be seen as HTML on the screen using the control you can include in your web pages. There are RDLC files that are available on the internet to print to various Avery formats.
I rewrote the SharpPDFLabel code that was mentioned back in 2011 this week as I needed it to be a lot more flexible (and to work with the current iTextSharp library).
You can get it here:
https://github.com/finalcut/SharpPDFLabel
I added the ability to specify the contents of each individual label if you want (or to continue creating a sheet of identical labels too). By extending the LabelDefinition class you can specify the layout of your labels pretty easily.
I also struggled with the HTML/CSS approach due to the inconsistent printing behaviour across browsers.
I created a C# library to produce Avery Labels from ASP.NET which I hope you might find useful:
https://github.com/wheelibin/SharpPDFLabel#readme
You can add images and text to the labels, and it's easy to define more labels types.
(I use it for barcode labels, the barcode is generated as an image and then added to the label using this library.)
Cheers
Add a few options to your app that let users adjust spacing for their particular configuration. You could include this right on the label if you want, and style it away via media selectors, but you'll probably want to persist them somewhere, too.
Flash is also good method to push a printable like a label albeit a little more complex to implement and maintain. In most cases it displays much quicker than a PDF and you can embed it into the design of the page and simply add a "Print" button within the flash.
I did this several years ago when we were using HTML and PDF to generate confirmation receipts. HTML is "ok" but is at the mercy of the end users web browser so we quickly dumped that method. PDF's are good as long as they have a PDF reader, which to our surprise a lot of our customers did not. So that was dumped as well after we switched to a FLASH version using a simple flash movie that included a few dynamic text areas and a "print" button. I communicated the data between the page and flash using a few flash vars. You can also use web service.
When I need something more than just simple text I use the free community edition of the PDF Generator component from DynamicPDF.com. It works great and is very quick.
I just went through the same thing. Ended up switching and making a short little JSF app (running on Glassfish) that uses JasperReports to print directly to the lable printer. Push button, instant label at the printer, don't even have to view it on-screen if you don't want to since Jasper can directly output to printer (as well as PDF in browser).

Resources