I'm working with an asp.net application that produces large PDF documents from HTML. The content is perhaps complex (detailed grid type listings, css styled, running to 40+ pages) compared to typical usage. None of the libraries we've tried are performing adequately. Typically a 40 page document is taking upwards of a minute to render on a powerful multi-core machine.
We are able to decouple the generation from the web application and also pre-generate documents in some cases. Still, the frequency with which content changes requires a faster solution.
So, does anyone have experience of a PDF generation component that can output a content heavy 40 page document in seconds rather than minutes? Or are our expectations unrealistic?
NB: I'd rather not "out" the poorly performing components here as we are seeking support from vendors to make improvements. I've reviewed previously questions posted on StackOverflow and none appear to deal with this type or size of document.
An option might be to not convert html to PDF and take another approach. We use the ActiveReports reporting tool that generates PDF, its pretty powerful when using sub-reports for multi-dataset reports, and completely integrates with visual studio.
This means that you would need to rebuild the report to produce the same data that you see on-screen. This is sometimes not such a bad thing as you can style up the report specifically for printing.
PDFs can be generated via a back-end service and/or emailed or produced on the fly to the browser.
Related
The problem is, we dont have MS Office in our university anymore, earlier we used its
preview feature in Windows File Explorer, so to view multiple assignments on the fly by
just clicking on the word file or using arrow keys navigation. This helped us the teachers, to
check many assignments without having to open each file 1 by 1. Now we switched to WPS Office,
thus the preview feature in windows explorer is gone. This made the assignments checking extremely
slow as now we have to open each file before marking it.
I am a teacher too, who can code a little, so I want to build up an application in windows forms,
that enables to preview .docx, .dox and .rtf files. I tried hard since 1.5 months so far, used
Pandoc, OpenXML Powertools, tried with trial versions of libraries like Aspire, Aspose, GemBox, Xceeds
But so far no free and reliable solution is found, some of them either spoil the formatting in unacceptable way. Some of them have trial limits. Most of them do not convert word own drawings (that the students draw inside Word document itself).
So basically I am looking for a reliable and free resource that converts the whole document,
doesn't miss anything and is free. We the teachers, still can overlook formatting a little.
What I have tried: Everything as listed above, in win forms application
Current best solution working for us: Using Syncfusion DocIO library to convert the said file
formats to PDF and then loading that PDF in a CefSharp browser control in winforms. This requires
just ~2 seconds load time of the document and gives nice zooming options. But its a trial version
and eventually it will expire after trial period.
NOTE: I cannot use any online or server based solution as it will be very slow (consider clicking on the file and then waiting for more than 10 seconds to load a converted document).
For our SaaS (LAMP) product reporting we are currently using JasperReports. We find it too cumbersome to develop reports with and the output in Word unworkable. Moreover, a couple of customers request to be able to develop simple reports themselves (to be used as mail merge). We would therefore like to develop templates right in Word. The idea is to have an application/webservice that would receive the Word template and JSON data from the LAMP application and return the filled-in report. The report has to support:
Loops inside content (repeating a document section several times while filling in array data)
Filling in tables (populating rows from array)
Filling in chart data in pre-created charts (from array)
This is the functionality we are using in JasperReports right now. Are there existing solutions to this? I've found quite a lot that can substitute simple variables, but no info about the the above three points. Will it be a lot of effort to write one from scratch? I would prefer a Windows OpenXML-based solution rather than a Linux PHPOffice-based one as I presume the former would handle the text split by spell-checker and language tags (though I'm not sure).
Windward and Docmosis are both commercial products that support the features you've listed and they are intended to be added to your application to provide reporting capabilities. Neither is are not OpenXML based. They can use Word documents as templates and perform the data merge into different output formats. Please note I work for Docmosis.
Aspose Words is another tool and it can populate a template but most of the power is through code rather than controls/directives in the template. Given your OpenXML thoughts, perhaps this is more what you are looking for.
More tools are recommended here in StackExchange.
I hope that helps.
ReportBox is a Web based reporting solution that can be used by any software application to generate documents and reports in Microsoft Word/ Excel/ PowerPoint/ HTML(DocX/Xlsx/PPTx/HTML) using OpenXML.
The process starts by building a Microsoft Word/ Excel/ PowerPoint/ HTML document as a template and uploading to ReportBox portal. Your application either sends data to ReportBox or ReportBox can pull data from your application database, which is then merged with the template to produce the finished report. Please note that I work for GreenThoughts.
Here’s the basic question…
I have a long HTML document (a contract with 100+ pages) that ultimately needs to be a PDF document with headers and footers (page numbers). What is the best tool/language for making this happen?
Here’s the back story…
I work at a satellite office for a low-tech construction company that issues contracts to subcontractors, and because I am the only one who is able to unjam the printer, I have become the defacto IT person in the company. In the past, to make a contract, someone has had to go through a MS Word document (the boiler plate contract) and type in the necessary information to produce a contract.
About a year ago, I got so frustrated with that methodology that I created a MS Access Database where a user could add information using Access forms and then a mail merge with MS Word to populate a contract. This has been a HUGE improvement plus we have been able to start tracking money a lot more easily using the other database features. The database is stored on a shared computer in the satellite office. However, this system only works IF the individual users have MS Access and MS Word installed on their individual machines and only if they are physically connected to our local network.
With the success of this system at the satellite office, I am now attempting to create a web-based version of this tool that everyone in the company can use that only relies on standard software on individual machines and can be accessible anywhere.
I have converted a computer into a server for development purposes using XAMPP, created a SQL database, created HTML forms, and am using PHP to run queries. Over the past few months, I have crash coursed my way through myriad languages including CSS, and have finally gotten everything to the point that the system will create an HTML version of the contract with everything populated. Now I just need to format it for printing (ideally to a virtual PDF printer) with headers and footers (page numbers). This should be the easiest part, right?
CSS with the #media: print tags would, on the surface, appear to be the best way to make this happen because CSS3 uses tags like “#top-left” and “content: counter(page)” to do everything that I want; however, after investing a lot of time setting everything up, it appears that only Foxfire kind of supports this and IE and Chrome absolutely do not.
Headers and footers overlap body content, and I can’t get the pagination to work at all. Apparently these are common frustrations.
In my hunting, I ran across a program called Prince that would seem to do what I want (and quite a bit more), but the price tag on that is way more than I am willing to pay.
I can’t believe that what I want to do is a new or unique thing. I suspect I am just not searching for the right keywords. Is there a better tool/technique out there for converting HTML to a printer-friendly format without spending a ton of money?
I feel your pain. But the only solution I've found that really works is to use a PDF library to write the formatted text to a PDF directly from PHP (or Python or another language, but you mentioned PHP and I've done that). I've used R&OS quite a bit:
http://pdf-php.sourceforge.net/
It may take a little while to get up to speed, but you can do pretty much anything with it, including easily create nicely formatted tables, flowing text and embedded images. The catch is that, with the exception of a few tags like <b></b> and <i></i> you don't get to use any HTML or CSS - essentially you write two output routines, one for HTML and one for PDF.
My first question here :)
I have a report generating website. When the user clicks a button the report is generated in a different sub as a html-file and is written to a txt-file. The html-file is later converted to a PDF in a different sub.
When the report is long (200 pages), I get out of memory exception when the PDF is generated. Memory seams to be allocated by the html generation, since when I convert the html to PDF in a different webform it works perfect.
I have tried to use analysis program like ANTS, but I dont have the knowledge to sort it out.
How can I release the html generation from memory?
Thanks!
/Georg
Your memory from a good component should hopefully get cleared out - however in this case since its a fairly large document it may by OK design but max the memory out. You can
1. Try to increase the memory in IIS available to your worker process
2. http://support.microsoft.com/kb/911716
3. (you didnt specify server version so this is dependent on that) http://support.microsoft.com/kb/820108
With ANTS - there are tutorials on RedGates site discussing its usage. If its a third party component there may not be much you can do except increase the available memory or contact the vendor.
A client wants to "Web-enable" a spreadsheet calculation -- the user to specify the values of certain cells, then show them the resulting values in other cells.
(They do NOT want to show the user a "spreadsheet-like" interface. This is not a UI question.)
They have a huge spreadsheet with lots of calculations over many, many sheets. But, in the end, only two things matter -- (1) you put numbers in a couple cells on one sheet, and (2) you get corresponding numbers off a couple cells in another sheet. The rest of it is a black box.
I want to present a UI to the user to enter the numbers they want, then I'd like to programatically open the Excel file, set the numbers, tell it to re-calc, and read the result out.
Is this possible/advisable? Is there a commercial component that makes this easier? Are their pitfalls I'm not considering?
(I know I can use Office Automation to do this, but I know it's not recommended to do that server-side, since it tries to run in the context of a user, etc.)
A lot of people are saying I need to recreate the formulas in code. However, this would be staggeringly complex.
It is possible, but not advisable (and officially unsupported).
You can interact with Excel through COM or the .NET Primary Interop Assemblies, but this is meant to be a client-side process.
On the server side, no display or desktop is available and any unexpected dialog boxes (for example) will make your web app hang – your app will behave flaky.
Also, attaching an Excel process to each request isn't exactly a low-resource approach.
Working out the black box and re-implementing it in a proper programming language is clearly the better (as in "more reliable and faster") option.
Related reading: KB257757: Considerations for server-side Automation of Office
You definitely don't want to be using interop on the server side, it's bad enough using it as a kludge on the client side.
I can see two options:
Figure out the spreadsheet logic. This may benefit you in the long term by making the business logic a known quantity, and in the short term you may find that there are actually bugs in the spreadsheet (I have encountered tons of monster spreadsheets used for years that turn out to have simple bugs in them - everyone just assumed the answers must be right)
Evaluate SpreadSheetGear.NET, which is basically a replacement for interop that does it all without Excel (it replicates a huge chunk of Excel's non-visual logic and IO in .NET)
Although this is certainly possible using ASP.NET, it's very inadvisable. It's un-scalable and prone to concurrency errors.
Your best bet is to analyze the spreadsheet calculations and duplicate them. Now, granted, your business is not going to like the time it takes to do this, but it will (presumably) give them a more usable system.
Alternatively, you can simply serve up the spreadsheet to users from your website, in which case you do almost nothing.
Edit: If your stakeholders really insist on using Excel server-side, I suggest you take a good hard look at Excel Services as #John Saunders suggests. It may not get you everything you want, but it'll get you quite a bit, and should solve some of the issues you'll end up with trying to do it server-side with ASP.NET.
That's not to say that it's a panacea; your mileage will certainly vary. And Sharepoint isn't exactly cheap to buy or maintain. In fact, short-term costs could easily be dwarfed by long-term costs if you go the Sharepoint route--but it might the best option to fit a requirement.
I still suggest you push back in favor of coding all of your logic in a separate .NET module. That way you can use it both server-side and client-side. Excel can easily pass calculations to a COM object, and you can very easily publish your .NET library as COM objects. In the end, you'd have a much more maintainable and usable architecture.
Neglecting the discussion whether it makes sense to manipulate an excel sheet on the server-side, one way to perform this would probably look like adopting the
Microsoft.Office.Interop.Excel.dll
Using this library, you can tell Excel to open a Spreadsheet, change and read the contents from .NET. I have used the library in a WinForm application, and I guess that it can also be used from ASP.NET.
Still, consider the concurrency problems already mentioned... However, if the sheet is accessed unfrequently, why not...
The simplest way to do this might be to:
Upload the Excel workbook to Google Docs -- this is very clean, in my experience
Use the Google Spreadsheets Data API to update the data and return the numbers.
Here's a link to get you started on this, if you want to go that direction:
http://code.google.com/apis/spreadsheets/overview.html
Let me be more adamant than others have been: do not use Excel server-side. It is intended to be used as a desktop application, meaning it is not intended to be used from random different threads, possibly multiple threads at a time. You're better off writing your own spreadsheet than trying to use Excel (or any other Office desktop product) form a server.
This is one of the reasons that Excel Services exists. A quick search on MSDN turned up this link: http://blogs.msdn.com/excel/archive/category/11361.aspx. That's a category list, so contains a list of blog posts on the subject. See also Microsoft.Office.Excel.Server.WebServices Namespace.
It sounds like you're talking that the user has the spreadsheet open on their local system, and you want a web site to manipulate that local spreadsheet?
If that's the case, you can't really do that. Even Office automation won't help, unless you want to require them to upload the sheet to the server and download a new altered version.
What you can do is create a web service to do the calculations and add some vba or vsto code to the Excel sheet to talk to that service.