Reliable and Free way to convert .docx, .doc, .rtf to PDF in .net - docx

The problem is, we dont have MS Office in our university anymore, earlier we used its
preview feature in Windows File Explorer, so to view multiple assignments on the fly by
just clicking on the word file or using arrow keys navigation. This helped us the teachers, to
check many assignments without having to open each file 1 by 1. Now we switched to WPS Office,
thus the preview feature in windows explorer is gone. This made the assignments checking extremely
slow as now we have to open each file before marking it.
I am a teacher too, who can code a little, so I want to build up an application in windows forms,
that enables to preview .docx, .dox and .rtf files. I tried hard since 1.5 months so far, used
Pandoc, OpenXML Powertools, tried with trial versions of libraries like Aspire, Aspose, GemBox, Xceeds
But so far no free and reliable solution is found, some of them either spoil the formatting in unacceptable way. Some of them have trial limits. Most of them do not convert word own drawings (that the students draw inside Word document itself).
So basically I am looking for a reliable and free resource that converts the whole document,
doesn't miss anything and is free. We the teachers, still can overlook formatting a little.
What I have tried: Everything as listed above, in win forms application
Current best solution working for us: Using Syncfusion DocIO library to convert the said file
formats to PDF and then loading that PDF in a CefSharp browser control in winforms. This requires
just ~2 seconds load time of the document and gives nice zooming options. But its a trial version
and eventually it will expire after trial period.
NOTE: I cannot use any online or server based solution as it will be very slow (consider clicking on the file and then waiting for more than 10 seconds to load a converted document).

Related

formatting html for printing with page numbers

Here’s the basic question…
I have a long HTML document (a contract with 100+ pages) that ultimately needs to be a PDF document with headers and footers (page numbers). What is the best tool/language for making this happen?
Here’s the back story…
I work at a satellite office for a low-tech construction company that issues contracts to subcontractors, and because I am the only one who is able to unjam the printer, I have become the defacto IT person in the company. In the past, to make a contract, someone has had to go through a MS Word document (the boiler plate contract) and type in the necessary information to produce a contract.
About a year ago, I got so frustrated with that methodology that I created a MS Access Database where a user could add information using Access forms and then a mail merge with MS Word to populate a contract. This has been a HUGE improvement plus we have been able to start tracking money a lot more easily using the other database features. The database is stored on a shared computer in the satellite office. However, this system only works IF the individual users have MS Access and MS Word installed on their individual machines and only if they are physically connected to our local network.
With the success of this system at the satellite office, I am now attempting to create a web-based version of this tool that everyone in the company can use that only relies on standard software on individual machines and can be accessible anywhere.
I have converted a computer into a server for development purposes using XAMPP, created a SQL database, created HTML forms, and am using PHP to run queries. Over the past few months, I have crash coursed my way through myriad languages including CSS, and have finally gotten everything to the point that the system will create an HTML version of the contract with everything populated. Now I just need to format it for printing (ideally to a virtual PDF printer) with headers and footers (page numbers). This should be the easiest part, right?
CSS with the #media: print tags would, on the surface, appear to be the best way to make this happen because CSS3 uses tags like “#top-left” and “content: counter(page)” to do everything that I want; however, after investing a lot of time setting everything up, it appears that only Foxfire kind of supports this and IE and Chrome absolutely do not.
Headers and footers overlap body content, and I can’t get the pagination to work at all. Apparently these are common frustrations.
In my hunting, I ran across a program called Prince that would seem to do what I want (and quite a bit more), but the price tag on that is way more than I am willing to pay.
I can’t believe that what I want to do is a new or unique thing. I suspect I am just not searching for the right keywords. Is there a better tool/technique out there for converting HTML to a printer-friendly format without spending a ton of money?
I feel your pain. But the only solution I've found that really works is to use a PDF library to write the formatted text to a PDF directly from PHP (or Python or another language, but you mentioned PHP and I've done that). I've used R&OS quite a bit:
http://pdf-php.sourceforge.net/
It may take a little while to get up to speed, but you can do pretty much anything with it, including easily create nicely formatted tables, flowing text and embedded images. The catch is that, with the exception of a few tags like <b></b> and <i></i> you don't get to use any HTML or CSS - essentially you write two output routines, one for HTML and one for PDF.

Directshow advice for range of functionality or is there a better alternative (.NET)?

I've been doing some work in VB.Net with Directshow over the past 3-4 weeks. I'm creating an application to keep tags on a video and eventually want to be able to extract the tagged parts of the video to a new file. In a video that is 2 hours long I might want to extract say 50 10-15 second "clips" up to 15 times (event tagging). This will be for a free application.
I've found it brilliant (and easy) to render / seek / play clips, etc on XP-Win7 with no issues. I've "discovered" the joys of GraphEdit, creating graphs, the issues with COM in VB.NET, GMFBridge, ....etc.
Now I need some advice. Am I using the right technology. Directshow seems to be very resistant to the idea of "open video", "seek to clip", "write clip to file", .....repeat for all clips, close file. I can sort of do this already if I visibly render the video but would need to do it as a background task faster than realtime render speed.
Things that seem to be missing are:
- an example of anyone doing anything similar (export multiple clips to a single file)
- no easily available 64bit compressors (lots of 32bit stuff around)
- all the references and examples I do find are VERY old
- VB.NET is not the first "port of call" for DirectShow developers
So, the question is, should I be using something else?
If not, has anyone done anything similar before. I'm not looking for their code, I just want some guidelines as it takes ages to figure things out in DirectShow and VB.Net just using trial & error (and Google).
I've looked at AFORGE (no sound), FFMPEG (command line toolset), Media Foundation (reluctant to throw away XP) and a variety of commercial helper libraries but not really getting any further.
Apologies for the length but I wanted readers to understand the background.
All help appreciated.
To output clips to a single file Microsoft had created the "DirectShow Editing Services". Sometimes it works, sometimes not. We use it in our software to create videos from clips like you. With a little bit work you can also include effects to the video.
It is also possible to use AviSynth. It's a scripting system and frameserver for DirectShow.
As I know, with MediaFoundation you can also create a video from multiple clips, but I never tried this.

ASP.net out of memory help

My first question here :)
I have a report generating website. When the user clicks a button the report is generated in a different sub as a html-file and is written to a txt-file. The html-file is later converted to a PDF in a different sub.
When the report is long (200 pages), I get out of memory exception when the PDF is generated. Memory seams to be allocated by the html generation, since when I convert the html to PDF in a different webform it works perfect.
I have tried to use analysis program like ANTS, but I dont have the knowledge to sort it out.
How can I release the html generation from memory?
Thanks!
/Georg
Your memory from a good component should hopefully get cleared out - however in this case since its a fairly large document it may by OK design but max the memory out. You can
1. Try to increase the memory in IIS available to your worker process
2. http://support.microsoft.com/kb/911716
3. (you didnt specify server version so this is dependent on that) http://support.microsoft.com/kb/820108
With ANTS - there are tutorials on RedGates site discussing its usage. If its a third party component there may not be much you can do except increase the available memory or contact the vendor.

Lotus Smartsuite to "something newer"

I shall try and keep my scenario as brief as possible and to the point.
The office I’m currently working for uses Lotus Smartsuite on Windows 98 / XP, using lots of Lotus Script to tie together Lotus 123 and Lotus Word Pro documents. They also make heavy use of the Lotus Object Linking functions. I shall describe its behaviour below:
You can fill rows and columns in a 123 Spreadsheet with data galore, style it and format it any way you like and define it as a range (nothing unique here). However, you can then copy that range and paste it as a link in a Lotus Word Pro document. This link is then categorised by its range name, so expanding the range back in the 123 file causes the table in the Word Pro Document to expand. This link also carries with it all the formatting and styling of the cells in the 123 Spreadsheet. As I imagine you are now aware, this link is completely live, you can double click anywhere in the object and it opens up the 123 file for editing, and all changes go backward and forward between the two documents. Most of the data retrieved from testing equipment is stored in these 123 spreadsheets and then parts of that are linked into a final Lotus Word Pro report document sent to the customer.
Note: Just to be clear, this is NOT the same as a DDE link in Open Office, which seems to allow for copying of a non-defined range of cells to be imported into a document where all formatting is lost and editing back and forth is not straight forward. It also behaves differently to an OLE object, which seems to only import the entire Spreadsheet rather than a small subsection of it.
However, in recent years, support this older software (Lotus) is becoming more difficult, especially with regards to sending customers documents (Lotus word Pro files are generally unsupported by more modern Office Tools) and technical support for Lotus Smartsuite seems to be practically non-existent these days. Also, with the fear of on going development in a scripting language no-longer being practised by mainstream IT technicians, on-going development and support seems futile. Once the guys who wrote it move on to other things, we will be left with spaghetti script in a language nobody can help us with.
So, we have this goal of "modernising" our IT system by the end of the year. Linux is becoming a very viable option too (No doubt Debian or a derivative), but Open Office doesn't seem to have the linking capability mentioned above. The reason this linking is so important is because the veterans of the office are so used to working this way - storing data in the spreadsheet, linking back to it later in their Word Pro documents, etc. I think they are more than keen to keep this practice going and we have found no equivalent of it in modern office tools (as was requested of me). I can see, as a software engineer (fluent in many languages), how this practice is not the safest or best way of using and storing data (databases spring to mind), but I was wondering if someone could give me a few other good reasons as to why this is bad practice in the work place (I was always in the belief that you should keep your data away from your reporting and formatting, the two should never be entwined - this looks like spreadsheet hell to me) ... or why this is a good thing to keep doing!?
So, for those of you still with me, I guess what I am asking is:
Is this practice of storing data, formatting it in spreadsheets and importing that directly back and forth between word documents good or bad, and what can be done about it? I guess I'll need to prove my point in case either way for this.
Are there ANY modern alternatives to this linking method (regardless of weather it is good or bad practice or not) out there for Linux or Windows? This link MUST carry formatting as well as dynamic range sizes (DDE links don't seem to be the answer).
What would your solution be if you had to start from scratch? Store everything in databases and use SQL to simply ask for the data you need in your word documents? How would you do this? What software would you use?
Any help with this scenario would be more than helpful, or if you know anywhere I should go to ask for advice, that would be appreciated too.
Thank-you for reading!
My suggestion is to first take a step back. What is the benefit to the way things are done now? Is it just a habit that is tough to break? Is there any reason the documents and spreadsheets need to be maintained and linked the way they are, or is it just a requirement because 'that's how it was done before'?
If you can remove that requirement, you have a lot more options and you're building a system that's easier to understand and maintain.
Regarding question 1, I believe there's nothing wrong with storing data in spreadsheets, especially if the end-users need to create and maintain them and development staff is limited. Some questions are whether that data needs to be secured, is related between spreadsheets, is duplicated across the company, or should be shared in a better way across the company. If any of those are true then a centralized database would make more sense. Personally I'd want any valuable data safely stored in a database where it can be managed, access to it can be controlled, it can be easily backed-up, etc.
Regarding question 2, you can do the same thing in Microsoft Office. You can either link the documents, so that the data stays in the source excel doc but appears in the word doc, or you can embed the excel spreadsheet within the word doc.
You might want to look at Microsoft Access for storing the data and generating reports. Or you could build an application using a relational database back-end and reporting front-end. The possibilities are wide-open. It really depends on where the expertise lies within the company.
If it were me I'd probably use a SQL Express back-end (it's free) and a custom ASP.NET MVC application for generating the reports, but that's just where my expertise lies.

Large HTML documents to PDF

I'm working with an asp.net application that produces large PDF documents from HTML. The content is perhaps complex (detailed grid type listings, css styled, running to 40+ pages) compared to typical usage. None of the libraries we've tried are performing adequately. Typically a 40 page document is taking upwards of a minute to render on a powerful multi-core machine.
We are able to decouple the generation from the web application and also pre-generate documents in some cases. Still, the frequency with which content changes requires a faster solution.
So, does anyone have experience of a PDF generation component that can output a content heavy 40 page document in seconds rather than minutes? Or are our expectations unrealistic?
NB: I'd rather not "out" the poorly performing components here as we are seeking support from vendors to make improvements. I've reviewed previously questions posted on StackOverflow and none appear to deal with this type or size of document.
An option might be to not convert html to PDF and take another approach. We use the ActiveReports reporting tool that generates PDF, its pretty powerful when using sub-reports for multi-dataset reports, and completely integrates with visual studio.
This means that you would need to rebuild the report to produce the same data that you see on-screen. This is sometimes not such a bad thing as you can style up the report specifically for printing.
PDFs can be generated via a back-end service and/or emailed or produced on the fly to the browser.

Resources