I am working on a web application that takes MS documents(word, excel, ppt) as input documents and generates PDF documents, while it's possible to create the accessible PDF using the API/library that I am currently using, I was looking for an API/Library that will help me scan the input document(word, ppt, excel) for accessibility compliance.
As if the input document itself is lacking the semantic meta-data for accessibility the resulting PDF will not be accessible.
MS Word itself has a scripting interface for VBscript (Windows/Mac) and AppleScript (Mac only). Not sure how far you can get with those, but I seem to remember that they both expose a lot of stuff about Word documents, so this is a possible pathway.
libreoffice has a python scripting interface - this may be another viable approach.
There are certainly command-line tools which can manipulate word files in various ways. Try this post:
Creating & Editing MS-Word documents on a linux server?
Related
I am looking for easy solution to convert documents from one format(doc, html, xls...) to pdf in ASP.NET.
Is the iTextSharp a good choise? Can iTextSharp convert documents from one format to pdf?
What library can your suggest me to use?
I've been using winnovative for all my PDF generation for the past few years:
http://www.winnovative-software.com/
Fair few good features, and simple to implement, if you don't mind paying for a license.
The default standard for this task should be Microsoft Office SharePoint Server. Another option would be using Microsoft Office applications from ASP.NET with Automation, combined with a PDF Printer (you will need a copy of Microsoft Office installed on the server). There are many PDF printers outthere (Cute PDF for example), but if you can afford a commercial option I recommend Amyuni PDF Converter. There are samples of Word/Excel to PDF conversion using Office+Automation with this product.
I'm working as Developer Evangelist with Aspose. And I would like to share with you that you may try Aspose.Total for .NET product suite, which allows you to convert various file formats (DOC/DOCX/PPT/PPTX/XSL/HTML etc.) into PDF format. You may also select components of your choice. Complete samples, tutorials and support are also available for these components.
Please note that these components are standard .NET assemblies and you can use them either in ASP.NET or Windows Forms applications.
Give the Muhimbi PDF Converter Services a look. It installs in your environment as a scalable and robust Windows Service and has specifically been designed for use from server based applications such as ASP.NET.
It comes with a friendly web services based interface that allows it to be used from most modern environments such as Java and .NET. It supports all common as well as some not so common file formats. Watermarking and PDF Security is included as well. If you have SharePoint in your environment then a SharePoint optimised version is available as well.
Disclaimer, I have worked on this product so the usual disclaimers apply. Having said that, it works great.
I have been doing dynamic PDF creation via ASP.net for some time -- in the form of HTML to PDF conversion. It works well for us, but we have accessibility requirements from the State to make everything is accessible. For static PDFs, we simply "tag" the files manually using Adobe's accessibility tools. Of course this does not work for dynamically created files. PDFs that I create dynamically fail the Acrobat Pro Accessibility test.
Does anyone have any ideas about create PDFs dynamically in ASP.net, but producing PDFs that are tagged and can pass the Adobe Accessiblity test? I have researched many components, but none that I have found support tagging.
Thanks.
I would look seriously at iText. AFAIK, this is the definitive library for creating dynamic PDF's, for Java and .NET.
You will need the book iText in Action.
Here's a quote from iText in Action on accessibility:
"You can use iText to create a document that passes all the criteria that are listed in Section 508."
I'm trying to (HTTP) upload a binary file programmatically from within VBA. I intend to put an ASPX page on the server to accept the file and certain additional parameters.
I know there are lots of nice ways to do that (e.g. use web service instead of aspx), but my constraint is that it must run in VBA (in an excel file), and that I cannot install any additional components on the client.
So I guess I'll use WinHTTP, and I've found several examples to post form data, but not to post a binary file. I probably need to base64 the file contents?
So my questions are:
Do I need to do the encoding manually or can I make WinHTTP do that?
Is there a better utility to use than WinHTTP? (Remember I don't want to install any additional software, it must be shipped with WinXP Pro, Office 2007 or a .NET framework, e.g.)
Is there a better way to go, e.g. using ASP.NET web services?
Thx, chiccodoro
You may use base64 but typically writing binary is easier.
The hurdle you have to leap is constructing a valid multi-par/form POST. This is completely possible using WinHTTP, although I have not done it in years and am not tooled to provide sample code, it is not trivial.
You can reference the following articles for examples of how to do this with C# HttpWebRequest. The WinHTTP api is a bit different of course but the salient points to take away from the articles is the structure of the POST body.
C# File Upload with form fields, cookies and headers (by yours truly)
UploadFileEx: C#'s WebClient.UploadFile with more functionality (a bit more procedural and may be easier to suss out the format)
Typically I provide sample code, but as I said, I do not have any stone-age tools set up right now ;-).
HTH
I'm looking for a way to export a Word document as a PDF. I would like to do this without the use of a "software printer" (such as CutePDF, etc.) and stick to reference assemblies if at all possible. I'm using Microsoft Office Interop Assemblies to generate a Word Document which I save to a temporary directory. So its not necessary for this solution to interact directly with Microsoft Office, unless it needs to.
Office 2007 has a built-in (or add-on) converter to PDF, therefore you can save office 2007 files to PDF without much hassle.
Otherwise, you'll have to use some sort of conversion assembly (there should be commercial assebmlies that perfrom this task), or a converstion application that can receive command-line arguments, or maybe even some web-based service for office-to-pdf conversion.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Similar questions have been asked, but nothing exactly like mine, so here goes.
We have a collection of Microsoft Word documents on an ASP.NET web server with merge fields whose values are filled in as a result of user form submissions. After the field merge, the server must convert the document to PDF and stream it down to the browser. Our first inclination was to use the Visual Studio Tools for Office API; however, we ran into this warning from Microsoft:
Microsoft does not currently recommend, and does not support, Automation of Microsoft Office applications from any unattended, non-interactive client application or component (including ASP, ASP.NET, DCOM, and NT Services), because Office may exhibit unstable behavior and/or deadlock when Office is run in this environment.
It looks like the field manipulation can be done using the Open XML SDK, but what's the best way to convert Word 2007 documents to PDF without opening Word? The optimal solution would be low-cost, scalable, have a low memory footprint, be easy to deploy, and have a .NET API.
It's not exactly Open Source, but Aspose has a couple products which can do that,
Aspose.Pdf.Kit
Aspose.Pdf.Kit is a non-graphical PDF® document manipulation component that enables both .NET and Java developers to manage existing PDF files as well as manage form fields embedded within PDF files. Aspose.Pdf is perfect for creating new PDF files; however, developers often need to edit already existing PDF documents. Aspose.Pdf.Kit allows them to do just that. Aspose.Pdf.Kit allows developers to create powerful applications for merging data directly into PDF documents as well as for updating and managing PDF documents. Aspose.Pdf.Kit is a wonderful product and works great with the rest of our PDF products.
and Aspose.pdf
Aspose.Pdf is a non-graphical PDF® document reporting component that enables either .NET or Java applications to create PDF documents from scratch without utilizing Adobe Acrobat®. Aspose.Pdf is very affordably priced and offers a wealth of strong features including: compression, tables, graphs, images, hyperlinks, security and custom fonts. Aspose.Pdf supports the creation of PDF files through API, XML templates and XSL-FO files. Aspose.Pdf is very easy to use and is provided with 14 fully featured demos written in both C# and Visual Basic.
Check out the API and demos. You can download a DLL for free to try it out. I've used both before and they work out great.
There's also iTextSharp which is a C# port of iText, a Java PDF converter. I've heard some people try it with mixed results.
The question is "MS Word Documents to PDF in ASP.NET" so I am very puzzled why Aspose.Pdf and Aspose.Pdf.Kit are recommended above. You need to use Aspose.Words because that's the component that supports Microsoft Word documents to PDF conversion.
Check out Microsoft's resource on Saving Word 2007 Documents to PDF and XPS Formats using C# or VB.
ActivePdf DocConverter - http://www.activepdf.com/
But it requires Office installed on the server for good quality conversion.
Aspose.Words may be the best option for you, but it doesn't convert all visual elements perfectly.
Have a look at the Muhimbi PDF Converter Web Services. It runs on Windows as a service, but can be accessed from any non-Windows web services capable environment including Java and .NET.
Although this solutions requires MS-Office to be installed on a server (not necessarily the same server as your application), it is very robust and provides perfect conversion fidelity. It goes to great lengths to get around the deadlock problems Microsoft refer to in their KB article.
To generate or Modify MS-Word files I recommend using the free Open XML SDK for Microsoft Office. Eric White maintains a really good Blog about it.
Disclaimer, I worked on this product. Having said that, it works great.
You should try using OpenOffice for this. It is Free and supports a whole range of file conversions. I have used it to convert DOC & DOCX files to HTML format with fantastic results.
ABCpdf is another popular component that'll let you convert Word documents to PDF under ASP.NET, however I believe it too makes use of Microsoft Office or OpenOffice.
http://www.websupergoo.com/abcpdf-office-docs.htm
Microsoft PDF add-in for word seems to be the best solution for now but you should take into consideration that it does not convert all word documents correctly to pdf and in some cases you will see huge difference between the word and the output pdf. Unfortunately I couldn't find any api that would convert all word documents correctly. The only solution I found to ensure the conversion was 100% correct was by converting the documents through a printer driver. The downside is that documents are queued and converted one by one, but you can be sure the resulted pdf is exactly like the word docuemtn. I personally preferred using UDC (Universal document converter) and installed Foxit Reader(free version) on server too then printed the documents by starting a "Process" and setting its Verb property to "print". You can also use FileSystemWatcher to set a signal when the conversion has completed.