I'm looking for a plugin which can convert word (docx / doc) to pdf Without Microsoft.Office.Interop and Open Source one. There are questions asked on it but no solution is provided or I didnt found any.
Any suggestion or references will be much appreciated!.
You could do this using Aspose.Words project, however this library is not an opensource (license is required and cost some money): https://blog.aspose.com/2020/01/02/convert-word-doc-docx-to-pdf-in-csharp-net-core/
On our project we needed to keep formating as close as the original. But every plugin we tried never came close to the original.
We opted for I Love Pdf utilities.
Word to PDF
They have a well documented API for some language (including .Net) and it works great.
You can process 250 files freely every month and if you need more, it's not that expensive.
Hope this helps
Related
I'm building an application to view pdf's through a browser without the need of a plugin on mobile devices. I tried ImageMagick and ghostscript to covert the pages to images but they are far too large and text becomes unclear. I see website offering a service of converting pdf's into html and do a descent job but I can't find an example of how this is accomplished. Any help is much appreciated. Thanks!
EDIT: I seem to have read the question backwards. In this case it might be best to parse through the PDF and then format some HTML based on what you find. I believe the javapdf option is capable of this, but I haven't used any of these so I am not sure. If worse comes to worst and you can't find software to disassemble a PDF, you might be able to write your own disassembler in Java or PHP by reading the PDF specification. Best of luck!
http://www.adobe.com/devnet/pdf/pdf_reference.html - PDF Specification (Adobe Modified Version, because they are most popular you may want to support their extensions)
-- OLD -- These websites probably write their own proprietary software to do the trick. If you are truly interested in this undertaking, I would suggest parsing the HTML to get the data and style information and using it to format some sort of PDF writer APIs. A quick Google search yields the following: -- END OLD --
http://www.cutepdf.com/Solutions/
http://ruby-pdf.rubyforge.org/pdf-writer/doc/index.html
http://asprise.com/product/javapdf/
If you are looking at converting PDF to HTML and planning to run the conversion on a server, then you can try pdf2html. It is a program packaged as part of poppler-utils. I do not know how the program accomplishes it.
I was googling and came across the below link explaining how scridb.com implements conversion.
http://coding.scribd.com/2010/06/01/the-perils-of-stacking/
What are some good authoring tools for creating cross-platform help files for end-users? (Our application is using the Qt framework, if that makes any difference.)
Note: I'm not interested in internal API documentation--we're using doxygen for that.
Ideally, a solution would:
Allow us to manage all help content (text, table of contents, images, etc.) in a single location.
Output to native help formats. (CHM for Windows--or at least something we could feed directly into the HTML Help API; not sure what other platforms' "standard" help formats are.)
Decent WYSIWYG support: handle common text entry, images, cross-references, etc. easily, but we can edit the HTML when we need to.
Text-based file-format for help project (XML, etc.) so that it can be versioned in Subversion.
Any hooks that help keep it in synch with the actual code base would be great. (Perhaps somehow a help topic is associated with a code file, and can check Subversion to see if any changes have been made and flag a topic as "possibly out of date" ... am I dreaming?)
Help content can be localized.
Not opposed to commercial product, but a free option would be nice.
I'll go ahead and make this a wiki and start with a few examples. Vote 'em up or down if you have experience with them, and leave some comments. Add additional tools as well.
I just discovered Sphinx; I think I'm in love.
Better than WYSIWYG over HTML: reStructuredText
Outputs to QtHelp (among other things), so will be easily to distribute (and integrate) in our application.
Not sure about localization yet, but we'll cross that bridge when we need to.
Was easy to set up and "just works"; looks professional.
I have used robohelp for years.
It is fine, but the core technology is very old now. Also the way they lock to Word versions is a total PITA (and has forced me to avoid MS office upgrades several times).
We are moving to madcap flare http://www.madcapsoftware.com/products/flare/robohelp.aspx
I think DocBook addresses all you requirements except possibly the synchronisation hooks, which I'll think a bit further on. It's essentially a subset of XML designed for creating documentation, and is free and open source. It's just a format plus a set of XSL output transforms that convert the Docbook into more useful formats (HTML and thus CHM, JavaHelp, PDF via XML-FO or Tex).
This means that you still need to choose an XML authoring tool to actually edit it so things like WYSIWYG will depend on the features of your XML authoring software. We use Syntext Serna as it has good support for WYSIWYG and inline editing of XML #includes (no-one else seems to support the latter). You may find other XML authoring tools better suit your needs - Serna is an reasonably pricey commercial offering.
Docbook provides a lot of flexibility via profiling, which allows you to include/exclude xml elements based on their attributes. Example use cases would be to have slightly different help output for OS=Windows than OS=Linux. Localization is also supported via profiling and other mechanisms.
A fairly good introduction to Docbook can be found here.
We use Docbook for our help format, and compile it to CHM files that contain help only for the features relevant to a specific product (ie Enterprise edition has features that aren't in the Standard or Demo versions). The relevant steps are:
Run the Profiling XSL templates on the XML Source (using eg XSLTproc).
Run the HTML-Help XSL templates on the output of 1.
Compile the output HTML files using Microsoft's HTML Help Compiler (HHC).
Help & Manual
Robohelp
The only one I know is Latex, one of the latex2html converters, and then a few adaptation to make the resulting html ready for the CHM archiver.
text,html,chm,pdf, ps no problem.
Converting to Word via RTF used to be a disaster, don't know current status.
latex 2 html converters, while several, all have their own problems.
The pdfs look absolutely great.
WYSIWYM (via lyx) possible.
This archive has a bunch of CHMs that way (notably the prog,ref and user parts, the rest (rtl,fcl,lcl) are generated by our own doxygen equivalent, fpdoc)
http://www.stack.nl/~marcov/doc-chm.zip
Note that the above CHMs are made with our own (portable) CHM compiler. Yes, no more workshop.
A Lyx document as PDF and html:
pdf: http://www.stack.nl/~marcov/buildfaq.pdf
html: http://www.stack.nl/~marcov/buildfaq/
Is there a pre-existing library to extract plain text form Open XML file formats (e.g. docx, pptx, and xlsx) files?
I require this to populate a lucene.net index.
I've found this example which extracts text from docx and it seems to work okay. But before building my own solution based on this I was wondering if there's something already available for the other file formats?
Before spending cash, it may be worth looking at the IFilter interface - these were/are designed to do exactly what you want.
http://msdn.microsoft.com/en-us/library/ms691105
http://www.codeproject.com/KB/cs/IFilter.aspx
(Some links at the bottom of the codeprject link).
MS provide IFilters for office file types.
http://www.microsoft.com/downloads/details.aspx?familyid=60c92a37-719c-4077-b5c6-cac34f4227cc&displaylang=en
I know that we use this technology to allow us to index PDFs using Lucene but I did not write the actual code and cannot be of much use I am afraid.
If your Google-fu is strong I am sure you can dig up more examples of using IFilters to do exactly what you want.
watch aspose.com, they have a good library to handle both ppt and pptx.
You can try Toxy, an open source text/data extraction framework for .NET. For now, it supports xls, xlsx, doc, docx. It will support pptx in version 1.5 very soon.
For detail, you can check here
Most of the print pdf library I ran into requires drawing tables, layouts etc. Which library can simply print the web page in pdf format without requiring too much coding? Any pointers will be greatly appreciated
Free* .Net Tool:
ABC PDF
*ABCpdf is normally priced at $329. However as a special offer we'll give
you a free license key - all you have
to do is link back to our web site...
The best solution that is free that I've located is this:
http://code.google.com/p/wkhtmltopdf/
http://www.rustyparts.com/pdf.php (PHP)
The best non-free solution is here:
http://www.html-to-pdf.net (.NET)
http://www.corda.com/java-pdf.php (Java)
I have a dotx file which contains placeholders to be replaced with DB values including Image place-holders.
Should I convert dotx file to docx first and then replace the placeholders in my .net application?
Or there is some other way to do same.
Please guide.
Thank you!
For my current scenerio,
I will be using Aspose library to do doc/docx/dotx manipulation and then finally converting the result document to pdf after mail merge. I have downloaded the free 30-days trial version and it seems to solve all my issues.
I would suggest anyone going to use Aspose to have the trial version first and then making the decision.