I'm building an application to view pdf's through a browser without the need of a plugin on mobile devices. I tried ImageMagick and ghostscript to covert the pages to images but they are far too large and text becomes unclear. I see website offering a service of converting pdf's into html and do a descent job but I can't find an example of how this is accomplished. Any help is much appreciated. Thanks!
EDIT: I seem to have read the question backwards. In this case it might be best to parse through the PDF and then format some HTML based on what you find. I believe the javapdf option is capable of this, but I haven't used any of these so I am not sure. If worse comes to worst and you can't find software to disassemble a PDF, you might be able to write your own disassembler in Java or PHP by reading the PDF specification. Best of luck!
http://www.adobe.com/devnet/pdf/pdf_reference.html - PDF Specification (Adobe Modified Version, because they are most popular you may want to support their extensions)
-- OLD -- These websites probably write their own proprietary software to do the trick. If you are truly interested in this undertaking, I would suggest parsing the HTML to get the data and style information and using it to format some sort of PDF writer APIs. A quick Google search yields the following: -- END OLD --
http://www.cutepdf.com/Solutions/
http://ruby-pdf.rubyforge.org/pdf-writer/doc/index.html
http://asprise.com/product/javapdf/
If you are looking at converting PDF to HTML and planning to run the conversion on a server, then you can try pdf2html. It is a program packaged as part of poppler-utils. I do not know how the program accomplishes it.
I was googling and came across the below link explaining how scridb.com implements conversion.
http://coding.scribd.com/2010/06/01/the-perils-of-stacking/
Related
I'm looking for a plugin which can convert word (docx / doc) to pdf Without Microsoft.Office.Interop and Open Source one. There are questions asked on it but no solution is provided or I didnt found any.
Any suggestion or references will be much appreciated!.
You could do this using Aspose.Words project, however this library is not an opensource (license is required and cost some money): https://blog.aspose.com/2020/01/02/convert-word-doc-docx-to-pdf-in-csharp-net-core/
On our project we needed to keep formating as close as the original. But every plugin we tried never came close to the original.
We opted for I Love Pdf utilities.
Word to PDF
They have a well documented API for some language (including .Net) and it works great.
You can process 250 files freely every month and if you need more, it's not that expensive.
Hope this helps
Is there a way to avoid working with the command-line installing and using LESS??
There are several offers for GUIs for the compiling-phase, but I did not find a way for the Installation-Phase.
I have been working in the IT-business for so many decades (more in the mainframe and midrange area and as a project-manager and programmer in the application development) and could by now avoid to go as far down to the command-line-world.
I did develop quite fine Websites using HTML5 and CSS3 and doint this I felt a desire for all that, what LESS and/or SASS are offering and the Syntax and logics dont look difficult to handle. But I fail in the first step of just installing it.
The LESS-Website offers command-lines to key in. But I am not sure, if this will be all I have to key in, but only the significant line to be embedded in a sequence of other commands very familiar to all those working at this Level.
How do I e.g. define the place to store the Installation and to refer to in the href in the link-Statement of my html-file .... ??
Thanks
Gerhard (from Vienna/Austria, living in Trier, Germany)
Less is a CSS pre-processor. if you are include less.js in you html page
You can use less directly in to your html page.
Other ways you can use less compiler
Kola this is an open source application it will help you to compile less to css
Your Topics are clear to me. I even downloaded Koala already and I have no Problem in including less.js in my html. And I have read Bass Jobsens book about the Syntax, which does not seem to raise great Problems to me.
But before working with it, I will have to download LESS -what I have done from the Less-Website to the Folder of my choice. My Problem is the next necessary step: To install this downloaded program. There is no install.exe or something like that. The book as well as the info in the less-Website tell me to key some crpytic commands into the command-line.
What are some good authoring tools for creating cross-platform help files for end-users? (Our application is using the Qt framework, if that makes any difference.)
Note: I'm not interested in internal API documentation--we're using doxygen for that.
Ideally, a solution would:
Allow us to manage all help content (text, table of contents, images, etc.) in a single location.
Output to native help formats. (CHM for Windows--or at least something we could feed directly into the HTML Help API; not sure what other platforms' "standard" help formats are.)
Decent WYSIWYG support: handle common text entry, images, cross-references, etc. easily, but we can edit the HTML when we need to.
Text-based file-format for help project (XML, etc.) so that it can be versioned in Subversion.
Any hooks that help keep it in synch with the actual code base would be great. (Perhaps somehow a help topic is associated with a code file, and can check Subversion to see if any changes have been made and flag a topic as "possibly out of date" ... am I dreaming?)
Help content can be localized.
Not opposed to commercial product, but a free option would be nice.
I'll go ahead and make this a wiki and start with a few examples. Vote 'em up or down if you have experience with them, and leave some comments. Add additional tools as well.
I just discovered Sphinx; I think I'm in love.
Better than WYSIWYG over HTML: reStructuredText
Outputs to QtHelp (among other things), so will be easily to distribute (and integrate) in our application.
Not sure about localization yet, but we'll cross that bridge when we need to.
Was easy to set up and "just works"; looks professional.
I have used robohelp for years.
It is fine, but the core technology is very old now. Also the way they lock to Word versions is a total PITA (and has forced me to avoid MS office upgrades several times).
We are moving to madcap flare http://www.madcapsoftware.com/products/flare/robohelp.aspx
I think DocBook addresses all you requirements except possibly the synchronisation hooks, which I'll think a bit further on. It's essentially a subset of XML designed for creating documentation, and is free and open source. It's just a format plus a set of XSL output transforms that convert the Docbook into more useful formats (HTML and thus CHM, JavaHelp, PDF via XML-FO or Tex).
This means that you still need to choose an XML authoring tool to actually edit it so things like WYSIWYG will depend on the features of your XML authoring software. We use Syntext Serna as it has good support for WYSIWYG and inline editing of XML #includes (no-one else seems to support the latter). You may find other XML authoring tools better suit your needs - Serna is an reasonably pricey commercial offering.
Docbook provides a lot of flexibility via profiling, which allows you to include/exclude xml elements based on their attributes. Example use cases would be to have slightly different help output for OS=Windows than OS=Linux. Localization is also supported via profiling and other mechanisms.
A fairly good introduction to Docbook can be found here.
We use Docbook for our help format, and compile it to CHM files that contain help only for the features relevant to a specific product (ie Enterprise edition has features that aren't in the Standard or Demo versions). The relevant steps are:
Run the Profiling XSL templates on the XML Source (using eg XSLTproc).
Run the HTML-Help XSL templates on the output of 1.
Compile the output HTML files using Microsoft's HTML Help Compiler (HHC).
Help & Manual
Robohelp
The only one I know is Latex, one of the latex2html converters, and then a few adaptation to make the resulting html ready for the CHM archiver.
text,html,chm,pdf, ps no problem.
Converting to Word via RTF used to be a disaster, don't know current status.
latex 2 html converters, while several, all have their own problems.
The pdfs look absolutely great.
WYSIWYM (via lyx) possible.
This archive has a bunch of CHMs that way (notably the prog,ref and user parts, the rest (rtl,fcl,lcl) are generated by our own doxygen equivalent, fpdoc)
http://www.stack.nl/~marcov/doc-chm.zip
Note that the above CHMs are made with our own (portable) CHM compiler. Yes, no more workshop.
A Lyx document as PDF and html:
pdf: http://www.stack.nl/~marcov/buildfaq.pdf
html: http://www.stack.nl/~marcov/buildfaq/
Most of the print pdf library I ran into requires drawing tables, layouts etc. Which library can simply print the web page in pdf format without requiring too much coding? Any pointers will be greatly appreciated
Free* .Net Tool:
ABC PDF
*ABCpdf is normally priced at $329. However as a special offer we'll give
you a free license key - all you have
to do is link back to our web site...
The best solution that is free that I've located is this:
http://code.google.com/p/wkhtmltopdf/
http://www.rustyparts.com/pdf.php (PHP)
The best non-free solution is here:
http://www.html-to-pdf.net (.NET)
http://www.corda.com/java-pdf.php (Java)
I'm currently looking at different solutions getting 2 dimensional mathematical formulas into webpages. I think that the wikipedia solution (generating png images from LaTeX sourcecode) is good enough until we get support for MathML in webbrowsers.
I suddenly realized that it might be possible to create a Google Charts API equivalent for mathformulas. Has this already been done? Is it even possible due to the strange characters involved in LaTeX-code?
I would like to hit an url like latex2png.org/api/?eq="E = mc^2" and get the following response:
edit:
Thanks for the answers sofar. However, I am already aware of several tools to generate png images from latex source code (both online and from my commandline), but what I was looking for was a simple way to get the image via an Http GET request. Perhaps such a service does not exist.
Update
As #hughes (and others) pointed out, the previous Google Chart API has been deprecated.
The example I wrote still works as of Sept 2015, but a new one shall be used now (documentation):
Old answer
Google Chart can do it (Documentation):
http://chart.apis.google.com/chart?cht=tx&chl=%5CLaTeX
I'm using this with Google Docs, because it doesn't support math yet.
chart.apis.google with background color changed
https://chart.apis.google.com/chart?cht=tx&chf=bg,s,FFFF00&chl=%0D%0A4x_0%5CDelta%28x%29%2B3%5CDelta%28x%29%2B2%5CDelta%28x%5E2%29%3E0%0D%0A
or chart.apis.google with background color transparent and resized
For better readability URL needs to be decoded.
https://chart.apis.google.com/chart?cht=tx&chs=428x35&chf=bg,s,FFFFFF00&chl=
4x_0\Delta(x)+3\Delta(x)+2\Delta(x^2)>0
Data structure looks like this
{
"cht":"tx",
"chs":"428x35",
"chf":"bg,s,FFFFFF00",
"chl":"n4x_0\Delta(x) 3\Delta(x) 2\Delta(x^2)>0"
}
https://chart.apis.google.com/chart?cht=tx&chs=428x35&chf=bg,s,FFFFFF00&chl=%0D%0A4x_0%5CDelta%28x%29%2B3%5CDelta%28x%29%2B2%5CDelta%28x%5E2%29%3E0%0D%0A
You could try the Online image generator for mathematical formulas for a start.
mathurl is a mathematical version of TinyURL.com. It allows you to reference LaTeXed mathematical expressions using a short url. For example, http://mathurl.com/?5v4pjw will show [LaTeX output Image] which you can then edit. More details on mathurl’s help page
I just ran across MathJax on Ajaxian [via Wayback Machine]:
MathJax seems to have a chance at being a practical solution that offers a high quality display of LaTeX and MathML math notation in HTML pages.
The output is remarkably beautiful, and it's all pure HTML and CSS, which makes it scalable and selectable. Performance is currently a bit sluggish, but this is recognized.
As everyone has said, there are many services that do this already. Here is another easy one that I've used a number of times (and you can install it locally on your server if necessary):
http://www.codecogs.com/components/equationeditor/equationeditor.php
I'd take a good look at how the MediaWiki LaTeX support does it and borrow from there.
Please check out this site for a way to create TeX documents without any software installed. You can then snippet the result image with any screen capture method and embed the resulting image into a any website.
Go to http://sharelatex.com
The software is free to use, but you need to register to create documents.