Recipes PDF batch extraction

Recipes PDF batch extraction - wordpress

I am now working with 500 pdf recipes files, which I want to display in my website. How can I batch extract them and display information on PDF to my website? PDF has all the information for recipes. For each recipe, I need to display its description, image, ingredients, instructions, nutrition label and so on. Is there any way so that I don't need to work on it manually?

Do these all have the same basic template for how the information is structured? This isn't really specifically a WordPress issue. One thing you can do is use Go to loop through and process all the files. I played with Go and it's incredibly fast to parse large amounts of information. Maybe you can fiddle with it in this library here https://github.com/unidoc/unidoc.
There are a lot of library options to try in PHP also. Here's just one example https://www.pdfparser.org/. There's documentation here and you can install it via composer. https://www.pdfparser.org/documentation
If every recipe follows the same sort of template, and you want to extract specific details in specific sections of the PDF, it should be easy enough. If you don't mind extracting all the text from a PDF and just display that on your website, it should be easy enough using one of the libraries. If you go the Golang route, you could just parse all the text for each PDF, save them to a file, and just upload them using PHP and have the PHP code insert everything into custom post types or something.

Related

How to publish R code and graphs (Markdown) on websites?

I would like to setup a homepage and publish several R code mixed with text, pictures and references. Obviously that's what a lot of people do:
https://feliperego.github.io/blog/2015/10/23/Interpreting-Model-Output-In-R
https://yihui.org/knitr/demo/wordpress/
http://3.14a.ch/archives/2015/03/08/how-to-publish-with-r-markdown-in-wordpress/
RWordPress does not seem to support graphical outputs from the script. Instead of discussing the multitude of errors I received and options I tried, my simple question is: How did the above listed examples manage to put R code and graphs on their websites? (I haven't tried to contact those homepage owners, but this will be the next step, if no explanation can be found.)

You have several options. It depends if you want to build your website from scratch or not. In the first case, I recommend you to have a look at:
bookdown: very nice for documentation. You can find many examples on the web, for instance the bookdown documentation
blogdown: more general than bookdown. This gives very nice websites (mine is built using blogdown for instance). blogdown is based on hugo: you can find many themes.
In both cases, you write standard Rmarkdown and build your site by executing the markdowns. You can build the site locally for preview or deploy them on the web (manually or by a continuous integration system as gitlab pages). The point of using these packages is reducing the burden of handling formatting and links between pages.

The short answer is a mixture of manual and automatic generation. If you look at my website (www.jamescurran.co.nz), you'll find articles like the ones you want to write.
I achieve this in a variety of ways. Sometimes I get knitr to produce the HTML, and then I hack it into shape on the website. Other times, I use a mixture of short-tags like
[code language = "R"]
data(cars)
[/code]
and/or HTML like
<pre><code>
data(cars)
</code><pre>
The images I upload and then use Wordpress' editor to insert the correct links. It's not very satisfactory, but unless you have a hosting platform that allows you to have static content, then most of the automated solutions that go straight from R Markdown won't work.

Multiple pdf files in one embed

I need your help over a problem I have. Actually, I have a page with a simple embed which displays a PDF file.
I got a request to add another PDF file to the same embed (or at least to do something which would look like it).
I searched some solutions and not finding a simple one, I'm thinking about using iTextSharp to merge both files (by getting their stream from their url), merging them into a new pdf file and display this resulting file into the embed.
But I'm just telling myself it's a bit too much for such a simple modification... And so I'm here asking you if someone would have a better idea ? From what I searched on stackoverflow and google it looks like I will have to take the merge solution but hey, we never know '^^

A simpler option would be to merge the two PDF files using either a free online tool or Adobe Combine Files option and then adding that newly combined PDF to your site. Unless I am missing something, there is no real reason or benefit to do this using code.

Can I do data visualization with Drupal?

Basically i want it to import data from a SQL database and Display it as graphs. Having said that i also want it to be dynamic,responsive as in the users should have filtering options. Any leads would be definitely helpful.
Please note that i am just a beginner with drupal.

Yes, But I recommend you to work on your own framework. Or even you can use some data visualization framework
Maybe this one is useful for you if you want to do it on your own:
http://www.sitepoint.com/twelve-javascript-libraries-data-visualization/

Why do you need Drupal for that?
I would make some PHP script which would read SQL and generate image out of that data, by using GD or ImageMagic lib. I guess you know that PHP can shoot out image file header and dynamically generate image, on fly. That PHP script could also read filter parameters to have influence on generated image.

Bulk upload of Microsoft Word files to WordPress pages

I have been asked to upload 200 Microsoft Word documents — many of them containing lengthy, complex math problems or scientific notation — into a WordPress setting. Each Word file would become a separate WordPress post.
I would clearly prefer to not cut-and-paste each file one-by-one into a post and then save it . Does anyone know of a way to automate the process while ensuring the accuracy of the translation, or at least minimizing the number of issues we might find when converting from Word to WordPress? Or am I dreaming the impossible dream?
Thanks for any input you can offer.

Sounds like an interesting problem. I have an idea that might be worth exploring. There are a number of free or shareware tools that can convert Word docs to HTML.
If you can manage to convert them into decently clean markup with one of those tools, I would recommend using the HTML Import 2 WordPress plugin. It can take a batch of HTML files and create Posts / Pages out of them.
It's a two step process, but I bet it'll work. (And certainly be faster than copy/paste 200 times).
Hope that helps, have fun!

Well I got the solution which works for me, but its bit manual but still save a lots of time.
Here are the Steps.
Connect your Blog to Ms word 2007/2013
Make sure Remote writing is Enabled in WP
Copy all post in one Word document or use merge to make one single DOC.
Now Set Default posting category from WP and Save it.
Now from your MSWORD copy the post and start posting one by one.
Tips:
Make Shortcut key for publishing.
Use Ctrl+C for text before publishing.
Make shortcut for publishing to WP

In ASP.NET what is the best way to convert a PDF file to HTML?

What my users will do is select a PDF document on their machine, upload it to my website, where I will convert into an HTML document for display on the website. The document will be stored in a database after conversion.
What's the best way to convert a PDF to HTML?
I have been handed a requirement where a user would create a "news" story as a pdf and then would upload it to the sever, where it will be converted to HTML and displayed on the website.

Any document creation software that can save documents as PDF can save them as HTML. I'm assuming the issue is that your users will be creating rich documents (lots of embedded images), which results in multiple files, and your requirements stem from a desire to make uploading these documents as simple as possible to the user.
There are numerous conversion packages that can probably do this for you, however when you're talking about rich content, you are talking about text plus images. Those images have to be stored somewhere and served somehow, and whatever conversion method you use will require you to examine all image sources to make sure they point to valid locations on your server.
I would like to suggest an alternate way of doing this that you can take to your team: Implement one of the many blog APIs for publishing content. There are free and commercial software packages that use these APIs to publish content directly to a website, such as Windows Live Writer and Microsoft Word. Your users can simply create their content and upload it directly to your website without having to publish it as PDF first then upload it. So the process becomes much smoother for your users, and you get the posts in a form that doesn't require you spend thousands of dollars on developing or buying conversion code.
The two most common APIs are the MetaWeblog API and the Movable Type API. Both are very simple and easy to implement. I think this way would be a MUCH better alternative than what you're thinking about doing.

I don't think converting a PDF to an HTML string is necessarily the best idea, especially if you want to export it back as PDF. PDF files often contain binary elements such as images, so you may be best to convert it to ASCII via an encoding, such as Base64. That way you will have an ASCII string you can save into a text field in the DB and then convert it back out. Could you expand more on the main requirement?

My recommendation would be to not do it that way IF POSSIBLE (but we all know what managers are like) so...
I would recommend that you stay away from converting the PDF to/from HTML (because unless you can find a commercial solution it will be nigh on impossible) and instead do as has already been mentioned and store it as an encoded Base64 string, or BLOB or some other binary format in the database, and then display it to the user with some sort of PDF view plugin for the browser.

All it took was a simple google search for "PDF to HTML": http://www.gnostice.com/pdf2manyOverview_x.asp. I'm sure there are others.
So while it's 'possible', you may want to explain to your manager that this isn't the best content management solution.

Why not use the iTextSharp to read the PDF content? Then You could save both the binary PDF and the text content to the database. You could then let users search the content and download the PDF.

You should look into DynamicPDF. They have a converter (currently Beta) out for serving exactly this purpose. We have used their products with great success (especially for dumping Reporting Services reports directly to PDF).
Ref: http://www.dynamicpdf.com/

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex