I have been asked to upload 200 Microsoft Word documents — many of them containing lengthy, complex math problems or scientific notation — into a WordPress setting. Each Word file would become a separate WordPress post.
I would clearly prefer to not cut-and-paste each file one-by-one into a post and then save it . Does anyone know of a way to automate the process while ensuring the accuracy of the translation, or at least minimizing the number of issues we might find when converting from Word to WordPress? Or am I dreaming the impossible dream?
Thanks for any input you can offer.
Sounds like an interesting problem. I have an idea that might be worth exploring. There are a number of free or shareware tools that can convert Word docs to HTML.
If you can manage to convert them into decently clean markup with one of those tools, I would recommend using the HTML Import 2 WordPress plugin. It can take a batch of HTML files and create Posts / Pages out of them.
It's a two step process, but I bet it'll work. (And certainly be faster than copy/paste 200 times).
Hope that helps, have fun!
Well I got the solution which works for me, but its bit manual but still save a lots of time.
Here are the Steps.
Connect your Blog to Ms word 2007/2013
Make sure Remote writing is Enabled in WP
Copy all post in one Word document or use merge to make one single DOC.
Now Set Default posting category from WP and Save it.
Now from your MSWORD copy the post and start posting one by one.
Tips:
Make Shortcut key for publishing.
Use Ctrl+C for text before publishing.
Make shortcut for publishing to WP
Related
I have searched for a setting within the mvbasic extension within VSCode but I may have hit a dead end. I am new to using VSCode with the rocket mvbasic extension and still in the learning process, so please bear with me.
Our development for the most part has always been directly on the server using the editor within it to code and develop on a Unix/Aix platform with Unidata. Some of our code has array assignments with CHAR(253)/CHAR(254) characters within them. See the link to the image that shows how its done. Now I didn't do this code, the original software developer did this many many years ago and we just aren't going to go and change it all.
How code looks on actual server
The issue is when pulling the code to edit in VSCode, the extension is changing it, and I uploaded it back and didn't pay attention and it was implemented in our production incorrectly, which created a few bugs.
ALIST="H�V�P�R�M�D"
How code looks in VSCode
How code looks after uploaded back to server from VSCode
Easy to fix, no biggie, but now to my question.
Does anyone have this issue, or has a direction to point me into that maybe I need to create a setting to keep the characters in the correct ASCII format so that this doesn't happen again by mistake?
VSCode defaults to the sane choice for character encoding in 2022: utf-8, but sometimes you have to deal with legacy stuff.
https://code.visualstudio.com/docs/editor/codebasics#_file-encoding-support
If you click on the UTF-8 in the bottom right corner you can choose "Reopen with Encoding":
After that, you can select a different encoding. I chose DOS (CP437) at a guess and literal MV characters are displayed as superscript 2 (²), and for me I can save to the server and confirm those characters remain as #VM after a round trip (though for my terminal emulator they appear as } which is useful).
You can edit preferences and set "files.encoding": "cp437". One other thing that can be helpful if your programs don't have a standard extension (like .bas) as most don't is to set the default mode to basic so most of what you're editing will identify as MVbasic, and you can do a quick CTRL-K M to switch to any other modes if you're just pasting in something else like SQL.
Some useful links - the Rocket forums are helpful and the folks there are always super nice
https://community.rocketsoftware.com/forums/multivalue?CommunityKey=521bce2e-71d5-4d32-b560-dfa95e950eb5
The MV Extensions Community extension is a good group and always has been helpful when I've had issues. I've made some small contributions - they're very open. I prefer this extension, but honestly haven't done a deep comparison.
https://github.com/mvextensions
I am now working with 500 pdf recipes files, which I want to display in my website. How can I batch extract them and display information on PDF to my website? PDF has all the information for recipes. For each recipe, I need to display its description, image, ingredients, instructions, nutrition label and so on. Is there any way so that I don't need to work on it manually?
Do these all have the same basic template for how the information is structured? This isn't really specifically a WordPress issue. One thing you can do is use Go to loop through and process all the files. I played with Go and it's incredibly fast to parse large amounts of information. Maybe you can fiddle with it in this library here https://github.com/unidoc/unidoc.
There are a lot of library options to try in PHP also. Here's just one example https://www.pdfparser.org/. There's documentation here and you can install it via composer. https://www.pdfparser.org/documentation
If every recipe follows the same sort of template, and you want to extract specific details in specific sections of the PDF, it should be easy enough. If you don't mind extracting all the text from a PDF and just display that on your website, it should be easy enough using one of the libraries. If you go the Golang route, you could just parse all the text for each PDF, save them to a file, and just upload them using PHP and have the PHP code insert everything into custom post types or something.
I'm building an application to view pdf's through a browser without the need of a plugin on mobile devices. I tried ImageMagick and ghostscript to covert the pages to images but they are far too large and text becomes unclear. I see website offering a service of converting pdf's into html and do a descent job but I can't find an example of how this is accomplished. Any help is much appreciated. Thanks!
EDIT: I seem to have read the question backwards. In this case it might be best to parse through the PDF and then format some HTML based on what you find. I believe the javapdf option is capable of this, but I haven't used any of these so I am not sure. If worse comes to worst and you can't find software to disassemble a PDF, you might be able to write your own disassembler in Java or PHP by reading the PDF specification. Best of luck!
http://www.adobe.com/devnet/pdf/pdf_reference.html - PDF Specification (Adobe Modified Version, because they are most popular you may want to support their extensions)
-- OLD -- These websites probably write their own proprietary software to do the trick. If you are truly interested in this undertaking, I would suggest parsing the HTML to get the data and style information and using it to format some sort of PDF writer APIs. A quick Google search yields the following: -- END OLD --
http://www.cutepdf.com/Solutions/
http://ruby-pdf.rubyforge.org/pdf-writer/doc/index.html
http://asprise.com/product/javapdf/
If you are looking at converting PDF to HTML and planning to run the conversion on a server, then you can try pdf2html. It is a program packaged as part of poppler-utils. I do not know how the program accomplishes it.
I was googling and came across the below link explaining how scridb.com implements conversion.
http://coding.scribd.com/2010/06/01/the-perils-of-stacking/
How should I store (and present) the text on a website intended for worldwide use, with several languages? The content is mostly in the form of 500+ word articles, although I will need to translate tiny snippets of text on each page too (such as "print this article" or "back to menu").
I know there are several CMS packages that handle multiple languages, but I have to integrate with our existing ASP systems too, so I am ignoring such solutions.
One concern I have is that Google should be able to find the pages, even for foreign users. I am less concerned about issues with processing dates and currencies.
I worry that, left to my own devices, I will invent a way of doing this which work, but eventually lead to disaster! I want to know what professional solutions you have actually used on real projects, not untried ideas! Thanks very much.
I looked at RESX files, but felt they were unsuitable for all but the most trivial translation solutions (I will elaborate if anyone wants to know).
Google will help me with translating the text, but not storing/presenting it.
Has anyone worked on a multi-language project that relied on their own code for presentation?
Any thoughts on serving up content in the following ways, and which is best?
http://www.website.com/text/view.asp?id=12345&lang=fr
http://www.website.com/text/12345/bonjour_mes_amis.htm
http://fr.website.com/text/12345
(these are not real URLs, i was just showing examples)
Firstly put all code for all languages under one domain - it will help your google-rank.
We have a fully multi-lingual system, with localisations stored in a database but cached with the web application.
Wherever we want a localisation to appear we use:
<%$ Resources: LanguageProvider, Path/To/Localisation %>
Then in our web.config:
<globalization resourceProviderFactoryType="FactoryClassName, AssemblyName"/>
FactoryClassName then implements ResourceProviderFactory to provide the actual dynamic functionality. Localisations are stored in the DB with a string key "Path/To/Localisation"
It is important to cache the localised values - you don't want to have lots of DB lookups on each page, and we cache thousands of localised strings with no performance issues.
Use the user's current browser localisation to choose what language to serve up.
You might want to check GNU Gettext project out - at least something to start with.
Edited to add info about projects:
I've worked on several multilingual projects using Gettext technology in different technologies, including C++/MFC and J2EE/JSP, and it worked all fine. However, you need to write/find your own code to display the localized data of course.
If you are using .Net, I would recommend going with one or more resource files (.resx). There is plenty of documentation on this on MSDN.
As with most general programming questions, it depends on your needs.
For static text, I would use RESX files. For me, as .Net programmer, they are easy to use and the .Net Framework has good support for them.
For any dynamic text, I tend to store such information in the database, especially if the site maintainer is going to be a non-developer. In the past I've used two approaches, adding a language column and creating different entries for the different languages or creating a separate table to store the language specific text.
The table for the first approach might look something like this:
Article Id | Language Id | Language Specific Article Text | Created By | Created Date
This works for situations where you can create different entries for a given article and you don't need to keep any data associated with these different entries in sync (such as an Updated timestamp).
The other approach is to have two separate tables, one for non-language specific text (id, created date, created user, updated date, etc) and another table containing the language specific text. So the tables might look something like this:
First Table: Article Id | Created By | Created Date | Updated By | Updated Date
Second Table: Article Id | Language Id | Language Specific Article Text
For me, the question comes down to updating the non-language dependent data. If you are updating that data then I would lean towards the second approach, otherwise I would go with the first approach as I view that as simpler (can't forget the KISS principle).
If you're just worried about the article content being translated, and do not need a fully integrated option, I have used google translation in the past and it works great on a smaller scale.
Wonderful question.
I solved this problem for the website I made (link in my profile) with a homemade Python 3 script that translates the general template on the fly and inserts a specific content page from a language requested (or guessed by Apache from Accept-Language).
It was fun since I got to learn Python and write my own mini-library for creating content pages. One downside was that our hosting didn't have Python 3, but I made my script generate static HTML (the original one was examining User-agent) and then upload it to server. That works so far and making a new language version of the site is now a breeze :)
The biggest downside of this method is that it is time-consuming to write things from scratch. So if you want, drop me line and I'll help you use my script :)
As for the URL format, I use site.com/content/example.fr since this allows Apache to perform language negotiation in case somebody asks for /content/example and has a browser tell that it likes French language. When you do this Apache also adds .html or whatever as a bonus.
So when a request is for example and I have files
example.fr
example.en
example.vi
Apache will automatically proceed with example.vi for a person with Vietnamese-configured browser or example.en for a person with German-configured browser. Pretty useful.
I'm thinking of starting a wiki, probably on a low cost LAMP hosting account. I'd like the option of exporting my content later in case I want to run it on IIS/ASP.NET down the line. I know in the weblog world, there's an open standard called BlogML which will let you export your blog content to an XML based format on one site and import it into another. Is there something similar with wikis?
The correct answer is ... "it depends".
It depends on which wiki you're using or planning to use. I've used various over the years MoinMoin was ok, used files rather than database, Ubuntu seem to like it. MediaWiki, everyone knows about and JAMWiki is a java clone(ish) of MediaWiki with the aim to be markup compatible with MediaWiki, both use databases and you can generally connect whichever database you want, JAMWiki is pre-configured to use an internal HSQLDB instance.
I recently converted about 80 pages from a MoinMoin wiki into JAMWiki pages and this was probably 90% handled by a tiny perl script I found somewhere (I'll provide a link if I can find it again). The other 10% was unfortunately a by-hand experience (they were of the utmost importance with them being recipies for the missus) ;-)
I also recently setup a Mediawiki instance for work and that took all of about 8 minutes to do. So that'd be my choice.
To answer your question I don't believe that there's such a standard as WikiML as Till called it.
As strange as it sounds, I've investigated screen scraping a wiki for a co-worker to help him port it to another wiki engine. It turned out that screen scraping would have been easier, quicker and more efficient to write to move this particular file based wiki to another one or a CMS.
Given the context that you wrote the question in I would bite the bullet now and pay the little extra for a windows hosted account and put Screwturn wiki on it. You're got the option of using file based or SQL Server based back end for it but because one of your requirements is low cost I'm guessing that you would use file based now for a cheaper hosted account and then you can always upscale the back end to SQL Server.
I haven't heard of WikiML.
I think your biggest obstacle is gonna be converting one wiki markup to another. For example, some wikis use markdown (which is what Stack Overflow uses), others use another markup syntax (e.g. BBCode, ...), etc.. The bottom line is - assuming the contents are databased it's not impossible to export and parse it to make it "fit" in another system. It might just be a pain in the ass.
And if the contents are not databased, it's gonna be a royal pain in the ass. :D
Another solution would be to stay with the same system. I am not sure what the reason is for changing the technology later on. It's not like a growing project requires IIS/ASP.NET all of the sudden. (It might just be the other way around.) But for example, if you could stick with PHP for a while, you could also run that on IIS.