I'm wondering if there is a place where I can get a bilingual dictionary in open format for my program.
So far I'd like to have just few language pairs: eng-esp, eng-frn, ger-esp, let's say.
I googled and lurked without success so far.
I've done translation for an application using BabelFish and it came out ok, especially when you keep captions and messages short. To display different languages I used 'Chilkat's Charset' to render the different languages. I kept all the language translations in an excel file, good thing because customers have added to this language collection for us. The program has all the translations in text files that it uses to populate captions and messages prompts.
Would a google translate API work?: http://code.google.com/apis/language/
Related
For our SaaS (LAMP) product reporting we are currently using JasperReports. We find it too cumbersome to develop reports with and the output in Word unworkable. Moreover, a couple of customers request to be able to develop simple reports themselves (to be used as mail merge). We would therefore like to develop templates right in Word. The idea is to have an application/webservice that would receive the Word template and JSON data from the LAMP application and return the filled-in report. The report has to support:
Loops inside content (repeating a document section several times while filling in array data)
Filling in tables (populating rows from array)
Filling in chart data in pre-created charts (from array)
This is the functionality we are using in JasperReports right now. Are there existing solutions to this? I've found quite a lot that can substitute simple variables, but no info about the the above three points. Will it be a lot of effort to write one from scratch? I would prefer a Windows OpenXML-based solution rather than a Linux PHPOffice-based one as I presume the former would handle the text split by spell-checker and language tags (though I'm not sure).
Windward and Docmosis are both commercial products that support the features you've listed and they are intended to be added to your application to provide reporting capabilities. Neither is are not OpenXML based. They can use Word documents as templates and perform the data merge into different output formats. Please note I work for Docmosis.
Aspose Words is another tool and it can populate a template but most of the power is through code rather than controls/directives in the template. Given your OpenXML thoughts, perhaps this is more what you are looking for.
More tools are recommended here in StackExchange.
I hope that helps.
ReportBox is a Web based reporting solution that can be used by any software application to generate documents and reports in Microsoft Word/ Excel/ PowerPoint/ HTML(DocX/Xlsx/PPTx/HTML) using OpenXML.
The process starts by building a Microsoft Word/ Excel/ PowerPoint/ HTML document as a template and uploading to ReportBox portal. Your application either sends data to ReportBox or ReportBox can pull data from your application database, which is then merged with the template to produce the finished report. Please note that I work for GreenThoughts.
I'm building a web app that's going to support multiple languages. For the moment, the words are hard-coded in English in the HTML and javascript. I want to use objects that contains the English word as the key and the word to be displayed as the value and have this dictionary object populate the page at runtime. But my question is not about client-side issues.
What's the best way to store and maintain this dictionary on the server. So far, I thought of a database table with columns for the English word and rows for the values to be displayed. I would then load an entire row in an anonymous type that I'd serialize in json and send the client. I think it'd work but I'm wondering if that's the best way to do it so that the dictionary will be easy to maintain.
If you've had some experience dealing with internationalization then please let me know if you have some suggestions.
Thanks.
You can use Resouce files for multiple language support for asp.net web. It is THE BEST method to use for multiple language support in asp.net. http://msdn.microsoft.com/en-us/library/fw69ke6f%28v=vs.100%29.aspx
I'm creating a web application that uses I18n. As I don't want to translate very common basic strings like "forgot password?" on my own I'm asking you if there are already any resource files or word lists containing these strings. One option is to download an existing framework and extract somehow these strings but this might be a hassle?
Especially I'm looking for translation regarding user authentication and translations from English to Italian, French and German. The file format doesn't matter.
Professional translators use a tool, TMX is the generic term i think, Translation Memory Exchange, that does what you are talking about by building up standard phrase lists in other languages so when they translate they can bring these phrases in to speed up their job and reduce the repetitive tedium. So these lists exist.
There is a free plugin for MS Word that does this and may come with lists (sorry cannot remember the name although Rosetta rings a bell).
There is an FOSS TMX tool called Okapi at Sourceforge. It may come with the dictionaries but if not it is a point where you can investigate.
You could also approach a site called Proz which is a site for translators and might be able to point you in the right direction
Take care over MT like Google API as it can give some weird results but you could use it to build you list and then double check. Remember that when you check a language that you need to do it with a native speaker who can pick up on the nuances and colloquialisms.
You can use google translator api. and your custom resource bundle
What my users will do is select a PDF document on their machine, upload it to my website, where I will convert into an HTML document for display on the website. The document will be stored in a database after conversion.
What's the best way to convert a PDF to HTML?
I have been handed a requirement where a user would create a "news" story as a pdf and then would upload it to the sever, where it will be converted to HTML and displayed on the website.
Any document creation software that can save documents as PDF can save them as HTML. I'm assuming the issue is that your users will be creating rich documents (lots of embedded images), which results in multiple files, and your requirements stem from a desire to make uploading these documents as simple as possible to the user.
There are numerous conversion packages that can probably do this for you, however when you're talking about rich content, you are talking about text plus images. Those images have to be stored somewhere and served somehow, and whatever conversion method you use will require you to examine all image sources to make sure they point to valid locations on your server.
I would like to suggest an alternate way of doing this that you can take to your team: Implement one of the many blog APIs for publishing content. There are free and commercial software packages that use these APIs to publish content directly to a website, such as Windows Live Writer and Microsoft Word. Your users can simply create their content and upload it directly to your website without having to publish it as PDF first then upload it. So the process becomes much smoother for your users, and you get the posts in a form that doesn't require you spend thousands of dollars on developing or buying conversion code.
The two most common APIs are the MetaWeblog API and the Movable Type API. Both are very simple and easy to implement. I think this way would be a MUCH better alternative than what you're thinking about doing.
I don't think converting a PDF to an HTML string is necessarily the best idea, especially if you want to export it back as PDF. PDF files often contain binary elements such as images, so you may be best to convert it to ASCII via an encoding, such as Base64. That way you will have an ASCII string you can save into a text field in the DB and then convert it back out. Could you expand more on the main requirement?
My recommendation would be to not do it that way IF POSSIBLE (but we all know what managers are like) so...
I would recommend that you stay away from converting the PDF to/from HTML (because unless you can find a commercial solution it will be nigh on impossible) and instead do as has already been mentioned and store it as an encoded Base64 string, or BLOB or some other binary format in the database, and then display it to the user with some sort of PDF view plugin for the browser.
All it took was a simple google search for "PDF to HTML": http://www.gnostice.com/pdf2manyOverview_x.asp. I'm sure there are others.
So while it's 'possible', you may want to explain to your manager that this isn't the best content management solution.
Why not use the iTextSharp to read the PDF content? Then You could save both the binary PDF and the text content to the database. You could then let users search the content and download the PDF.
You should look into DynamicPDF. They have a converter (currently Beta) out for serving exactly this purpose. We have used their products with great success (especially for dumping Reporting Services reports directly to PDF).
Ref: http://www.dynamicpdf.com/
How should I store (and present) the text on a website intended for worldwide use, with several languages? The content is mostly in the form of 500+ word articles, although I will need to translate tiny snippets of text on each page too (such as "print this article" or "back to menu").
I know there are several CMS packages that handle multiple languages, but I have to integrate with our existing ASP systems too, so I am ignoring such solutions.
One concern I have is that Google should be able to find the pages, even for foreign users. I am less concerned about issues with processing dates and currencies.
I worry that, left to my own devices, I will invent a way of doing this which work, but eventually lead to disaster! I want to know what professional solutions you have actually used on real projects, not untried ideas! Thanks very much.
I looked at RESX files, but felt they were unsuitable for all but the most trivial translation solutions (I will elaborate if anyone wants to know).
Google will help me with translating the text, but not storing/presenting it.
Has anyone worked on a multi-language project that relied on their own code for presentation?
Any thoughts on serving up content in the following ways, and which is best?
http://www.website.com/text/view.asp?id=12345&lang=fr
http://www.website.com/text/12345/bonjour_mes_amis.htm
http://fr.website.com/text/12345
(these are not real URLs, i was just showing examples)
Firstly put all code for all languages under one domain - it will help your google-rank.
We have a fully multi-lingual system, with localisations stored in a database but cached with the web application.
Wherever we want a localisation to appear we use:
<%$ Resources: LanguageProvider, Path/To/Localisation %>
Then in our web.config:
<globalization resourceProviderFactoryType="FactoryClassName, AssemblyName"/>
FactoryClassName then implements ResourceProviderFactory to provide the actual dynamic functionality. Localisations are stored in the DB with a string key "Path/To/Localisation"
It is important to cache the localised values - you don't want to have lots of DB lookups on each page, and we cache thousands of localised strings with no performance issues.
Use the user's current browser localisation to choose what language to serve up.
You might want to check GNU Gettext project out - at least something to start with.
Edited to add info about projects:
I've worked on several multilingual projects using Gettext technology in different technologies, including C++/MFC and J2EE/JSP, and it worked all fine. However, you need to write/find your own code to display the localized data of course.
If you are using .Net, I would recommend going with one or more resource files (.resx). There is plenty of documentation on this on MSDN.
As with most general programming questions, it depends on your needs.
For static text, I would use RESX files. For me, as .Net programmer, they are easy to use and the .Net Framework has good support for them.
For any dynamic text, I tend to store such information in the database, especially if the site maintainer is going to be a non-developer. In the past I've used two approaches, adding a language column and creating different entries for the different languages or creating a separate table to store the language specific text.
The table for the first approach might look something like this:
Article Id | Language Id | Language Specific Article Text | Created By | Created Date
This works for situations where you can create different entries for a given article and you don't need to keep any data associated with these different entries in sync (such as an Updated timestamp).
The other approach is to have two separate tables, one for non-language specific text (id, created date, created user, updated date, etc) and another table containing the language specific text. So the tables might look something like this:
First Table: Article Id | Created By | Created Date | Updated By | Updated Date
Second Table: Article Id | Language Id | Language Specific Article Text
For me, the question comes down to updating the non-language dependent data. If you are updating that data then I would lean towards the second approach, otherwise I would go with the first approach as I view that as simpler (can't forget the KISS principle).
If you're just worried about the article content being translated, and do not need a fully integrated option, I have used google translation in the past and it works great on a smaller scale.
Wonderful question.
I solved this problem for the website I made (link in my profile) with a homemade Python 3 script that translates the general template on the fly and inserts a specific content page from a language requested (or guessed by Apache from Accept-Language).
It was fun since I got to learn Python and write my own mini-library for creating content pages. One downside was that our hosting didn't have Python 3, but I made my script generate static HTML (the original one was examining User-agent) and then upload it to server. That works so far and making a new language version of the site is now a breeze :)
The biggest downside of this method is that it is time-consuming to write things from scratch. So if you want, drop me line and I'll help you use my script :)
As for the URL format, I use site.com/content/example.fr since this allows Apache to perform language negotiation in case somebody asks for /content/example and has a browser tell that it likes French language. When you do this Apache also adds .html or whatever as a bonus.
So when a request is for example and I have files
example.fr
example.en
example.vi
Apache will automatically proceed with example.vi for a person with Vietnamese-configured browser or example.en for a person with German-configured browser. Pretty useful.