How should I store (and present) the text on a website intended for worldwide use, with several languages? The content is mostly in the form of 500+ word articles, although I will need to translate tiny snippets of text on each page too (such as "print this article" or "back to menu").
I know there are several CMS packages that handle multiple languages, but I have to integrate with our existing ASP systems too, so I am ignoring such solutions.
One concern I have is that Google should be able to find the pages, even for foreign users. I am less concerned about issues with processing dates and currencies.
I worry that, left to my own devices, I will invent a way of doing this which work, but eventually lead to disaster! I want to know what professional solutions you have actually used on real projects, not untried ideas! Thanks very much.
I looked at RESX files, but felt they were unsuitable for all but the most trivial translation solutions (I will elaborate if anyone wants to know).
Google will help me with translating the text, but not storing/presenting it.
Has anyone worked on a multi-language project that relied on their own code for presentation?
Any thoughts on serving up content in the following ways, and which is best?
http://www.website.com/text/view.asp?id=12345&lang=fr
http://www.website.com/text/12345/bonjour_mes_amis.htm
http://fr.website.com/text/12345
(these are not real URLs, i was just showing examples)
Firstly put all code for all languages under one domain - it will help your google-rank.
We have a fully multi-lingual system, with localisations stored in a database but cached with the web application.
Wherever we want a localisation to appear we use:
<%$ Resources: LanguageProvider, Path/To/Localisation %>
Then in our web.config:
<globalization resourceProviderFactoryType="FactoryClassName, AssemblyName"/>
FactoryClassName then implements ResourceProviderFactory to provide the actual dynamic functionality. Localisations are stored in the DB with a string key "Path/To/Localisation"
It is important to cache the localised values - you don't want to have lots of DB lookups on each page, and we cache thousands of localised strings with no performance issues.
Use the user's current browser localisation to choose what language to serve up.
You might want to check GNU Gettext project out - at least something to start with.
Edited to add info about projects:
I've worked on several multilingual projects using Gettext technology in different technologies, including C++/MFC and J2EE/JSP, and it worked all fine. However, you need to write/find your own code to display the localized data of course.
If you are using .Net, I would recommend going with one or more resource files (.resx). There is plenty of documentation on this on MSDN.
As with most general programming questions, it depends on your needs.
For static text, I would use RESX files. For me, as .Net programmer, they are easy to use and the .Net Framework has good support for them.
For any dynamic text, I tend to store such information in the database, especially if the site maintainer is going to be a non-developer. In the past I've used two approaches, adding a language column and creating different entries for the different languages or creating a separate table to store the language specific text.
The table for the first approach might look something like this:
Article Id | Language Id | Language Specific Article Text | Created By | Created Date
This works for situations where you can create different entries for a given article and you don't need to keep any data associated with these different entries in sync (such as an Updated timestamp).
The other approach is to have two separate tables, one for non-language specific text (id, created date, created user, updated date, etc) and another table containing the language specific text. So the tables might look something like this:
First Table: Article Id | Created By | Created Date | Updated By | Updated Date
Second Table: Article Id | Language Id | Language Specific Article Text
For me, the question comes down to updating the non-language dependent data. If you are updating that data then I would lean towards the second approach, otherwise I would go with the first approach as I view that as simpler (can't forget the KISS principle).
If you're just worried about the article content being translated, and do not need a fully integrated option, I have used google translation in the past and it works great on a smaller scale.
Wonderful question.
I solved this problem for the website I made (link in my profile) with a homemade Python 3 script that translates the general template on the fly and inserts a specific content page from a language requested (or guessed by Apache from Accept-Language).
It was fun since I got to learn Python and write my own mini-library for creating content pages. One downside was that our hosting didn't have Python 3, but I made my script generate static HTML (the original one was examining User-agent) and then upload it to server. That works so far and making a new language version of the site is now a breeze :)
The biggest downside of this method is that it is time-consuming to write things from scratch. So if you want, drop me line and I'll help you use my script :)
As for the URL format, I use site.com/content/example.fr since this allows Apache to perform language negotiation in case somebody asks for /content/example and has a browser tell that it likes French language. When you do this Apache also adds .html or whatever as a bonus.
So when a request is for example and I have files
example.fr
example.en
example.vi
Apache will automatically proceed with example.vi for a person with Vietnamese-configured browser or example.en for a person with German-configured browser. Pretty useful.
Related
I have an application in ASP.Net in AngularJS, until then that my application had the need only in Brazil, but recently the company where I work needs to make this site available to other countries in other languages like Spanish and English.
I have never worked with International applications, so there will be many doubts and difficulties.
I'll start with the difficulties:
- My application has fixed texts in the html, database and code, how can I translate all these fronts? Is there a component that translates everything into the client? (Javascript, AngularJS, etc ...).
- Development time for this change (Too much code to change).
Questions: - Decimal, Date and Time, how to work with these values in an International application? (My application has many logs and values to display and insert)
I'm researching a lot but really needed a hint of where I could go.
Thank you, I'll wait!
I would start searching for "Localization". There is something called "AspNetBoilerplate" that has implemented localization into there project. Here is their link: https://aspnetboilerplate.com/ and here is document on how to use it within their project: https://aspnetboilerplate.com/Pages/Documents/Localization. You can easily download a free project and see what they did to implement it.
They have different localization files based on language being used. The language being used is a setting in the DB based upon the tenant that logs in. All static text looks up to its respective localization file (See their .Core project under "Localization" to see all their files).
As far as DateTimes/Decimals/etc, I would have extension methods that looked up the cached localization being used and format respectively.
By no means am I recommending you use this software but rather see how they implemented it within their project. Also, know that I have only had to do this once and there may be numerous other ways to accomplish this goal.
This is more of an advise / best practice question that I'm hoping someone has come across before and can give me a steer.
I need to build a web application (the client would like webforms because that's what their developers know for when i hand it over)
Essentially when the client logs in, they will pick a language then I need to replace the text for menus, input boxes etc. The client wants to add their translations and update them at any time.
Ideas I have looked at are:
Holding the translations in resource files, building an editor in to the web application and then adding attributes on the fly to my viewmodels.
Holding the translations in sql server so i have the name, language and translation as a lookup e.g. Home | French | Maison. Then on pre-render I'll scrape the screen for any controls needing translation in the menu, labels, text areas.
Does anyone know of any good examples or had the experience of doing this themselves.
I've a similar situation, and chose to store data in SQL.
Translation mistakes happen often, and you don't want to recompile or disassemble every time.
It is possible to avoid the need to republish, but I've found it just more intuitive and straightforward to maintain SQL.
Bottom line, it depends on the amount of data you have. If it's more than just a couple of keywords, it sounds like a job for SQL to me.
Edit:
In a similar question, users recommend using resources, claiming it is the standard method.
However, if your users are going to make changes to values on regular basis (not because of mistake correction, but because data actually changes), then SQL seems best fit for the job.
I'm planning to create a site for learning technologies, such as codeproject or codeplex. Can you please suggest to me the different ways to manage huge articles?
Look at a content management system, such as SiteFinity: http://www.sitefinity.com/. There are others, some free. You can find some on codeplex.com.
HTH.
Check out DotNetNuke CMS too >> http://www.dotnetnuke.com/
And here's a very hot list available of ASP.NET CMS systems:
http://en.wikipedia.org/wiki/List_of_content_management_systems#Microsoft_ASP.NET_2
Different ways to manage articles while building the entire system yourself. Hmm, ok, let me give it a try... here's the short version.
There are several ways you can "store" your articles (content, data, whatever), and the best way to do so is to use a Database. (SQL Server, MySQL, SQLCE, SQLite, Oracle, the list goes on).
If you're not sold on the idea of a database, you can use any other type of persistent storage that you like. IE: XML, or even flat "TXT" files.
Since you're using ASP.NET you now need to either write your code behind, or some other complied code to access your stored data. You pull it out of the storage and display it on the page/view.
Last but not least, I'd like to give you a suggestion (even though it's not part of your original question). As the other answerers have stated, you should look at a pre-built CMS. If nothing else, to see how it's done (not necessarily to use it as is). My philosophy is quite simple, if you want to be productive in your development, don't bother reinventing the wheel just for the sake of it. If someone else has already build and given away exactly what you need, you should at very least give it a look and use what you can. It will save you piles of time and heartache.
Your question is not vague enough to be closed, but is vague enough that answering all of the nuances could take several thousand lines.
Multi language content application in ASP.Net can usually be overcome by using resource files. However when it comes to dynamic content that need to be multi-language. what would be the ideal system design?
Example like having a questionnaire application, each question text and answer options need to be translated.
Is storing the specific language text/content using a resource key idea good? by using the question text or the main english version of the text as a key a good idea?
Because i am trying to reduce database calls whenever my system programmatically creates the questions on the page.
Resource files should just hold the keys/reference and nothing language specific should be put in there.
All your specific translations should reside in the database.
So, whenever your System dynamically creates questions there would be ONE and ONLY ONE DB call...using the Locale identifier and the appropriate language version of the question would be loaded.
This depends quite a lot on your survey system design, you have a few options.
For example, if you are storing each individual survey in an xml file with all the questions/answers encoded there, you may be able to get away with generating a new xml file per language. The additional files would contain only translated questions/answers.
So, you may have the survey IsASPAwesome.xml, and would generate IsASPAweseome.enUS.xml, IsASPAwesome.frFR.xml and so forth. Another option if you want to use a single file is to have multiple text values for each language you want and then select the appropriate for the language. Something like this (excuse the poor xml and worse french)
<survey>
<question id=1>
<questionText lang="enUS">how great is this?</questionText>
<questionText lang="frFR">Comment est ce grand?</questionText>
<answer value="GOOD" lang="enUS" sortOrder="1">Great!</answer>
<answer value="BAD" lang="enUS" sortOrder="2">Not so great</answer>
<answer value="GOOD" lang="frFR" sortOrder="1">Grand!</answer>
<answer value="BAD" lang="frFR" sortOrder="2">Merde!</answer>
</question>
</survey>
If you were storing this in a DB, you should be able to get away with the same amount of DB calls, as you would just add an additional condition to your select query (ie SELECT answerText,Value FROM questionAnswers WHERE questionId=2 AND lang="frFR")
Using a separate survey per language may make processing the results a little harder, but it does make coding multi-lingual support a fiar bit easier. furthermore, it makes caching easier too (which is a good idea if you want to limit your DB calls, as survey's don't change that often).
I went down the "multiple XML file" approach for a previous project and it was sufficient, and also allowed translators to work simlutaneously and, in one case, suggest more appropriate ordering of questions for cultural reasons.
Hope this helps!
I'm writing an asp.net application that will need to be localized to several regions other than North America. What do I need to do to prepare for this globalization? What are your top 1 to 2 resources for learning how to write a world ready application.
A couple of things that I've learned:
Absolutely and brutally minimize the number of images you have that contain text. Doing so will make your life a billion percent easier since you won't have to get a new set of images for every friggin' language.
Be very wary of css positioning that relies on things always remaining the same size. If those things contain text, they will not remain the same size, and you will then need to go back and fix your designs.
If you use character types in your sql tables, make sure that any of those that might receive international input are unicode (nchar, nvarchar, ntext). For that matter, I would just standardize on using the unicode versions.
If you're building SQL queries dynamically, make sure that you include the N prefix before any quoted text if there's any chance that text might be unicode. If you end up putting garbage in a SQL table, check to see if that's there.
Make sure that all your web pages definitively state that they are in a unicode format. See Joel's article, mentioned above.
You're going to be using resource files a lot for this project. That's good - ASP.NET 2.0 has great support for such. You'll want to look into the App_LocalResources and App_GlobalResources folder as well as GetLocalResourceObject, GetGlobalResourceObject, and the concept of meta:resourceKey. Chapter 30 of Professional ASP.NET 2.0 has some great content regarding that. The 3.5 version of the book may well have good content there as well, but I don't own it.
Think about fonts. Many of the standard fonts you might want to use aren't unicode capable. I've always had luck with Arial Unicode MS, MS Gothic, MS Mincho. I'm not sure about how cross-platform these are, though. Also, note that not all fonts support all of the Unicode character definition. Again, test, test, test.
Start thinking now about how you're going to get translations into this system. Go talk to whoever is your translation vendor about how they want data passed back and forth for translation. Think about the fact that, through your local resource files, you will likely be repeating some commonly used strings through the system. Do you normalize those into global resource files, or do you have some sort of database layer where only one copy of each text used is generated. In our recent project, we used resource files which were generated from a database table that contained all the translations and the original, english version of the resource files.
Test. Generally speaking I will test in German, Polish, and an Asian language (Japanese, Chinese, Korean). German and Polish are wordy and nearly guaranteed to stretch text areas, Asian languages use an entirely different set of characters which tests your unicode support.
Learn about the System.Globalization namespace:
System.Globalization
Also, a good book is NET Internationalization: The Developer's Guide to Building Global Windows and Web Applications
Would be good to refresh a bit on Unicodes if you are targeting other cultures,languages.
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
This is a hard problem. I live in Canada, so multilingualism is a big issue. In all my years of doing software development, I've never seen a solution that I liked. I've seen a lot of solutions that worked, and got the job done, but they've always felt like a big kludge. I would go with #harriyott, and make sure that none of your strings are actually in code. A resource file works well for desktop applications. However in ASP.Net, I'd recommend using the database. #John Christensen also has some good pointers.
Make sure you're compiling with Code Analysis turned on, and pay attention to the Globalization warnings that it gives you. Keep data in an invariant format (CultureInfo.InvariantCulture) until you display it to the user (then use CultureInfo.CurrentCulture).
I would seriously consider reading the following code project article:
Globalization and localization demystified in ASP.NET 2.0
It covers everything from Cultures and Locales, setting the threads current culture, resource files, encodings, you name it!
And of course it's loaded with pretty pictures and examples :-). Good luck!
I would suggest:
Put all strings in either the database or resource files.
Allow extra space for translated text, as some (e.g. German) are wordier.