I'm writing an asp.net application that will need to be localized to several regions other than North America. What do I need to do to prepare for this globalization? What are your top 1 to 2 resources for learning how to write a world ready application.
A couple of things that I've learned:
Absolutely and brutally minimize the number of images you have that contain text. Doing so will make your life a billion percent easier since you won't have to get a new set of images for every friggin' language.
Be very wary of css positioning that relies on things always remaining the same size. If those things contain text, they will not remain the same size, and you will then need to go back and fix your designs.
If you use character types in your sql tables, make sure that any of those that might receive international input are unicode (nchar, nvarchar, ntext). For that matter, I would just standardize on using the unicode versions.
If you're building SQL queries dynamically, make sure that you include the N prefix before any quoted text if there's any chance that text might be unicode. If you end up putting garbage in a SQL table, check to see if that's there.
Make sure that all your web pages definitively state that they are in a unicode format. See Joel's article, mentioned above.
You're going to be using resource files a lot for this project. That's good - ASP.NET 2.0 has great support for such. You'll want to look into the App_LocalResources and App_GlobalResources folder as well as GetLocalResourceObject, GetGlobalResourceObject, and the concept of meta:resourceKey. Chapter 30 of Professional ASP.NET 2.0 has some great content regarding that. The 3.5 version of the book may well have good content there as well, but I don't own it.
Think about fonts. Many of the standard fonts you might want to use aren't unicode capable. I've always had luck with Arial Unicode MS, MS Gothic, MS Mincho. I'm not sure about how cross-platform these are, though. Also, note that not all fonts support all of the Unicode character definition. Again, test, test, test.
Start thinking now about how you're going to get translations into this system. Go talk to whoever is your translation vendor about how they want data passed back and forth for translation. Think about the fact that, through your local resource files, you will likely be repeating some commonly used strings through the system. Do you normalize those into global resource files, or do you have some sort of database layer where only one copy of each text used is generated. In our recent project, we used resource files which were generated from a database table that contained all the translations and the original, english version of the resource files.
Test. Generally speaking I will test in German, Polish, and an Asian language (Japanese, Chinese, Korean). German and Polish are wordy and nearly guaranteed to stretch text areas, Asian languages use an entirely different set of characters which tests your unicode support.
Learn about the System.Globalization namespace:
System.Globalization
Also, a good book is NET Internationalization: The Developer's Guide to Building Global Windows and Web Applications
Would be good to refresh a bit on Unicodes if you are targeting other cultures,languages.
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
This is a hard problem. I live in Canada, so multilingualism is a big issue. In all my years of doing software development, I've never seen a solution that I liked. I've seen a lot of solutions that worked, and got the job done, but they've always felt like a big kludge. I would go with #harriyott, and make sure that none of your strings are actually in code. A resource file works well for desktop applications. However in ASP.Net, I'd recommend using the database. #John Christensen also has some good pointers.
Make sure you're compiling with Code Analysis turned on, and pay attention to the Globalization warnings that it gives you. Keep data in an invariant format (CultureInfo.InvariantCulture) until you display it to the user (then use CultureInfo.CurrentCulture).
I would seriously consider reading the following code project article:
Globalization and localization demystified in ASP.NET 2.0
It covers everything from Cultures and Locales, setting the threads current culture, resource files, encodings, you name it!
And of course it's loaded with pretty pictures and examples :-). Good luck!
I would suggest:
Put all strings in either the database or resource files.
Allow extra space for translated text, as some (e.g. German) are wordier.
Related
I'm pretty much after people opinions/best practices and nuggets of experience here.
I need to produce a new website in ASP.net C# which has the requirement of changing the language based on the user profiles.
I've done a couple of simple samples before but I'm curious on a slightly lower level. I'm after resources which I can read and review really.
What design patterns are in place for doing things like translating grids of data into different cultures.
If I'm going to store currency info, is it standard practice to store the exchange rates also?
If I'm going to down the route of a standard ASP.net web application can I use URL routing to help pick the culture to use? for instance www.mynewsite.com/en-GB/default.aspx.
Wisdom/Thoughts welcome.
Thanks for looking and thanks more for answering,
Mike
A couple of things that I've learned:
Absolutely and brutally minimize the number of images you have that contain text. Doing so will make your life a billion percent easier since you won't have to get a new set of images for every friggin' language.
Be very wary of css positioning that relies on things always remaining the same size. If those things contain text, they will not remain the same size, and you will then need to go back and fix your designs.
If you use character types in your sql tables, make sure that any of those that might receive international input are unicode (nchar, nvarchar, ntext). For that matter, I would just standardize on using the unicode versions.
If you're building SQL queries dynamically, make sure that you include the N prefix before any quoted text if there's any chance that text might be unicode. If you end up putting garbage in a SQL table, check to see if that's there.
Make sure that all your web pages definitively state that they are in a unicode format. See Joel's article, mentioned above.
You're going to be using resource files a lot for this project. That's good - ASP.NET 2.0 has great support for such. You'll want to look into the App_LocalResources and App_GlobalResources folder as well as GetLocalResourceObject, GetGlobalResourceObject, and the concept of meta:resourceKey. Chapter 30 of Professional ASP.NET 2.0 has some great content regarding that. The 3.5 version of the book may well have good content there as well, but I don't own it.
Think about fonts. Many of the standard fonts you might want to use aren't unicode capable. I've always had luck with Arial Unicode MS, MS Gothic, MS Mincho. I'm not sure about how cross-platform these are, though. Also, note that not all fonts support all of the Unicode character definition. Again, test, test, test.
Start thinking now about how you're going to get translations into this system. Go talk to whoever is your translation vendor about how they want data passed back and forth for translation. Think about the fact that, through your local resource files, you will likely be repeating some commonly used strings through the system. Do you normalize those into global resource files, or do you have some sort of database layer where only one copy of each text used is generated. In our recent project, we used resource files which were generated from a database table that contained all the translations and the original, english version of the resource files.
Test. Generally speaking I will test in German, Polish, Hebrew or Arabic, and an Asian language (Japanese, Chinese, Korean). German and Polish are wordy and nearly guaranteed to stretch text areas, Asian languages use an entirely different set of characters which tests your unicode support, and Hebrew and Arabic are both right to left languages.
Recently we started using some code names for several different types of prototype applications all following a theme. This made things a little more fun and was a great idea.
The problem is that Im not too sure how people deal with migrating a codebase from "codename" state into version 1.0 state which may have a proper name... not something that a client really shouldnt see :)
We are using Visual Studio at the moment, and I can see that you can change the assembly name, but there are references to the namespaces, etc... that would really be a large change to make.
Do people bother changing things like namespaces before the v1.0 release?
I prefer to change all references including project names, folders, namespaces, everything whenever the real name changes. It can be a bit of a pain, but it's better in the long run, especially when new developers are introduced to a project and are not familiar with the history.
Some companies continue to use code names internally even after the real name is decided and released. Even today there are some places where "Opus" shows up in reference to Microsoft Word (when digging into window handle info, not any published api or ui).
If you keep code names around, you end up with a mess and a large document to have to know what is what.
http://en.wikipedia.org/wiki/List_of_Microsoft_codenames
I've always considered development names to live in a different space than the deliverable product name. Unless the development name is profane, or you are producing libraries or APIs, I don't see necessary harm from the development name appearing in a symbol table or sumptin'. (Your customers will generate their own profanities for your code, anyway ;)
Sam's answer sort of agrees with this stance, if the development names never got outside the code pit, there wouldn't be a Wikipedia page listing them.
I'm about to begin work on translating client's website into spanish and french and looking for resources on Localization with ASP.NET. There are millions of hits in Google and almost all of them go back to 2005 and ASP.NET 2.0. Is there anaything new in regards to localization in 3.5 and VS2008? Any tips or recources with common practices would be highly appreciated!
Localization simply hasn't changed that much since ASP.NET 2.0, to be honest. The resources you're finding are no doubt recommending you put things in resx files located in App_LocalResources, which is still the way you do it. Here's some tips I've learned from doing the same things.
Absolutely and brutally minimize the number of images you have that contain text. Doing so will make your life a billion percent easier since you won't have to get a new set of images for every friggin' language.
Be very wary of css positioning that relies on things always remaining the same size. If those things contain text, they will not remain the same size, and you will then need to go back and fix your designs.
If you use character types in your sql tables, make sure that any of those that might receive international input are unicode (nchar, nvarchar, ntext). For that matter, I would just standardize on using the unicode versions.
If you're building SQL queries dynamically, make sure that you include the N prefix before any quoted text if there's any chance that text might be unicode. If you end up putting garbage in a SQL table, check to see if that's there.
Make sure that all your web pages definitively state that they are in a unicode format.
See Joel's article on Unicode - http://joelonsoftware.com/articles/Unicode.html
You're going to be using resource files a lot for this project. That's good - ASP.NET 2.0 has great support for such. You'll want to look into the App_LocalResources and App_GlobalResources folder as well as GetLocalResourceObject, GetGlobalResourceObject, and the concept of meta:resourceKey. Chapter 30 of Professional ASP.NET 2.0 has some great content regarding that. The 3.5 version of the book may well have good content there as well, but I don't own it.
Think about fonts. Many of the standard fonts you might want to use aren't unicode capable. I've always had luck with Arial Unicode MS, MS Gothic, MS Mincho. I'm not sure about how cross-platform these are, though. Also, note that not all fonts support all of the Unicode character definition. Again, test, test, test.
Start thinking now about how you're going to get translations into this system. Go talk to whoever is your translation vendor about how they want data passed back and forth for translation. Think about the fact that, through your local resource files, you will likely be repeating some commonly used strings through the system. Do you normalize those into global resource files, or do you have some sort of database layer where only one copy of each text used is generated. In our recent project, we used resource files which were generated from a database table that contained all the translations and the original, english version of the resource files.
Test. Generally speaking I will test in German, Polish, and an Asian language (Japanese, Chinese, Korean). German and Polish are wordy and nearly guaranteed to stretch text areas, Asian languages use an entirely different set of characters which tests your unicode support.
I don't think there is something really new since then (or I'm not aware of it).
You could have a look at ResourceBlender which can also be installed via the Web Platform Installer.
ResourceBlender seems to be a little big buggy so far. For example: Some ResourceStrings are named equivalent. If you change one of this equivalent Strings, the others will be changed too... The last version is from dec. 2009.
How should I store (and present) the text on a website intended for worldwide use, with several languages? The content is mostly in the form of 500+ word articles, although I will need to translate tiny snippets of text on each page too (such as "print this article" or "back to menu").
I know there are several CMS packages that handle multiple languages, but I have to integrate with our existing ASP systems too, so I am ignoring such solutions.
One concern I have is that Google should be able to find the pages, even for foreign users. I am less concerned about issues with processing dates and currencies.
I worry that, left to my own devices, I will invent a way of doing this which work, but eventually lead to disaster! I want to know what professional solutions you have actually used on real projects, not untried ideas! Thanks very much.
I looked at RESX files, but felt they were unsuitable for all but the most trivial translation solutions (I will elaborate if anyone wants to know).
Google will help me with translating the text, but not storing/presenting it.
Has anyone worked on a multi-language project that relied on their own code for presentation?
Any thoughts on serving up content in the following ways, and which is best?
http://www.website.com/text/view.asp?id=12345&lang=fr
http://www.website.com/text/12345/bonjour_mes_amis.htm
http://fr.website.com/text/12345
(these are not real URLs, i was just showing examples)
Firstly put all code for all languages under one domain - it will help your google-rank.
We have a fully multi-lingual system, with localisations stored in a database but cached with the web application.
Wherever we want a localisation to appear we use:
<%$ Resources: LanguageProvider, Path/To/Localisation %>
Then in our web.config:
<globalization resourceProviderFactoryType="FactoryClassName, AssemblyName"/>
FactoryClassName then implements ResourceProviderFactory to provide the actual dynamic functionality. Localisations are stored in the DB with a string key "Path/To/Localisation"
It is important to cache the localised values - you don't want to have lots of DB lookups on each page, and we cache thousands of localised strings with no performance issues.
Use the user's current browser localisation to choose what language to serve up.
You might want to check GNU Gettext project out - at least something to start with.
Edited to add info about projects:
I've worked on several multilingual projects using Gettext technology in different technologies, including C++/MFC and J2EE/JSP, and it worked all fine. However, you need to write/find your own code to display the localized data of course.
If you are using .Net, I would recommend going with one or more resource files (.resx). There is plenty of documentation on this on MSDN.
As with most general programming questions, it depends on your needs.
For static text, I would use RESX files. For me, as .Net programmer, they are easy to use and the .Net Framework has good support for them.
For any dynamic text, I tend to store such information in the database, especially if the site maintainer is going to be a non-developer. In the past I've used two approaches, adding a language column and creating different entries for the different languages or creating a separate table to store the language specific text.
The table for the first approach might look something like this:
Article Id | Language Id | Language Specific Article Text | Created By | Created Date
This works for situations where you can create different entries for a given article and you don't need to keep any data associated with these different entries in sync (such as an Updated timestamp).
The other approach is to have two separate tables, one for non-language specific text (id, created date, created user, updated date, etc) and another table containing the language specific text. So the tables might look something like this:
First Table: Article Id | Created By | Created Date | Updated By | Updated Date
Second Table: Article Id | Language Id | Language Specific Article Text
For me, the question comes down to updating the non-language dependent data. If you are updating that data then I would lean towards the second approach, otherwise I would go with the first approach as I view that as simpler (can't forget the KISS principle).
If you're just worried about the article content being translated, and do not need a fully integrated option, I have used google translation in the past and it works great on a smaller scale.
Wonderful question.
I solved this problem for the website I made (link in my profile) with a homemade Python 3 script that translates the general template on the fly and inserts a specific content page from a language requested (or guessed by Apache from Accept-Language).
It was fun since I got to learn Python and write my own mini-library for creating content pages. One downside was that our hosting didn't have Python 3, but I made my script generate static HTML (the original one was examining User-agent) and then upload it to server. That works so far and making a new language version of the site is now a breeze :)
The biggest downside of this method is that it is time-consuming to write things from scratch. So if you want, drop me line and I'll help you use my script :)
As for the URL format, I use site.com/content/example.fr since this allows Apache to perform language negotiation in case somebody asks for /content/example and has a browser tell that it likes French language. When you do this Apache also adds .html or whatever as a bonus.
So when a request is for example and I have files
example.fr
example.en
example.vi
Apache will automatically proceed with example.vi for a person with Vietnamese-configured browser or example.en for a person with German-configured browser. Pretty useful.
I'm thinking of starting a wiki, probably on a low cost LAMP hosting account. I'd like the option of exporting my content later in case I want to run it on IIS/ASP.NET down the line. I know in the weblog world, there's an open standard called BlogML which will let you export your blog content to an XML based format on one site and import it into another. Is there something similar with wikis?
The correct answer is ... "it depends".
It depends on which wiki you're using or planning to use. I've used various over the years MoinMoin was ok, used files rather than database, Ubuntu seem to like it. MediaWiki, everyone knows about and JAMWiki is a java clone(ish) of MediaWiki with the aim to be markup compatible with MediaWiki, both use databases and you can generally connect whichever database you want, JAMWiki is pre-configured to use an internal HSQLDB instance.
I recently converted about 80 pages from a MoinMoin wiki into JAMWiki pages and this was probably 90% handled by a tiny perl script I found somewhere (I'll provide a link if I can find it again). The other 10% was unfortunately a by-hand experience (they were of the utmost importance with them being recipies for the missus) ;-)
I also recently setup a Mediawiki instance for work and that took all of about 8 minutes to do. So that'd be my choice.
To answer your question I don't believe that there's such a standard as WikiML as Till called it.
As strange as it sounds, I've investigated screen scraping a wiki for a co-worker to help him port it to another wiki engine. It turned out that screen scraping would have been easier, quicker and more efficient to write to move this particular file based wiki to another one or a CMS.
Given the context that you wrote the question in I would bite the bullet now and pay the little extra for a windows hosted account and put Screwturn wiki on it. You're got the option of using file based or SQL Server based back end for it but because one of your requirements is low cost I'm guessing that you would use file based now for a cheaper hosted account and then you can always upscale the back end to SQL Server.
I haven't heard of WikiML.
I think your biggest obstacle is gonna be converting one wiki markup to another. For example, some wikis use markdown (which is what Stack Overflow uses), others use another markup syntax (e.g. BBCode, ...), etc.. The bottom line is - assuming the contents are databased it's not impossible to export and parse it to make it "fit" in another system. It might just be a pain in the ass.
And if the contents are not databased, it's gonna be a royal pain in the ass. :D
Another solution would be to stay with the same system. I am not sure what the reason is for changing the technology later on. It's not like a growing project requires IIS/ASP.NET all of the sudden. (It might just be the other way around.) But for example, if you could stick with PHP for a while, you could also run that on IIS.