Extract localizable strings from source code, aspx, xaml to resource files - asp.net

As part of internationalizing our application which is based on asp.net, c#, silverlight, XBAP, I'm evaluating approaches to start with. I'm having to chose between GNU gettext()(PO files) and Microsoft's resource(resx) based approach. So at this juncture, I'm trying to understand what is the best way to extract localizable strings from .cs files, aspx, ascx, xaml (silverlight) files to resource files(resx) automatically if I have to go the MS way.
I have below options in mind:
Resource Refactoring tool, but it extracts all strings (no matter if you have to translate or not) like page headers etc. And we cannot mark or exclude particular strings. Or we will have to manually select each string and then extract (right click and click extract).
Resharper's Localization assistance, here I do not see the automatic extraction, but I'll have to manually extract string by string.
I know there has to be a bit of manual intervention, but any advise would help in choosing the right direction, between gettext()(gnu gettext() c# or fairlylocal or MS localization approach.

Both the approaches have pros and cons, lets discuss.
FairlyLocal
(GNU Gettext) first, initial tweaking is required:
download library & tools and dump at some place relative to your project
modify the base page object of your site (manual intervention)
add a post-build step to your web project that will run xgettext and update your .po files
second, strings extraction has been taken care-of by FairlyLocal itself.
third, translation of strings could be done in-house or outsourced as PO files are widely known by linguists. fourth, rendering of a few UTF-8 chars (if any) depend on webfonts {eot (trident), svg (webkit, gecko, presto)}. fifth, locale needs to be maintained (like pa-IN languageCode-countryCode). sixth, several converters are available for PO files. seventh, the default logic will fall-back on default-locale (en-US) resources for the value. an issue, The .po files that the build script generates won't be UTF8 by default. You'll need to open them in POEdit (or similar) and explicitly change the encoding the first time you edit them if you want your translated text to correctly show special characters.
MS localization
first, extraction of strings is pretty easy using Resource Refactoring Tool. second, resgen.exe command-line tool could be used to make .resx files linguists friendly.
resgen /compile examplestrings.xx.resx,examplestrings.xx.txt
third, Localization within .NET (not specific to ASP.NET Proper or ASP.NET MVC) implements a standard fallback mechanism. fourth, no dependency on GNU Gettext Utils. fifth, can achieve localization from Strings to Dates, Currency, etc. using CurrentUICulture and CurrentCulture. sixth, webfonts are recommended here too.
thanks.

Related

My .resx Resource File Pain Points

When using .resx resource files for localization I have the follwing pain points:
For each label, I need to make an entry in each different language file. It is prone to human error in terms of copying the name of the entry and it would be easier to add different language versions of the same label in one place. For example:
var lbl_Hello = new { en = "Hello, fr = "Bonjour" };
I cannot seem to search for names or values inside the resx visual editor using Visual Studio search.
Are there alternatives to overcome these?
I feel your pain. It's ridiculous that Microsoft hadn't provided a better alternative for this in so many years. Even the .NET Core localization uses .resx files. My alternative for this is to create a table with one column foreach language you want to track. Then load it on memory soon in the pipeline (in .NET Core) or in the config phase (older ASP) in the form of a static variable (a dictionary) or I insert it on the local/shared cache with no expiration date.
EDIT
If the quantity of strings to localize is not very big and you have all of them available in all languages you can perfectly put them inlined on the static class itself, rather than having them on an external resource (a database or a file). But having this strings on an external resource has one clear advantage, you can replace them without recompiling the application.
Furthermore, sometimes you won't have all the strings translated in all the availables languages you want to offer. Then, if you have the resources allocated in a database or an external file then you can provide a user interface to allow some users with the required privileges to modify/complete the translations.

Alfresco localization encoding

Trying to create custom types, aspects and properties for Alfresco, I followed the Alfresco Developer Series guide. When I reached the localization section I found out that Alfresco does not handle UTF-8 encoding in the .properties files that you create. Greek characters are not displayed correctly in Share.
Checking out other built-in .properties files (/opt/alfresco-4.0.e/tomcat/webapps/alfresco/WEB-INF/classes/alfresco/messages) I noticed that in Japanese, for example, the characters are in this notation: \u3059\u3079\u3066\u306e...
So, the question is: do I have to convert the greek words in the above mentioned notation for Share to display them correctly, or is there another -more elegant- way to do it?
The \u#### form is the Java form of the Unicode Escape Sequence, and is used to reference unicode characters without having to worry about the encoding of the file storing them.
This question has some information on how to create and decode them
Another way, which is what Alfresco developers tend to use, is the Native2ASCII tool which ships with Java itself. With that, you can initially write your strings in a UTF-8 (for example) file, then use the tool to turn them into their escaped form.

How do I add a string to a ASP.NET global resource that only belongs to one language?

I have a global resources file for different languages:
Resource.resx
Resource.de-DE.resx
Resource.ro-RO.resx
For the most part, all the strings in Resource.resx have localized versions in other languages as well.
However, I have certain strings that should only exist in Resource.de-DE.resx but not Resource.resx. When I try to use them in my code:
GetGlobalResourceObject("Resource", "Personal Identification Number")
I get an error that says Cannot resolve resource item 'Personal Identification Number'. The string still gets localized properly when I view the page in German because it's present in Resource.de-DE.resx, but because it's not in Resource.resx, I get this error in Visual Studio, and I'd like to get rid of the error.
How do I work around this so that I don't get this error message? Should I move the local-specific string to another resource file?
The whole resource fallback approach really assumes that all strings are present for the base language.
I imagine you have this scenario because you implemented some feature that only applies to German and you don't want to add unnecessary resources to your base language as these will increase the localization effort for languages that don't need it.
One solution would be to create a separate local resource file. And either only translate this one into German (and not other languages) or make it a base resource (without the de-DE language code but still with your German strings in it).
Another solution (if you can't create a local resource file and for some reason can only use global resources) would be to add those extra entries to your base global resources (Resource.resx) and make it obvious that you don't want these translated. For example make them all blank strings and use the Comment field to explain that these strings are for German only. Not very nice.
I just replicated your scenario and it works fine. just create another resource file containing local-specific strings. hope this helps :)

Translate QT application without text in code

Is there a way to translate a QT app into different languages without defining the texts directly in the source? I want to separate the text from source. This would result in some kind of resource files for ALL languages, including the default language (e.g. English).
You won't be able to leave the English (or your source language, not necessarily English) source out of the XML (.ts) files as lupdate will put it there each time you run it. However as long as a translation exists for the chosen language, the source text will be ignored. If there is no translation text, it will default to the source text. This is useful since you'll be guaranteed to get some sort of text in your translation, but it'll be up to your test team to insure that the translations exist. I wrote a python script to automate the checking of the translation files since we have 9 languages and nearly 1k strings per translation. To test for this, we used a very simple sed script to create pseudo-loc source strings so if there were translations missing, the pseudo-loc text would be very evident.
Regarding the process for editing the .ts files, we farmed out the translations to individual translators, providing them with the .ts file for their language, and usually about an hour's worth of hand's on instruction in using QT Linguist. If the translator was onsite and wanted to see their translations on our device immediately, I wrote an autorun script that would place the resultant .qm file in the right place in our embedded file system and restart the application to display the new translations. If they weren't onsite, we'd run them through the python script mentioned above to check for a number of different problems, then simply check in the .ts file so it'd get built the next time around.
HTH
You might be able to use the QT_TRANSLATE_NOOP family of macros to do what you want. Those simply mark some text as "need to be translated" so that lupdate picks it up, but you can refer to those translated values by variable constants in your code. So ...
const char *kHelloWorld = QT_TRANSLATE_NOOP("SomeScope", "Hello world");
Later...
return qApp->translate("SomeScope", kHelloWorld);
This is still in your source code somewhere, but it is at least one step removed. You could theoretically have the QT_TRANSLATE_NOOP stuff in an entirely different Qt project, but then you'd need some way in your real project to know what you are supposed to translate. (You still need the char* constants somewhere.)
What are you trying to accomplish exactly by not having the English text in the source? That might help people provide more appropriate answers.
EDIT
Based on your comment below about intended usage
OK, then this is precisely what the normal Qt translation process does. Running lupdate creates those XML files (they have a .ts extension). They can be opened by translators in the very-easy-to-use Qt Linguist. They are translated, then sent back to you (the programmer), where you run lrelease on them to create the binary translation files you will ship with the app. The translations are resolved at runtime, so there is no need to recompile.
If you wanted to enable a user to do this, you would:
Ship your application with an empty (untranslated) .ts file and the lrelease program.
Provide instructions on how to use Qt Linguist to translate. (They could use a text editor and modify the Xml directly, but it's a lot easier with Linguist.)
Explain how to run lrelease and where to drop the binary translation files so that your application pulls them in.
On this last step, you could theoretically provide a nice wizard-like app that hides the implementation details.
What we will do is:
* Include a translation for the former default language. Using this *.ts file to auto-generate the other *.ts files. This is required as we keep the translations outside the QT environment as they match with other projects not related to QT.
Then have only have to make sure this translation contains the correct value.
In the future we can define IDs in the code witch represent Text in the default translation. Like translating TXT_ID_ABOUT to "About".

Best practices or resources for Localization in vanilla ASP.NET?

I'm about to begin work on translating client's website into spanish and french and looking for resources on Localization with ASP.NET. There are millions of hits in Google and almost all of them go back to 2005 and ASP.NET 2.0. Is there anaything new in regards to localization in 3.5 and VS2008? Any tips or recources with common practices would be highly appreciated!
Localization simply hasn't changed that much since ASP.NET 2.0, to be honest. The resources you're finding are no doubt recommending you put things in resx files located in App_LocalResources, which is still the way you do it. Here's some tips I've learned from doing the same things.
Absolutely and brutally minimize the number of images you have that contain text. Doing so will make your life a billion percent easier since you won't have to get a new set of images for every friggin' language.
Be very wary of css positioning that relies on things always remaining the same size. If those things contain text, they will not remain the same size, and you will then need to go back and fix your designs.
If you use character types in your sql tables, make sure that any of those that might receive international input are unicode (nchar, nvarchar, ntext). For that matter, I would just standardize on using the unicode versions.
If you're building SQL queries dynamically, make sure that you include the N prefix before any quoted text if there's any chance that text might be unicode. If you end up putting garbage in a SQL table, check to see if that's there.
Make sure that all your web pages definitively state that they are in a unicode format.
See Joel's article on Unicode - http://joelonsoftware.com/articles/Unicode.html
You're going to be using resource files a lot for this project. That's good - ASP.NET 2.0 has great support for such. You'll want to look into the App_LocalResources and App_GlobalResources folder as well as GetLocalResourceObject, GetGlobalResourceObject, and the concept of meta:resourceKey. Chapter 30 of Professional ASP.NET 2.0 has some great content regarding that. The 3.5 version of the book may well have good content there as well, but I don't own it.
Think about fonts. Many of the standard fonts you might want to use aren't unicode capable. I've always had luck with Arial Unicode MS, MS Gothic, MS Mincho. I'm not sure about how cross-platform these are, though. Also, note that not all fonts support all of the Unicode character definition. Again, test, test, test.
Start thinking now about how you're going to get translations into this system. Go talk to whoever is your translation vendor about how they want data passed back and forth for translation. Think about the fact that, through your local resource files, you will likely be repeating some commonly used strings through the system. Do you normalize those into global resource files, or do you have some sort of database layer where only one copy of each text used is generated. In our recent project, we used resource files which were generated from a database table that contained all the translations and the original, english version of the resource files.
Test. Generally speaking I will test in German, Polish, and an Asian language (Japanese, Chinese, Korean). German and Polish are wordy and nearly guaranteed to stretch text areas, Asian languages use an entirely different set of characters which tests your unicode support.
I don't think there is something really new since then (or I'm not aware of it).
You could have a look at ResourceBlender which can also be installed via the Web Platform Installer.
ResourceBlender seems to be a little big buggy so far. For example: Some ResourceStrings are named equivalent. If you change one of this equivalent Strings, the others will be changed too... The last version is from dec. 2009.

Resources