ASP.NET Localization with singular and plural - asp.net

Can I with ASP.NET Resources/Localization translate strings to depend on one or other (the English grammar) in a easy way, like I pass number 1 in my translate, it return "You have one car" or with 0, 2 and higher, "You have %n cars"?
Or I'm forced to have logic in my view to see if it's singular or plural?

JasonTrue wrote:
To the best of my knowledge, there isn't a language that requires something more complicated than singular/plural
Such languages do exist. In my native Polish, for example, there are three forms: for 1, for 2-4 and for zero and numbers greater than 4. Then after you reach 20, the forms for 21, 22-24 and 25+ are again different (same grammatical forms as for numerals 0-9). And yes, "you have 0 things" sounds awkward, because you're not likely to see that used in real life.
As a localization specialist, here's what I would recommend:
If possible, use forms which put the numeral at the end:
a: Number of cars: %d
This means the form of the noun "car" does not depend on the numeral, and 0 is as natural as any other number.
If the above is not acceptable, at least always make a complete sentence a translatable resource. That is, do use
b: You have 1 car.
c: You have %d cars.
But never split such units into smaller fragments such as
d: You have
e: car(s)
(Then somewhere else you have a non-localizable resource such as %s %d %s)
The difference is that while I cannot translate (c) directly, since the form of the noun will change, I can see the problem and I can change the sentence to form (a) in translation.
On the other hand, when I am faced with (d) and (e) fragments, there is no way to make sure the resulting sentence will be grammatical. Again: using fragments guarantees that in some languages the translation will be anything from grammatically awkward to completely broken.
This applies across the board, not just to numerals. For example, a fragment such as %s was deleted is also untranslatable, since the form of the verb will depend on the gender of the noun, which is unavailable here. The best I can do for Polish is the equivalent of Deleted: %s, but I can only do it as long as the %s placeholder is included in the translatable resource. If all I have is "was deleted" with no clue to the referent noun, I can only startle my dog by cursing aloud and in the end I still have to produce garbage grammar.

I've been working on a library to assist with internationalization of an application. It's called SmartFormat and is open-source on GitHub.
It contains "grammatical number" rules for many languages that determine the correct singular/plural form based on the locale. When translating a phrase that has words that depend on a quantity, you specify the multiple forms and the formatter chooses the correct form based on these rules.
For example:
var message = "There {0:is:are} {0} {0:item:items} remaining.";
var output = Smart.Format(culture, message, items.Count);
It has a very similar syntax to String.Format(...), but has tons of great features that make it easy to create natural and grammatically-correct messages.
It also deals with possible gender-specific formatting, lists, and much more.

moodforaday wrote:
This applies across the board, not just to numerals. For example, a fragment such as "%s was deleted" is also untranslatable, since the form of the verb will depend on the gender of the noun, which is unavailable here. The best I can do for Polish is the equivalent of "Deleted: %s", but I can only do it as long as the %s placeholder is included in the translatable resource. If all I have is "was deleted" with no clue to the referent noun, I can only startle my dog by cursing aloud and in the end I still have to produce garbage grammar.
The point I would like to make here, is never include a noun as a parameter. Many European languages modify the noun based on its gender, and in the case of German, whether it is the subject, object, or indirect object. Just use separate messages.
Instead of, "%s was deleted.", use a separate translation string for each type:
"The transaction was deleted."
"The user was deleted." etc.
This way, each string can be translated properly.
The solution to handle plural forms in different languages can be found here:
http://www.gnu.org/software/gettext/manual/html_node/Plural-forms.html
While you can't use the above software in a commercial application (it's covered under GPL), you can use the same concepts. Their plural form expressions fit perfectly with lambda expressions in .NET. There are a limited number of plural form expressions that cover a large number of languages. You can map a particular language to a lambda expression that calculates which plural form to use based on the language. Then, look up the appropriate plural form in a .NET resx file.

There is nothing built in, we ended up coding something like this:
Another solution can be:
use place holders like {CAR} in the resource file strings
Maintain separate resource entries for singular and plural words for "car" :
CAR_SINGULAR car
CAR_PLURAL cars
Develop a class with this logic:
class MyResource
{
private List<string> literals = new List<string>();
public MyResource() { literals.Add("{CAR}") }
public static string GetLocalizedString(string key,bool isPlural)
{
string val = rm.GetString(key);
if(isPlural)
{
val = ReplaceLiteralWithPlural(val);
}
else
{
val = ReplaceLiteralWithSingular(val);
}
}
}
string ReplaceLiteralWithPlural(string val)
{
StringBuilder text = new StringBuilder(val)
foreach(string literal in literals)
{
text = text.Replace(literal,GetPluralKey(literal));
}
}
string GetPluralKey(string literal)
{
return literal + "_PLURAL";
}

This is still not possible in .NET natively. But it is a well known problem and already solved by gettext. It have native support for many technologies, even includes C# support. But looks like there is modern reimplementation now in .NET: https://github.com/VitaliiTsilnyk/NGettext

Orchard CMS uses GNU Gettext's PO files for localization and plural forms.
Here is their code where you can grab the idea:
https://github.com/OrchardCMS/OrchardCore/tree/main/src/OrchardCore/OrchardCore.Localization.Core
https://github.com/OrchardCMS/OrchardCore/tree/main/src/OrchardCore/OrchardCore.Localization.Abstractions

The logic needn't be in your view, but it's certainly not in the resource model the DotNet framework. Yours is a simple enough case that you can probably get away with creating a simple format string for singular, and one for plural. "You have 1 car."/"You have {0} cars." You then need to write a method that discriminates between the plural and singular case.
There are also a number of languages where there's no distinction between singular and plural, but this just requires a little duplication in the resource strings. (edit to add: gender and sometimes endings change depending on number of units in many languages, but this should be fine as long as you distinguish between singular and plural sentences, rather than just words).
Microsoft mostly gave up on semantically clever localization because it's hard to generalize even something like pluralization to 30+ languages. This is why you see such boring presentation of numeric quantities in most applications that are translated to many languages, along the lines of "Cars: {0}". That sounds lame, but surprisingly, the last time I checked, usability studies didn't actually favor the verbose natural language presentation in most cases.

Related

Qt internationalization system with different positioned strings

I'm aware that there are some languages that writes the order of some characters differently than the common latin languages. E.g.: a percentage number in English would be like "100%", while in Persian it would be "/100" (the symbol comes before the number).
Question: how to consider that in the Qt internationalization system in an intelligent way?
I first thought about this code:
myLabel->setText(tr("%1%2").arg(value).arg(tr("%")));
So what would happen is that, in the Qt Linguist, the translator would change the order of the replacement fields:
%1%2 -> in Persian translation -> %2%1
I checked that in my code and I found out that while in the normal (English) translation everything was fine, when I changed to the file containing the performed translation, a bug would occur: the number to be shown was never complete having one less number that what I had written. So e.g. if I chose "99%", it would show "%9", and if I set only "9%", I would have just "%".
The problem disappeared when I put a space between %1 and %2 both in the source code as well as in the translation (%2 %1). Since ISO xxxxx says that the % should be placed with a space between it and the correspondent number, no problem for this specific situation. But what If I wanted to have both symbols without a space between each other? How should it be done?
I confirm that the problem you described exists. However I would solve this problem in the following way:
QString sPer = QString("%%1").arg(value); // %99
QString sEng = QString("%1%").arg(value); // 99%
So that
%1% -> in Persian translation -> %%1
Put the percentage inside the string to translate, something like
myLabel->setText(tr("%%1").arg(value))
even better, I think I would add a disambiguation string (Qt "old style" comment)
myLabel->setText(tr("%%1", "Show the number with a percentage").arg(value))
or maybe a new style translation comment like this
//: Show the number with a percentage
myLabel->setText(tr("%%1").arg(value))
Translator comments have issues of their own, you might be better off using a Qt disambiguation string like the preceding example.
Putting a disambiguation string (or a translator comment) will let your translator know what they are translating...
Let the translators decide where they want to put the percentage, don't try to handle it in the code as somebody else suggested, it is not scalable, you can't start handling this in the code like that, what if you need to use another character or formatting for another language?
It might even be possible that Qt might have calls to handle number formatting but I can't seem to find them at the moment and I am not fully sure how we handle them in the Qt application I work on...
If putting a % alone doesn't work, try to precede it with \, it might be necessary to escape it, I am not sure...

Is # inside razor syntax called operator?

I just bought a book on ASP.NET MVC With Razor View Engine. There is a subsection called Usage of # Operator and this subsection title makes me ... well, uncomfortable.
Is # inside the razor view engine called operator?
UPDATE
I guess my question is not so clear. I want to know if # is an operator inside the razor view engine. For example, < > = != >= => these are called as operator inside C# language. Is it same for # inside Razor view engine?
I think the reason for you discomfort is that the # token (when talking about it from a parsing perspective, though the word character should also do) is overloaded to indicate a number of different situations. Let's examine what those are:
Write a value to the output:
#this.Value
Indicate a code block transition:
#{
var foo = 1;
foo += 1;
}
Indicate a code statement transition:
<div>
#if(foo) {
// code
}
</div>
Indicate an escape to markup till the end of the line
#if(foo) {
foo = false;
#: value printed to output
}
Indicate directive statements:
#inherits MyCustomBaseType
Indicate special code blocks:
#section Foo {
<div />
}
#helper Bar(int param) {
param += 1;
}
Delimit comment blocks:
#*
This is a comment
*#
Escape the # character:
Email me at myemail##example.com
In my opinion only the first usage can be considered as an operator. The operand (i.e. everything else that follows the # up to but excluding the first markup-significant whitespace character) is passed as the parameter to the Write() method. All other usages don't have any clearly identifiable operands or require additional tokens (the * in the comment block, etc) to be indentified.
According to the official pages on Razor, you can find one example here, it does not seem that this is called an operator.
From the linked page:
You add code to a page using the # character
(my emphasis)
I also found numerous other pages on that same site, all referring it to just the "# character", so in that sense it isn't considered an operator.
<opinion>
However, if you read on Wikipedia on the topic of operators, then:
Syntactically operators usually contrast to functions. In most languages, functions may be seen as a special form of prefix operator with fixed precedence level and associativity, often with compulsory parentheses e.g. Func(a) (or (Func a) in LISP). Most languages support programmer-defined functions, but cannot really claim to support programmer-defined operators, unless they have more than prefix notation and more than a single precedence level. Semantically operators can be seen as special form of function with different calling notation and a limited number of parameters (usually 1 or 2).
(again, my emphasis)
Then I would argue that # is in fact an operator. It is a symbol with a specific meaning, and you could argue that you're "escaping out of the surrounding context to do something else", sort of like a function call.
In other words, while the word operator does not appear in the website articles I've seen so far, I would consider it to be an operator nonetheless.
</opinion>
Hmm, are you referring to the Fonts and Colors section of the Tools > Options dialog in Visual Studio? If so, no it isn't an Operator, it's "HTML Server-Side Script". The "functions", "section" and other razor-specific keywords are also this color.
Otherwise, yes it is an operator from a technical description. I'm not entirely sure what other distinction you are looking for.
No, the # character is not used as an operator. It's only the choise of that author to call it so.
For example, in the blog post introducing Razor it's not described as an operator anywhere.
There doesn't seem to be an official or de-facto term for it yet. Personally I would call it a tag, as it's used in the markup code along with other tags, and it's used as a replacement for the <% %> script tag.
I guess that it's not really incorrect to call it an operator, but it's not commonly used. As the razor tag contains C# or VB code which also can contain operators, it can get confusing.
The # character starts inline expressions, single statement blocks, and multi-statement blocks.
On Scott Gu's blog, he doesn't refer to it as an operator but there are lots of other developers who do. Personally, I don't see this as an operator but more of an identifier.
As far as I know are symbols like >, != and && called equality, relational or conditional operators. The # symbol does not seem to fit in any of those groups so I must say, no, it cannot be called an operator. On the other hand however, it does indicate that 'some operation' is performed which makes it rather likely to be called an operator.
The # symbol is not an operator, it is an indicator which indicates that a razor statement begins.
It is comparable with <?php ?> in php. This indicates to the compiler/interpreter where your code is
e.g.
#Html.ActionLink(...)
a #(...) gives you the oportunitie to do some coding int there
e.g
#(
int a = 0;
int b = 4;
int c = a + b;
)
I hope this helps you on the way

Qt: n param of QTranslator::translate() for non-singulars

Qt QTranslator::translate() documentation declares that
If n is not -1, it is used to choose an appropriate form for the translation (e.g. "%n file found" vs. "%n files found").
It seems that there is no way to translate "%n men answered %n questions" as one string (i.e. I need to perform 2 QTranslator::translate() calls), or am I wrong?
I'd advise against attempting to use multiple numerus forms in a single translatable string.
It's tricky, involving more than one call to tr().
It's complex. Some languages can have more than two numerus forms and the translation space grows in O(n^m) where n is the number of numerus forms in language and m is the number of number forms in your string to be translated string. Case in point: Arabic has six numerus forms and if you have two %ns in your string, you'll need 36 different translations.
So, better to structure your translatable strings so that max one %n is needed per string.

How should I encode dictionaries into HTTP GET query strings?

An HTTP GET query string is a ordered sequence of key/value pairs:
?spam=eggs&spam=ham&foo=bar
Is, with certain semantics, equivalent to the following dictionary:
{'spam': ['eggs', 'ham'], 'foo': bar}
This happens to work well for boolean properties of that page being requested:
?expand=1&expand=2&highlight=7&highlight=9
{'expand': [1, 2], 'highlight': [7, 9]}
If you want to stop expanding the element with id 2, just pop it out of the expand value and urlencode the query string again. However, if you've got a more modal property (with 3+ choices), you really want to represent a structure like so:
{'highlight_mode': {7: 'blue', 9: 'yellow'}}
Where the values to the corresponding id keys are part of a known enumeration. What's the best way to encode this into a query string? I'm thinking of using a sequence of two-tuples like so:
?highlight_mode=(7,blue)&highlight_mode=(9,yellow)
Edit: It would also be nice to know any names that associate with the conventions. I know that there may not be any, but it's nice to be able to talk about something specific using a name instead of examples. Thanks!
The usual way is to do it like this:
highlight_mode[7]=blue&highlight_mode[9]=yellow
AFAIR, quite a few server-side languages actually support this out of the box and will produce a nice dictionary for these values.
I've also seen people JSON-encode the nested dictionary, then further encode it with BASE64 (or something similar), then pass the whole resulting mess as a single query string parameter.
Pretty ugly.
On the other hand, if you can get away with using POST, JSON is a really good way to pass this kind of information back and forth.
In many Web frameworks it's encoded differently from what you say.
{'foo': [1], 'bar': [2, 3], 'fred': 4}
would be:
?foo[]=1&bar[]=2&bar[]=3&fred=4
The reason array answers should be different from plain answers is so the decoding layer can automatically tell the less common foo case (array which just happens to have a single element) from extremely common fred case (single element).
This notation can be extrapolated to:
?highlight_mode[7]=blue&highlight_mode[9]=yellow
when you have a hash, not just an array.
I think this is pretty much what Rails and most frameworks which copy from Rails do.
Empty arrays, empty hashes, and lack of scalar value look identical in this encoding, but there's not much you can do about it.
This [] seems to be causing just a few flamewars. Some view it as unnecessary, because the browser, transport layer, and query string encoder don't care. The only thing that cares is automatic query string decoder. I support the Rails way of using []. The alternative would be having separate methods for extracting a scalar and extracting an array from querystring, as there's no automatic way to tell when program wants [1] when it wants 4.
This piece of code works for me with Python Backend-
import json, base64
param={
"key1":"val1",
"key2":[
{"lk1":"https://www.foore.in", "lk2":"https://www.foore.in/?q=1"},
{"lk1":"https://www.foore.in", "lk2":"https://www.foore.in/?q=1"}
]
}
encoded_param=base64.urlsafe_b64encode(json.dumps(param).encode())
encoded_param_ready=str(encoded_param)[2:-1]
#eyJrZXkxIjogInZhbDEiLCAia2V5MiI6IFt7ImxrMSI6ICJodHRwczovL3d3dy5mb29yZS5pbiIsICJsazIiOiAiaHR0cHM6Ly93d3cuZm9vcmUuaW4vP3E9MSJ9LCB7ImxrMSI6ICJodHRwczovL3d3dy5mb29yZS5pbiIsICJsazIiOiAiaHR0cHM6Ly93d3cuZm9vcmUuaW4vP3E9MSJ9XX0=
#In JS
var decoded_params = atob(decodeURI(encoded_param_ready));

Server.UrlEncode vs. HttpUtility.UrlEncode

Is there a difference between Server.UrlEncode and HttpUtility.UrlEncode?
I had significant headaches with these methods before, I recommend you avoid any variant of UrlEncode, and instead use Uri.EscapeDataString - at least that one has a comprehensible behavior.
Let's see...
HttpUtility.UrlEncode(" ") == "+" //breaks ASP.NET when used in paths, non-
//standard, undocumented.
Uri.EscapeUriString("a?b=e") == "a?b=e" // makes sense, but rarely what you
// want, since you still need to
// escape special characters yourself
But my personal favorite has got to be HttpUtility.UrlPathEncode - this thing is really incomprehensible. It encodes:
" " ==> "%20"
"100% true" ==> "100%%20true" (ok, your url is broken now)
"test A.aspx#anchor B" ==> "test%20A.aspx#anchor%20B"
"test A.aspx?hmm#anchor B" ==> "test%20A.aspx?hmm#anchor B" (note the difference with the previous escape sequence!)
It also has the lovelily specific MSDN documentation "Encodes the path portion of a URL string for reliable HTTP transmission from the Web server to a client." - without actually explaining what it does. You are less likely to shoot yourself in the foot with an Uzi...
In short, stick to Uri.EscapeDataString.
HttpServerUtility.UrlEncode will use HttpUtility.UrlEncode internally. There is no specific difference. The reason for existence of Server.UrlEncode is compatibility with classic ASP.
Fast-forward almost 9 years since this was first asked, and in the world of .NET Core and .NET Standard, it seems the most common options we have for URL-encoding are WebUtility.UrlEncode (under System.Net) and Uri.EscapeDataString. Judging by the most popular answer here and elsewhere, Uri.EscapeDataString appears to be preferable. But is it? I did some analysis to understand the differences and here's what I came up with:
WebUtility.UrlEncode encodes space as +; Uri.EscapeDataString encodes it as %20.
Uri.EscapeDataString percent-encodes !, (, ), and *; WebUtility.UrlEncode does not.
WebUtility.UrlEncode percent-encodes ~; Uri.EscapeDataString does not.
Uri.EscapeDataString throws a UriFormatException on strings longer than 65,520 characters; WebUtility.UrlEncode does not. (A more common problem than you might think, particularly when dealing with URL-encoded form data.)
Uri.EscapeDataString throws a UriFormatException on the high surrogate characters; WebUtility.UrlEncode does not. (That's a UTF-16 thing, probably a lot less common.)
For URL-encoding purposes, characters fit into one of 3 categories: unreserved (legal in a URL); reserved (legal in but has special meaning, so you might want to encode it); and everything else (must always be encoded).
According to the RFC, the reserved characters are: :/?#[]#!$&'()*+,;=
And the unreserved characters are alphanumeric and -._~
The Verdict
Uri.EscapeDataString clearly defines its mission: %-encode all reserved and illegal characters. WebUtility.UrlEncode is more ambiguous in both definition and implementation. Oddly, it encodes some reserved characters but not others (why parentheses and not brackets??), and stranger still it encodes that innocently unreserved ~ character.
Therefore, I concur with the popular advice - use Uri.EscapeDataString when possible, and understand that reserved characters like / and ? will get encoded. If you need to deal with potentially large strings, particularly with URL-encoded form content, you'll need to either fall back on WebUtility.UrlEncode and accept its quirks, or otherwise work around the problem.
EDIT: I've attempted to rectify ALL of the quirks mentioned above in Flurl via the Url.Encode, Url.EncodeIllegalCharacters, and Url.Decode static methods. These are in the core package (which is tiny and doesn't include all the HTTP stuff), or feel free to rip them from the source. I welcome any comments/feedback you have on these.
Here's the code I used to discover which characters are encoded differently:
var diffs =
from i in Enumerable.Range(0, char.MaxValue + 1)
let c = (char)i
where !char.IsHighSurrogate(c)
let diff = new {
Original = c,
UrlEncode = WebUtility.UrlEncode(c.ToString()),
EscapeDataString = Uri.EscapeDataString(c.ToString()),
}
where diff.UrlEncode != diff.EscapeDataString
select diff;
foreach (var diff in diffs)
Console.WriteLine($"{diff.Original}\t{diff.UrlEncode}\t{diff.EscapeDataString}");
Keep in mind that you probably shouldn't be using either one of those methods. Microsoft's Anti-Cross Site Scripting Library includes replacements for HttpUtility.UrlEncode and HttpUtility.HtmlEncode that are both more standards-compliant, and more secure. As a bonus, you get a JavaScriptEncode method as well.
Server.UrlEncode() is there to provide backward compatibility with Classic ASP,
Server.UrlEncode(str);
Is equivalent to:
HttpUtility.UrlEncode(str, Response.ContentEncoding);
The same, Server.UrlEncode() calls HttpUtility.UrlEncode()

Resources