ASP.NET how to encode into Chinese? - asp.net

I am currently helping a Chinese colleague migrating an ASP.NET app to a different server and we have now run into a character encoding problem. What is supposed to look like Chinese looks like gibberish.
The strings presented by the ASP.NET web pages have been coded in Chinese ...
Example:
<input id="BQuery" value=" 查询 " runat="server" class="BottonLoginRed" name="Button1" type="button" />
The Web.config file is configured like so ...
<globalization requestEncoding="gb2312" responseEncoding="gb2312" />
Since the source code contains Chinese encoded characters I figured I needed to set the correct culture for the response thread by adding a "culture" setting for "Simplified Chinese" in the Web.config file, like so ...
<globalization culture="zh-Hans" requestEncoding="gb2312" responseEncoding="gb2312" />
... but that produces this error message:
"Culture 'zh-Hans' is a neutral culture. It cannot be used in formatting and parsing and therefore cannot be set as the thread's current culture."
I have tried all variants for Chinese encoding, such as "zh-Hant", "zh-CHS" or just "zh" but they all yield the same problem. Apparently, there is no way to run Chinese as the response thread culture.
What would be the correct approach to resolve this issue?
[EDIT]
Apparently, my Chinese colleague has "solved" this problem before by simply setting Chinese as the language for the server itself. This is no longer an option (we'll have other apps, for different cultures running on the same server) but it might provide a hint.
[EDIT 2]
When I removed the encoding hints from the Web.config file it works. So, why is it we need to use these hints at all these days? Is it just me or is character encoding something that's being perceived as a very messy subject by everyone? :-)

Related

classic asp chr() function issue

I was trying to implement RC4 encription in ASP but I found a strange behaviour on chr() function.
but the issue is not related to RC4 script but to something I've not been able to solve.
Not to mention all the test I've done, I could riproduce the issue in a very simple form:
I simply wrote
<%=chr(146)%>
in 2 pages, let say L2.asp and L3.asp
page L2.asp shows ' thus html ’
page L3.asp shows �
clearly both pages are on the same server (Windows Server 2012 R2) but
it seems page L3.asp does not recognize Extended ASCII Table.
I try adding <% Response.Charset="ISO-8859-1"%> on top.. and many other solution but nothing changes..
although the script is very simple (but tested also longer script with rc4 routine), if I copy the content of L2.asp in L3.asp or viceversa, the behaviour of the page remains unchanged, thus, L2.asp contiunes to show ' while L3 shows �, and changing name of the page will not change behaviour.
do have some idea what can create such strange behaviour?
Thanks a lot for any hint
It's not about Chr function. � is UTF8-BOM which is optional for UTF-8 files. First try to save ASP files in UTF-8 without BOM. You can use an advanced editor like Notepad++. Follow the steps: Open "file.asp" > Encoding > Convert to UTF-8 and then File > Save.
Response.Charset simply appends the name of the character set to the Content-Type response header and does nothing on server-side.
Instead you must specify Response.CodePage = 1252.

Do I need web.config for non-ASCII characters?

Attempting to make my first ASP.NET page. Got IIS 5.1 on XP, configured to run .NET 4. Created a new virtual directory and added an .aspx file. When I browse the file, non-ASCII characters are corrupted. For instance, an ü (U+00FC) is transformed to ü (U+00C3 U+00BC), which is the I-don't-get-this-is-UTF-8 equivalent.
I have tried various ways of availing this:
I made sure the .aspx file is indeed encoded as UTF-8.
I set the meta tag:
<meta charset="UTF-8">
I set the virtual directory to handle .aspx as text/html;charset=utf-8 under HTTP Headers > File Type in IIS.
I added ResponseEncoding="utf-8" to <%# Page ... %>.
I inserted the string in HttpUtility.HtmlEncoded(). Now the ü was transformed to ü (U+00C3 U+00BC).
Finally, I found 2 ways that worked:
Replacing non-ASCII characters with character references, such as ü This was okay in the 90's, not today.
Adding a web.config file to the virtual directory, with this content:
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<system.web>
<globalization fileEncoding="utf-8"/>
</system.web>
</configuration>
Without fileEncoding setting, the ASP.NET parser will read the .aspx and corrupt every non-ASCII character without attempting to infer the file encoding. Is this just something you pros have learned to live with, or am I missing something? Is a web.config file with globalization settings the way to handle "international" characters on .aspx pages? I don't remember having similar problems with PHP, so I'm puzzled why this crops up with ASP.NET.
To use non-ASCII characters you need to have two things. Save the files using UTF-8, by choosing this encoding for the files and be sure that you have these settings on your web.config
<globalization requestEncoding="utf-8" responseEncoding="utf-8" fileEncoding="utf-8" />
Note that there is always a web.config on ASP.NET. There is the global one that also has these settings and lives in the asp.net directory {drive:}\WINDOWS\Microsoft.NET\Framework\{version}\CONFIG\, and then the web.config on your project. Sometimes the global one sets the encoding from the current country. In this case you need to set it back to UTF-8 in your project.
You have found all that already, I just point out the 3 settings:
Save your files with unicode.
Set the requestEncoding="utf-8"
Set the responseEncoding="utf-8"
You have three options.
Option 1 - either entity-encode all characters that don't fit into ASCII or replace them with similarly looking ASCII equivalents. This is error-prone and hard to maintain. The next time you have to incorporate a large piece of text you may forget to check the included piece and it "looks garbage" again.
Option 2 - save the .aspx as "UTF-8 with BOM". Such files are properly handled automatically - that's documented in description of fileEncoding property of system.web/globalization section of web.config. This is also hard to maintain - the next time you get the file resaved as "UTF-8" (without BOM) it "looks garbage" again and it may go unnoticed. When you add new .aspx files you'll have to check they are saved as "UTF-8 with BOM" too. This approach is error prone - for example, some file comparison tools don't show adding/removing BOM (at least with default settings).
Option 3 - ensure the file is saved as either "UTF-8" or "UTF-8 with BOM" and at the same time set fileEncoding property of system.web/globalization section of web.config to utf-8. The default value of this property is "single byte character encoding" so files with non-ASCII character saved as UTF-8 are handled improperly and result "looks garbage". This is the most maintainable approach - it's easy to see and easy to verify and don't randomly break when a file is resaved. fileEncoding is the only one of the three ???Encoding properties which defaults to "single byte character encoding" - responseEncoding and requestEncoding default to utf-8 so in most cases there's no need to change (or set) them, setting fileEncoding is usually enough.

Is IIS performing an illegal character substitution? If so, how to stop it?

Context: ASP.NET MVC running in IIS, with a a UTF-8 %-encoded URL.
Using the standard project template, and a test-action in HomeController like:
public ActionResult Test(string id)
{
return Content(id, "text/plain");
}
This works fine for most %-encoded UTF-8 routes, such as:
http://mydevserver/Home/Test/%e4%ba%ac%e9%83%bd%e5%bc%81
with the expected result 京都弁
However using the route:
http://mydevserver/Home/Test/%ee%93%bb
the url is not received correctly.
Aside: %ee%93%bb is %-encoded code-point 0xE4FB; basic-multilingual-plane, private-use area; but ultimately - a valid unicode code-point; you can verify this manually, or via:
string value = ((char) 0xE4FB).ToString();
string encoded = HttpUtility.UrlEncode(value); // %ee%93%bb
Now, what happens next depends on the web-server; on the Visual Studio Development Server (aka cassini), the correct id is received - a string of length one, containing code-point 0xE4FB.
If, however, I do this in IIS or IIS Express, I get a different id, specifically "î“»", code-points: 0xEE, 0x201C, 0xBB. You will immediately recognise the first and last as the start and end of our percent-encoded string... so what happened in the middle?
Well:
code-point 0x93 is “ (source)
code-point 0x201c is “ (source)
It looks to me very much like IIS has performed some kind of quote-translation when processing my url. Now maybe this might have uses in a few scenarios (I don't know), but it is certainly a bad thing when it happens in the middle of a %-encoded UTF-8 block.
Note that HttpContext.Current.Request.Raw also shows this translation has occurred, so this does not look like an MVC bug; note also Darin's comment, highlighting that it works differently in the path vs query portion of the url.
So (two-parter):
is my analysis missing some important subtlety of unicode / url processing?
how do I fix it? (i.e. make it so that I receive the expected character)
id = Encoding.UTF8.GetString(Encoding.Default.GetBytes(id));
This will give you your original id.
IIS uses Default (ANSI) encoding for path characters. Your url encoded string is decoded using that and that is why you're getting a weird thing back.
To get the original id you can convert it back to bytes and get the string using utf8 encoding.
See Unicode and ISAPI Filters
ISAPI Filter is an ANSI API - all values you can get/set using the API
must be ANSI. Yes, I know this is shocking; after all, it is 2006 and
everything nowadays are in Unicode... but remember that this API
originated more than a decade ago when barely anything was 32bit, much
less Unicode. Also, remember that the HTTP protocol which ISAPI
directly manipulates is in ANSI and not Unicode.
EDIT: Since you mentioned that it works with most other characters so I'm assuming that IIS has some sort of encoding detection mechanism which is failing in this case. As a workaround though you can prefix your id with this char and then you can easily detect if the problem occurred (if this char is missing). Not a very ideal solution but it will work. You can then write your custom model binder and a wrapper class in ASP.NET MVC to make your consumption code cleaner.
Once Upon A Time, URLs themselves were not in UTF-8. They were in the ANSI code page. This facilitates the fact that they often are used to select, well, pathnames in the server's file system. In ancient times, IE had an option to tell whether you wanted to send UTF-8 URLs or not.
Perhaps buried in the bowels of the IIS config there is a place to specify the URL encoding, and perhaps not.
Ultimately, to get around this, I had to use request.ServerVariables["HTTP_URL"] and some manual parsing, with a bunch of error-handling fallbacks (additionally compensating for some related glitches in Uri). Not great, but only affects a tiny minority of awkward requests.

ASP.net problem with Regional & Language Options (win2k3)

I have a TextBox in an Asp.net form. there is a simple javascript witch separates each three digits in TextBox. it works fine when you enter data in TextBox. I used coma , for separating digits and used dot . as floating point character.
as I said every thing works fine when I am entering data in TextBox. but when the post-back occurs and saved data returns to client, every .(s) has been removed (for example 2.3 saved as 23 and digits in TextBox are separated by . instead of ,.
this problem occurs just in a specific server (windows server 2003 sp1) and works fine in other windows server 2003 (SP1)! I am experiencing this problem for first time!
But I think the problem is because of specific Regional & Language Options in the server. This server is joined to a domain controller. when I change the regional and language options to this set:
Decimal Symbol -> .
Digit Grouping Symbol -> ,
nothing changes.
when I check the following item after customizing settings :
Apply All Settings to the current user account and to the default user profile -> checked
when I restart the Server, It jumps out from domain and need to be re-joined to domain controller! and of-course nothing changes again!
Do you had this problem? any solution please!
I can not post code here, because the code is too complex and I am sure problem is not because of code because it is working every where unless the specified server.
EDIT
Also setting regional and language options for network service user may help to solve the problem. any body knows how can I do this ?
Have you tried using the globalization tag in your web.config? This prevents you from running into trouble when multiple servers are configured differently (ie. different languagepacks).
<configuration>
<system.web>
<globalization
culture="en-US"
uiCulture="en-US" />
</system.web>
</configuration>
After goofing around with a similar problem for WAY to long I did the following with the help of a number of clues (also found on StackOverFlow, StackOverFlow rocks by the way...)
The first thing I did was dump out what the server was actually thinking (Page_Load):
var dtInfo = System.Globalization.DateTimeFormatInfo.CurrentInfo;
DisplayDebugInfo(String.Format(
"Culture({0}/{1}), DateFormat(SD:{2},DS:{3})",
System.Globalization.CultureInfo.CurrentCulture.Name,
System.Globalization.CultureInfo.CurrentUICulture.Name,
dtInfo.ShortDatePattern, dtInfo.DateSeparator));
Also on Windows 2003, I tried fixing the regional setting via the regular control panel but with no success.
I also tried setting the globalization settings in the web.config as mentioned in the other solution but with little effect.
It seems that once you start messing with the regional setting you can quickly get to the point where things are messed up. I decided to avoid messing with the registry and go for a code solution because then I would not have to worry when my code was released to production.
I added the following code to the base class for my page so that it would fix it everywhere. You could also place it in the Page_Load.
using System.Globalization;
using System.Threading;
// Fix the cultural settings...
CultureInfo culture = (CultureInfo)CultureInfo.CurrentCulture.Clone();
culture.DateTimeFormat.ShortDatePattern = "MM/dd/yyyy";
culture.DateTimeFormat.DateSeparator = "/";
Thread.CurrentThread.CurrentCulture = culture;
Problem solved. For me anyway.

Best way to convert a decimal value to a currency string for display in HTML

I wanting to show prices for my products in my online store.
I'm currently doing:
<span class="ourprice">
<%=GetPrice().ToString("C")%>
</span>
Where GetPrice() returns a decimal. So this currently returns a value e.g. "£12.00"
I think the correct HTML for an output of "£12.00" is "£12.00", so although this is rendering fine in most browsers, some browsers (Mozilla) show this as $12.00.
(The server is in the UK, with localisation is set appropriately in web.config).
Is the below an improvement, or is there a better way?
<span class="ourprice">
<%=GetPrice().ToString("C").Replace("£","£")%>
</span>
Try this, it'll use your locale set for the application:
<%=String.Format("{0:C}",GetPrice())%>
Use
GetPrice().ToString("C", CultureInfo.CreateSpecificCulture("en-GB"))
The £ symbol (U+00A3), and the html entities & #163; and & pound; should all render the same in a browser.
If the browser doesn't recognise £, it probably won't recognise the entity versions.
It's in ISO 8859-1 (Latin-1), so I'd be surprised if a Mozilla browser can't render it (my FF certainly can).
If you see a $ sign, it's likely you have two things:
1. The browser default language is en-us
2. Asp.net is doing automatic locale switching. The default web.config setting is something like
<globalization culture="auto:en-us" uiCulture="auto:en-US" />
As you (almost certainly) want UK-only prices, simply specify the locale in web.config:
<globalization culture="us" uiCulture="en-gb" />
(or on page level:)
<%#Page Culture="en-gb" UICulture="en-gb" ..etc... %>
Thereafter the string formats such as String.Format("{0:C}",GetPrice()) and GetPrice().ToString("C") will use the en-GB locale as asp.net will have set the currentCulture for you
(although you can specify the en-gb culture in the overloads if you're paranoid).
You could write a function which would perform the conversion from price to string. This way you have a lot of control over the output.
The problem with locale is that it's web server dependent and not web browser dependent.
If you need to explicity state the localisation you can use the CultureInfo and pass that to the string formatter.
just use the ToString("C2") property of a decimal value. Set your globalization in the web.config - keep it simple.

Resources