Do I need web.config for non-ASCII characters? - asp.net

Attempting to make my first ASP.NET page. Got IIS 5.1 on XP, configured to run .NET 4. Created a new virtual directory and added an .aspx file. When I browse the file, non-ASCII characters are corrupted. For instance, an ü (U+00FC) is transformed to ü (U+00C3 U+00BC), which is the I-don't-get-this-is-UTF-8 equivalent.
I have tried various ways of availing this:
I made sure the .aspx file is indeed encoded as UTF-8.
I set the meta tag:
<meta charset="UTF-8">
I set the virtual directory to handle .aspx as text/html;charset=utf-8 under HTTP Headers > File Type in IIS.
I added ResponseEncoding="utf-8" to <%# Page ... %>.
I inserted the string in HttpUtility.HtmlEncoded(). Now the ü was transformed to ü (U+00C3 U+00BC).
Finally, I found 2 ways that worked:
Replacing non-ASCII characters with character references, such as ü This was okay in the 90's, not today.
Adding a web.config file to the virtual directory, with this content:
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<system.web>
<globalization fileEncoding="utf-8"/>
</system.web>
</configuration>
Without fileEncoding setting, the ASP.NET parser will read the .aspx and corrupt every non-ASCII character without attempting to infer the file encoding. Is this just something you pros have learned to live with, or am I missing something? Is a web.config file with globalization settings the way to handle "international" characters on .aspx pages? I don't remember having similar problems with PHP, so I'm puzzled why this crops up with ASP.NET.

To use non-ASCII characters you need to have two things. Save the files using UTF-8, by choosing this encoding for the files and be sure that you have these settings on your web.config
<globalization requestEncoding="utf-8" responseEncoding="utf-8" fileEncoding="utf-8" />
Note that there is always a web.config on ASP.NET. There is the global one that also has these settings and lives in the asp.net directory {drive:}\WINDOWS\Microsoft.NET\Framework\{version}\CONFIG\, and then the web.config on your project. Sometimes the global one sets the encoding from the current country. In this case you need to set it back to UTF-8 in your project.
You have found all that already, I just point out the 3 settings:
Save your files with unicode.
Set the requestEncoding="utf-8"
Set the responseEncoding="utf-8"

You have three options.
Option 1 - either entity-encode all characters that don't fit into ASCII or replace them with similarly looking ASCII equivalents. This is error-prone and hard to maintain. The next time you have to incorporate a large piece of text you may forget to check the included piece and it "looks garbage" again.
Option 2 - save the .aspx as "UTF-8 with BOM". Such files are properly handled automatically - that's documented in description of fileEncoding property of system.web/globalization section of web.config. This is also hard to maintain - the next time you get the file resaved as "UTF-8" (without BOM) it "looks garbage" again and it may go unnoticed. When you add new .aspx files you'll have to check they are saved as "UTF-8 with BOM" too. This approach is error prone - for example, some file comparison tools don't show adding/removing BOM (at least with default settings).
Option 3 - ensure the file is saved as either "UTF-8" or "UTF-8 with BOM" and at the same time set fileEncoding property of system.web/globalization section of web.config to utf-8. The default value of this property is "single byte character encoding" so files with non-ASCII character saved as UTF-8 are handled improperly and result "looks garbage". This is the most maintainable approach - it's easy to see and easy to verify and don't randomly break when a file is resaved. fileEncoding is the only one of the three ???Encoding properties which defaults to "single byte character encoding" - responseEncoding and requestEncoding default to utf-8 so in most cases there's no need to change (or set) them, setting fileEncoding is usually enough.

Related

Replacing the Ajaxfileupload control in a Windows Forms application

We have a windows forms legacy asp.net site that uses the AjaxFileUpload control to manage file uploads. One of our issues is that we have different file type uploads but these types are distinguished not by the extension, but by an element right before the extnsion, EG: .gh.zip vs. .gy.zip. It seems that if I add one of these, but not the other, to the AllowedFileTypes, it doesn't allow either. Is it possible to piggyback some additional JS validation code to prevent an invalid file name, or would I need to replace the entire module with something else, and if so, what would be the recommendation for something that's going to be the least time-consuming that will offer a reasonable amount of configuratability?
That control is open source - you can download the source and change it if you wish.
However, why would not just specifying zip as allowed file type work?
If I set a allowed extension of zip?
Then all of these work:
.gh.zip ok
.gy.zip ok
.pdf no
However, my markup is this:
<ajaxToolkit:AjaxFileUpload ID="AjaxFileUpload1" runat="server"
OnClientUploadCompleteAll="MyCompleteAll" ChunkSize="16384"
AllowedFileTypes="zip"
/>
So, above only allows zip files.
if I try to say add a pdf file to above que, then I get this:
So just add allowed extension type = zip
(Edit: do NOT include the "." in this extension)
I not sure why that would not work?
But as noted, you can grab the source - it is open source code now.
However, I suspect perhaps some other issue is going on here?
Or maybe you need "more" complex file extensions parsing?
I mean, you could for the "rare" cases or say some "out liner" cases allow that file up-load, and THEN the post-processing code could reject the file type anyway, right?
However, looking at above, just specify file type = zip, and you should be ok.

classic asp chr() function issue

I was trying to implement RC4 encription in ASP but I found a strange behaviour on chr() function.
but the issue is not related to RC4 script but to something I've not been able to solve.
Not to mention all the test I've done, I could riproduce the issue in a very simple form:
I simply wrote
<%=chr(146)%>
in 2 pages, let say L2.asp and L3.asp
page L2.asp shows ' thus html ’
page L3.asp shows �
clearly both pages are on the same server (Windows Server 2012 R2) but
it seems page L3.asp does not recognize Extended ASCII Table.
I try adding <% Response.Charset="ISO-8859-1"%> on top.. and many other solution but nothing changes..
although the script is very simple (but tested also longer script with rc4 routine), if I copy the content of L2.asp in L3.asp or viceversa, the behaviour of the page remains unchanged, thus, L2.asp contiunes to show ' while L3 shows �, and changing name of the page will not change behaviour.
do have some idea what can create such strange behaviour?
Thanks a lot for any hint
It's not about Chr function. � is UTF8-BOM which is optional for UTF-8 files. First try to save ASP files in UTF-8 without BOM. You can use an advanced editor like Notepad++. Follow the steps: Open "file.asp" > Encoding > Convert to UTF-8 and then File > Save.
Response.Charset simply appends the name of the character set to the Content-Type response header and does nothing on server-side.
Instead you must specify Response.CodePage = 1252.

ASP.NET how to encode into Chinese?

I am currently helping a Chinese colleague migrating an ASP.NET app to a different server and we have now run into a character encoding problem. What is supposed to look like Chinese looks like gibberish.
The strings presented by the ASP.NET web pages have been coded in Chinese ...
Example:
<input id="BQuery" value=" 查询 " runat="server" class="BottonLoginRed" name="Button1" type="button" />
The Web.config file is configured like so ...
<globalization requestEncoding="gb2312" responseEncoding="gb2312" />
Since the source code contains Chinese encoded characters I figured I needed to set the correct culture for the response thread by adding a "culture" setting for "Simplified Chinese" in the Web.config file, like so ...
<globalization culture="zh-Hans" requestEncoding="gb2312" responseEncoding="gb2312" />
... but that produces this error message:
"Culture 'zh-Hans' is a neutral culture. It cannot be used in formatting and parsing and therefore cannot be set as the thread's current culture."
I have tried all variants for Chinese encoding, such as "zh-Hant", "zh-CHS" or just "zh" but they all yield the same problem. Apparently, there is no way to run Chinese as the response thread culture.
What would be the correct approach to resolve this issue?
[EDIT]
Apparently, my Chinese colleague has "solved" this problem before by simply setting Chinese as the language for the server itself. This is no longer an option (we'll have other apps, for different cultures running on the same server) but it might provide a hint.
[EDIT 2]
When I removed the encoding hints from the Web.config file it works. So, why is it we need to use these hints at all these days? Is it just me or is character encoding something that's being perceived as a very messy subject by everyone? :-)

diacritics in asp.net querystring collection

I'm trying to get a value out of the querystring that has diacritics. The diacritics are encoded in utf-8 %uxxx format, and when I check Request.QueryString["name"] they are incorrectly decoded. However, if i check Request.RawUrl, they are there and correct.
I've tried adding
<globalization fileEncoding="utf-8" requestEncoding="utf-8" responseEncoding="utf-8"/>
but still have the issue.
Is there any solution to this issue short of parsing the RawUrl myself and correctly handling the diacritics?

Unable to Retrieve Simplified Chinese Characters From Form

I have a page that displays content retrieved from XML with no problems:
<?xml version="1.0" encoding="UTF-8"?>
<Root>
<Fields>
<NamePrompt>名字</NamePrompt>
</Fields>
</Root>
Page encoding is set to GB18030 and it displays perfectly. However, when I retrieve inputted text from HttpContext.Current.Request.Form that's been entered with double-byte characters, the retrieved string contains unreadable characters. Single-byte characters are fine, obviously.
I've tried the following to no avail:
byte[] valueBytes = Encoding.UTF8.GetBytes(HttpContext.Current.Request.Form["fullName"]);
string value = Encoding.UTF8.GetString(valueBytes);
I don't see this problem with other double-byte languages like Japanese or Korean. How can I successfully retrieve double-byte characters from a page that's GB18030 encoded?
What platform is the code running on? According to this, GB18030 was not supported at all before Win2K, and not natively until WinXP.
If that isn't the problem, we'll need more details. How do you know the characters are unreadable? Are you trying to display them somewhere other than the browser? At this point, we can only assume it's a font problem, or an encoding-conversion problem in code that you haven't shown us.
By the way, the code you did post doesn't really do anything--just a perfectly safe (and pointless) round-trip from .NET string to UTF-8 byte array and back again. But that's a dead-end anyway; if the string returned by
HttpContext.Current.Request.Form["fullName"]
...is corrupt, that's the problem you have to solve. Repairing the string after the fact (if that's even possible) is no solution.

Resources