diacritics in asp.net querystring collection - asp.net

I'm trying to get a value out of the querystring that has diacritics. The diacritics are encoded in utf-8 %uxxx format, and when I check Request.QueryString["name"] they are incorrectly decoded. However, if i check Request.RawUrl, they are there and correct.
I've tried adding
<globalization fileEncoding="utf-8" requestEncoding="utf-8" responseEncoding="utf-8"/>
but still have the issue.
Is there any solution to this issue short of parsing the RawUrl myself and correctly handling the diacritics?

Related

ASP.NET how to encode into Chinese?

I am currently helping a Chinese colleague migrating an ASP.NET app to a different server and we have now run into a character encoding problem. What is supposed to look like Chinese looks like gibberish.
The strings presented by the ASP.NET web pages have been coded in Chinese ...
Example:
<input id="BQuery" value=" 查询 " runat="server" class="BottonLoginRed" name="Button1" type="button" />
The Web.config file is configured like so ...
<globalization requestEncoding="gb2312" responseEncoding="gb2312" />
Since the source code contains Chinese encoded characters I figured I needed to set the correct culture for the response thread by adding a "culture" setting for "Simplified Chinese" in the Web.config file, like so ...
<globalization culture="zh-Hans" requestEncoding="gb2312" responseEncoding="gb2312" />
... but that produces this error message:
"Culture 'zh-Hans' is a neutral culture. It cannot be used in formatting and parsing and therefore cannot be set as the thread's current culture."
I have tried all variants for Chinese encoding, such as "zh-Hant", "zh-CHS" or just "zh" but they all yield the same problem. Apparently, there is no way to run Chinese as the response thread culture.
What would be the correct approach to resolve this issue?
[EDIT]
Apparently, my Chinese colleague has "solved" this problem before by simply setting Chinese as the language for the server itself. This is no longer an option (we'll have other apps, for different cultures running on the same server) but it might provide a hint.
[EDIT 2]
When I removed the encoding hints from the Web.config file it works. So, why is it we need to use these hints at all these days? Is it just me or is character encoding something that's being perceived as a very messy subject by everyone? :-)

Do I need web.config for non-ASCII characters?

Attempting to make my first ASP.NET page. Got IIS 5.1 on XP, configured to run .NET 4. Created a new virtual directory and added an .aspx file. When I browse the file, non-ASCII characters are corrupted. For instance, an ü (U+00FC) is transformed to ü (U+00C3 U+00BC), which is the I-don't-get-this-is-UTF-8 equivalent.
I have tried various ways of availing this:
I made sure the .aspx file is indeed encoded as UTF-8.
I set the meta tag:
<meta charset="UTF-8">
I set the virtual directory to handle .aspx as text/html;charset=utf-8 under HTTP Headers > File Type in IIS.
I added ResponseEncoding="utf-8" to <%# Page ... %>.
I inserted the string in HttpUtility.HtmlEncoded(). Now the ü was transformed to ü (U+00C3 U+00BC).
Finally, I found 2 ways that worked:
Replacing non-ASCII characters with character references, such as ü This was okay in the 90's, not today.
Adding a web.config file to the virtual directory, with this content:
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<system.web>
<globalization fileEncoding="utf-8"/>
</system.web>
</configuration>
Without fileEncoding setting, the ASP.NET parser will read the .aspx and corrupt every non-ASCII character without attempting to infer the file encoding. Is this just something you pros have learned to live with, or am I missing something? Is a web.config file with globalization settings the way to handle "international" characters on .aspx pages? I don't remember having similar problems with PHP, so I'm puzzled why this crops up with ASP.NET.
To use non-ASCII characters you need to have two things. Save the files using UTF-8, by choosing this encoding for the files and be sure that you have these settings on your web.config
<globalization requestEncoding="utf-8" responseEncoding="utf-8" fileEncoding="utf-8" />
Note that there is always a web.config on ASP.NET. There is the global one that also has these settings and lives in the asp.net directory {drive:}\WINDOWS\Microsoft.NET\Framework\{version}\CONFIG\, and then the web.config on your project. Sometimes the global one sets the encoding from the current country. In this case you need to set it back to UTF-8 in your project.
You have found all that already, I just point out the 3 settings:
Save your files with unicode.
Set the requestEncoding="utf-8"
Set the responseEncoding="utf-8"
You have three options.
Option 1 - either entity-encode all characters that don't fit into ASCII or replace them with similarly looking ASCII equivalents. This is error-prone and hard to maintain. The next time you have to incorporate a large piece of text you may forget to check the included piece and it "looks garbage" again.
Option 2 - save the .aspx as "UTF-8 with BOM". Such files are properly handled automatically - that's documented in description of fileEncoding property of system.web/globalization section of web.config. This is also hard to maintain - the next time you get the file resaved as "UTF-8" (without BOM) it "looks garbage" again and it may go unnoticed. When you add new .aspx files you'll have to check they are saved as "UTF-8 with BOM" too. This approach is error prone - for example, some file comparison tools don't show adding/removing BOM (at least with default settings).
Option 3 - ensure the file is saved as either "UTF-8" or "UTF-8 with BOM" and at the same time set fileEncoding property of system.web/globalization section of web.config to utf-8. The default value of this property is "single byte character encoding" so files with non-ASCII character saved as UTF-8 are handled improperly and result "looks garbage". This is the most maintainable approach - it's easy to see and easy to verify and don't randomly break when a file is resaved. fileEncoding is the only one of the three ???Encoding properties which defaults to "single byte character encoding" - responseEncoding and requestEncoding default to utf-8 so in most cases there's no need to change (or set) them, setting fileEncoding is usually enough.

ASP.NET: HttpRequest.Url truncates trailing '.' characters

If the URL that arrives to ASP.NET application contains trailing full stops - '.', they are truncated from the Url property in HttpRequest.
For example if the URL is "http://server/folder.../", the following call:
HttpContext.Current.Request.Url.PathAndQuery;
returns "/folder/" instead of "/folder.../".
Tried this solution, but it helps only if the Uri is constructed after the suggested code executes, while HttpRequest is probably constructed before any code in ASP.NET web application is executed.
Any ideas how to preserve trailing '.' in HttpRequest.Url?
You can add the relaxedUrlToFileSystemMapping to your web.config inside <system.web> section.
<httpRuntime relaxedUrlToFileSystemMapping="true" />
This will preserve the dots in the url.
But for some reason Url.PathAndQuery wont contain the dots, while RawUrl contains them.
HttpContext.Current.Request.Request.RawUrl;
Keep in mind that there are probably some security implications when enabling relaxedUrlToFileSystemMapping.

Encoding Issue ASP.net

i upload a file with ASP.net which Contains an "ä" or "ü", when uploaded, on the server the "ä" or "ü" is replaced with another special character. How can i solve this issue. Same Problem is with normal textboxes, so i guess it has to do something with Encoding.
Maybe u have got a solution or an idea, would be quite nice...:-)
Most likely an encoding issue.
You could check:
Whether the encoding meta tag on the HTML page is correct.
Whether the pages are sending the correct encoding to the client (in the HTTP header)
Whether the pages are actually encoded in the correct encoding (via VS.NET "File" menu, menu item "Advanced Save Options").
To see the HTTP headers, use e.g. ieHttpHeaders extension for Internet Explorer.
To change the sent encoding, use either the <globalization> tag in WEB.CONFIG to change for all pages or use the #Page directive to define the response encoding on a per-page-basis.
put following code in web.config
<configuration>
<system.web>
<globalization
fileEncoding="utf-8"
requestEncoding="utf-8"
responseEncoding="utf-8"
/>
</system.web>
</configuration>
if(File.Exists(Server.MapPath("../App_Data/Karten/") + FileUpload1.PostedFile.FileName.Replace("ö","oe").Replace("Ö","Oe").Replace("Ö","ae").Replace("ä","Ae").Replace("ü","ue").Replace("Ü","Ue"))){
Label1.Text = "Datei existiert bereits";
}else{
string filepath = FileUpload1.PostedFile.FileName;
System.Diagnostics.Debug.WriteLine("Filename" + filepath);
System.Diagnostics.Debug.WriteLine("Filename" + filepath.Replace("ö","oe").Replace("Ö","Oe").Replace("Ö","ae").Replace("ä","Ae").Replace("ü","ue").Replace("Ü","Ue"));
if (FileUpload1.PostedFile.FileName.ToLower().EndsWith("jpeg") || FileUpload1.PostedFile.FileName.ToLower().EndsWith("jpg"))
{
System.Drawing.Image UploadedImage = System.Drawing.Image.FromStream(FileUpload1.PostedFile.InputStream);
if (UploadedImage == null)
{
Label1.Text = "Kein Bild";
System.IO.File.Delete(Server.MapPath("../App_Data/Karten/") + filepath);
}

Unable to Retrieve Simplified Chinese Characters From Form

I have a page that displays content retrieved from XML with no problems:
<?xml version="1.0" encoding="UTF-8"?>
<Root>
<Fields>
<NamePrompt>名字</NamePrompt>
</Fields>
</Root>
Page encoding is set to GB18030 and it displays perfectly. However, when I retrieve inputted text from HttpContext.Current.Request.Form that's been entered with double-byte characters, the retrieved string contains unreadable characters. Single-byte characters are fine, obviously.
I've tried the following to no avail:
byte[] valueBytes = Encoding.UTF8.GetBytes(HttpContext.Current.Request.Form["fullName"]);
string value = Encoding.UTF8.GetString(valueBytes);
I don't see this problem with other double-byte languages like Japanese or Korean. How can I successfully retrieve double-byte characters from a page that's GB18030 encoded?
What platform is the code running on? According to this, GB18030 was not supported at all before Win2K, and not natively until WinXP.
If that isn't the problem, we'll need more details. How do you know the characters are unreadable? Are you trying to display them somewhere other than the browser? At this point, we can only assume it's a font problem, or an encoding-conversion problem in code that you haven't shown us.
By the way, the code you did post doesn't really do anything--just a perfectly safe (and pointless) round-trip from .NET string to UTF-8 byte array and back again. But that's a dead-end anyway; if the string returned by
HttpContext.Current.Request.Form["fullName"]
...is corrupt, that's the problem you have to solve. Repairing the string after the fact (if that's even possible) is no solution.

Resources