Charset not working in .html but not .aspx - asp.net

When I put this word "Bibliothèque" in a .aspx page, I see it correctly "Bibliothèque".
If I put the same word in a .html file, I see "Bibliothèque"
How can this be possible? Must be an IIS issue but I can't find the setting.
How can a .aspx file show the right word but not a .html file.

Open the file named web.config in the ASP.NET project. The value of requestEncoding attribute in globalization element is "utf-8". It means the requested texts were encoded as UTF-8 character set.
check your browser what it is support. you can change it using character encoding. So your HTML is giving you the result according to browser character encoding.

To ensure it will always work, for this specific example, you can replace the non ASCII characters using Html entities, like this: Bibliothèque. But this is not always practical in general.
Otherwise, there are other various ways to make it work:
use byte order mark encoding (sometimes called 'signature', or BOM, by editors) and save the file as UTF-8
add a META character encoding to your html file.
define what HTTP headers will be sent to the client using the globalization element in the application web.config (responseEncoding, etc.)
define what HTTP headers will be sent to the client using the ASP.NET #page directive
The best is to make sure all this is consistent in your application. UTF-8 support is now widespread, so it's a good choice as the encoding.
An interesting article on the encoding subject :The Definitive Guide to Web Character Encoding

Related

Disable URL encoding

Hello I'm not trying to use space or anything like that. all my urls are standard with dash separated words but the characters are in persian so instead of
/%D8%A2%D8%B1%D8%A7%DB%8C%D8%B4%DB%8C
I wanna see
/آرایشی
The links are already saved as the second one but when it shows it on webpage it automatically encodes it.
My CMS can handle getting requests like the second one and auto redirect. I've tried changing config file and some globalization settings but no luck yet.

Missing (replaced) polish characters after serving via IIS

I have a page in ASP.net (VB) that I'm serving via IIS.
The page is basically a translation of the uk site.
I have:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
at the top of the code, and all the characters show ok in the code.
however in (all) browsers many of the special polish characters, such as 'Ł' are missing, replaced directly with 'L'.
Is this an IIS thing? or could it be something else?
ETA: I just noticed that the polish text portion drawn out of the SQL database is being displayed correctly within the same page..! Odd!
Further edit:
I have found the basic source of the issue, I think, but not a solution:
The areas that are not showing properly are headers and footers, which are imported into the page via Server Side Include.
It seams some sort of encoding is being lost in this import / injection.
Should the imported file have some sort of encoding header?
This sounds like a problem with encoding in your static content files. The content-type <meta> has no bearing on the actual physical encoding of the file. I have a suspicion the file is saved in Codepage 1252 instead of UTF-8.
I suggest you open your *.aspx files (where I assume you're storing the problematic Polish text) in a text editor that supports different encodings (such as VS or Notepad2. Not WordPad or Windows Notepad). Force-save the file with UTF-8 encoding (in VS, go File > Advanced Save Options and ensure "Unicode (UTF-8 with signature)" is selected). Then access your site again.
Also ensure that the Content-Type HTTP header is also correctly set to UTF-8.

Special char added to css (​) where did this come from?

I was doing a bunch of search-replace operations in notepad++ to effectively minify my css - mostly removing whitespace/tabs etc...) This ended up breaking much of my css.
Apparently a strange character (​) was inserted all over the place) Using notepad++ in UTF-8 without BOM, I cannot see these, but they appeared in a view-source.
I was able to remove these by doing a search replace in ANSI encoding, but my question is, what is this character, and why might it have appeared?
The string “​” is the UTF-8 encoded form of ZWSP when misinterpreted as windows-1252 encoded data. (Checked this using a nice UTF-8 decoder.) This explains why you don’t see it in Notepad++ in UTF-8 mode; ZWSP (zero-width space) is an invisible character with no width.
Apparently browsers are interpreting the style sheet as windows-1252 encoded. Saving the file with BOM might help, since then browsers would probably guess the encoding better. The real fix is to make sure (in a server-dependent manner) that the server sends appropriate Content-Type header for the CSS file.
But if this is the only non-Ascii character in your CSS file, it does not matter in practice, after you have removed the offending data.
I don’t know of any simple way to make Notepad++ insert ZWSP (you could of course use general character insertion utilities in the system), so it’s a bit of mystery where it came from. Perhaps via copy and paste from somewhere.
Using the web developer plug in or ext in Firefox you can see the problem character in the css document.
In Visual Studio all I could see was:
}
.t
Web developer showed an unwanted hidden character, an "a" with a caret on top:
}
â.t
The utf encoder link above revealed this
} (the encoded character for ampersand)
.t
and this
but simply fix the problem by deleting and retyping.

HtmlEncode Local resources

I have a web site that uses local resources. The main text (so not the labels, etc.) on de default page is stored in a file. This file is added to my local resources file default.aspx.fi-FI.resx and is named text-defaultPage. It's a regular text file with tags etc.
The problem is however, that the text is Finnish in other words it uses a lot of characters having umlaut (ä) and other special characters.
The person for whom the web site is wants to edit this text himself but he doesn't know anything about programming, html entities etc.
Is there a way to make it so that those characters are encoded with say htmlEncode?
in my Global.asax I check for the selected language and the page gets reload with that language.
Edit
Never mind, I made the files Unicode text files.
Solution to the problem is make the files unicode.

AntiXss.UrlEncode vs. AntiXss.HtmlAttributeEncode usage in link (a href)

According to old AntiXss article on MSDN AntiXss.UrlEncode is used to encode link href (Untrusted-input in the following example):
Click Here!
My understanding was, that UrlEncode should be used only when setting something to URL, like when setting document.location with JS. So why don't I use HtmlAttributeEncode in the previous example to encode [Untrusted-input]? On the other hand is there a security flaw if I use UrlEncode to encode HTML attributes like in the above sample?
Url Encode encodes URL parameters for use in anchor tags.
Html Attribute encode encodes things for use in general HTML attributes.
Both encoding types vary - unsafe characters in HTML attribute encoding will be turned into a &xxx; form, in URL encoding they'll turn into %xxx. Whilst it's probably unlikely getting it wrong would cause a security problem your data wouldn't be properly rendered in the browser, or understood in a request.
(Indeed Url encoding is probably going to change because of an incompatibility with older browsers, and HTML Encoding will change in the next CTP drop to allow for safe listing of particular Unicode ranges).

Resources