Classic ASP text substitution and UTF-8 encoding - asp-classic

We have a website that uses Classic ASP.
Part of our release process substitutes values in a file and we found a bug in it where it will write the file out as UTF-8.
This then causes our application to start spitting out garbage. Apostrophes get returned as some encoded characters.
If we then go an remove the BOM that says this file is UTF-8 then the text that was previously rendered as garbage is now displayed correctly.
Is there something that IIS does differently when it encounters UTF-8 a file?

I was searching on the same exact issue yesterday and came across:
http://blog.inspired.no/utf-8-with-asp-71/
Important part from that page, in case it goes away...
ASP CODE:
Response.ContentType = "text/html"
Response.AddHeader "Content-Type", "text/html;charset=UTF-8"
Response.CodePage = 65001
Response.CharSet = "UTF-8"
and the following HTML META tag:
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
We were using the meta tag and asp CharSet property, yet the page still didn't render correctly. After adding the other three lines to the asp file everything just worked.
Hope this helps!

UTF-8 does not use BOMs; it is an annoying misfeature in some Microsoft software that puts them there. You need to find what step of your release process is putting a UTF-8-encoded BOM in your files and fix it — you should stop that even if you are using UTF-8, which really these days is best.
But I doubt it's IIS causing the display problem. More likely the browser is guessing the charset of the final displayed page, and when it sees bytes that look like they're UTF-8 encoded it guesses the whole page is UTF-8. You should be able to stop it doing that by stating a definitive charset by using an HTTP header:
Content-Type: text/html;charset=iso-8859-1
and/or a meta element in the HTML
<meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1" />
Now (assuming ISO-8859-1 is actually the character set your data are in) it should display OK. However if your file really does have a UTF-8-encoded BOM at the start, you'll now see that as ‘’ in your page, which is what those bytes look like in ISO-8859-1. So you still need to get rid of that misBOM.

If you using access db you should write
Session.CodePage=65001
Set tabtable= Conn.Execute("SELECT * FROM table")

Related

charset utf-8 in asp.net for special turkish characters

I put <meta http-equiv="content-type" content="text/html; charset=utf-8" /> in the head, and DOCTYPE at the top of my _Layout.cshtml, but still when I'm viewing the website, the special Turkish characters are displayed as like &#23'1; and such in the source, not in the page. The webpage displays them correctly, the source file has the problem..
Do you know what else I should do to correct this?
Try to save your source file as UTF-8 with BOM.
I was having a similar issue and it seems that Razor expects files to have the BOM signature.
In Visual Studio: FILE > Advanced Save Options:
I had exactly the same problem, tried all the solutions and none of them worked, checked .cshtml file encoding with Notepad++ and it was correct (utf8 with BOM), SaveAs file with encoding (as shown above) but the problem still persisted.
Finally I found out that there was nothing wrong with my razor files, it seems that ASP.Net core MVC does not set UTF-8 encoding as default encoding and it must be set if needed by developer.
I added the below code at the end of ConfigureServices method of Program.cs (it adds UTF-8 encoding to default encodings) and the problem solved.
services.AddWebEncoders(o => {
o.TextEncoderSettings = new System.Text.Encodings.Web.TextEncoderSettings(UnicodeRanges.All);
});

Missing (replaced) polish characters after serving via IIS

I have a page in ASP.net (VB) that I'm serving via IIS.
The page is basically a translation of the uk site.
I have:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
at the top of the code, and all the characters show ok in the code.
however in (all) browsers many of the special polish characters, such as 'Ł' are missing, replaced directly with 'L'.
Is this an IIS thing? or could it be something else?
ETA: I just noticed that the polish text portion drawn out of the SQL database is being displayed correctly within the same page..! Odd!
Further edit:
I have found the basic source of the issue, I think, but not a solution:
The areas that are not showing properly are headers and footers, which are imported into the page via Server Side Include.
It seams some sort of encoding is being lost in this import / injection.
Should the imported file have some sort of encoding header?
This sounds like a problem with encoding in your static content files. The content-type <meta> has no bearing on the actual physical encoding of the file. I have a suspicion the file is saved in Codepage 1252 instead of UTF-8.
I suggest you open your *.aspx files (where I assume you're storing the problematic Polish text) in a text editor that supports different encodings (such as VS or Notepad2. Not WordPad or Windows Notepad). Force-save the file with UTF-8 encoding (in VS, go File > Advanced Save Options and ensure "Unicode (UTF-8 with signature)" is selected). Then access your site again.
Also ensure that the Content-Type HTTP header is also correctly set to UTF-8.

Charset not working in .html but not .aspx

When I put this word "Bibliothèque" in a .aspx page, I see it correctly "Bibliothèque".
If I put the same word in a .html file, I see "Bibliothèque"
How can this be possible? Must be an IIS issue but I can't find the setting.
How can a .aspx file show the right word but not a .html file.
Open the file named web.config in the ASP.NET project. The value of requestEncoding attribute in globalization element is "utf-8". It means the requested texts were encoded as UTF-8 character set.
check your browser what it is support. you can change it using character encoding. So your HTML is giving you the result according to browser character encoding.
To ensure it will always work, for this specific example, you can replace the non ASCII characters using Html entities, like this: Bibliothèque. But this is not always practical in general.
Otherwise, there are other various ways to make it work:
use byte order mark encoding (sometimes called 'signature', or BOM, by editors) and save the file as UTF-8
add a META character encoding to your html file.
define what HTTP headers will be sent to the client using the globalization element in the application web.config (responseEncoding, etc.)
define what HTTP headers will be sent to the client using the ASP.NET #page directive
The best is to make sure all this is consistent in your application. UTF-8 support is now widespread, so it's a good choice as the encoding.
An interesting article on the encoding subject :The Definitive Guide to Web Character Encoding

IE9 encoding problem

In one of my asp.net-applications I found a strange behaviour produced in Internet Explorer 9 while IE8 works well.
As the default encoding I need utf-8. That's important because I use german so called Umlaute like "ÄäÖüÜü".
When the page is loaded for the first time IE9 decides to use "Western Europe" Encoding. That's ISO 8859-1 as far as I know and the Umlaute change to strange letters.
On the second load IE9 uses utf-8 correctly.
In the sourcecode I tried the following things to tell IE which encoding to use:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 TRANSITIONAL//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="de">
<meta http-equiv="content-type" content="text/html; charset=utf-8">
Why does IE9 work so strange on the first load?
And what else can I try to tell IE9 how to work
Firstable - server where you host your site can return wrong encoding information in header;
Two - maybe it's some fail in string that tolk about encoding in the header of your page (wrong symbol in that string).
Three - open you page in Hex brouser (WinHex for example) and post first row of code (sometimes editor place wrong data in first byte, I've stumble on it once)
If this site is placed online, post it's url and I try to find a problem.
Check response header of you server it must contains something like this:
Key Value
Content-Type text/html; charset=utf-8
Response HTTP/1.1 200 OK
if it's not then check you server settings or you code there must be place where Content-type header changes
EDIT: ok, encoding is right, as suggested in the comment you shoul check first bytes of you response, it seems like it starts with additional bytes (usually info about encoding)

strange characters on web page

I have a graffiti blog and i have a strange problem which is showing strange char page like this:
alt text http://amrelgarhy.com/ScreenShots/error.jpg
This page was showing when I opened my control panel admin page. It's also showing the same when I try to edit one of my previous posts. My problem is that i don't know what's the reason behind it.
I am not sure how to fix this. All my posts are in English and I always use Windows Live Writer to post.
Has anyone faced a problem like this before? Can you advise me on finding the cause of this problem, and any potential solution?
Looks like it might be an encoding mismatch. Are you opening UTF-8 (or some other Unicode)-encoded files in a tool that doesn't understand UTF encodings or vice-versa?
Try placing this in your master page:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Also, check that a virtual directory has been created.
There seems to be a problem with the content MIME-types. The weirdness you are seeing happens because the server offers content as binary (I'm guessing application/octet-stream) even though it should offer them as text/html. Images should be offered as image/<extension>, for example image/png.
You can manually set MIME-type handlers to certain filetypes. If you are using Apache, you could easily to this in a .htaccess file like this:
AddType text/html .html
If your content is something else than HTML the MIME-type is something different. If your web-server doesn't automatically do this you should probably add the handlers yourself.
All MIME-types can be found from here: http://www.iana.org/assignments/media-types/

Resources