Classic ASP convert string to windows-1252 - asp-classic

I am processing a POST request which is encoded in UTF-8. This POST request is responsible for creating a file in some folder. However, when I look at the file names for Russian characters, I see garbage values for the file name ( file contents are ok). English characters for file names are ok. In the script I see :
Set fsOBJ= Server.CreateObject("Scripting.FileSystemObject")
Set fsOBJ= fsObj.CreateTextFile(fsOBJ.BuildPath(Path, strFileName))
I believe that 'strFileName' is my problem. Windows doesn't seem to like UTF-8 filenames. Any ideas on how to solve this.

VBScript strings are strictly 2-byte unicode any encoding used in storage or transmission of strings is converted to unicode before a string existing in VBScript.
My guess is you have form post carrying the file name and the post is encoded as UTF-8. However your receiving page has its CodePage set to something other than 65001 (the UTF-8 code page) at the time of decoding the the form field carrying the file name. As a result the string retrieved from the form is corrupt.
Add <%# CODEPAGE=65001 %> to your page, include Response.CharSet = "UTF-8" in the top of the page and save it as UTF-8.
Now when the source form posts UTF-8 encoded form data to the page the form data will be decoded to unicode correctly.

Related

What is causing my browser to render an asp's &nbsp incorrectly?

I have an asp page rendering some text from a table into html. Some of the text has the non-breaking-space character in it (unicode U+00A0). The browser auto-detects the character encoding to be unicode, which is good, but it isn't rendering the correctly. It is rendering them as � (the replacement character). When I change the page encoding to be "Western" instead of "Unicode", the � characters disappear.
Shouldn't the non-breaking-space be a normal character for a Unicode encoded web page to render? What is happening to cause this?
I have verified that the character stored in the database is the non-breaking-space by using SQL Server's ASCII and UNICODE functions, both return 160.
Also, when I run this code snippet String.fromCharCode(160) it returns " ", so the browser does seem to understand that character is supposed to be a space. Could the ASP be messing those characters up between querying them and writing them as html?
The asp file was saved with ANSI encoding. Switching the file's encoding to UTF-8 solved the problem. I'm guessing even though the page said it's charset was UTF-8, it really wasn't. This explains why 'Western' encoding worked while "Unicode" did not.

Define Character Encoding of QWebElement's `toPlainText()`

I'm having trouble getting the hang of the character encoding while dealing with QWebKit's QWebElement and its toPlainText() function (*).
I have got a QString with UTF8 encoding holding the content of a HTML page, which was read from local disc via QFile. No I want to parse this page by using QWebKit. Thus I defined a QWebFrame object as part of a QWebPage. With QWebFrame::setHtml() I filled in the QString into the QWebKit environment.
QString rawReport = "some UTF8 encoded string read in previously";
QWebPage p;
QWebFrame *frame = p.mainFrame();
frame->setHtml(rawReport);
QWebElement report = frame->documentElement();
qDebug() << report.toPlainText();
But somehow, qDebug() seems to get the encoding wrong as for example German umlauts äöüß are shown rather funny. Even not as their corresponding HTML entities.
I doubt it's qDebug's fault but rather the encoding inside QWebElement. Somewhere I read, that QWebFrame::setHtml() expects UTF8 encoding. But I'm almost sure, this is the case here.
What am I missing? Is there somewhere a function/option to force QWebFrame/QWebElement to use a specific character encoding for both, input and output?
[*] Using QWebElement::toOuterXml() or QWebElement::toInnerXml() show the same encoding problem.
Have you tried using from***() functions of QString to find how the string returned by toPlainText() is encoded?
The documentation states
When using this method WebKit assumes that external resources such as JavaScript programs or style sheets are encoded in UTF-8 unless otherwise specified. For example, the encoding of an external script can be specified through the charset attribute of the HTML script tag. It is also possible for the encoding to be specified by web server.''.
I would thus try to change the charset specified in the html source (in the corresponding meta tag) that you are loading to explicitly specify that you are using UTF-8.

Classic ASP's Request.Form is dropping an 8-bit character -- is there a simple way to prevent this?

A client of mine is using a Classic ASP script to process a form from a third-party payment processor (this is the last step in a credit-card-transaction sequence that starts at the client's website, goes to the third-party site, and then returns to the client's site).
The client is in Austria and when one of the fields includes an 8-bit character (e.g., when the field value is Österreich), the Ö is simply dropped when I retrieve the value of the field in the standard way; e.g.:
fieldval = Request.Form("country")
If fieldval = "sterreich" Then
' Code here will execute
End If
The literal value that the third-party page is POSTing is %D6sterreich, which I think suggests that the POST is being encoded in UTF-8.
The POST request has the following possibly-relevant headers:
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Content-Type: application/x-www-form-urlencoded
I'm by no means a character-encoding expert and this is the first time I've really done anything with Classic ASP, so I'm kind of flummoxed.
From some Googling and searching SO, I've added the following to the page that processes the POST:
<%# Codepage=65001 %>
<%
Response.CharSet = "UTF-8"
Response.Codepage = 65001
%>
But it doesn't make any difference -- I still lose that initial 8-bit character. Is there something really simple that I'm just not aware of?
Try adding the following to the top of the page:
<%
Response.CharSet = "utf-8"
Session.CodePage = 65001
%>
Turns out I was going the wrong direction with this. The ASP file in question was itself encoded in UTF-8, which was implicitly setting Response.CodePage to 65001 -- in other words, explicitly adding a CODEPAGE directive made no difference -- and in fact the UTF-8 encoding was the source of the problem.
When I re-encoded the file to Windows-1252, the problem disappeared. I'm pretty ignorant of character encodings in general, but I think in retrospect the %D6 in the POST should have been my clue -- if I'm starting to understand things rightly, the single byte 0xD6 is not a valid UTF-8 character. Maybe someone more familiar with these things could confirm or deny this.
What about using the Ascii Character 0 in the query string, encoded as (%00), can I retrieve the whole value without terminating by Ascii 0?
http://localhost/Test_Authentication.asp?token=%13%23%02%00%01%01%00%01%01%05%02%02%03%00%02%02%0A%0A%0A%0A%0A%0A048
Response.CharSet = "utf-8";
Session.CodePage=65001;
var strToken = (Request.QueryString("token").Count > 0)?Request.QueryString("token")(1):"";
#Ben Dunlap: Try this at the top of the page --
<%#LANGUAGE="VBSCRIPT" CODEPAGE="65001"%>
Update
If you do a Response.Write Request.Form("country"), what does it display?
The 2 simple steps I used were:
add at the top of EVERY asp file:
Response.CharSet = "utf-8"
Response.CodePage = 65001
save every ASP text file in "ANSI" encoding (NOT utf-8!) - this option is usually found in the "Save" window of advanced text editors
If you save in utf-8 encoding or if you don't add the two line specified at the top of your code, this will never work as you intended.
My issue was similar (but quite strange) and adding the following two lines on all my pages has corrected it. Thanks so much for this.
Response.CharSet = "UTF-8"
Response.Codepage = 65001
But, to explain, here is the exact issue I had. Folks were entering Spanish characters on my ASP entry page and the results were very weird. For example" "Peña" was entered. The ASP page would display this, as entered, but what ended up in the database was displayed back as "Pe?a". This would have been sort of ok, except the hex actually stored in the database was 0x50653F6100. Notice the extra "00". Somehow the database stored value had an extra NULL at the end. So, when I later retrieved the data the screens went a little bonkers when the "00" [null] was hit and the displayed data essentially stopped after this data.
In any case adding the two lines seems to have fixed the issue and the "ñ" is stored in the database as it should be.

question about Character encoding in Web

let's say I have a JSP Page(i just list part of it, please don't mind):
<%# page language="java" contentType="text/html;charset=UTF-8"%>
<form>
<input type=input>
</input>
中華<!--character with BIG5 encoding>
</form>
and In server side I use this request.setCharacterEncoding("UTF-8");
my problem is:
If i use IME to input Chinese characters into the input box, then when I submit this form, what encoding will the character in the input box is ? WHY?
And if i try to copy the "中華" in the jsp page into the input box and submit the form, in server side, i found the string in the input box is not "UTF-8"(same as the setting in request.setCharacterEncoding) but "BIG5".
And this is in java/jsp, it seems that the request are not really as the setting to be "UTF-8".
why ? can someone tell me something about this ?
But In asp.net, whatever character i input into the input box and post the form, in server side, it will always be UTF-8, and seems to never corrupt.
Why ? does asp.net handle this automatically? it Change the character encoding in the input box into UTF-8 automatically?
I always think that the form post action just treat all the character in the form as some HEX, and will not process them automatically, it just enclose these HEX with header and then send it to server.
But if this idea is true, why the characters will never get corrupted in asp.net?
Thanks in advance!
Identify the point of failure.
中華
The characters you have chosen are (as Unicode codepoints) U+4E2D and U+83EF (in the CJK Unified Ideographs block). On the server, if you take the string you receive and output the values of the constituent characters using Integer.toHexString(mystring.charAt(i)), you should see these values. If this is not the case, there is a problem interpreting data from the client.
You are specifying a page encoding of UTF-8. Encoded as UTF-8, the above characters should take on the following byte sequence values in the rendered HTML:
U+4E2D 0xE4 0xB8 0xAD
U+83EF 0xE8 0x8F 0xAF
So, save the page in the browser as a file and open it in a hex editor - you should see the characters encoded as above.
You can also glean information about what is being sent from the client by sending the form to a servlet, dumping the raw byte input to a file, and inspecting it with a hex editor. It is also worth inspecting the HTTP headers and what character encodings the server and client say they will accept and are sending (see Firebug).

Possible Encoding Issue Reading HTM File using .Net Streamreader

I have an HTML file with a ® (copyright) and ™ (trademark) symbol in the text. These are just two among many other symbols. When I read the html file into a literal control it converts the symbols to something else.
The copyright symbol converts to � (open box in ff)
The trademark symbol converts to ™ (as expected)
If (System.IO.File.Exists(FullName)) Then
Dim StreamReader1 As New System.IO.StreamReader(FullName)
Contents.Text = StreamReader1.ReadToEnd()
StreamReader1.Close()
End If
Contents is a <asp:Literal runat="server" ID="Contents"></asp:Literal> and it's the only control in the aspx page.
From some research I think this is related to the encoding but I don't know why it would change how to fix it.
The html file does not contain any Content-Type settings in the head section.
If it's at all possible to shift this processing to the Render method, you could use HttpResponse.WriteFile to see if it handles these characters better than the Literal control does. If you're doing nothing with the content of this file other than assigning it to the control and then letting it render, then you should be able to do this OK.

Resources