Possible Encoding Issue Reading HTM File using .Net Streamreader - asp.net

I have an HTML file with a ® (copyright) and ™ (trademark) symbol in the text. These are just two among many other symbols. When I read the html file into a literal control it converts the symbols to something else.
The copyright symbol converts to � (open box in ff)
The trademark symbol converts to ™ (as expected)
If (System.IO.File.Exists(FullName)) Then
Dim StreamReader1 As New System.IO.StreamReader(FullName)
Contents.Text = StreamReader1.ReadToEnd()
StreamReader1.Close()
End If
Contents is a <asp:Literal runat="server" ID="Contents"></asp:Literal> and it's the only control in the aspx page.
From some research I think this is related to the encoding but I don't know why it would change how to fix it.
The html file does not contain any Content-Type settings in the head section.

If it's at all possible to shift this processing to the Render method, you could use HttpResponse.WriteFile to see if it handles these characters better than the Literal control does. If you're doing nothing with the content of this file other than assigning it to the control and then letting it render, then you should be able to do this OK.

Related

What is causing my browser to render an asp's &nbsp incorrectly?

I have an asp page rendering some text from a table into html. Some of the text has the non-breaking-space character in it (unicode U+00A0). The browser auto-detects the character encoding to be unicode, which is good, but it isn't rendering the correctly. It is rendering them as � (the replacement character). When I change the page encoding to be "Western" instead of "Unicode", the � characters disappear.
Shouldn't the non-breaking-space be a normal character for a Unicode encoded web page to render? What is happening to cause this?
I have verified that the character stored in the database is the non-breaking-space by using SQL Server's ASCII and UNICODE functions, both return 160.
Also, when I run this code snippet String.fromCharCode(160) it returns " ", so the browser does seem to understand that character is supposed to be a space. Could the ASP be messing those characters up between querying them and writing them as html?
The asp file was saved with ANSI encoding. Switching the file's encoding to UTF-8 solved the problem. I'm guessing even though the page said it's charset was UTF-8, it really wasn't. This explains why 'Western' encoding worked while "Unicode" did not.

classic asp chr() function issue

I was trying to implement RC4 encription in ASP but I found a strange behaviour on chr() function.
but the issue is not related to RC4 script but to something I've not been able to solve.
Not to mention all the test I've done, I could riproduce the issue in a very simple form:
I simply wrote
<%=chr(146)%>
in 2 pages, let say L2.asp and L3.asp
page L2.asp shows ' thus html ’
page L3.asp shows �
clearly both pages are on the same server (Windows Server 2012 R2) but
it seems page L3.asp does not recognize Extended ASCII Table.
I try adding <% Response.Charset="ISO-8859-1"%> on top.. and many other solution but nothing changes..
although the script is very simple (but tested also longer script with rc4 routine), if I copy the content of L2.asp in L3.asp or viceversa, the behaviour of the page remains unchanged, thus, L2.asp contiunes to show ' while L3 shows �, and changing name of the page will not change behaviour.
do have some idea what can create such strange behaviour?
Thanks a lot for any hint
It's not about Chr function. � is UTF8-BOM which is optional for UTF-8 files. First try to save ASP files in UTF-8 without BOM. You can use an advanced editor like Notepad++. Follow the steps: Open "file.asp" > Encoding > Convert to UTF-8 and then File > Save.
Response.Charset simply appends the name of the character set to the Content-Type response header and does nothing on server-side.
Instead you must specify Response.CodePage = 1252.

Classic ASP convert string to windows-1252

I am processing a POST request which is encoded in UTF-8. This POST request is responsible for creating a file in some folder. However, when I look at the file names for Russian characters, I see garbage values for the file name ( file contents are ok). English characters for file names are ok. In the script I see :
Set fsOBJ= Server.CreateObject("Scripting.FileSystemObject")
Set fsOBJ= fsObj.CreateTextFile(fsOBJ.BuildPath(Path, strFileName))
I believe that 'strFileName' is my problem. Windows doesn't seem to like UTF-8 filenames. Any ideas on how to solve this.
VBScript strings are strictly 2-byte unicode any encoding used in storage or transmission of strings is converted to unicode before a string existing in VBScript.
My guess is you have form post carrying the file name and the post is encoded as UTF-8. However your receiving page has its CodePage set to something other than 65001 (the UTF-8 code page) at the time of decoding the the form field carrying the file name. As a result the string retrieved from the form is corrupt.
Add <%# CODEPAGE=65001 %> to your page, include Response.CharSet = "UTF-8" in the top of the page and save it as UTF-8.
Now when the source form posts UTF-8 encoded form data to the page the form data will be decoded to unicode correctly.

How can I write raw XML to a Label in ASP.NET

I am getting a block of XML back from a web service. The client wants to see this raw XML in a label on the page. When I try this:
lblXmlReturned.Text = returnedXml;
only the text gets displayed, without any of the XML tags. I need to include everything that gets returned from the web service.
This is a trimmed down sample of the XML being returned:
<Result Matches="1">
<VehicleData>
<Make>Volkswagen</Make>
<UK_History>false</UK_History>
</VehicleData>
<ABI>
<ABI_Code></ABI_Code>
<Advisory_Insurance_Group></Advisory_Insurance_Group>
</ABI>
<Risk_Indicators>
<Change_In_Colour>false</Change_In_Colour>
</Risk_Indicators>
<Valuation>
<Value xsi:nil="true" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"></Value>
</Valuation>
<NCAP>
<Pre_2009></Pre_2009>
</NCAP>
</Result>
What can I do to make this appear on the screen? I noticed that Stack Overflow does a pretty good job of putting the XML on the scren. I checked the source and it's using <pre> tags. Is this something that I have have to use?
It would be easier to use a <asp:Literal /> with it's Mode set to Encode than to deal with manually encoding Label's Text
<asp:Literal runat="server" ID="Literal1" Mode="Encode" />
You need to HtmlEncode the XML first (which escapes special characters like < and > ):
string encodedXml = HttpUtility.HtmlEncode(xml);
Label1.Text = encodedXml;
Surrounding it in PRE tags would help preserve formatting, so you could do:
string encodedXml = String.Format("<pre>{0}</pre>", HttpUtility.HtmlEncode(xml));
Label1.Text = encodedXml;
As Bala R mentions, you could just use a Literal control with Mode="Encode" as this automatically HtmlEncodes any string. However, this would also encode any PRE tags you added into the string, which you wouldn't want. You could also use white-space:pre in CSS which should do the same thing as the PRE tag, I think.

html injection question

Using FreeTextBox, I'm capturing HTML-formatted text. The purpose is to allow a website owner to update their web page content on a few pages. I have the system completed except for knowing what to do with the resultant HTML markup.
After the page editor completes their work, I can get the output from FreeTextBox, in html format, like so: <font color="#000080"><b>This is some text.</b></font>
I tried storing it as escaped markup in web.config, but that didn't work since it kept hosing the tags even after I changed them to escaped characters, like so: <font color="#000080">
The reason I wanted to store this kind of string as a key in web.config is that I could successfully store a static string, set a lebel's value to it, and successfully render the text. But when I try to escape it, it gets reformatted in web.config by .Net somehow.
So I escaped all the characters, encoded them as Base64 and stored that. Then on page_load, I tried to decode it, but it just shows up as text, with all the html tags showing as well - it doesn't get rendered. I know a million people use this control, but I'm damned if I can figure out how to do it right.
So here's my question: how can I inject the saved HTML into an edited page so it shows up in browsers like the editor wants it to look?
Try Server.HtmlDecode to output the HTML to the screen.
As a side note, I prefer to use CKEditor for html-formatted input. I found it is the better option among all options (FreeTextBox, TinyMCE, anything else?) and it has got completely rewritten and faster in the version 3.0!
In case anyone comes here for the answer, here's one way to do it.
I had initial problems with web.config changing some of the HTML tags upon storage, so we use B64 encoding (may not be necessary). Store the saved html markup to an AppSettings key in web.config as Base64 encoding, using this for your setting update function. Add error checking and whatever else you need it to do:
'create configuration object
Dim cfg As Configuration
cfg = WebConfigurationManager.OpenWebConfiguration("~")
'get reference to appsettings("HTMLstring")
Dim HTMLString As KeyValueConfigurationElement = _
CType(cfg.AppSettings.Settings("HTMLstring"), KeyValueConfigurationElement)
'get text entered by user and marked up with HTML tags from FTB1, then
'encode as Base64 so we can store it as XML-safe string in web.config
Dim b64String As String = Convert.ToBase64String(System.Text.Encoding.UTF8.GetBytes(FTB1.Text))
'save new value into web.config
If Not HTMLString Is Nothing Then
HTMLString.Value = b64String
cfg.Save()
End If
Next, add a Literal control to the aspx markup:
<asp:Literal id="charHTML" runat="server"/>
To add the saved HTML to the post-edited page, do the following in Page_Load:
'this string of HTML code is stored in web.config as Base64 to preserve XML-unsafe characters that come from FreeTextBox.
Dim injectedHTML As String = System.Text.Encoding.UTF8.GetString(Convert.FromBase64String(AppSettings("HTMLstring")))
'the literal control will directly inject this HTML instead of encoding it
charHTML.Mode = LiteralMode.PassThrough
'set the value
charHTML.Text = injectedHTML
Hope this helps. sF

Resources