When does asp.net HTML encode elements? - asp.net

In my database, I have a text field that contains escaped data:
"O'Neal"
I am trying to output it to my page like this:
LastName.InnerText = DB.LastName;
However, this results in this HTML on my page:
<h2 id="LastName">O&#39;Neal</h2>
What makes asp.net encode my HTML like this, and can I trust it to do this all the time?

Use InnerHtml instead of InnerText which will not encode data.

The official documentation of InnerText clearly says in the remarks section:
Unlike the InnerHtml property, the InnerText property automatically encodes special characters to and from HTML entities. HTML entities allow you to display special characters, such as the < character, that a browser would ordinarily interpret as having special meaning. The < character would be interpreted as the start of a tag and is not displayed on the page. To display the < character, you would need to use the entity <.
For example, if the InnerText property is set to "<b> Hello </b>", the < and > symbols are converted to < and >, respectively. The rendered output would be: <b> Hello </b>. The < and > entities would indicate to the browser that these characters are to be displayed on the page. The browser will not detect the <b> tags and display the text in a bold font. The text displayed on the page is: <b>Hello</b>.
To prevent automatic HTML encoding and decoding, use the InnerHtml property.

Related

How does Html.Raw MVC helper work?

I use the Html.Raw to print a raw html content, for example when I send some thing like ViewBag.div = "<div> Hello </div>"; from the controller to the view side it does not print a raw html content unless I use the Html.Raw method but if I have an encoded content, like content encoded using jquery and inserted into the database and I want to print it as a raw html content the Html.Raw method does not work and I have to use HttpUtility.HtmlDecode(EncodedContent) before I use Html.Raw so please could anyone explain why it acts in this way and what is the proper situation to use Html.Raw method? or in another way, why Html.Raw does not work when it receives html entities as a parameter instead of html tags?.
Because encoded characters are HTML, and the Raw version of that string is the encoded one.
Html.Raw renders what it is given without doing any html encoding, so with ViewBag.div = "<div> Hello </div>";:
#Html.Raw(ViewBag.div);
Renders
<div> Hello </div>
However, when you have encoded characters in there, such as ViewBag.Something = ">"; the raw version of that is >. To get back to actual html you need to Html.Raw(HttpUtility.HtmlDecode(EncodedContent)); as you've said.
If Html.Raw did do the decoding then it would be confusing, and we would need something that didn't do it. ;-)
Html.Raw Method asks the Razor Engine to not encode the special chars.
The Razor Engine encodes the special chars because it considers that you want to show them in the state you sent to it. As a result, it encodes the special chars, and the browser decodes them again to show you the characters as you sent them to the Razor Engine.
But if you use the Html.Raw you are telling the Razor Engine to not encode the special chars of your content and render the content you get as-is from, for example, your database. So, if you want to show the decoded content, you have to decode it using HttpUtility.HtmlDecode and then directly render the html tags by using Html.Raw.
Example
If you have this content in your database
<h1>dklxf;<span style="font-style: italic;">kldk;dlk<span style="font-weight: bold;">dxl'f;dlxd'fdlf;ldk;dlkf</span></span></h1>
...if render it without using HTML.Raw the Razor Engine will encode the special chars in that content to be printed in the browser as-is, but if you use HTML.Raw it renders it to the HTML content of the page source directly. (Run the snippet)
<h1>dklxf;<span style="font-style: italic;">kldk;dlk<span style="font-weight: bold;">dxl'f;dlxd'fdlf;ldk;dlkf</span></span></h1>
But if you use Html.Raw(HttpUtility.HtmlDecode(EncodedContent)) then your page will render the content as HTML tags. (Run the snippet)
<h1>dklxf;<span style="font-style: italic;">kldk;dlk<span style="font-weight: bold;">dxl'f;dlxd'fdlf;ldk;dlkf</span></span></h1>

<br/> getting rendered as text and not page break on page

I have some text coming from xml file text reads as &li;br/ > which after Html.Decode becomes <br/> but since i am not using any server controls this gets displayed as <br/> text rather than a page break. Any clues.
Based upon your comment you are assigning the text <br/> to the InnerText property of a variable instance of the class HtmlGenericControl.
Your problem is the text is escaped (try viewing the source of the rendered page) so that it renders as text. Use the InnerHtml property instead to write out pre-formatted HTML in a HtmlGenericControl instance.
There are much better ways (and nicer ways) but
text.Replace("<br/>",vbCrLf) 'VB
or
text.Replace(#"<br/>","\n") //C#
Set InnerHtml of HtmlGenericControl instead of innerText
http://msdn.microsoft.com/en-us/library/7512d0d0%28v=vs.71%29.aspx

How to trim html tags from text in asp.net grid view?

I have used asp.net ajax html editor and i saved data in database. But now i want to retrieve it and show it in grid view. But when i retrieve that, it also shows those html tags (generated by asp.net ajax editor). So, i want to trim those tags and show plain text in grid view. How do i do that?
Thanks
Go to you db and look, how it is saved. Maybe it is save encoded. If it is not the case, you can use some simple regex to remove all those tags.
<[^<]+?>
This shows you just plain text and removes all Tags
To stripe the html tags from text you can utilize the
RegEx.Replace("str","Pattern","replacementstring "); method which there exist in
System.Text.RegularExpressions namespace
for example
Plain_Body = Regex.Replace(txtBody.Text, #"<[^>]*>", string.Empty);
here i am replacing the html specific characters with String.Empty or "" you can add additional characters if you wish to pattern like #"<[^>]*>" and spaces(&nbsp) and Ampersand(&amp) etc

Do I need to html-encode title attributes (tooltips)?

In my markup I am using HTML title attributes which I set by the Tooltip property of various ASP.NET controls like an asp:Label. The content of those titles come from a database and I use data binding syntax, for instance:
<asp:Label ID="PersonLabel" runat="server"
Text='<%# HttpUtility.HtmlEncode(Eval("PersonShortName")) %>'
ToolTip='<%# HttpUtility.HtmlEncode(Eval("PersonFullName")) %>' />
Now, tooltips seem to be displayed as plain text on Windows and in the browsers I have tested. So the HTML-encoding is not what I really want and I am inclined to remove the encoding.
Can this be dangerous in any way if the database fields may contain script tags for example? My question is basically: Is it always guaranteed that HTML-title attributes are displayed as plain text? Are they always displayed as tooltips at all, or is it possible that some browsers (or OSs) display them in another way and allow and render HTML content in the title attributes?
Edit:
Looking at some of the answers it seems I didn't phrase my question well, so here are some additions:
If I have in the code snippet above a PersonShortName of "PM" in my database and as the PersonFullName a name with non-ASCII characters in it like Umlauts in "Peter Müller" the browser displays in the tooltip Peter Müller when I apply HttpUtility.HtmlEncode like in the code example - which is ugly.
I've also tested a simple HTML fragment like:
<span title="<script>alert('Evil script')</script>" >Hello</span>
The script in the title attribute didn't run in a browser with enabled Javascript (tested with Firefox), instead it was displayed in the tooltip as plain text. Therefore my guess was that title attributes are always rendered as plain text.
But as Felipe Alsacreations answered below there exist "rich tooltip plugins" which may render the title attribute as HTML. So in this case encoding is a good thing. But how can I know that?
Perhaps HttpUtility.HtmlEncode isn't the right solution and I have to filter only HTML tags but not encode simple special characters to make sure that the plain text is displayed correctly and to protect "rich HTML tooltips" at the same time. But it looks like a costly work - only for a simple tooltip.
Always sanitize output to the browser.
If a value like "><script>blabla</script> is inserted as a value for your fields, a user can essentially take over your entire site. It will probably make a mess when it comes to validation and correct code, but the script will still be run.
So to answer your question: No, it is not guaranteed that HTML-title attributes are displayed as plain text if the user knows what he/she is doing.
Beside security reasons:
Title attributes should always be plain text but certain JS plugins misuse them to display 'rich' tooltips (i.e. HTML code with bold text, emphasis, links and so on).
As for browsers and AFAIK they are displayed as plain text and tooltips, never displayed to those who use tabbed navigation (keyboard) and scren readers give to their users (blind and partially sighted people) many options, like reading the longest between link title and its text or always title or never ...
Surprisingly, still, no right answer in 5 years. The answer is: yes, you need to encode the title attribute, but not everything that is encoded in the innerText of the element.
The proper way to do it in asp.net if you do your own markup is:
string markup = string.Format("<div class='myClass' title='{0}'>{1}</div>",
System.Web.HttpUtility.HtmlAttributeEncode(myText),
System.Web.HttpUtility.HtmlEncode(myText));
The above will set both innerText and title of the div to myText, which is customary for elements that may contain long text but are constrained in width (as I believe the question implies).
The ToolTip property of a ASP.NET control will auto encode the value on output/rendering.
This means it is safe to set the tooltip to plain text as the page will sanitize the text on rendering.
Label1.ToolTip = "Some encoded text < Tag >"
Renders HTML output as:
<span title="Some encoded text < Tag >"></span>
If you need to use text that is already encoded, you can set the title attribute instead. The title attribute will not be automatically encoded on rendering:
Label1.Attributes("title") = "Some encoded text < Tag >"
Renders HTML output as:
<span title="Some encoded text < Tag >"></span>
Another point:
Who cares how the title attribute is rendered by a browser, when it is the presence of malicious strings in the source code that could present an issue?
It doesn't matter how it is displayed, the question is: how does it appear in the source code?
(As already stated, if you're pumping strings to the client, do something to sanitize those strings.)
I think there may be some confusion going on with this thread.
Firstly <asp:Label> is an ASP.NET Web Control. The Text and ToolTip attributes are "abstractions" of the inline content and 'title' attributes of an HTML tag respectively.
For these particular two properties Microsoft will perform the HTML Encoding for you automatically so if you set ToolTip="H&S<" then the <span> tag will be rendered as <span title="H&S<"...>. The same goes for the Text property.
NOTE: Not all properties perform automatic encoding (HTML or InnerContent properties for example)
If however you are generating HTML tags directly (Response.Write("<span...") for example) then you MUST http encode the text content and tooltip attributes content if:
Those values originate from a user / external unsanitised source or
If there is a possibility that the content may contain characters that should be escaped (& < > etc.)
Usually this means that it is safe to to:
Hardcoded content with no http characters:
Response.Write("<span title='Book Reference'>The art of zen</span>"); // SAFE
Hardcoded content with http characters that you manualle encode:
Response.Write("<span title='Book & Reference'>The art & zen</span>"); // SAFE
Dynamically sourced content:
Response.Write("<span title='"+sTitle+"'>"+sText+"</span>"); // UNSAFE
Response.Write("<span title='"+HttpUtility.HtmlEncode(sTitle)+"'>" +HttpUtility.HtmlEncode(sText)+"</span>"); // SAFE

Encoded character is used instead the correct one

I have a little problem and I'm hopping that you can help me solve this annoying issue.
I need to use an iFrame in an administration panel to let users use the selection service, and in the HTML I have:
<iframe scrolling="yes" runat="server" title="Par Selection" id="iFrame"
frameborder="0" enableviewstate="true" width="100%" height="490" />
in my code-behind file I have:
iFrame.Attributes.Add("src", String.Format(
"https://www.parurval.se/urval/?username={0}&password={1}",
parSettings.GetSettings(parSettings.SettingsType.PARSelection, parSettings.SectionType.Username),
parSettings.GetSettings(parSettings.SettingsType.PARSelection, parSettings.SectionType.Password)));
The output is this:
<iframe id="tcMain_tabPARSelection_iFrame" scrolling="yes" title="Par Selection"
frameborder="0" width="100%" height="490"
src="https://www.parurval.se/urval/?username=myUsername&password=myPassword">
</iframe>
Please note the & instead & sign in the src address when passing username and password
How can I prevent this?
I tried with HttpUtility.Decode( myCompleteUrl ) but with the same achievement :(
The worst thing is, if the src code has only the address
... src="https://www.parurval.se/urval/" ...
I'm not able to input the user/pwd, I see the form and I can enter text, but it does nothing, it only refreshes the iframe inner page, doing this in a full window, works fine.
And in that administration panel I have a textbox to the user add the username and password in order that entering the Administration page, I will jump directly to the service in the iFrame so the user does not need to enter user/pwd to login every time, that is way I'm trying to add those values dynamically.
Any ideas?
Added:
If I put the correct URL address (with user and pwd) in the iFrame src attribute in the HTML side (not dynamically) all works fine :(
The presense of the & is actually correct there. Most browsers are forgiving enough not to choke on just seeing & there, but it's technically not correct.
“&” is a special character in HTML (more specifically in SGML), so encoding it is the correct thing to do. Yes, even in link URLs.
The HTML 4.01 specification states:
Authors should use "&" (ASCII decimal 38) instead of "&" to avoid confusion with the beginning of a character reference (entity reference open delimiter). Authors should also use "&" in attribute values since character references are allowed within CDATA attribute values.
So encoding the & as & is correct behavior since the interpretation of the src attribute value (CDATA data type) is described as:
CDATA is a sequence of characters from the document character set and may include character entities. User agents should interpret attribute values as follows:
Replace character entities with characters,
Ignore line feeds,
Replace each carriage return or tab with a single space.
Otherwise src attribute values like /foo?bar&sect=123 would be ambiguous as they can be interpreted either literally as /foo?bar&sect=123 or (replacing the sect entity) as /foo?bar§=123.
This seems like a case where you can take advantage of URL encoding to hide the &, bypassing XML encoding. & is U+0025, so you can encode it as %25: https://www.parurval.se/urval/?username={0}%25password={1}
You should use
HttpUtility.HtmlEncode(String.Format("https://www.parurval.se/urval/?username={0}&password={1}",
parSettings.GetSettings(parSettings.SettingsType.PARSelection, parSettings.SectionType.Username),
parSettings.GetSettings(parSettings.SettingsType.PARSelection, parSettings.SectionType.Password)));

Resources