Input Validation When Using a Rich Text Editor - asp.net

I have an ASP.NET MVC application and I'm using CKEditor for text entry. I have turned off input validation so the HTML created from CKEditor can be passed into the controller action. I am then showing the entered HTML on a web page.
I only have certain buttons on CKEditor enabled, but obviously someone could send whatever text they want down. I want to be able to show the HTML on the page after the user has entered it. How can I validate the input, but still be able to show the few things that are enabled in the editor?
So basically I want to sanitize everything except for a few key things like bold, italics, lists and links. This needs to be done server side.

How about AntiXSS?

See my full answer here from similar question:
I have found that replacing the angel
brackets with encoded angel brackets
solves most problems

You could create a "whitelist" of sorts for the html tags you'd like to allow. You could start by HTML encoding the whole thing. Then, replace a series of "allowed" sequences, such as:
"<strong>" and "</strong>" back to "<strong>" and "</strong>"
"<em>" and "</em>" back to "<em>" and "</em>"
"<li>" and "</li>" back to ... etc. etc.
For things like the A tag, you could resort to a regular expression (since you'd want the href attribute to be allowed too). You would still want to be careful about XSS; someone else already recommended AntiXSS.
Sample Regexp to replace the A tags:
<a href="([^"]+)">
Then replace as
<a href="$1">
Good luck!

Related

POST rendered text by Razor in ASP.NET MVC to server

I have a situation where I make a with some Razor. This is pretty standard, so imagine something like:
<div>
<strong>Undertegnede myndige skyldner</strong>:<br /><br />
#Model.ContractText.DebtorName, #Model.ContractText.DebtorFullAddress
#foreach (var reminder in Model.DemandStructure.ReminderFees_Lines)
{
#reminder.Label #: #reminder.Amount.ToCurrency()<br />
}
</div>
This becomes a nice piece of text.
What I want to do, is to POST this generated text, and store it on the server.
Possible solutions
Now, I could of course just generate this string on the server, but then I would loose the nice formatting of Razor.
I could use some templating language - but I am not familiar with anything that is easy and solves my problem to use on the server side?
Maybe it makes sense to wrap this in some kind of input field so it's POST-ed to server?
Does anyone have a simple and smart solution for how to POST a generated text-string to the server?
Do you mean post the text to the server in relation to a user action on the page?
If not, certainly do it server-side instead, and even in that case while yes, you potentially could wrap it in a form element (or probably wrap a copy of the text in for example a hidden element) and that will be submitted with any form submits - you probably shouldn't do this for a number of reasons;
The user will be able to edit the text before it is send to the server, and it looks like they shouldn't be able to, as then they could change the Amount value before it is stored
It will be hard to encode the newlines (they are br elements in your HTML but these don't post correctly to a newline in a string so you would have to convert them, or use a textarea instead)
It generates unnecessary network traffic.
What you should probably do is simply store it on the server-side code, using a format string to create the same resultant text;
var theText = $"Undertegnede myndige skyldner:{Envionment.NewLine}{Envionment.NewLine}{Model.ContractText.DebtorName}, {Model.ContractText.DebtorFullAddress}
{String.Join(Environment.NewLine, Model.DemandStructure.ReminderFees_Lines})";

Label html string being read by screen reader

I've got some dynamically generated html building a drop down menu using the Dojo library. I need to make my code Accessibility compliant and right now the screenreader looks at the menu item and reads it as plain html:
menu.addChild(new MenuItem({
label: "<a onclick=window.location.href='sampleurl.com'
href="sampleurl.com">Sample Link</a> ...
Excuse the onclick, it's for a different issue, but what I'm getting is basically:
Tab down to first menu item
Screenreader: "Less than a onclick equals window dot location dot href equals sampleurl"... etc
I've tried using aria-hidden, but the screen reader just reads that as text, I'm using voice over on Mac OS, but I need it compliant for JAWS as well. Any tips or advice? Thanks!
label is used for the label (which can be in HTML), not for putting the full link html tag.
See on the following page how to use the Dojo library to generate menu items:
https://dojotoolkit.org/reference-guide/1.10/dijit/Menu.html
Example:
menu.addChild(new MenuItem({
label: "Sample Link",
onclick: function() {window.location.href='sampleurl.com';}}));
This would be easier to debug with a working example along with something stating what screen reader / browser combo you are using. At the bare minimum, show us the HTML output of your script, considering it is writing HTML for the screen reader to parse.
That being said, I suspect the missing / inconsistent quotes. Note that you start a string with double quotes, then go into the onclick attribute with no quotes around, then single quotes around its value, and then use double quotes around the href.
Alternatively, you are writing the entire string into the page and somehow HTML encoding it.
I suggest using a linting tool to check your JS.

Do I need to html-encode title attributes (tooltips)?

In my markup I am using HTML title attributes which I set by the Tooltip property of various ASP.NET controls like an asp:Label. The content of those titles come from a database and I use data binding syntax, for instance:
<asp:Label ID="PersonLabel" runat="server"
Text='<%# HttpUtility.HtmlEncode(Eval("PersonShortName")) %>'
ToolTip='<%# HttpUtility.HtmlEncode(Eval("PersonFullName")) %>' />
Now, tooltips seem to be displayed as plain text on Windows and in the browsers I have tested. So the HTML-encoding is not what I really want and I am inclined to remove the encoding.
Can this be dangerous in any way if the database fields may contain script tags for example? My question is basically: Is it always guaranteed that HTML-title attributes are displayed as plain text? Are they always displayed as tooltips at all, or is it possible that some browsers (or OSs) display them in another way and allow and render HTML content in the title attributes?
Edit:
Looking at some of the answers it seems I didn't phrase my question well, so here are some additions:
If I have in the code snippet above a PersonShortName of "PM" in my database and as the PersonFullName a name with non-ASCII characters in it like Umlauts in "Peter Müller" the browser displays in the tooltip Peter Müller when I apply HttpUtility.HtmlEncode like in the code example - which is ugly.
I've also tested a simple HTML fragment like:
<span title="<script>alert('Evil script')</script>" >Hello</span>
The script in the title attribute didn't run in a browser with enabled Javascript (tested with Firefox), instead it was displayed in the tooltip as plain text. Therefore my guess was that title attributes are always rendered as plain text.
But as Felipe Alsacreations answered below there exist "rich tooltip plugins" which may render the title attribute as HTML. So in this case encoding is a good thing. But how can I know that?
Perhaps HttpUtility.HtmlEncode isn't the right solution and I have to filter only HTML tags but not encode simple special characters to make sure that the plain text is displayed correctly and to protect "rich HTML tooltips" at the same time. But it looks like a costly work - only for a simple tooltip.
Always sanitize output to the browser.
If a value like "><script>blabla</script> is inserted as a value for your fields, a user can essentially take over your entire site. It will probably make a mess when it comes to validation and correct code, but the script will still be run.
So to answer your question: No, it is not guaranteed that HTML-title attributes are displayed as plain text if the user knows what he/she is doing.
Beside security reasons:
Title attributes should always be plain text but certain JS plugins misuse them to display 'rich' tooltips (i.e. HTML code with bold text, emphasis, links and so on).
As for browsers and AFAIK they are displayed as plain text and tooltips, never displayed to those who use tabbed navigation (keyboard) and scren readers give to their users (blind and partially sighted people) many options, like reading the longest between link title and its text or always title or never ...
Surprisingly, still, no right answer in 5 years. The answer is: yes, you need to encode the title attribute, but not everything that is encoded in the innerText of the element.
The proper way to do it in asp.net if you do your own markup is:
string markup = string.Format("<div class='myClass' title='{0}'>{1}</div>",
System.Web.HttpUtility.HtmlAttributeEncode(myText),
System.Web.HttpUtility.HtmlEncode(myText));
The above will set both innerText and title of the div to myText, which is customary for elements that may contain long text but are constrained in width (as I believe the question implies).
The ToolTip property of a ASP.NET control will auto encode the value on output/rendering.
This means it is safe to set the tooltip to plain text as the page will sanitize the text on rendering.
Label1.ToolTip = "Some encoded text < Tag >"
Renders HTML output as:
<span title="Some encoded text < Tag >"></span>
If you need to use text that is already encoded, you can set the title attribute instead. The title attribute will not be automatically encoded on rendering:
Label1.Attributes("title") = "Some encoded text < Tag >"
Renders HTML output as:
<span title="Some encoded text < Tag >"></span>
Another point:
Who cares how the title attribute is rendered by a browser, when it is the presence of malicious strings in the source code that could present an issue?
It doesn't matter how it is displayed, the question is: how does it appear in the source code?
(As already stated, if you're pumping strings to the client, do something to sanitize those strings.)
I think there may be some confusion going on with this thread.
Firstly <asp:Label> is an ASP.NET Web Control. The Text and ToolTip attributes are "abstractions" of the inline content and 'title' attributes of an HTML tag respectively.
For these particular two properties Microsoft will perform the HTML Encoding for you automatically so if you set ToolTip="H&S<" then the <span> tag will be rendered as <span title="H&S<"...>. The same goes for the Text property.
NOTE: Not all properties perform automatic encoding (HTML or InnerContent properties for example)
If however you are generating HTML tags directly (Response.Write("<span...") for example) then you MUST http encode the text content and tooltip attributes content if:
Those values originate from a user / external unsanitised source or
If there is a possibility that the content may contain characters that should be escaped (& < > etc.)
Usually this means that it is safe to to:
Hardcoded content with no http characters:
Response.Write("<span title='Book Reference'>The art of zen</span>"); // SAFE
Hardcoded content with http characters that you manualle encode:
Response.Write("<span title='Book & Reference'>The art & zen</span>"); // SAFE
Dynamically sourced content:
Response.Write("<span title='"+sTitle+"'>"+sText+"</span>"); // UNSAFE
Response.Write("<span title='"+HttpUtility.HtmlEncode(sTitle)+"'>" +HttpUtility.HtmlEncode(sText)+"</span>"); // SAFE

asp.net MVC action link need to have the registered on the actual word

got this actionlink:
<%= Html.ActionLink("Corian® Worktops", "Index", "Corian")%>
the word corian has to carry the registered symbol or the word can not be used, but it seems to process, i know i could just write this as a normal href but it kinda defeats the object if there is another solution.
has any tried and successfully caried something like this out?
thanks
It works normally
<%= Html.ActionLink("RegistededMark®", "Action")%>
Use the normal ® symbol but make sure the font in HTML displays it correctly.
I do not know why but having static text in the views gives me the chills. I would rather suggest that you use a resource provider to fill in your link text. That way you will not be bothered by the html encoding stuff.

How to deal with special characters in ASP.NET's HyperLink.NavigateUrl?

I am currently having troubles figuring out how to handle a filepath to be (dynamicly) passed out to a HyperLink control's NavigateUrl property.
Let's say that I'm trying to refer to a file named jäynä.txt at the root of C:.
Passing "file:///C:/jäynä.txt" result to a link to file:///C:/jäynä.txt, as does HttpUtility.UrlPathEncode("file:///C:/jäynä.txt").
Replacing the ä**s with **%E4, which gives the string "file:///C:/j%E4yn%E4.txt", does give a working link to file:///C:/jäynä.txt, but I have not been able to find a way to make the replacement without defining it myself. With Replace("ä", "%E4"), for example.
Is there a way to automaticly handle the filepath string so that the HyperLink would display it correctly, without manualy listing what characters to replace in the string?
Additional Note:
There may be a way to work around this by spesifying the character encoding in which the page is rendered, because debugging shows that the HyperLink at least saves the string "file:///C:/jäynä.txt" unchanged, but somehow mangles it around the time of rendering.
However, this seems only be the case in rendering of the NavigateUrl because other components as well as HyperLink's Text-property are all quite capable of rendering the character ä unchanged.
The NavigateUrl property of a Hyperlink will encode unicode chars in the url.
Instead you can set the href attribute property of the Hyperlink like this:
hyperlink1.Attribute("href") = "file:///C:/jäynä.txt"
This is due to how the browser starts to interpret the path, typically individuals will avoid using characters such as that in the urls of pages.
In your case, I believe you have struck upon the best case scenario, as I am not aware of any way to change the behavior of HttpUtility and/or the NavigateUrl property. At least not without creating a custom control for it.
Don't use HyperLink control. Instead use HtmlAnchor control. It will solve your problem. I don't know why Microsoft designed like this.
Thank you!
The post using the 'attributes' solved my problem. In my case it was
HyperLink6.Attributes["href"] = "http://høgstedt.danquah.dk/";
The problem of using special danish characters in a url seem to have been troubling a lot of programmers - a search provides several very complicated approaches. This one is SIMPLE and it SIMPLY WORKS.
So once again, thank you

Resources