I'm using a Kendo Grid to read JSON datas.
"contentType: "application/json; charset=utf-8"," is indicated but the specials characters as 'é', 'à', etc ... aren't encoded and appear like "%Ã" or something else. How to dispay them as "é", "à", etc... ?
Seems that you are sending the correct encoding for the data but ... what about the HTML page including the grid?
Try adding <meta charset="UTF-8"/> to the head section of you HTML.
Related
I'm looking to know how I can strip any hyperlink < a > tags from within some text - the whole lot including the text/image whatever is being linked before the end < / a > tag.
E.g.
Click here
<img src="http://stackoverflow.com" alt = "blah">
ie. remove the whole lot.
Any ideas how to do this?
Thanks
Obligatory "don't use regex to parse html" warning: RegEx match open tags except XHTML self-contained tags
I would recommend either converting to XHTML and using xPath or taking a look at the HTMLAgilityPack to do this. I have used both methods for parsing/modifying html in the past and they are far more flexible/robust than using regex.
Here is an example that should get you started with HtmlAgilityPack:
HtmlDocument doc = new HtmlDocument();
doc.Load("file.htm");
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[#href]")
{
// Do stuff!
}
doc.Save("file.htm");
From what I understand, this should work
string linksRemoved = Regex.Replace(withLinks, #"</?(a|A).*>", "");
You can try a regular expression to replace your tags. My regex isn't the best but this should get you close.
System.Text.RegularExpressions.Regex.Replace(
input,
#"<a[^>]*?>.*?</a>",
string.Empty);
How can I find an image from a content? I have a method in aspx I am calling this method for remove all html tags like this: Usage.DeleteHtml(Eval("content").ToString())
but I don't want delete img tag from content.. I should find the first image I will show it on my page.. like this:<img src="Usage.FindImage("content")" />
but couldn't write a method for finding image..
my DeleteHtml method:
public static string DeleteHtml(string text)
{
string mystr= Regex.Replace(text, #"<(.|\n)*?>", string.Empty);
return mystr;
}
I assume that your task is essentially retrieving the first image in document.
If your HTML document is a well-formed XML-document as well, you could easily solve your task using XPath.
More on XPath in .NET here.
XPath query to retrieve the first image's URL will look like this:
//img[1]/#src
Otherwise, if you really need to strip HTML, it's a duplicate to a couple of questions already:
Using C# regular expressions to remove HTML tags
How can I strip HTML tags from a string in ASP.NET?
How to clean HTML tags using C#
Short answer: use Html Agility Pack.
I have used asp.net ajax html editor and i saved data in database. But now i want to retrieve it and show it in grid view. But when i retrieve that, it also shows those html tags (generated by asp.net ajax editor). So, i want to trim those tags and show plain text in grid view. How do i do that?
Thanks
Go to you db and look, how it is saved. Maybe it is save encoded. If it is not the case, you can use some simple regex to remove all those tags.
<[^<]+?>
This shows you just plain text and removes all Tags
To stripe the html tags from text you can utilize the
RegEx.Replace("str","Pattern","replacementstring "); method which there exist in
System.Text.RegularExpressions namespace
for example
Plain_Body = Regex.Replace(txtBody.Text, #"<[^>]*>", string.Empty);
here i am replacing the html specific characters with String.Empty or "" you can add additional characters if you wish to pattern like #"<[^>]*>" and spaces( ) and Ampersand(&) etc
I have an ASP.NET web forms site with a rather large menu. The HTML for the menu is dynamically generated via a method in the C# as a string. I.e., what is being returned is something like this:
<ul><li><a href='default.aspx?param=1&anotherparam=2'>LINK</a></li></ul>
Except it is a lot bigger, and the lists are nested up to 4 deep.
This is written to the page via a code block.
However, instead of returning a flat string from the method I would like to return it as formatted HTML, so when rendered it looks like this:
<ul>
<li>
<a href='default.aspx?param=1&anotherparam=2'>LINK</a>
</li>
</ul>
I thought about loading the html into an XmlDocument but it doesn't like the & character found in the query strings (in the href attribute values).
The main reason for doing this is so I can more easily debug the generated HTML during development.
Anyone have any ideas?
Maybe you can work with an HtmlTextWriter? It has Indenting capabilities and it may actually be a cleaner thing as you could write straight into the output stream, which should be more "in the flow" than generating a string in memory etc.
Is there a reason you want to do this? This implicitly minified HTML will perform slightly better anyway. If you do still need to render the HTML for pretty display, you will either need to incorporate indentation into the logic that generates the output HTML or build your content using ASP.NET controls and then call Render().
Try loading the HTML into the HTML Agilty Pack. It is an HTML parser that can deal with HTML fragments (and will be fine with & in URLs).
I am not sure if it can output pretty printed (what you call "formatted") HTML, but that would be my first approach.
I like to use format strings for this sort of thing, your HTML output would be generated with;
String.Format("<ul>{0}\t<li>{0}\t\t<a href='{2}'>{3}</a>{0}\t</li>{0}</ul>",
System.Environment.NewLine,
myHrefVariable,
myLinkText);
In my markup I am using HTML title attributes which I set by the Tooltip property of various ASP.NET controls like an asp:Label. The content of those titles come from a database and I use data binding syntax, for instance:
<asp:Label ID="PersonLabel" runat="server"
Text='<%# HttpUtility.HtmlEncode(Eval("PersonShortName")) %>'
ToolTip='<%# HttpUtility.HtmlEncode(Eval("PersonFullName")) %>' />
Now, tooltips seem to be displayed as plain text on Windows and in the browsers I have tested. So the HTML-encoding is not what I really want and I am inclined to remove the encoding.
Can this be dangerous in any way if the database fields may contain script tags for example? My question is basically: Is it always guaranteed that HTML-title attributes are displayed as plain text? Are they always displayed as tooltips at all, or is it possible that some browsers (or OSs) display them in another way and allow and render HTML content in the title attributes?
Edit:
Looking at some of the answers it seems I didn't phrase my question well, so here are some additions:
If I have in the code snippet above a PersonShortName of "PM" in my database and as the PersonFullName a name with non-ASCII characters in it like Umlauts in "Peter Müller" the browser displays in the tooltip Peter Müller when I apply HttpUtility.HtmlEncode like in the code example - which is ugly.
I've also tested a simple HTML fragment like:
<span title="<script>alert('Evil script')</script>" >Hello</span>
The script in the title attribute didn't run in a browser with enabled Javascript (tested with Firefox), instead it was displayed in the tooltip as plain text. Therefore my guess was that title attributes are always rendered as plain text.
But as Felipe Alsacreations answered below there exist "rich tooltip plugins" which may render the title attribute as HTML. So in this case encoding is a good thing. But how can I know that?
Perhaps HttpUtility.HtmlEncode isn't the right solution and I have to filter only HTML tags but not encode simple special characters to make sure that the plain text is displayed correctly and to protect "rich HTML tooltips" at the same time. But it looks like a costly work - only for a simple tooltip.
Always sanitize output to the browser.
If a value like "><script>blabla</script> is inserted as a value for your fields, a user can essentially take over your entire site. It will probably make a mess when it comes to validation and correct code, but the script will still be run.
So to answer your question: No, it is not guaranteed that HTML-title attributes are displayed as plain text if the user knows what he/she is doing.
Beside security reasons:
Title attributes should always be plain text but certain JS plugins misuse them to display 'rich' tooltips (i.e. HTML code with bold text, emphasis, links and so on).
As for browsers and AFAIK they are displayed as plain text and tooltips, never displayed to those who use tabbed navigation (keyboard) and scren readers give to their users (blind and partially sighted people) many options, like reading the longest between link title and its text or always title or never ...
Surprisingly, still, no right answer in 5 years. The answer is: yes, you need to encode the title attribute, but not everything that is encoded in the innerText of the element.
The proper way to do it in asp.net if you do your own markup is:
string markup = string.Format("<div class='myClass' title='{0}'>{1}</div>",
System.Web.HttpUtility.HtmlAttributeEncode(myText),
System.Web.HttpUtility.HtmlEncode(myText));
The above will set both innerText and title of the div to myText, which is customary for elements that may contain long text but are constrained in width (as I believe the question implies).
The ToolTip property of a ASP.NET control will auto encode the value on output/rendering.
This means it is safe to set the tooltip to plain text as the page will sanitize the text on rendering.
Label1.ToolTip = "Some encoded text < Tag >"
Renders HTML output as:
<span title="Some encoded text < Tag >"></span>
If you need to use text that is already encoded, you can set the title attribute instead. The title attribute will not be automatically encoded on rendering:
Label1.Attributes("title") = "Some encoded text < Tag >"
Renders HTML output as:
<span title="Some encoded text < Tag >"></span>
Another point:
Who cares how the title attribute is rendered by a browser, when it is the presence of malicious strings in the source code that could present an issue?
It doesn't matter how it is displayed, the question is: how does it appear in the source code?
(As already stated, if you're pumping strings to the client, do something to sanitize those strings.)
I think there may be some confusion going on with this thread.
Firstly <asp:Label> is an ASP.NET Web Control. The Text and ToolTip attributes are "abstractions" of the inline content and 'title' attributes of an HTML tag respectively.
For these particular two properties Microsoft will perform the HTML Encoding for you automatically so if you set ToolTip="H&S<" then the <span> tag will be rendered as <span title="H&S<"...>. The same goes for the Text property.
NOTE: Not all properties perform automatic encoding (HTML or InnerContent properties for example)
If however you are generating HTML tags directly (Response.Write("<span...") for example) then you MUST http encode the text content and tooltip attributes content if:
Those values originate from a user / external unsanitised source or
If there is a possibility that the content may contain characters that should be escaped (& < > etc.)
Usually this means that it is safe to to:
Hardcoded content with no http characters:
Response.Write("<span title='Book Reference'>The art of zen</span>"); // SAFE
Hardcoded content with http characters that you manualle encode:
Response.Write("<span title='Book & Reference'>The art & zen</span>"); // SAFE
Dynamically sourced content:
Response.Write("<span title='"+sTitle+"'>"+sText+"</span>"); // UNSAFE
Response.Write("<span title='"+HttpUtility.HtmlEncode(sTitle)+"'>" +HttpUtility.HtmlEncode(sText)+"</span>"); // SAFE