strip any hyperlinks and text within from piece of text - asp.net

I'm looking to know how I can strip any hyperlink < a > tags from within some text - the whole lot including the text/image whatever is being linked before the end < / a > tag.
E.g.
Click here
<img src="http://stackoverflow.com" alt = "blah">
ie. remove the whole lot.
Any ideas how to do this?
Thanks

Obligatory "don't use regex to parse html" warning: RegEx match open tags except XHTML self-contained tags
I would recommend either converting to XHTML and using xPath or taking a look at the HTMLAgilityPack to do this. I have used both methods for parsing/modifying html in the past and they are far more flexible/robust than using regex.
Here is an example that should get you started with HtmlAgilityPack:
HtmlDocument doc = new HtmlDocument();
doc.Load("file.htm");
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[#href]")
{
// Do stuff!
}
doc.Save("file.htm");

From what I understand, this should work
string linksRemoved = Regex.Replace(withLinks, #"</?(a|A).*>", "");

You can try a regular expression to replace your tags. My regex isn't the best but this should get you close.
System.Text.RegularExpressions.Regex.Replace(
input,
#"<a[^>]*?>.*?</a>",
string.Empty);

Related

Label html string being read by screen reader

I've got some dynamically generated html building a drop down menu using the Dojo library. I need to make my code Accessibility compliant and right now the screenreader looks at the menu item and reads it as plain html:
menu.addChild(new MenuItem({
label: "<a onclick=window.location.href='sampleurl.com'
href="sampleurl.com">Sample Link</a> ...
Excuse the onclick, it's for a different issue, but what I'm getting is basically:
Tab down to first menu item
Screenreader: "Less than a onclick equals window dot location dot href equals sampleurl"... etc
I've tried using aria-hidden, but the screen reader just reads that as text, I'm using voice over on Mac OS, but I need it compliant for JAWS as well. Any tips or advice? Thanks!
label is used for the label (which can be in HTML), not for putting the full link html tag.
See on the following page how to use the Dojo library to generate menu items:
https://dojotoolkit.org/reference-guide/1.10/dijit/Menu.html
Example:
menu.addChild(new MenuItem({
label: "Sample Link",
onclick: function() {window.location.href='sampleurl.com';}}));
This would be easier to debug with a working example along with something stating what screen reader / browser combo you are using. At the bare minimum, show us the HTML output of your script, considering it is writing HTML for the screen reader to parse.
That being said, I suspect the missing / inconsistent quotes. Note that you start a string with double quotes, then go into the onclick attribute with no quotes around, then single quotes around its value, and then use double quotes around the href.
Alternatively, you are writing the entire string into the page and somehow HTML encoding it.
I suggest using a linting tool to check your JS.

Simple HTML Dom get href that begins with

I am using Simple HTML Dom to extract information from a remote source. I would like to get all href links that contain a particular piece of text (not all on a page). I have tried
->find('a[href*="/place"]')
and
->find('a[href="/place"*]')
and
->find('a[href="/place*"]')
but this returns empty results.
The href I am trying to get must begin with the text "/place".
Any suggestions?
Thanks
Match elements that have the specified attribute and it starts with a certain value, use [attribute^=value].
->find('a[href^="/place"]')
Ref: http://simplehtmldom.sourceforge.net/manual.htm#frag_find_attr
I do not now this app, however did you try using the asterisk like so ?
>find('a[href="/place*"]')

Finding first Image from a content

How can I find an image from a content? I have a method in aspx I am calling this method for remove all html tags like this: Usage.DeleteHtml(Eval("content").ToString())
but I don't want delete img tag from content.. I should find the first image I will show it on my page.. like this:<img src="Usage.FindImage("content")" />
but couldn't write a method for finding image..
my DeleteHtml method:
public static string DeleteHtml(string text)
{
string mystr= Regex.Replace(text, #"<(.|\n)*?>", string.Empty);
return mystr;
}
I assume that your task is essentially retrieving the first image in document.
If your HTML document is a well-formed XML-document as well, you could easily solve your task using XPath.
More on XPath in .NET here.
XPath query to retrieve the first image's URL will look like this:
//img[1]/#src
Otherwise, if you really need to strip HTML, it's a duplicate to a couple of questions already:
Using C# regular expressions to remove HTML tags
How can I strip HTML tags from a string in ASP.NET?
How to clean HTML tags using C#
Short answer: use Html Agility Pack.

How to trim html tags from text in asp.net grid view?

I have used asp.net ajax html editor and i saved data in database. But now i want to retrieve it and show it in grid view. But when i retrieve that, it also shows those html tags (generated by asp.net ajax editor). So, i want to trim those tags and show plain text in grid view. How do i do that?
Thanks
Go to you db and look, how it is saved. Maybe it is save encoded. If it is not the case, you can use some simple regex to remove all those tags.
<[^<]+?>
This shows you just plain text and removes all Tags
To stripe the html tags from text you can utilize the
RegEx.Replace("str","Pattern","replacementstring "); method which there exist in
System.Text.RegularExpressions namespace
for example
Plain_Body = Regex.Replace(txtBody.Text, #"<[^>]*>", string.Empty);
here i am replacing the html specific characters with String.Empty or "" you can add additional characters if you wish to pattern like #"<[^>]*>" and spaces(&nbsp) and Ampersand(&amp) etc

Remove style tags, CSS, scripts and HTML tags from HTML to plain text

Using regular expressions, how do I remove style tags, CSS, scripts and HTML tags from HTML to plain text.
In ASP.NET C#.
I don't think you are looking for a regex to do this, however the following regex should do it,
if you run a regex replace:
<[^>]*>
To use this in a Regex Replace to the following:
string myHtmlString = "<html><body>my test text</body></html>";
string myPlainTextString = Regex.Replace(myHtmlString ,"<[^>]*>",String.Empty);
I recommend you use something like the Html Agility pack though - http://htmlagilitypack.codeplex.com/
as it has a method to make this even easier called "ConvertToPlainText":
string myHtmlString = "<html><body>my test text</body></html>";
string myPlainTextString = ConvertToPlainText(myHtmlString);

Resources