how to get attribute value using selenium and css - css

I have the following HTML code:
2
I would like to get what is contained in href, ie, I was looking for a command which would give me "/search/?p=2&q=move&mt=1" value for href.
Could someone please help me with the respective command and css locator in selenium, for the above query?
if I have something like:
2
3
Out of these two if I was to get the attribute value for href whose text conatins '2', then how would my css locator synatx look like?

If your HTML consists solely of that one <a> tag, then this should do it:
String href = selenium.getAttribute("css=a#href");
You use the DefaultSelenium#getAttribute() method and pass in a CSS locator, an # symbol, and the name of the attribute you want to fetch. In this case, you select the a and get its #href.
In response to your comment/edit:
The part after # tells Selenium that that part is the name of the attribute.
You should place :contains('2') before #href because it's part of the locator, not the attribute. So, like this:
selenium.getAttribute("css=a:contains('2')#href");

Changing css=a#href to href should do the trick. Let me if this did not work.
List<WebElement> ele = driver.findElements(By.className("c"));
for(WebElement e : ele)
{
String doctorname = e.getText();
String linkValue = e.getAttribute("href");
}

Related

click() on css Selector not working in Selenium webdriver

HTML
<input class="button" type="button" onclick="$.reload('results')" value="Search">
I don't have an id or name for this . Hence am writing
FirefoxDriver driver = new FirefoxDriver();
driver.get("http://....");
driver.findElement(By.cssSelector("input[value=Search]")).click();
But click() is not happening.
Tried
driver.findElement(By.cssSelector(".button[value=Search]")).click();
Tried
value='Search' (single quotes).
these Selectors are working in
.button[value=Search] {
padding: 10px;
}
input[value=Search] {
padding: 10px;
}
i would inject piece of js to be confident in resolving this issue:
first of all locate element using DOM (verify in firebug):
public void jsClick(){
JavascriptExecutor js = (JavascriptExecutor) driver;
StringBuilder stringBuilder = new StringBuilder();
stringBuilder.append("document.getElementsByTagName('button')[0].click();");
js.executeScript(stringBuilder.toString());
}
jsClick();
from the retrospective of your element it be like:
....
stringBuilder.append("document.getElementsByTagName('input')[0].click();");
....
Please, note: document.getElementsByTagName('input') returns you an array of DOM elements. And indexing it properly e.g. document.getElementsByTagName('input')[0], document.getElementsByTagName('input')1, document.getElementsByTagName('input')[2]....
,etc you will be able to locate your element.
Hope this helps you.
Regards.
Please use the below code.
driver.findElement(By.cssSelector("input[value=\"Search\"]")).click();
It works for me. And make sure that the name is "Search", coz it is case sensitive.
Thanks
Are you sure that using this CSS-selector (input[value=Search]) on your page you have only one result?
single quotes are missing in your code, the [value=Search] should be replaced with [value='Search'].
first you have to check if the selector u are using will work or not..
If you are using chrome or FF,you can follow these steps,
go to the page where button (to be clicked) is present,
open web console and type in the following and click enter..
$("input[value='Search']")
or
$("input[value='Search'][type='button']")
or
$("input[value='Search'][type='button'].button")
you will get a list of elements which can be accessed using this selector, if that list contains only one element (button that you want to click), then this selector is valid for your use..otherwise u'l have to try some other selector..
If any of the above selector is valid,u'l have to change your code accordingly..
driver.findElement(By.cssSelector("input[value='Search'][type='button'].button")).click();

Removing all elements from HTML that have given class using Agility Pack

I'm trying to select all elements that have a given class and remove them from a HTML string.
This is what I have so far it doesn't seem to remove anything although the source shows clearly 4 elements with that class name.
// Filter page HTML to display required content
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
// filePath is a path to a file containing the html
htmlDoc.LoadHtml(pageHTML);
// ParseErrors is an ArrayList containing any errors from the Load statement);
if (!htmlDoc.ParseErrors.Any())
{
// Remove all elements marked with pdf-ignore class
HtmlNodeCollection nodes = htmlDoc.DocumentNode.SelectNodes("//body[#class='pdf-ignore']");
// Remove the collection from above
foreach (var node in nodes)
{
node.Remove();
}
}
EDIT: Just to clarify the document is parsing and the SelectNodes line is being hit, just not returning anything.
Here is a snippet of the html:
<input type=\"submit\" name=\"ctl00$MainContent$PrintBtn\" value=\"Print Shotlist\" onclick=\"window.print();\" id=\"MainContent_PrintBtn\" class=\"pdf-ignore\">
EDIT: in your updated answer you posted a part of the HTML string an <input> element declaration, but you're trying to match a <body> element with the class pdf-ignore (according to your expression //body[#class='pdf-ignore']).
If you want to match all the elements from the document with this class you should use:
var nodes = htmlDoc.DocumentNode.SelectNodes("//*[contains(#class,'pdf-ignore')]");
code to get your nodes. This will match all the elements with the class name specified.
Your code is seems to be correct except the one detail: the condition htmlDoc.ParseErrors == null. You select and remove nodes ONLY if the ParseErrors property (which is a type of IEnumerable<HtmlParseError>) is null, but actually if no errors found this property returns an empty list. So changing your code to:
if (!htmlDoc.ParseErrors.Any())
{
// some logic here
}
should solve the issue.
Your xpath is probably not matching: have you tried "//div[class='pdf-ignore']" (no "#")?

Can someone explain this seeming inconsistency in jQuery/Javascript?? (trailing brackets inconsistency on reads)

So, in my example below, "InputDate'" is an input type=text, "DateColumn" is a TD within a table with a class of "DateColumn".
Read the value of a discreet texbox:
var inputVal = $('#InputDate').val();
Read the value of a div within a table....
This works:
$('#theTable .DateColumn').each(function() {
var rowDate = Date.parse($(this)[0].innerHTML);
});
This doesn't:
$('#theTable .DateColumn').each(function() {
var rowDate = Date.parse($(this)[0].innerHTML());
});
The difference is the "()" after innerHTML. This behavior seems syntactically inconsistent between how you read a value from a textbox and how you read it from a div. I'm ok with sometimes, depending on the type of control, having to read .val vs .innerHTML vs.whateverElseDependingOnTheTypeOfControl...but this example leads me to believe I now must also memorize whether I need trailing brackets or not on each property/method.
So for a person like me who is relatively new to jQuery/Javascript....I seem to have figured out this particular anomaly, in this instance, but is there a convention I am missing out on, or does a person have to literally have to memorize whether each method does or does not need brackets?
innerHTML is javascript, and is a property of an element. If you'd like to stick with the jQuery version of doing things, use html():
$('#theTable .DateColumn').each(function() {
var rowDate = Date.parse($(this).html() );
});
edit: a bit more clarification about your concerns. jQuery is pretty consistent in it's syntax. Basically, most of the methods you find allow read/write access by adjusting the parameters passed to the method.
var css = $('#element').css('color'); // read the color of the element
$('#element').css('color', 'red'); // set the color to "red"
var contents = $('#element').html(); // grab the innerHTML of the element
$('#element').html('Hello World'); // set the innerHTML of this element
.innerHTML is a property of the element not a method.
Property reference Example: object.MyProperty
Method Example: object.SomeFunction();

Getting a substring of text containing HTML tags

Getting a substring of text containing HTML tags
Assume that you want the first 10 characters of the following:
"<p>this is paragraph 1</p>this is paragraph 2</p>"
The output would be:
"<p>this is"
The returned text contains an unclosed P tag. If this is rendered to a page, subsequent content will be affected by the open P tag. Ideally, the preferred output would close any unclosed HTML tags in reverse of when they were opened:
"<p>this is</p>"
I want a function that returns a subtring of HTML, making sure that no tags are left unclosed
You need to teach your code how to understand that your string is actually HTML or XML. Just treating it like a string won't allow you to work with it the way you want to. This means first transforming it to the correct format and then working with that format.
Use an XSL stylesheet
If your HTML is well-formed XML, load it into an XMLDocument and run it through an XSL stylesheet that does something like the following:
<xsl:template match="p">
<xsl:value-of select="substring(text(), 0, 10)" />
</xsl:template>
Use an HTML parser
If it's not well-formed XML (as in your example, where you have a sudden </p> in the middle), you'll need to use a HTML parser of some kind, such as HTML Agility Pack (see this question about C# HTML parsers).
Don't use regular expressions, since HTML is too complex to parse using regex.
You can use the next static function. For a working example check: http://www.koodr.com/item/438c2e9c-62a8-45fc-9ca2-db1479f412e1 . You can also turn this into a extensionmethod.
public static string HtmlSubstring (string html, int maxlength) {
//initialize regular expressions
string htmltag = "</?\\w+((\\s+\\w+(\\s*=\\s*(?:\".*?\"|'.*?'|[^'\">\\s]+))?)+\\s*|\\s*)/?>";
string emptytags = "<(\\w+)((\\s+\\w+(\\s*=\\s*(?:\".*?\"|'.*?'|[^'\">\\s]+))?)+\\s*|\\s*)/?></\\1>";
//match all html start and end tags, otherwise get each character one by one..
var expression = new Regex(string.Format("({0})|(.?)", htmltag));
MatchCollection matches = expression.Matches(html);
int i = 0;
StringBuilder content = new StringBuilder();
foreach (Match match in matches)
{
if (match.Value.Length == 1
&& i < maxlength)
{
content.Append(match.Value);
i++;
}
//the match contains a tag
else if (match.Value.Length > 1)
content.Append(match.Value);
}
return Regex.Replace(content.ToString(), emptytags, string.Empty); }
Your requirement is very unclear so most of this is guesswork. Also, you have provided no code which would help to clarify what it is you want to do.
One solution could be:
a. Find the text between the <p> and the </p> tags. You can use the following Regex for this or use a simple string search:
\<p\>(.*?)\</p\>
b. In the found text, apply a Substring() to extract the required text.
c. Put back the extracted text between the <p> and the </p> tags.
You could loop over the html string to detect the angle brackets and build up an array of tags and whether there was a matching closing tag for each one. The problem is, HTML allows for non closing tags, such as img, br, meta - so you'd need to know about those. You would also need to have rules to check the order of closing, because just matching an open with a close doesn't make valid HTML - if you open a div, then a p and then close the div and then close the p, that isn't valid.
try this code (python 3.x):
notags=('img','br','hr')
def substring2(html,size):
if len(html) <= size:
return html
result,tag,count='','',0
tags=[]
for c in html:
result += c
if c == '<':
intag=True
elif c=='>':
intag=False
tag=tag.split()[0]
if tag[0] == '/':
tag = tag.replace('/','')
if tag not in notags:
tags.pop()
else:
if tag[-1] != '/' and tag not in notags:
tags.append(tag)
tag=''
else:
if intag:
tag += c
else:
count+=1
if count>=size: break
while len(tags)>0:
result += '</{0}>'.format(tags.pop())
return result
s='<div class="main">html <code>substring</code> function written by <span>imxylz</span>, using python language</div>'
print(s)
for size in (30,40,55):
print(substring2(s,size))
output
<div class="main">html <code>substring</code> function written by <span>imxylz</span>, using python language</div>
<div class="main">html <code>substring</code> function writte</div>
<div class="main">html <code>substring</code> function written by <span>imxyl</span></div>
<div class="main">html <code>substring</code> function written by <span>imxylz</span>, using python</div>
more
See code at github.
Another question.

Filtering out anchor tags in a string

I need to filter out anchor tags in a string. For instance,
Check out this site: stackoverflow
I need to be able to filter out the anchor tag to this:
Check out this site: http://www.stackoverflow.com
That format may not be constant, either. There could be other attributes to the anchor tag. Also, there could be more than 1 anchor tag in the string. I'm doing the filtering in vb.net before it goes to the database.
Here's a simple regular expression that should work.
Imports System.Text.RegularExpressions
' ....
Dim reg As New Regex("<a.*?href=(?:'|"")(.+?)(?:'|"").*?>.+?</a>")
Dim input As String = "This is a link: <a href='http://www.stackoverflow.com'>Stackoverflow</a>"
input = reg.Replace(input, "$1", RegexOptions.IgnoreCase)

Resources