QDomElement::text() without child element texts?

QDomElement::text() without child element texts? - qt

I have an xml like:
<a>
<b>1</b>
<c>2</c>
<d>3</d>
</a>
and a recursive function that parses QDomDocument that wraps it. The function iterates QDomNodes, converting them into QDomElements and calls text() method to get data.
Unfortunately QDomElement::text() works at <a> level too and returns: 123. So it gathers the texts of all nested elements.
I would like it to return an empty string bcs, I would rather not checking tagName() value as there are may be plenty of them. So I would rather chek node tag by haveng/not having text inside than vice versa. Is this doable? Is there a method that will return empty string for <a> and text values for <b>, <c>, <d> levels?
P.S. QDomNode::nodeValue() returns an empty text for all elements.

It seems I was wrong bcs I was not iterating QDomNodes that can't be converted to QDomElements. And according to this answer:
This is required by the DOM specification:
The Text interface represents the textual content (termed character data in XML) of an Element or Attr. If there is no markup
inside an element's content, the text is contained in a single object
implementing the Text interface that is the only child of the element.
If there is markup, it is parsed into a list of elements and Text
nodes that form the list of children of the element.
I have no markup inside <b>-like elements. So at <b> element's level I'm having el.childNodes().size() == 1, el.firstChild().isElement() == false and el.firstChild().nodeValue() returns a non-empty value!

Related

Robotframework : How can I have this kind of result => Page MUST (and not SHOULD) contain element

I'm new to RobotFramework and I would have some help for my issue.
On a page, I would like to verify that a word is present on a locator. I have used the Page Should Contain Element keyword and it works partially : indeed, the elements which contains the word is found, but there isn't an error when the others elements (same type) don't contain this word.
Example (I work on a list of sell):
The page contains many locators of this type :
//*[#class="resultats mode_liste ng-scope"]/div[#ng-repeat="annonce in resultats.data.annonces "]//h2/a/strong[#class="ng-binding"]
A "div" element contains "locator1" which contains "House"
A "div" element contains "locator1" which contains "House"
A "div" element contains "locator1" which contains "Box"
Etc...
So I have written the Keyword
Page Should Contain Element //*[#class="resultats mode_liste ng-scope"]/div[#ng-repeat="annonce in resultats.data.annonces "]//h2/a/strong[#class="ng-binding"][contains(., "House")]
but the result is not what I expected (error if another word than "House" is founded on the locator1)
And I would like to have the following result: All the locator1 elements MUST contain the word "House". If the locator1 contains a different word, then the test must fail.

According to the Library documentation, you can set a "limit" parameter which allows you to set the amount of elements to check for on the page
E.g. if you wanted to check the element appears twice on the page then you can set limit to 2
Page Should Contain Element //*[#class="resultats mode_liste ng-scope"]/div[#ng-repeat="annonce in resultats.data.annonces "]//h2/a/strong[#class="ng-binding"][contains(., "House")] limit=2
From doc:
"The limit argument can used to define how many elements the page should contain. When limit is None (default) page can contain one or more elements. When limit is a number, page must contain same number of elements."
Having said that, if you're using an older version of the Library, you may need to use a different keyword like "Xpath Should Match X Times" but it is deprecated in new version
The other issue with not failing based on other words in the element is because the Xpath you're using is specifically looking at just elements that contain the text "House" therefore the other elements will be ignored completely when running this.
If you wanted to check to ensure that no other text is contained in elements and you know what they would be then you could use additional keyword
Page Should Not Contain Element //*[#class="resultats mode_liste ng-scope"]/div[#ng-repeat="annonce in resultats.data.annonces "]//h2/a/strong[#class="ng-binding"][contains(., "Box")]
However if you will not know what the text will be then you may need to approach this differently by getting the web element text values and cycling through them in a loop and checking the text of each one matches "House"
It would look something like this:
Check Element Contains Text 'House'
#{elements}= Get Webelements //*[#class="resultats mode_liste ng-scope"]/div[#ng-repeat="annonce in resultats.data.annonces "]//h2/a/strong[#class="ng-binding"]
FOR ${element} IN #{elements}
${text}= Get Text ${element}
Should Be Equal As Strings ${text} House
END
I'm unable to test it to be absolutely sure though.

Why does google-chrome-devtools identifies less number of elements through XPath then number of elements identified through CssSelector

I am trying to identify the elements containing the reviews on this webpage using google-chrome-devtools.
Using the following xpath:
//div[#class='text show-more__control']
The number of elements identified are: 15
Snapshot:
Using the following css-selectors:
div.text.show-more__control
The number of elements identified are: 25
Snapshot:
So, why does google-chrome-devtools identifies less number of elements through XPath then number of elements identified through CssSelector

The XPath is checking #class attribute values lexically for the string, text show-more__control.
The CSS expression is checking semantically for #class attribute values that indicate that the div should have both the text and the show-more__control styles.
There are 10 div elements that satisfy the CSS semantic selection criteria that fail the XPath lexical criteria because their #class lexically is
text show-more__control clickable
^^^^^^^^^^
The usual workaround for testing #class is to pad and check each class separately:
//div[ contains(concat(' ',#class,' '), ' text ')
and contains(concat(' ',#class,' '), ' show-more__control ') ]
This XPath returns 25 div elements, just like the CSS selector.
Note: Particularly tricky here is that clickable parts of the div/#class attribute value are not present in the static source, only in the dynamic properties on the div objects.

Why we need to escape CSS?

Given the following examples which I picked up from here:
CSS.escape(".foo#bar") // "\.foo\#bar"
CSS.escape("()[]{}") // "\(\)\[\]\{\}"
Since .foo#bar is a valid CSS selector expression. Why we need to append \ before some characters? Suppose I want to write my own program which does the same task of escaping all the values/expressions in a CSS file then, how should I proceed?
PS: I am always confused about the escaping, how should I think when it comes to escaping some input?

You escape strings only when those strings contain special symbols that you want to be treated literally. If you are expecting a valid CSS selector as user input, you shouldn't be escaping anything.
.foo#bar is a valid CSS selector, but it means something completely different from \.foo\#bar. The former matches an element with that respective class and ID, e.g. <div class=foo id=bar> in HTML. The latter matches an element with the element name ".foo#bar", which in a hypothetical markup language could be represented as <.foo#bar> (obviously this is not legal HTML or XML syntax, but you get the picture).

Can the HTML 'class' element attribute contain line breaks?

Can the 'class' attribute of HTML5 elements contain line breaks? Is it allowable in the specs and do browsers support it?
I ask because I have some code that dynamically inserts various classes into the element and this has created one very long line that is hard to manage. Normally I would build the class value using a variable but the CMS I'm using requires the template conditional tags to be positioned inline with the HTML. I can't use variables or PHP.
What I found in my research is that some HTML tag attributes need to be a single line, but I haven't been able to discover if the class attribute is one of those.
Does anyone know something about this?

Per the HTML 4 spec, the class attribute is CDATA:
User agents should interpret attribute values as follows:
o Replace character entities with characters
o Ignore line feeds
o Replace each carriage return or tab with a single space.
so you're in good shape there.
The HTML5 spec describes a class as a set of space separated tokens, where a 'space' includes newlines.
So you should be good there, too.

Can the [class] attribute of HTML5 elements contain line breaks?
Yes. The HTML5 spec says:
The attribute, if specified, must have a value that is a set of space-separated tokens representing the various classes that the element belongs to.
The link proceeds to say:
A set of space-separated tokens is a string containing zero or more words (known as tokens) separated by one or more space characters, where words consist of any string of one or more characters, none of which are space characters.
And space characters include:
space (' ')
tab (\t)
line feed (\n)
form feed (\f)
carriage return (\r)
The space characters, for the purposes of this specification, are U+0020 SPACE, "tab" (U+0009), "LF" (U+000A), "FF" (U+000C), and "CR" (U+000D).
Newlines as you would add to UTF-8 documents are:
line feeds (\n)
carriage returns (\r)
a carriage return followed immediately by a line feed (\r\n)

What is caret symbol ^ used for in css when selecting elements?

I encountered a css selector in a file like this:
#contactDetails ul li a, a[href^=tel] {....}

The circumflex character “^” as such has no defined meaning in CSS. The two-character operator “^=” can be used in attribute selectors. Generally, [attr^=val] refers to those elements that have the attribute attr with a value that starts with val.
Thus, a[href^=tel] refers to such a elements that have the attribute href with a value that starts with tel. It is probably meant to distinguish telephone number links from other links; it’s not quite adequate for that, since the selector also matches e.g. ... but it is probably meant to match only links with tel: as the protocol part. So a[href^="tel:"] would be safer.

a[href^="tel"]
(^) means it selects elements that have the specified attribute with a value beginning/starting exactly with a given string.
Here it selects all the 'anchor' elements the value of href attribute starting exactly with a string 'tel'

The carat "^" used like that will match a tags where the href starts with "tel" ( http://csscreator.com/content/attribute-selector-starts )

It means a tags whose href attribute begins with "tel"
Example:
This is a link
will match.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

QDomElement::text() without child element texts? - qt

Related

Robotframework : How can I have this kind of result => Page MUST (and not SHOULD) contain element

Why does google-chrome-devtools identifies less number of elements through XPath then number of elements identified through CssSelector

Why we need to escape CSS?

Can the HTML 'class' element attribute contain line breaks?

What is caret symbol ^ used for in css when selecting elements?

Categories

Resources