Accessing text between two QWebElement objects - qt

I am traversing a DOM using Qt's WebKit classes. Please have a look on the following pseudo HTML:
<br>111<a class="node">AAA</a>
<br>222<a class="node">BBB</a>
...
I can easily find the anchors using findAll(). However I also need to get the text before the elements ("111" and "222"). I tried to use previousSibling() but of course that gives me the <br> element since the "111" and "222" texts are no elements.
I found a function to access text within an element, but how can I access between the <br> and the <a> elements?

It seems it is not possible. The only workaround I could find is getting the plain text of the parent node and parsing the resulting plain text.

This is the way I solved it:
QWebElement *element = ...
// find out if QWebElement has text
QDomDocument doc;
doc.setContent(element->toOuterXml());
QDomElement domelem = doc.documentElement();
for(QDomNode n = domelem.firstChild(); !n.isNull(); n = n.nextSibling())
{
QDomText t = n.toText();
if (!t.isNull())
{
// it has text !
qDebug() << t.data();
break;
}
}

Related

DocXtemplater multiple tags on a single line behavior

I am replacing an old program for template-merging with docxtemplater and am trying to recreate the old programs prefix functionality.
I want the line removed if all prefixed tags ({$tag}) on that line are undefined.
The issue being that if all the tags on that line are undefined docxtemplater still creates a blank line.
All the examples I have found online tend to reference inverted-sections or rawtags, which both seem to be designed for a single tag per line opposed to multiple tags side by side.
I have looked into using rawtags and writing a custom-parser / nullGetter. However I am still none the wiser to removing the blank line.
I am using:
const options = {
paragraphLoop: true,
linebreaks: false,
parser: function(tag) {
return {
get(scope, context) {
console.log(tag);
console.log(scope);
console.log(context);
if (tag[0] == "$") {
tag = tag.substr(1); // needs to then remove line break
}
return scope[tag];
}
}
},
nullGetter: function nullGetter(part, scopeManager) {
if (!part.module) {
return "";
}
if (part.module === "rawxml") {
return "";
}
return "";
}
};
doc = new Docxtemplater(zip, options);
The prefix in the program I am replacing acts as follows:
data:
existingtag: EXISTINGTAG
Template.docx:
1 text above
{$existingtag}{$nonexistingtag}
text below
2 text above
{$existingtag}{$existingtag}
text below
3 text above
{$nonexistingtag}{$nonexistingtag}
text below
old program produced (What I want to produce)
1 text above
EXISTINGTAG
text below
2 text above
EXISTINGTAGEXISTINGTAG
text below
3 text above
text below
my docxtemplater produces (extra line in example 3):
1 text above
EXISTINGTAG
text below
2 text above
EXISTINGTAGEXISTINGTAG
text below
3 text above
text below
I'm the creator of docxtemplater and I don't think that there is a way to do what you want to achieve without taking a lot of time to handle this case.
The problem is that the tags such as :
{xxx}{yyy}
have access only to the text that they are in, but they cannot have any effect ouside of that text, so it is not possible to remove a paragraph conditionnally.
There is one tag that has access to the whole paragraph, that is the raw xml tag, prefixed by a "#", like this :
{#raw}
It is used to add rawXML and if the raw value is an empty string, than that paragraph will be removed.
Edit : I have actually worked on a module back in the time to achieve quite similar functionnality, it is a paid module : https://docxtemplater.com/modules/paragraph-placeholder/

Qt - How to count and measure the lines in a QTextDocument?

In one of my project, I created a QTextDocument which owns a text I need to draw. The text is a word wrapped HTML formatted text, and it should be drawn in a rectangle area, for which I know the width. Its content will also never exceed a paragraph.
The text document is created as follow:
// create and configure the text document to measure
QTextDocument textDoc;
textDoc.setHtml(text);
textDoc.setDocumentMargin(m_TextMargin);
textDoc.setDefaultFont(m_Font);
textDoc.setDefaultTextOption(m_TextOption);
textDoc.setTextWidth(m_Background.GetMessageWidth(size().width()));
and here is a sample text I want to draw:
Ceci est un texte <img src=\"Resources/1f601.svg\" width=\"24\" height=\"24\"> avec <img src=\"Resources/1f970.svg\" width=\"24\" height=\"24\"> une <img src=\"Resources/1f914.svg\" width=\"24\" height=\"24\"> dizaine <img src=\"Resources/1f469-1f3fe.svg\" width=\"24\" height=\"24\"> de <img src=\"Resources/1f3a8.svg\" width=\"24\" height=\"24\"> mots. Pour voir comment la vue réagit.
The images are SVG images get from the qml resources.
In order to perform several operations while the text is drawn, I need to know how many lines will be drawn, after the word wrapping is applied, and the height of any line in the word wrapped text.
I tried to search in the functions provided by the text document, as well as those provided in QTextBLock, QTextFragment and QTextCursor. I tried several approaches like iterate through the chars with a cursor and count the lines, or count each fragments in a block. Unfortunately none of them worked: All the functions always count 1 line, or just fail.
Here are some code sample I already tried, without success:
// FAILS - always return 1
int lineCount = textDoc.lineCount()
// FAILS - always return 1
int lineCount = textDoc.blockCount()
// FAILS - return the whole text height, not a particular line height at index
int lineHeight = int(textDoc.documentLayout()->blockBoundingRect(textDoc.findBlockByNumber(lineNb)).height());
// get the paragraph (there is only 1 paragraph in the item text document
QTextBlock textBlock = textDoc.findBlockByLineNumber(lineNb);
int blockCount = 0;
for (QTextBlock::iterator it = textBlock.begin(); it != textBlock.end(); ++it)
{
// FAILS - fragments aren't divided by line, e.g an image will generate a fragment
QString blockText = it.fragment().text();
++blockCount;
}
return blockCount;
QTextCursor cursor(&textDoc);
int lineCount = 0;
cursor.movePosition(QTextCursor::Start);
// FAILS - movePosition() always return false
while (cursor.movePosition(QTextCursor::Down))
++lineCount;
I cannot figure out what I'm doing wrong, and why all my approaches fail.
So my questions are:
How can I count the lines contained in my word wrapped document
How can I measure the height of a line in my word wrapped document
Are the text document function failing because of the html format? If yes, how should I do to reach my objectives in a such context?
NOTE I know how to measure the whole text height. However, as each line height may be different, I cannot just divide the whole text height by the lines, so this is not an acceptable solution for me.
I finally found a way to resolve my issue. The text document cannot be used directly to measure individual lines, the layout should be used for this purpose instead, as explained in the following post:
https://forum.qt.io/topic/113275/how-to-count-and-measure-the-lines-in-a-qtextdocument
So the following code is the solution:
//---------------------------------------------------------------------------
int getLineCount(const QString& text) const
{
// create and configure the text document to measure
QTextDocument textDoc;
textDoc.setHtml(text);
textDoc.setDocumentMargin(m_TextMargin);
textDoc.setDefaultFont(m_Font);
textDoc.setDefaultTextOption(m_TextOption);
textDoc.setTextWidth(m_Background.GetMessageWidth(size().width()));
// this line is required to force the document to create the layout, which will then be used
//to count the lines
textDoc.documentLayout();
// the document should at least contain one block
if (textDoc.blockCount() < 1)
return -1;
int lineCount = 0;
// iterate through document paragraphs (NOTE normally the message item should contain only 1 paragraph
for (QTextBlock it = textDoc.begin(); it != textDoc.end(); it = it.next())
{
// get the block layout
QTextLayout* pBlockLayout = it.layout();
// should always exist, otherwise error
if (!pBlockLayout)
return -1;
// count the block lines
lineCount += pBlockLayout->lineCount();
}
return lineCount;
}
//---------------------------------------------------------------------------
int measureLineHeight(const QString& text, int lineNb, int blockNb) const
{
// create and configure the text document to measure
QTextDocument textDoc;
textDoc.setHtml(text);
textDoc.setDocumentMargin(m_TextMargin);
textDoc.setDefaultFont(m_Font);
textDoc.setDefaultTextOption(m_TextOption);
textDoc.setTextWidth(m_Background.GetMessageWidth(size().width()));
// this line is required to force the document to create the layout, which will then be used
//to count the lines
textDoc.documentLayout();
// check if block number is out of bounds
if (blockNb >= textDoc.blockCount())
return -1;
// get text block and its layout
QTextBlock textBlock = textDoc.findBlockByNumber(blockNb);
QTextLayout* pLayout = textBlock.layout();
if (!pLayout)
return -1;
// check if line number is out of bounds
if (lineNb >= pLayout->lineCount())
return -1;
// get the line to measure
QTextLine textLine = pLayout->lineAt(lineNb);
return textLine.height();
}
//---------------------------------------------------------------------------

JavaFX Text in TextFlow ignores StyleClass?

I try to use a javafx TextFlow to view some styled text. The following code does not do any text styling.
public Node createText(String t,String cls){
Text ret = new Text(t);
ret.getStyleClass().add(cls);
return ret;
}
When I replace Text with Label it works properly, but things like \n obviously do not work anymore. How can I use the Text class with css classes?
EDIT: As requested a short example of my default.css
.defaultElementAttr{
-fx-text-fill:#48a711;
}
-fx-text-fill is a CSS property of Label but it is not a CSS property of Text.
If you want to change the color of a Text object with CSS, use the -fx-fill property:
.defaultElementAttr {
-fx-fill:#48a711;
}

Select element without a child

I have a page that might one of the following:
<span id='size'>33</span>
Or
<span id='size'>
<b>33</b>
<strike>32</strike>
</span>
I would like to grab the value '33' on both cases, is there a CSS selector I can use?
I tried to use the following, #size with no b sibling or b which is a #size sibling:
document.querySelector('#size:not(>b), #size>b').innerText
But I keep getting an error- "Error: SYNTAX_ERR: DOM Exception 12"
According to w3 Spec only Simple Selectors are supported, the thing is that "greater-than sign" (U+003E, >)" is considered as part of the Simple Selectors definition.
You can't do it with a regular CSS selector, but you can do it in a few lines of JS:
var element = document.querySelector('#size');
var b = element.querySelector('b');
var text = b ? b.innerText : element.childNodes[0].nodeValue;
console.log(text);
So really you want significant text (ie other than whitespace, because in your second example there's probably tabs and returns between the span start tag and the b) of #size, or, if that doesn't exist, the significant text of its first element:
// Is text just whitespace?
function isWhitespace(text){
return text.replace(/\s+/,'').length === 0;
}
// Get the immediate text (ie not that of children) of element
function getImmediateText(element){
var text = '';
// Text and elements are all DOM nodes. We can grab the lot of immediate descendants and cycle through them.
for(var i = 0, l = element.childNodes.length, node; i < l, node = element.childNodes[i]; ++i){
// nodeType 3 is text
if(node.nodeType === 3){
text += node.nodeValue;
}
}
return text;
}
function getFirstTextNode(element){
var text = getImmediateText(element);
// If the text is empty, and there are children, try to get the first child's text (recursively)
if(isWhitespace(text) && element.children.length){
return getFirstTextNode(element.children[0])
}
// ...But if we've got no children at all, then we'll just return whatever we have.
else {
return text;
}
}
The day we'll have CSS Level 4 selectors and the parent selector you'll be able to use a simple selector but for now you can't do it directly.
You could iterate to find the first text node but here's a hacky solution :
var text = document.getElementById('size').innerHTML.split(/<.*?>/)[0];
To be used only if you have some idea of the content of your #size element.

Last line of a paragraph contains a single word only [duplicate]

This question already has answers here:
Widow/Orphan Control with JavaScript?
(7 answers)
Closed 8 years ago.
A common problem when working with typography in HTML/CSS is something we call "horunge" in Swedish ("widow" in english).
What it is:
Let's say you have a box with a width of 200px and with the text "I love typograpy very much". Now the text breaks and becomes:
I love typography very
much
As a designer I don't want a word bastard (single word / row). If this was a document/PDF etc. I would break the word before very and look like this:
I love typography
very much
which looks much better.
Can I solve this with a CSS rule or with a javascript? The rule should be to never let a word stand empty on a row.
I know it can be solved by adding a <br /> but that's not a solution that works with dynamic widths, feed content, different translations, browser font rendering issues etc.
Update (solution)
I solved my problem with this jquery plugin: http://matthewlein.com/widowfix/
A simple jQuery / regrex solution could look like the following, if you add the class "noWidows" to the tag of any element that contains text you are worried about.
Such as:
<p class="noWidows">This is a very important body of text.</p>
And then use this script:
$('.noWidows').each(function(i,d){
$(d).html( $(d).text().replace(/\s(?=[^\s]*$)/g, " ") )
});
This uses regex to find and replace the last space in the string with a non-breaking character. Which means the last two words will be forced onto the same line. It's a good solution if you have space around the end of the line because this could cause the text to run outside of an element with a fixed width, or if not fixed, cause the element to become larger.
Just wanted to add to this page as it helped me a lot.
If you have (widows) actually should be orphans as widows are single words that land on the next page and not single words on a new line.
Working with postcodes like "N12 5GG" will result in the full postcode being on a new line together but still classed as an orphan so a work around is this. (changed the class to "noWidow2" so you can use both versions.
123 Some_road, Some_town, N12 5GG
$('.noWidows2').each(function(i,d){
var value=" "
$(d).html($(d).text().replace(/\s(?=[^\s]*$)/g, value).replace(/\s(?=[^\s]*$)/g, value));
});
This will result is the last 3 white spaces being on a new line together making the postcode issue work.
End Result
123 Some_road,
Some_town, N12 5GG
I made a little script here, with the help of this function to find line height.
It's just an approach, it may or may not work, didn't have time to test throughly.
As of now, text_element must be a jQuery object.
function avoidBastardWord( text_element )
{
var string = text_element.text();
var parent = text_element.parent();
var parent_width = parent.width();
var parent_height = parent.height();
// determine how many lines the text is split into
var lines = parent_height / getLineHeight(text_element.parent()[0]);
// if the text element width is less than the parent width,
// there may be a widow
if ( text_element.width() < parent_width )
{
// find the last word of the entire text
var last_word = text_element.text().split(' ').pop();
// remove it from our text, creating a temporary string
var temp_string = string.substring( 0, string.length - last_word.length - 1);
// set the new one-word-less text string into our element
text_element.text( temp_string );
// check lines again with this new text with one word less
var new_lines = parent.height() / getLineHeight(text_element.parent()[0]);
// if now there are less lines, it means that word was a widow
if ( new_lines != lines )
{
// separate each word
temp_string = string.split(' ');
// put a space before the second word from the last
// (the one before the widow word)
temp_string[ temp_string.length - 2 ] = '<br>' + temp_string[ temp_string.length - 2 ] ;
// recreate the string again
temp_string = temp_string.join(' ');
// our element html becomes the string
text_element.html( temp_string );
}
else
{
// put back the original text into the element
text_element.text( string );
}
}
}
Different browsers have different font settings. Try to play a little to see the differences. I tested it on IE8 and Opera, modifying the string every time and it seemed to work ok.
I would like to hear some feedback and improve because I think it may come in handy anyway.
Just play with it! :)
There are also CSS widows and orphans properties: see the about.com article.
Not sure about browser support...
EDIT: more information about WebKit implementation here: https://bugs.webkit.org/buglist.cgi?quicksearch=orphans.
Manually, you could replace the space in between with
I've been looking for ways to dynamically add it in. I found a few, but haven't been able to make it work myself.
$('span').each(function() {
var w = this.textContent.split(" ");
if (w.length > 1) {
w[w.length - 2] += " " + w[w.length - 1];
w.pop();
this.innerHTML = (w.join(" "));
}
});
#foo {
width: 124px;
border: 1px solid #ccc;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="foo">
<span class="orphan">hello there I am a string really really long, I wonder how many lines I have</span>
</div>

Resources