DomCrawler filterXpath for emails - symfony

In my project I am trying to use filterXPath for emails. So I get an E-Mail via IMAP and put the mail body into my DomCrawler.
$crawler = new Crawler();
$crawler->addHtmlContent($mail->textHtml); //mail html content utf8
Now to my issue. I only want the plain text of the mail body, but still remain all new lines spaces etc - the exact same as the mail looks just in plain text without html (still with \n\r etc).
For that reason I tried using $crawler->filterXPath('//body/descendant-or-self::*/text()') to get every text node inside the mail.
However my test-mail containts html like:
<p>
<u>
<span>
<a href="mailto:mail#example.com">
<span style="color:#0563C1">mail#example.com</span>
</a>
</span>
</u>
<span>
</span>
<span>·</span>
<span>
<b>
<a href="http://www.example.com">
<span style="color:#0563C1">www.example.com</span>
</a>
</b>
<p/>
</span>
</p>
In my mail this looks like mail#example.com · www.example.com (in one single line).
With my filterXPath I get multiple nodes which result in following (multiple lines):
mail#example.com
· wwww.example.com
I know that probably the 
 might be the problem, which is a \r, but since I can't change the html in the mail, I need another solution - as mentioned before in the mail it is only a single line.
Please keep in mind, that my solution has to work for every mail - I do not know how the mail html looks like - it can change every time. So I need a generic solution.
I already tried using strip_tags too - this does not change the result at all.
My current approach:
$crawler = new Crawler();
$crawler->addHtmlContent($mail->textHtml);
$text = "";
foreach ($crawler->filterXPath('//body/descendant-or-self::*/text()') as $element) {
$part = trim($element->textContent);
if($part) {
$text .= "|".$part."|\n"; //to see whitespaces etc
}
}
echo $text;
//OUTPUT
|mail#example.com|
|·|
| |
|www.example.com|
| |

I believe something like this should work:
$xpath = new DOMXpath($crawler);
$result = $xpath->query('(//span[not(descendant::*)])');
$text = "";
foreach ($result as $element) {
$part = trim($element->textContent);
if($part) {
$text .= "|".$part."|"; //to see whitespaces etc
}
}
echo $text;
Output:
|mail#example.com||·||www.example.com|

Do note that you are dealing with two different ways to treat whitespace only text nodes: HTML has its own rules about if those are rendered (the difference are mainly between block elements and inline elements and also includes normalization) and XPATH works over a document tree provided by a parser (or DOM API) which has its own configuration about preserving or not those whitespace only text nodes. Taking this into account, one solution could be to use the string() function to get the string value of the element containing the email:
For this input:
<root>
<p>
<u>
<span>
<a href="mailto:mail#example.com">
<span style="color:#0563C1">mail#example.com</span>
</a>
</span>
</u>
<span>
</span>
<span>·</span>
<span>
<b>
<a href="http://www.example.com">
<span style="color:#0563C1">www.example.com</span>
</a>
</b>
<p/>
</span>
</p>
</root>
This XPath expresion:
string(/root)
Outputs:
mail#example.com
·
www.example.com
Check in here

Related

Stripping 'get_the_author_link' to only return the URL (not name, title, target etc)

The code will return a lof of unnecessary information. This is not needed and breaks my icon.
Tried placing the code inside an href tag. It broke it.
It does work alone since it then returns the name of the person, and makes the name clickable to his website. But I only want it to return the website URL. Not the name, etc.
<?php echo get_the_author_link(); ?>
when used like this:
<li><i class="fa fa-linkedin"></i></li>
it outputs the following too crowded (and broken) code:
<li><i class="fa fa-linkedin"></i></li>
All i basically want it to return is the URL, not the name etc. which breaks the output.
Thank you.
You should be able to do something like this:
<?php
$author_data = get_userdata( $author->ID );
?>
<li>
<a href="<?php echo $author_data->user_url; ?>" target="_blank">
<i class="fa fa-linkedin"></i>
</a>
</li>
function get_the_author_link_url(){
$author_url = get_the_author_meta('url');
return !empty( $author_url )? $author_url : false;
}
you can use this function to get the url, if set or false otherwise. Please make sure to check if the result is false or alter it to fit your needs.
Inspired by the Documentation for get_the_author_link()

WordPress removes empty span tag

I use WordPress-editor and I want to display an icon within a "span"-tag like this:
<div id="question1" class="box-around">
<div class="box-left"><span class="fa fa-search" aria-hidden="true"> </span></div>
<div class="box-right">
<h3>Some Heading</h3>
Some Text
<span id="question1-answer"> </span>
</div>
</div>
Whenever I make a change in "visual", it removes the "span"-tag and looks like this:
<div id="question1" class="box-around">
<div class="box-left"></div>
<div class="box-right">
<h3>Some Heading</h3>
Some Text
<span id="question1-answer"> </span>
</div>
</div>
Oddly enough, the span at the bottom (id="question1-answer") is kept. Am I missing something? I already tried to set a whitespace "&nbsp" within the tag, which will be converted to a " " (actual whitespace) after changing text in "visual" and used different tags as well.
Thanks!
Add this code in your active theme functions.php file.
function override_mce_options($initArray) {
$opts = '*[*]';
$initArray['valid_elements'] = $opts;
$initArray['extended_valid_elements'] = $opts;
return $initArray;
}
add_filter('tiny_mce_before_init', 'override_mce_options');
A little more specific - allow empty tags if they have an id, name, class or style attribute:
function override_mce_options($initArray) {
$opts = '*[id|name|class|style]';
$initArray['valid_elements'] .= ',' . $opts;
$initArray['extended_valid_elements'] .= ',' . $opts;
return $initArray;
}
add_filter('tiny_mce_before_init', 'override_mce_options');
Maybe I'm doing something wrong, but for me it works. Still I'm sure there's a better solution - it would be nice to be able to add only one specific tag to valid elements.
With the above answers (Val) the function will allow empty tags but this still may not work due to the theme structure or any page builder plugins you may have.
For example, I am using WPBakery page builder with custom functions. For my to allow an empty span with style (background for example) I added the above code to my functions.php and also placed a tag within the block.
The span block has a custom class .break to where the styling is created, I then set a display: none on the tag within the .break class so the styling remains but the extra space is removed.
<span class="break"><br></span>
.break br {display:none;}
Now the empty span tag should display as normal.

How to avoid the unwanted <p></p> tags while getting the post in Wordpress

I'm trying to get the data from a post using the following snippet:
<div class="marquee-container row marquee-row">
<?php
$post = get_post(415);
if(isset($post->post_status) && $post->post_status == 'publish') {
echo apply_filters('the_content', $post->post_content);
}
?>
</div>
But when I inspect the element, I found some unwanted <p></p> tags which are unnecessarily introducing gaps.
<div class="marquee-container row marquee-row">
<p> </p>
<div class="some-class">Content from the post</div>
<p></p>
</div>
How to avoid this unwanted tags ?
Note: This tags are not being inserted to table, being introduce while trying to get the post.
Thank you for your help.
can you please check that content from admin side in text tab(Where you add the contents for page or post)? If you have pressed Enter key then it will take that as <p></p> tag in content.
Remove blank spaces from editor if it is not necessary.

How to getText from the element

I'm using webdriver for forum reply testing.In this scenario,I'm not able to locate and get the reply text ("I want rock!")from following code.
The HTML code is:
<div id="user_ack_con0" class="user_ack_con mt15 clear clearfix">
<dl class="clear clearfix">
<dt>
<a href="http://www.abc/user/1161/">
</a>
</dt>
<div>
Jason
<span class="total_icon total_icon5"></span>
:I want rock!
</div>
I really don't know how to get that text from this element:( Anybody knows,thanks.
Here's a general solution:
def get_text_excluding_children(driver, element):
return driver.execute_script("""
return jQuery(arguments[0]).contents().filter(function() {
return this.nodeType == Node.TEXT_NODE;
}).text();
""", element)
The element passed to the function can be something obtained from the find_element...() methods (i.e. it can be a WebElement object).
I'm actually using this code in a test suite.
The text is technically inside the div element, so you should try getting it using the find method on the xPath:
//div[#id="user_ack_con0"]/dl/div
and then getting the text
You can try:
driver.findElement(By.cssSelector("#user_ack_con0 > dl > div")).getText()
Or This:
$("#user_ack_con0 > dl > div").textContent
Jquery get Text

nth child anchor issue

I have a page which brings database entries in and displays them, and i have given every other entry/listing this style:
hjl:nth-child(odd) { background: #F2F2F2;}
And this is my HTML/PHP
<a href="paging.php?job_id=<?php echo $rsjobinfo['job_id'];?>">
<div class = "hjl">
<div class = "hjldate">
<p>Posted on:<br /><?php echo $date = date('d M Y', strtotime($rsjobinfo['date']));?></p>
</div>
<div class = "hjljobinfo">
<h1><?php echo $rsjobinfo['job_title'];?></h1>
<h2><?php echo $rsjobinfo['company_name'];?> |</h2>
<p class = "location"><?php echo $rsjobinfo['city'];?>, <?php echo $rsjobinfo['county'];?>, <?php echo $rsjobinfo['country'];?></p>
</div>
</div>
</a>
However, when i try to wrap each entire entry in an anchor tag, every entry changes to the background style listed above, no longer recognising the 'odd'.
This is the HTMl that's generated:
<a href="paging.php?job_id=253">
<div class = "hjl">
<div class = "hjldate">
<p>Posted on:<br />11 Jul 2011</p>
</div>
<div class = "hjljobinfo">
<h1>Entry One</h1>
<h2>Company |</h2>
<p class = "location">New York, NY, USA</p>
</div>
</div>
</a>
I'm not used to using nth-child so i'm not sure how to fix it (i've tried playing around with adding an 'a' to the above but its not making any difference).
Does anyone have any pointers they can kick me towards?
Thanks
Dan
:nth-child works between siblings documentation. That means that the counter (odd in this case) applies to elements with the same parent.
If you wrap each .hjl in a a then they no longer share the same parent so the selector tries to find odd .hjl elements inside the a element and finds just the first one (the only one that exists in the a).
So, you should change your selector to work with a tags (perhaps apply a class as well for more precision)
on another note, placing div and h1/h2 elements inside a a tag is invalid and will cause other issues.
You need to add ":" right in front of it, like so:
YourElement:nth-child( { number expression | odd | even } ) {
declaration block
}
Your element is something like tr, li, or anything like that. Hope that helps.

Resources