Excluding quotes while scraping text on a forum site

Excluding quotes while scraping text on a forum site - web-scraping

I am struggling with excluding quotes on a forum website. I need to scrape message texts some of which include quotations as a reference to an earlier message. Scraping all messages that have quotes means I get the same text on scraped data multiple times. In which case analyzing it becomes difficult. Can anybody help me with bypassing quoted text while scraping?
Here is an example page: https://forum.donanimhaber.com/toyota-touch-2-ve-touch-go-2-kullanici-tecrubeleri-ve-klavuzda-yazmayanlar--88405838
Here is the html code of a message with quotation:
<table>
<tbody>
<tr>
<td>
<table style="width:100%;"><tbody><tr><td>**<blockquote class="quote"><i>quote:</i><br><br>Orijinalden alıntı: DBolanci <br> <br> Beyler albüm kapağı ve klasör listelemeyi bende yapamadım. Mp3leri tek tek düzenledim en ince ayrıntısına kadar yazdım ama göstermiyor. Nasıl yapacaz bilgisi olan? Ayrıca araçta navigasyon özelliğini nasıl kazandırabiliriz? servis yazılım için ücret istiyor :( <br> </blockquote>**</td></tr></tbody></table> <br> aynı soruların cevabını bende bekliyorum. yardımcı olabilecek kimse yokmu?
</td>
</tr>
</tbody>
</table>
</span>

If you look at the html structure of the message you can see that it follows this format:
<span class="msg">
<table>
<table> QUOTED TEXT </table>
TEXT CONTENT
</table>
All you have to do is select msg>table content and do not select msg>table>table content. In xpath selectors this can be achieved in something like:
//span[#class="msg"]/table/tbody/tr/td/text()

Related

How to embed file content into body of the email using mail command?

I have requirement,where i need to send file content as mail body.can we dot through unix scripting.
Thanks in Advance

With the data create a html file. And then send that file in email as content.
use an expression to create your file data like this -
v_data= ' <tr>
<td>'||company ||'</td>
<td>'|| contact_person|| '</td>
<td>'|| country ||'</td>
</tr>'
Use an aggregator to concat all these data into one single row. group by none.
Then use another expression transformation.
create a ports like this -
v_head ='
<head></head>
<body>
<b>pls find below data.</b>
<table>
<tr>
<th>Company</th>
<th>Contact</th>
<th>Country</th>
</tr>'
v_body = Aggregated_v_data
v_tail='</table></body>'
v_output = v_head||v_body ||v_tail
Then use this output and connect to a flat file target.
Then send this flat file sing mailx command/any mail client.
Output should look like this.
html file should looks like this
<head></head>
<body>
pls find below data.
<br> </br>
<table>
<tr>
<th>Company</th>
<th>Contact</th>
<th>Country</th>
</tr>
<tr>
<td>Alfreds </td>
<td>Maria </td>
<td>Germany</td>
</tr>
</table>
</body>

Displaying price including add-ons in "Add-to-cart"-button in single product view

I am trying to show the price of the product in the single product view in the button. I tried this solution WooCommerce display price on add to cart button - but I have product-add-ons (with the plug-in from YITH).
The price is shown in the table above the button, so it is calculated. But how do I get it, I can't find it in the source code, to move it into the button (see picture).
I tried now several solutions, but sadly nothing works. Can anybody give me a hint in the right direction? Sorry, I am a beginner at this, but I'm trying.

I used jquery to read the total amount value and then changed the button's text.
var myTotalPrice = $(".yith_wapo_group_final_total .price").text()
$("button.single_add_to_cart_button").text(myTotalPrice)
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div class="yith_wapo_group_total
yith_wapo_keep_show" data-product-price="7.2" data-product-id="63">
<table>
<tbody><tr class="ywapo_tr_product_base_price">
<td data-nosnippet="">Product price</td>
<td data-nosnippet=""><div class="yith_wapo_group_product_price_total"><span class="price amount">€ 7,20</span></div></td>
</tr>
<tr class="ywapo_tr_additional_options">
<td data-nosnippet="">Add-Ons Total:</td>
<td data-nosnippet=""><div class="yith_wapo_group_option_total"><span class="price amount">€ 2,40</span></div></td>
</tr>
<tr class="ywapo_tr_order_totals">
<td data-nosnippet="">Gesamtsumme:</td>
<td data-nosnippet=""><div class="yith_wapo_group_final_total"><span class="price amount">€ 9,60</span></div></td>
</tr>
</tbody></table>
</div>
<button type="submit" name="add-to-cart" value="63" class="single_add_to_cart_button button alt">In den Warenkorb</button>
The easiest way to insert this JS to your theme is to use Code Snippets plugin. There is an example JS snippet there. Just copy/paste the 2 line JS code here into that example snippet.
Make sure you inject this snippet into the footer section of your theme.
EDIT: Please try this code to get the latest price value on the page when it is changed.
var priceElement = $(".yith_wapo_group_final_total .price")
$(priceElement).change(function(){
var myTotalPrice = priceElement.text()
$("button.single_add_to_cart_button").text(myTotalPrice)
});

Robot Framework - Selenium2Library - Input text into field according to the text before it

Due to every ID being randomly generated after each refresh, I am having to use other identifiers. I basically want Robot to input text into a field next to my indicated identifier.
https://vgy.me/3YBIgl.png
I want text to be inputted to the user: field and using the "user:" as the locator to work with.
It works with xpaths, but I would rather use another method which wont be so brittle.
Here is the HTML used to generate the form
<tbody id="m8DPe" class="z-rows">
<tr id="m8DPf" style="background:#FFFFFF;" class="z-row">
<td id="m8DPg-chdextr" style="background:#FFFFFF;text-align:left;" class="z-row-inner">
<div id="m8DPg-cell" class="z-row-content">
<span id="m8DPg" class="z-label">User:</span>
</div>
</td>
<td id="m8DPh-chdextr" style="background:#FFFFFF;text-align:left;" class="z-row-inner">
<div id="m8DPh-cell" class="z-row-content">
<input id="m8DPh" class="z-textbox" value="" type="text" name="j_username">
</div>
</td>
</tr>
<tr id="m8DPi" style="background:#FFFFFF;" class="z-row z-grid-odd">
<td id="m8DPj-chdextr" style="background:#FFFFFF;text-align:left;" class="z-row-inner">
<div id="m8DPj-cell" class="z-row-content">
<span id="m8DPj" class="z-label">Password:</span>
</div>
</td>
<td id="m8DPk-chdextr" style="background:#FFFFFF;text-align:left;" class="z-row-inner">
<div id="m8DPk-cell" class="z-row-content">
<input id="m8DPk" class="z-textbox" value="" type="password" name="j_password">
</div>
</td>
</tr>
I know it would be something similar to this:
Input Text //tr[contains(text(), 'Example') and ...] ${USERNAME}
But I honestly do not know. Is there somewhere I could read up on this?
I am extremely new to Robot Framework. Sorry for the noobness.

Looking at the sample, the name attribute for the input doesn'the look randomly generated. If that's really so, you could use the most trivial Selenium locator strategy by name, e.g.:
Input Text name=j_username ${USERNAME}
If that's not the case, this can be accomplished through an xpath:
//tr[//span[text()="User:"]/td//input
That reads (right-to-left for clearness, though it's evaluated LTR): return the input, which is inside a td (a cell), which itself is a direct child of a tr (a table row), having a span with that text (exact match here).
Thus, the locator will find the row having cell with "User:" in it, and will return the input in it.

*** settings ***
Library Selenium Library
*** Test Cases ***
Input text [Xpath- where you have to enter the text] python interpreter

How to create zebra-stripe CSS with TAL?

How can I use Chameleon or Zope Page Templates to easily create CSS zebra striping? I want to add odd and even classes to each row in a table, but using a condition with repeat/name/odd or repeat/name/even looks rather verbose even with a conditional expression:
<table>
<tr tal:repeat="row rows"
tal:attributes="class python:repeat['row'].odd and 'odd' or 'even'">
<td tal:repeat="col row" tal:content="col">column text text</td>
</tr>
</table>
This gets especially tedious if you have multiple classes to calculate.

The Zope Page Templates implementation for the repeat variable has an under-documented extra parameter, parity, than gives you the string 'odd' or 'even', alternating between iterations:
<table>
<tr tal:repeat="row rows"
tal:attributes="class repeat/row/parity">
<td tal:repeat="col row" tal:content="col">column text text</td>
</tr>
</table>
This is also much easier to interpolate into a string expression:
tal:attributes="class string:striped ${row/class} ${repeat/row/parity}"
This works in Chameleon as well.

Remove HTML with Regex

Is it possible to use regex to remove HTML tags inside a particular block of HTML?
E.g.
<body>
<p>Hello World!</p>
<table>
<tr>
<td>
<p>My First HTML Table</p>
</td>
</tr>
</table>
I don't want to remove all P tags, only those within the table element.
The ability to both remove or retain the text inside the nested p tag would be ideal.
Thanks.

There are a lot of mentions regarding not to use regex when parsing HTML, so you could use Html Agility Pack for this:
var html = #"
<body>
<p>Hello World!</p>
<table>
<tr>
<td>
<p>My First HTML Table</p>
</td>
</tr>
</table>";
HtmlDocument document = new HtmlDocument();
document.LoadHtml(html);
var nodes = document.DocumentNode.SelectNodes("//table//p");
foreach (HtmlNode node in nodes)
{
node.ParentNode.ReplaceChild(
HtmlNode.CreateNode(node.InnerHtml),
node
);
}
string result = null;
using (StringWriter writer = new StringWriter())
{
document.Save(writer);
result = writer.ToString();
}
So after all these manupulations, you'll get the next result:
<body>
<p>Hello World!</p>
<table>
<tr>
<td>
My First HTML Table
</td>
</tr>
</table></body>

I have found this link in which it seems the exact question was asked
"I have an HTML document in .txt format containing multiple tables and other texts and I am trying to delete any HTML (anything within "<>") if it's inside a table (between and ). For example:"
Regex to delete HTML within <table> tags

<td>[\r\n\s]*<p>([^<]*)</p>[\r\n\s]*</td>
The round brackets denote a numbered capture group which will contain your text.
However, using regular expressions in this way relies on a lot of assumptions regarding the content of the <p> tag and the construction of the HTML.
Have a read of the ubiquitous SO question regarding using regular expressions to parse (X)HTML and see #Bruno's answer for a more robust solution.

Possible to some extent but not reliable!
I will rather suggest you to look at HTML parsers such as HTML Agility Pack.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Excluding quotes while scraping text on a forum site - web-scraping

Related

How to embed file content into body of the email using mail command?

Displaying price including add-ons in "Add-to-cart"-button in single product view

Robot Framework - Selenium2Library - Input text into field according to the text before it

How to create zebra-stripe CSS with TAL?

Remove HTML with Regex

Categories

Resources