I have below html code from which I want to extract the text "Extracted Text" inside last tag by using xpath of css selector. the text "value" inside 2nd tag will always be changing and we have stored that value in some variable. So I want to write a code which will parse below html and extract the text.
<div>
<div>value</div>
<div class="a">
<div>
<div>Extracted Text</div>
</div>
</div>
</div>
I have tried with below code:
response.xpath('//div[div="variable"]//div/div/text()')
but it didn't work. Please help.
This xpath does what you want
'//div[text()="value"]/following-sibling::div/div/div/text()'
Tested on command line
xmllint --html --xpath '//div[text()="value"]/following-sibling::div/div/div/text()' test.html
Extracted Text
Related
How can I get the first text, I mean "Quotes to Scrape", from the following element using class name by scrapy python?
<div class="col-md-8">
<h1>
Quotes to Scrape
</h1>
</div>
Thanks for your time.
Here is a reasonable list of selectors both for css and xpath.
The element has no class, but you can get the text like this:
response.css('h1 a::text').get()
Is it possible to add R code chunks inside an HTML tag?
For example:
<div class="cards-list">
<div class="stat-card">
<div class="stat-icon right"><span class="bg-ontrack"></span></i></div>
<h3>Impressions</h3>
<h4 class="stat-num"><span class="txt-ontrack">{r code}</span></h4>
</div>
</div>
I need to add an R Code chunk inside an HTML tag here let's say in tag. That means the class should be applied to the output of the R code chunk.
Is this possible?
I'm strugling with this simple code.
<div id="post_message_975824" class="alt3">
<div class="quote">
some unwanted text
</div>
the text to get <abr>ABR</abr> text to get
</div>
and I want to get this worked:
xpath = "//*[contains(#id, 'post_message_') and not(contains(#class,'quote'))]"
but this fails. I was trying to use some another query but not sure what I'm doing wrong?
EDIT
I found his code works:
xpath = "//*[contains(#id,'post_message_')//div[not(contains(#class,'quote'))]"
but it doesn't select the desired text when there's no quote subclass in the html.
The idea is to get all text from all subnodes also but not from those restricted.
Try this xpath :
//div[contains(#id,'post_message_')]/text() | //div[contains(#id,'post_message_')]/*[not(contains(#class,'quote'))]/text()
The first part of xpath //div[contains(#id,'post_message_')]/text() gives the text under the parent div i.e. <div id="post_message_975824" class="alt3">
The second part of xpath //div[contains(#id,'post_message_')]/*[not(contains(#class,'quote'))]/text() gives the text under all its child nodes only if the child doesn't contain an attribute called class with value quote
The result on your example is :
the text to get
ABR
text to get
Why not just remove all the nodes you don't want?
library(xml2)
doc <- read_xml('<div id="post_message_975824" class="alt3">
<div class="quote">
some unwanted text
</div>
the text to get <abr>ABR</abr> text to get
</div>')
xml_find_all(doc, ".//div[#class='quote']") %>% xml_remove()
I have the following html structure:
<div class="decorator">
<div class="EC_MyICHP_Item">
<div class="text">
<h3><a target="_blank" title="" href="#"></a></h3>
text here text here text here text here text here text here
</div>
</div>
<div class="EC_MyICHP_Item">
<div class="text">
<h3><a target="_blank" title="" href="#"></a></h3>
text here text here text here text here text here text here
</div>
</div>
<div class="EC_MyICHP_Item">
<div class="text">
<h3><a target="_blank" title="" href="#"></a></h3>
text here text here text here text here text here text here
</div>
</div>
<div class="readmore"><a></a></div>
</div>
I am trying to select the LAST EC_MyICHP_Item, by using last-child, but in vain. (both CSS and jQuery) Could you help me?
Thanks.
You need to use :last-child at the end of the selector.
div.EC_MyICHP_Item:last-child
http://reference.sitepoint.com/css/pseudoclass-lastchild
Example: http://jsfiddle.net/jasongennaro/zrufd/
Please note: this will not work in earlier versions of IE.
EDIT
As per the comment about the last div being added and it interfering. You're right. It does cause :last-child to choke... at least in Chrome where I tested it.
If your HTML structure remains the same, that is, always three div.EC_MyICHP_Item, you could do this
.EC_MyICHP_Item:nth-child(3)
Updated Example: http://jsfiddle.net/jasongennaro/zrufd/1/
EDIT #2
unfortunately the number of EC_MyICHP_Item div's varies
In that case, I would use jQuery:
$('.EC_MyICHP_Item:last')
Further updated example: http://jsfiddle.net/zrufd/2/
.EC_MyICHP_Item:last-child should work well.
It's important to realize that E:last-child means "an E element, which is a last child", not "last child of E element".
I added the ID for merely testing purposes.
But use :last.
http://jsfiddle.net/aDbcW/1/
.EC_MyICHP_Item:last-child would only work if the div is the last child of its parent div. Since you have div.readmore as the last child of the parent, your selector wouldn't work. You need to use .EC_MyICHP_Item:last-of-type instead.
Edit: in jQuery this would translate to .EC_MyICHP_Item:last.
How to remove all inline styled style=properties+val from every tag from a long source code quickly.
For example.
<p style="border:2px red solid">some text</p>
<span style="background:red">some text</span>
to
<p>some text</p>
<span>some text</span>
Assuming your editor supports regular expression find and replace (and if it doesn't, get a new editor):
Find with this regular expression: \s?style="[^"]*"
Replace with: nothing!
Note that this will not catch instances where your code is malformed, as shown in your example (missing double quote at the end of the first style).