get text inside elements from class name using scrapy

get text inside elements from class name using scrapy - web-scraping

How can I get the first text, I mean "Quotes to Scrape", from the following element using class name by scrapy python?
<div class="col-md-8">
<h1>
Quotes to Scrape
</h1>
</div>
Thanks for your time.

Here is a reasonable list of selectors both for css and xpath.

The element has no class, but you can get the text like this:
response.css('h1 a::text').get()

Related

BeautifulSoup find class contains some specific words

I have searched around to find about how to find a class with name contains some word but I don't find it. I want to take the information from class named with word footer on it.
<div class="footerinfo">
<span class="footerinfo__header">
</span>
</div>
<div class="footer">
<div class="w-container container-footer">
</div>
</div>
I have tried this but it still don't work
soup.find_all('div',class_='^footer^'):
and
soup.find_all('div',class_='footer*'):
Does anyone have any idea on doing this?

You can use CSS selectors which allow you to select elements based on the content of particular attributes. This includes the selector *= for contains.
for ele in soup.select('div[class*="footer"]'):
print (ele)
or regex
import re
regex = re.compile('.*footer.*')
soup.find_all("div", {"class" : regex})

Unable to locate element by div class

Trying to check if the element set focus to using class header matching by text and getting error unable to locate the element. I know the header title which is 'My Details' in this example, and using this title, how to locate the element?
<div class="attribute-group-header card__header">
<h3 class="attribute-group-title card__header-title">My Details</h3>
</div>
Element should be focused //div[contains(.,'My Details')

To locate the h3 in your example code, use this xpath //h3[contains(text(),'My Details')]
To locate the div which has card__header in class, use this xpath //div[contains(#class,'card__header')]

It worked fine with this keyword and the X-path reference. Thank you all for guiding me to find the solution
Element should be enabled //h3[contains(text(),'${MyLinkText}')]

Unable to get value from xpath

I have below html code from which I want to extract the text "Extracted Text" inside last tag by using xpath of css selector. the text "value" inside 2nd tag will always be changing and we have stored that value in some variable. So I want to write a code which will parse below html and extract the text.
<div>
<div>value</div>
<div class="a">
<div>
<div>Extracted Text</div>
</div>
</div>
</div>
I have tried with below code:
response.xpath('//div[div="variable"]//div/div/text()')
but it didn't work. Please help.

This xpath does what you want
'//div[text()="value"]/following-sibling::div/div/div/text()'
Tested on command line
xmllint --html --xpath '//div[text()="value"]/following-sibling::div/div/div/text()' test.html
Extracted Text

how to use contains and not contains on different classes in xpath

I'm strugling with this simple code.
<div id="post_message_975824" class="alt3">
<div class="quote">
some unwanted text
</div>
the text to get <abr>ABR</abr> text to get
</div>
and I want to get this worked:
xpath = "//*[contains(#id, 'post_message_') and not(contains(#class,'quote'))]"
but this fails. I was trying to use some another query but not sure what I'm doing wrong?
EDIT
I found his code works:
xpath = "//*[contains(#id,'post_message_')//div[not(contains(#class,'quote'))]"
but it doesn't select the desired text when there's no quote subclass in the html.
The idea is to get all text from all subnodes also but not from those restricted.

Try this xpath :
//div[contains(#id,'post_message_')]/text() | //div[contains(#id,'post_message_')]/*[not(contains(#class,'quote'))]/text()
The first part of xpath //div[contains(#id,'post_message_')]/text() gives the text under the parent div i.e. <div id="post_message_975824" class="alt3">
The second part of xpath //div[contains(#id,'post_message_')]/*[not(contains(#class,'quote'))]/text() gives the text under all its child nodes only if the child doesn't contain an attribute called class with value quote
The result on your example is :
the text to get
ABR
text to get

Why not just remove all the nodes you don't want?
library(xml2)
doc <- read_xml('<div id="post_message_975824" class="alt3">
<div class="quote">
some unwanted text
</div>
the text to get <abr>ABR</abr> text to get
</div>')
xml_find_all(doc, ".//div[#class='quote']") %>% xml_remove()

how to locate the below elements for css or xpath

I have below code and want to locate the element for css or xpath.
<div class="arrowIconDiv" data-dojo-attach-point="arrowIconDiv">
<div class="i-arrow-double" data-dojo-attach-point="NavAreaExpander"> /div>
<div class="i-arrow-double-hover" data-dojo-attach-point="NavAreaExpanderHover"></div>
Tried multiple things but was not able to locate and click on the element.
(by.css(".arrowIconDiv"))
(by.css('[data-dojo-attach-point="arrowIconDiv"]'))
(by.xpath('.//div[#class="i-arrow-double-hover"][.="NavAreaExpanderHover"]'))
Thanks in advance

NavAreaExpanderHover is value of data-dojo-attach-point attribute, not the content of the div element. So instead of using . in the XPath, use the attribute name :
.//div[#class="i-arrow-double-hover" and #data-dojo-attach-point="NavAreaExpanderHover"]

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

get text inside elements from class name using scrapy - web-scraping

How can I get the first text, I mean "Quotes to Scrape", from the following element using class name by scrapy python? <div class="col-md-8"> <h1> Quotes to Scrape </h1> </div> Thanks for your time.

Here is a reasonable list of selectors both for css and xpath.

The element has no class, but you can get the text like this: response.css('h1 a::text').get()

Related

BeautifulSoup find class contains some specific words

Unable to locate element by div class

Unable to get value from xpath

how to use contains and not contains on different classes in xpath

how to locate the below elements for css or xpath

Categories

Resources