Scrapy Only Returns Empty Arrays

Scrapy Only Returns Empty Arrays - web-scraping

So I start my scrapy shell with
scrapy shell 'https://www.amazon.com/s?k=tomatoes&ref=nb_sb_noss_1'
And I am trying to scrape the title of the products so I enter
response.xpath('//span[#class="a-size-base-plus a-color-base a-text-normal"]').getall()
and get: []
And when I tried it in CSS with
response.css("span.a-size-base-plus a-color-base a-text-normal").getall()
I still get: []
I don't understand why it isn't finding the element even though I am copying and pasting the tags and classes from the site.
I have also tried just writing a-size-base-plus for the classes in XPath and CSS but I still get nothing

You need to replace the spaces between the classes with dots if you want to match multiple classes on elements:
response.css("span.a-size-base-plus.a-color-base.a-text-normal").getall()
instead of:
response.css("span.a-size-base-plus a-color-base a-text-normal").getall()
Your selector with spaces says something like: Give me all a-text-normal elements inside a-color-base elements inside span elements with a class of a-size-base-plus. This is of course not what you want.
I've also had to set a user agent to get the correct results. See this answer on how to set the user agent using scrapy shell.

Related

Extract number data from HTML with RobotFramework

I need to extract a number from an HTML page and convert it into a variable in my test case.
The problem is that there is no ID directly to this element, here is the HTML code, I want to get the 54 (that number can change that's why I need to identificate him with another way), I tried Get Text by using "resultat" but I get "54 ligne(s) trouvée(s)" but I only want "54":
<div class="tab-interpage> == $0
<div class="resultat">
<b>54</b>
ligne(s) trouvée(s)
</div>
...

You have other options how to locate an element, see Locating elements section in Selenium Library.
This might be a situation that requires xPath, I can imagine this one works (but I don't see the whole DOM, so I can't be 100 % sure):
//div[#class="resultat"]/b
combined with the keyword:
${var}= Get Text //div[#class="resultat"]/b
Obviously if there're more div elements with class "resultat", you might run into problems here. In this case, explore the DOM a bit more and see what are some other ways you can get to the element you need.
I think it'd be much more readable if the HTML elements had proper attributes like:
form with class attribute
unique ids usually work best

Get Element Attribute in robot framework

hi how to use Get Element Attribute in Robot framework? in instruction I have
Return value of element attribute.
attribute_locator consists of element locator followed by an # sign and attribute name, for example element_id#class.
I have this xpath=${check_radio_xpath}#class is this right way?
where ${check_radio_xpath} = md-radio-11
I get this error:
${ischecked} = Selenium2Library . Get Element Attribute xpath=${check_radio_xpath}#class
Documentation:
Return value of element attribute.
TRACE Arguments: [ 'xpath=md-radio-11#class' ]
DEBUG Finished Request
FAIL ValueError: Element 'xpath=md-radio-11' not found.

I think you're pretty close. Please try to format your question better, I took a quick shot because your question is difficult to read. The result will be more and better help from the community
${RADIO_XPATH} //*[#id="${check_radio_xpath}"]
${CLASS}= Selenium2Library.Get Element Attribute ${check_radio_xpath}#class

sample for this <div><label for="foo"></label></div>
${for_value}= Get Element Attribute xpath=//div/label for
Log To Console ${for_value}
console result is:
foo

This snippet works for me :
Get Line Numbers And Verify
${line_number1}= Get Element Attribute //*[#id="file-keywords-txt-L1"] data-line-number
Log To Console ${line_number1}
${line_number2}= Get Element Attribute //*[#id="file-keywords-txt-L2"] data-line-number
Log To Console ${line_number2}
Verify in order of ${line_number1} and ${line_number2} is true
What was important is that the spaces/tabs between the keywords are correct, otherwise it does not get recognised as a so called keyword.

Thanks a lot, i wanted to check meta noindex content in page source.
i used this.
${content} Get Element Attribute xpath=//meta[#name="robots"]#content
should be equal as strings ${content} noindex,follow

You can use both XPath and CSS selector if you have selenium library
${title}= Get Element Attribute ${xpath} attribute=title

CSS selector to choose second item after a hyphen

The CSS selector [attr|=value] is designed to select items which are exactly "value" or which begin with "value-". This was originally intended to allow selection of all languages regardless of dialect, such as "en-au", "en-ca", "en-gb", "en-us".
What I'm looking for is a selector for an item which is exactly "value", which includes "-value-" or which ends with "-value". In my case, I am not concerned with language codes at all.
This page claims that there is a =\operator:
[data-value=|"foo"] {
/* Attribute value has this in a dash-separated list somewhere */
}
However I have been unable to get this to work. If I'm just interested in a controlled list of 2-item terms, then this will work for me:
[attr*=-value][attr$=value]
However, this would also return items like "xx-valuevalue", so the result is not perfect.
My question is: is there another way to write a CSS selector that will select all items that have a given string as one item in a hyphen-delimited list?

Here you go:
[attr*=-value-], [attr$=-value], [attr=value]
In pseudo code:
Get those containing -value-, those ending with -value, and those exactly equal to value.

using the chrome console to select out data

I'm looking to pull out all of the companies from this page (https://angel.co/finder#AL_claimed=true&AL_LocationTag=1849&render_tags=1) in plain text. I saw someone use the Chrome Developer Tools console to do this and was wondering if anyone could point me in the right direction?
TLDR; How do I use Chrome console to select and pull out some data from a URL?

Note: since jQuery is available on this page, I'll just go ahead and use it.
First of all, we need to select elements that we want, e.g. names of the companies. These are being kept on the list with ID startups_content, inside elements with class items in a field with class name. Therefore, selector for these can look like this:
$('#startups_content .items .name a')
As a result, we will get bunch of HTMLElements. Since we want a plain text we need to extract it from these HTMLElements by doing:
.map(function(idx, item){ return $(item).text(); }).toArray()
Which gives us an array of company names. However, lets make a single plain text list out of it:
.join('\n')
Connecting all the steps above we get:
$('#startups_content .items .name a').map(function(idx, item){ return $(item).text(); }).toArray().join('\n');
which should be executed in the DevTools console.
If you need some other data, e.g. company URLs, just follow the same steps as described above doing appropriate changes.

Alternative of contains in cssSelector ? Selenium WebDriver

I am using selenium 2 (WebDriver).
I am locating a button and clicking by the script:
driver.findElement(By.cssSelector("button:contains('Run Query')"));
or
driver.findElement(By.cssSelector("css=.gwt-Button:contains('Run Query')"))
whose html is like :
<button type="button" class="gwt-Button" id="ext-gen362">Run Query</
button>
As the id is dynamically generated, I can't make use of the ID.
Is there any way to use cssSelector with something like contains ? Is this possible?

You can't do this with CSS selectors, because there is no such thing as :contains() in CSS. It was a proposal that was abandoned years ago.
If you want to select by the element text, you'll have use an XPath selector. Something like
driver.findelement(By.xpath("//button[contains(., 'Run Query']"))
or
driver.findelement(By.xpath("//[contains(concat(' ', #class, ' '), ' .gwt-Button ') and contains(., 'Run Query']"))

Another option is using jQuery, if it's present on the page, something like:
var webElement = ((JavascriptExecutor)driver).executeScript("return jQuery('button:contains(Run Query)')");

CSS alone will not get you what you need; you cannot filter by the text. You could either use js to get the id of the element, or loop through all the buttons in your code until you find the one with the right text. If this were in python:
[btn for btn in browser.find_elements_by_css_selector('button')
if 'Run Query' in btn.text]
You could easily generalize this and make a helper function, too.

I'm in the same boat, currently using XPath selectors with "contains" to find elements with specific text content. Some are <td> and some are <td><a> deep within large tables (specific columns, but row unknown in advance). It's very slow (4 to 5 seconds just to find such a table entry with Firefox 20), so I was hoping to use CSS to be faster. Often the text will be by itself (complete) and other times it will be a filename at the end of a path I'd like to ignore. Does anyone have suggestions for the fastest XPath search pattern, given that it's a known column but unknown row, and may be a <td> or <td><a> (sometimes in the same table). Would an equality comparison be much faster than contains(), for the majority of cases where the text I'm looking for is complete (not at the end of other text)? I think there's a "starts with" lookup, but is there an "ends with" lookup? I know that using an "id" would be even faster, but unfortunately this HTML doesn't have any IDs here, and they can't be added. I'm looking to find the <tr> containing this text so I can locate another element in the same row and get its text or click on a link. It doesn't hurt to locate a small subset of the rows and check their text, but I'd like to avoid doing separate searches for <td> and <td><a> if that's possible.

You cannot use contains but use a wild card instead.
driver.findElement(By.cssSelector("button:(*'Run Query'*)"));

driver.findElement("#ext-gen362").Where(webElement => webElement.Text.Contains("Run Query"))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Scrapy Only Returns Empty Arrays - web-scraping

Related

Extract number data from HTML with RobotFramework

Get Element Attribute in robot framework

CSS selector to choose second item after a hyphen

using the chrome console to select out data

Alternative of contains in cssSelector ? Selenium WebDriver

Categories

Resources