Extract number data from HTML with RobotFramework

Extract number data from HTML with RobotFramework - robotframework

I need to extract a number from an HTML page and convert it into a variable in my test case.
The problem is that there is no ID directly to this element, here is the HTML code, I want to get the 54 (that number can change that's why I need to identificate him with another way), I tried Get Text by using "resultat" but I get "54 ligne(s) trouvée(s)" but I only want "54":
<div class="tab-interpage> == $0
<div class="resultat">
<b>54</b>
ligne(s) trouvée(s)
</div>
...

You have other options how to locate an element, see Locating elements section in Selenium Library.
This might be a situation that requires xPath, I can imagine this one works (but I don't see the whole DOM, so I can't be 100 % sure):
//div[#class="resultat"]/b
combined with the keyword:
${var}= Get Text //div[#class="resultat"]/b
Obviously if there're more div elements with class "resultat", you might run into problems here. In this case, explore the DOM a bit more and see what are some other ways you can get to the element you need.
I think it'd be much more readable if the HTML elements had proper attributes like:
form with class attribute
unique ids usually work best

Related

R Selenium - Difficulty Extracting Data from Complex Table

I'm trying to webscrape some soccer data. I'm able to loop through all of the necessary web pages, but I'm having trouble getting the data that I need from each page. I think the tables that hold the table are some form of Java, which makes it difficult.
I'm trying to get the goal times for each team from the following website:
http://www.scoreboard.com/uk/match/arsenal-west-brom-2014-2015/AyTNt38e/#match-summary|match-statistics;0|lineups;1
but I can't seem to distinguish between goals/cards/other events that are present. Can anyone help me, or is this simply a lost cause on this website?
My code to get the time of the first event (goal/cards/other) is :
library("RSelenium")
startServer()
mybrowser <- remoteDriver()
mybrowser$open()
mybrowser$navigate("http://www.scoreboard.com/uk/match/arsenal-west-brom-2014-2015/AyTNt38e/#match-summary|match-statistics;0|lineups;1")
x<-mybrowser$findElements(using = 'css selector', ".time-box")
x[[1]]$getElementText()

You need to pick a specific parent element that holds only and all the elements that you want. In this case, "#summary-content div.time-box" works as the CSS selector.
If you want the event type, e.g. goal vs card vs ..., then you want to use the CSS selector "#summary-content div.icon-box" and then look at the other class on the DIV element. soccer-ball for a goal, y-card for a yellow card, and so on. For example,
<div class="icon-box soccer-ball">
That should be enough to get you started. You should be able to get the rest of them yourself.

Ambiguous match, found 2 elements matching css, how to get to the second one?

I am having problems in Capybara with the Ambiguous match problem. And the page provides no 'ids" to identify which one is which.
I am using within function.
within('.tile.tile-animation.animation-left.animation-visible.animated') do
#some code in here
end
I've used the :match option which solved my first problem.
within('.tile.tile-animation.animation-left.animation-visible.animated', :match => :first) do
#some code in here
end
The question is how to get to the SECOND css '.tile.tile-animation.animation-left.animation-visible.animated' ?

It depends on the html -- a simple solutions is
within(all('.tile.tile-animated.animation-left.animation-visible.animated')[1]) do
# some code in here
end
which will scope to the second matching element on the page, but won't be able to auto-reload if the page changes, and won't wait for the elements to appear. If you need it to wait for at least two elements to appear you can do
within(all('.tile.tile-animated.animation-left.animation-visible.animated', minimum: 2)[1]) do
....
which will wait some time for at least the 2 elements to appear on the page, but still won't be able to auto-reload if the page changes. If you need the ability to auto-reload on a dynamically changing page it will need to be possible to write a unique selector for the element (rather than indexing into the results of #all.

using the chrome console to select out data

I'm looking to pull out all of the companies from this page (https://angel.co/finder#AL_claimed=true&AL_LocationTag=1849&render_tags=1) in plain text. I saw someone use the Chrome Developer Tools console to do this and was wondering if anyone could point me in the right direction?
TLDR; How do I use Chrome console to select and pull out some data from a URL?

Note: since jQuery is available on this page, I'll just go ahead and use it.
First of all, we need to select elements that we want, e.g. names of the companies. These are being kept on the list with ID startups_content, inside elements with class items in a field with class name. Therefore, selector for these can look like this:
$('#startups_content .items .name a')
As a result, we will get bunch of HTMLElements. Since we want a plain text we need to extract it from these HTMLElements by doing:
.map(function(idx, item){ return $(item).text(); }).toArray()
Which gives us an array of company names. However, lets make a single plain text list out of it:
.join('\n')
Connecting all the steps above we get:
$('#startups_content .items .name a').map(function(idx, item){ return $(item).text(); }).toArray().join('\n');
which should be executed in the DevTools console.
If you need some other data, e.g. company URLs, just follow the same steps as described above doing appropriate changes.

CSS Attribute Selcetor - Which one is faster?

Which one is preferred in terms of performance?
a[href*="op.ExtSite.com/p"]
a[href*="shop.ExtSite.com/page"]
a[href^="http://shop.ExtSite.com/page"]
a[href^="http://shop.ExtSite.com/page"][href$=".html"]
Update
The last selector should have been written as follow:
a[href^="http://shop.E"][href$=".html"]
Also, regarding this multiple selector, I would like to know which condition is checked first, the left one or the right one?

My guess is either this one
a[href^="http://shop.ExtSite.com/page"]
or
a[href^="http://shop.ExtSite.com/page"][href$=".html"]
as it starts looking from the first of the string so all links that does not have h in the beginning will be avoided.
UPDATE
if you need to check on the full pattern then go with the one I mentioned below :
a[href^="http://shop.ExtSite.com/page.html"]

Alternative of contains in cssSelector ? Selenium WebDriver

I am using selenium 2 (WebDriver).
I am locating a button and clicking by the script:
driver.findElement(By.cssSelector("button:contains('Run Query')"));
or
driver.findElement(By.cssSelector("css=.gwt-Button:contains('Run Query')"))
whose html is like :
<button type="button" class="gwt-Button" id="ext-gen362">Run Query</
button>
As the id is dynamically generated, I can't make use of the ID.
Is there any way to use cssSelector with something like contains ? Is this possible?

You can't do this with CSS selectors, because there is no such thing as :contains() in CSS. It was a proposal that was abandoned years ago.
If you want to select by the element text, you'll have use an XPath selector. Something like
driver.findelement(By.xpath("//button[contains(., 'Run Query']"))
or
driver.findelement(By.xpath("//[contains(concat(' ', #class, ' '), ' .gwt-Button ') and contains(., 'Run Query']"))

Another option is using jQuery, if it's present on the page, something like:
var webElement = ((JavascriptExecutor)driver).executeScript("return jQuery('button:contains(Run Query)')");

CSS alone will not get you what you need; you cannot filter by the text. You could either use js to get the id of the element, or loop through all the buttons in your code until you find the one with the right text. If this were in python:
[btn for btn in browser.find_elements_by_css_selector('button')
if 'Run Query' in btn.text]
You could easily generalize this and make a helper function, too.

I'm in the same boat, currently using XPath selectors with "contains" to find elements with specific text content. Some are <td> and some are <td><a> deep within large tables (specific columns, but row unknown in advance). It's very slow (4 to 5 seconds just to find such a table entry with Firefox 20), so I was hoping to use CSS to be faster. Often the text will be by itself (complete) and other times it will be a filename at the end of a path I'd like to ignore. Does anyone have suggestions for the fastest XPath search pattern, given that it's a known column but unknown row, and may be a <td> or <td><a> (sometimes in the same table). Would an equality comparison be much faster than contains(), for the majority of cases where the text I'm looking for is complete (not at the end of other text)? I think there's a "starts with" lookup, but is there an "ends with" lookup? I know that using an "id" would be even faster, but unfortunately this HTML doesn't have any IDs here, and they can't be added. I'm looking to find the <tr> containing this text so I can locate another element in the same row and get its text or click on a link. It doesn't hurt to locate a small subset of the rows and check their text, but I'd like to avoid doing separate searches for <td> and <td><a> if that's possible.

You cannot use contains but use a wild card instead.
driver.findElement(By.cssSelector("button:(*'Run Query'*)"));

driver.findElement("#ext-gen362").Where(webElement => webElement.Text.Contains("Run Query"))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Extract number data from HTML with RobotFramework - robotframework

Related

R Selenium - Difficulty Extracting Data from Complex Table

Ambiguous match, found 2 elements matching css, how to get to the second one?

using the chrome console to select out data

CSS Attribute Selcetor - Which one is faster?

Alternative of contains in cssSelector ? Selenium WebDriver

Categories

Resources