R- html_nodes doesnt find selector - css

I wanted to scrap some data with "rvest" package from url http://www.finanzen.ch/kurse/historisch/ABB/SWL/1.1.2001_27.10.2015
I wanted to get the table with the following selector (copied via inspect option from chrome):
#historic-price-list > div > div.content > table
But html_nodes doesn't work:
> url="http://www.finanzen.ch/kurse/historisch/ABB/SWL/1.1.2001_27.10.2015"
> css_selector="#historic-price-list > div > div.content > table"
> html(url) %>% html_nodes(css_selector)
{xml_nodeset (0)}
What I can find is:
> css_selector="#historic-price-list"
> html(url) %>% html_nodes(css_selector)
{xml_nodeset (1)}
[1] <div id="historic-price-list"/>
But it doesn't goes any further.
Maybe someone got an idea why?

Related

Can't Access Static Drop-down Inside an Iframe

I want to click on a dropdown inside an iframe and select some options, but i keep getting this error in cypress
cypress-iframe commands can only be applied to exactly one iframe at a time. Instead found 2
Please see my code below
`
it('Publish and Lock Results', function(){
setClasses.clickTools()
setClasses.clickConfiguration()
setClasses.publishAndLockResult()
cy.frameLoaded();
cy.wait(5000)
cy.iframe().find("body > div:nth-child(6) > div:nth-child(1) > div:nth-child(1) > form:nth-child(2) > table:nth-child(5) > tbody:nth-child(1) > tr:nth-child(1) > td:nth-child(2) > div:nth-child(5) > p:nth-child(2)").click()
})
```**Please see screenshot of the element**
[![enter image description here](https://i.stack.imgur.com/WpUfn.png)](https://i.stack.imgur.com/WpUfn.png)
There should be an iframe identifier that you can use (anything else for the iframe, a unique id, class or attribute, like data-test)
let iframe = cy.get(iframeIdentifier).its('0.contentDocument.body').should('not.be.empty').then(cy.wrap)
iframe.find("body > div:nth-child(6) > div:nth-child(1) > div:nth-child(1) > form:nth-child(2) > table:nth-child(5) > tbody:nth-child(1) > tr:nth-child(1) > td:nth-child(2) > div:nth-child(5) > p:nth-child(2)").click()
From what the output is suggesting, you've got two iframes on that page that it's trying to send commands to, and you have to pick one
Answer adapted from source: cypress.io blog post on working with iframes

R rvest keeping italics in text when scraping

I'm looking to scrape some message from an online message board.
Currently I am using:
html_nodes(conv,'.talk-post.message') %>%
html_text(trim = TRUE)
For the message:
I'm back now and slowly getting back to speed.
This gives:
"\nI'm back now and slowly getting back to speed.\n"
Which works fine, but removes all html formatting. I would like to retain an indication of where the text has italics tags (similarly for underlining and bold).
I appreciate I could use toString.XMLNode instead, but then that keeps all html tags, not just the three required.
"{xml_nodeset (1)}\n[1] <div class=\"talk-post message\">\\n<p><i>I'm back now and slowly getting back to speed.</i><br>
Are there any more elegant solutions to this?
You can use the XML library for get all the string in the div.
> library(XML)
> txtNode <- "<div><i>Hello</i></div><div><b>World</b></div><div><b><i>!</i></b></div>"
> html <- htmlParse(txtNode)
> html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<div><i>Hello</i></div>
<div><b>World</b></div>
<div><b><i>!</i></b></div>
</body></html>
>
> lNode <- getNodeSet(html, "//div")
> lNode
[[1]]
<div>
<i>Hello</i>
</div>
[[2]]
<div>
<b>World</b>
</div>
[[3]]
<div>
<b>
<i>!</i>
</b>
</div>
attr(,"class")
[1] "XMLNodeSet"
>
> lapply(lNode, function(x) toString.XMLNode(x[[1]]))
[[1]]
[1] "<i>Hello</i> "
[[2]]
[1] "<b>World</b> "
[[3]]
[1] "<b>\n <i>!</i>\n</b> "

Scraping users video in Twitter with rvest

I'm using rvest to scrape some web static elements in the web. However, I could not scrape dynamic content . For example, how to scrape audience count (44K) in the following video post?
I tried this:
library(rvest)
video_tweet = html("https://twitter.com/estrellagalicia/status/993432910584659968")
video_tweet %>%
html_nodes("#permalink-overlay #permalink-overlay-dialog div #permalink-overlay-body div div div div div div div div div div div div span div div div div span span") %>% as.character()
You need to use Rselenium and you should pick the right css for it.This should work:
library(RSelenium)
library(rvest)
rmDr <- rsDriver(browser = "chrome")
myclient <- rmDr$client
video_tweet = "https://twitter.com/estrellagalicia/status/993432910584659968"
myclient$navigate(video_tweet)
mypagesource <- myclient$getPageSource()
read_html(mypagesource[[1]]) %>%
html_nodes("#permalink-overlay-dialog > div.PermalinkOverlay-content > div > div > div.permalink.light-inline-actions.stream-uncapped.has-replies.original-permalink-page > div.permalink-inner.permalink-tweet-container > div > div.js-tweet-details-fixer.tweet-details-fixer > div.card2.js-media-container.has-autoplayable-media > div.PlayableMedia.LiveBroadcastCard-playerContainer.LiveBroadcastCard--supportsLandscapePresentation.watched.playable-media-loaded > div > div > div > div:nth-child(2) > div.rn-1oszu61.rn-1efd50x.rn-14skgim.rn-rull8r.rn-mm0ijv.rn-13yce4e.rn-fnigne.rn-ndvcnb.rn-gxnn5r.rn-1nlw0im.rn-deolkf.rn-6koalj.rn-1pxmb3b.rn-7vfszb.rn-eqz5dr.rn-1r74h94.rn-1mnahxq.rn-61z16t.rn-p1pxzi.rn-11wrixw.rn-ifefl9.rn-bcqeeo.rn-wk8lta.rn-9aemit.rn-1mdbw0j.rn-gy4na3.rn-u8s1d.rn-1lgpqti > span > div > div.rn-1oszu61.rn-1efd50x.rn-14skgim.rn-rull8r.rn-mm0ijv.rn-13yce4e.rn-fnigne.rn-ndvcnb.rn-gxnn5r.rn-deolkf.rn-6koalj.rn-1pxmb3b.rn-7vfszb.rn-eqz5dr.rn-1mnahxq.rn-61z16t.rn-p1pxzi.rn-11wrixw.rn-ifefl9.rn-bcqeeo.rn-wk8lta.rn-9aemit.rn-1mdbw0j.rn-gy4na3.rn-bnwqim.rn-1lgpqti > div > div > span > span") %>% as.character()

Convert xpath into cssSelector

I am trying to convert xpath
//*[#id='formStartQuote']//div[#class='popular-countries']//ul[1]/li[1]
into cssSelector.
I tried all possible ways say
#formStartQuote > div.popular-countries ul:nth-of-type(1) > li:nth-of-type(1)
#formStartQuote div.popular-countries ul:nth-of-type(1) > li:nth-of-type(1)
#formStartQuote > div.popular-countries > ul:nth-of-type(1) > li:nth-of-type(1)
Nothing worked for me.
Do anyone has a solution for this?

rvest - select href tag string

I am using rvest.
> pgsession %>% jump_to(urls[2]) %>% read_html() %>% html_nodes("a")
{xml_nodeset (114)}
[1] Date
[2] Kennwort ändern
[3] Benutzernamen ändern
[4] Abmelden
...
However, I would only like to get all tags that have the href tag Mitglieder/Detail in it back.
For example a result should look like that :
[1] /Mitglieder/Detail/1213412
...
I tried f.ex.: a[href~=\"Mitglieder\ as css selector, but I get nothing back as a result.
Any suggestions how to change this css selector?
I appreciate your replies!

Resources