It does not really align with stackoverflow policy since I am not showing what I have done but I really have no clue how to even start on this question given my lack of technical expertise. Hope someone can post a solution or at least point me to the right direction.
I want to download all the data from this website:
http://aps.dac.gov.in/APY/Public_Report1.aspx
I need to download all the data i.e. all season * all year * all states * all crops. The longer (frustrating!) way to approach is to just click all the boxes and press download.
However, I was wondering if anyone has any programming solution to download this data. I would preferably want to do this in R because that's the language I understand but feel free to tag other programming languages.
Here's a solution using RSelenium to instance a browser and direct it to do your bidding.
library(RSelenium)
driver <- rsDriver()
remDr <- driver[["client"]]
remDr$navigate("http://aps.dac.gov.in/APY/Public_Report1.aspx") #navigate to your page
You basically need to tell the browser to select each button you want to mark, using SelectorGadget to find the unique ID for each, then pass them one-by-one to webElem. Then use the webElem methods to make the page do things.
webElem <- remDr$findElement(using = 'id', value = "TreeViewSeasonn0CheckBox")
webElem$highlightElement() #quick flash as a check we're in the right box
webElem$clickElement() #performs the click
#now do the same for each other box
webElem <- remDr$findElement(using = 'id', value = "TreeView1n0CheckBox")
webElem$highlightElement()
webElem$clickElement()
webElem <- remDr$findElement(using = 'id', value = "TreeView2n0CheckBox")
webElem$highlightElement()
webElem$clickElement()
webElem <- remDr$findElement(using = 'id', value = "TreeViewYearn0CheckBox")
webElem$highlightElement()
webElem$clickElement()
Now choose the report form you want and click the download button. Assuming it's Excel format here.
webElem <- remDr$findElement(using = 'id', value = "DdlFormat")
webElem$sendKeysToElement(list("Excel", key = "enter"))
webElem <- remDr$findElement(using = 'id', value = "Button1")
webElem$clickElement() #does the click
For what it's worth, the site timed out on trying to download all the data for me. Your results may vary.
Related
Would appreciate some help as I am stuck here.
I am trying to write an automated script to download data from the Microsoft Power BI site of the WHO, which can be found here.
But when I try to retrieve the data, the right click function doesn't seem to work - or more likely: I am doing something wrong.
I created a container on Docker that I am accessing with Selenium in R. The script below generates a click on the first page (on the "Download data" button at the lower left-hand side corner of the screen. After a long load time the next screen appears. The goal is to RIGHT-CLICK on the download button of "Vaccine uptake by target group" x "Data".
Here's two screenshots of where I need to create the first left-click and the second right-click.
I have tried multiple approaches, including first selecting the iframe, switching the frame view and then selecting an xpath pointing to the clickable area. That seemed to work.
But when I give the command to right click nothing happens as verified in the VNC rendition. The contextual menu doesn't appear.
Anyone knows what went wrong?
Here's the code I entered:
library(RSelenium)
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "firefox")
remDr$open()
remDr$navigate("https://app.powerbi.com/view?r=eyJrIjoiMWNjNzZkNjctZTNiNy00YmMzLTkxZjQtNmJiZDM2MTYxNzEwIiwidCI6ImY2MTBjMGI3LWJkMjQtNGIzOS04MTBiLTNkYzI4MGFmYjU5MCIsImMiOjh9")
Sys.sleep(30) # WHO is taking its time
#This is first button to bring us to the next page
webElem <- remDr$findElement(using = "xpath", value = "/html/body/div[1]/report-embed/div/div/div[1]/div/div/div/exploration-container/div/docking-container/div/div/div/div/exploration-host/div/div/exploration/div/explore-canvas/div/div[2]/div/div[2]/div[2]/visual-container-repeat/visual-container[22]/transform/div/div[3]/div/visual-modern")
webElem$highlightElement()
webElem$clickElement()
Sys.sleep(30) # again need some time to fully load
# This selects the iframe
webElem <- remDr$findElement(using = "xpath", value = "/html/body/div[1]/report-embed/div/div/div[1]/div/div/div/exploration-container/div/docking-container/div/div/div/div/exploration-host/div/div/exploration/div/explore-canvas/div/div[2]/div/div[2]/div[2]/visual-container-repeat/visual-container[19]/transform/div/div[3]/div/visual-modern/div/iframe")
remDr$switchToFrame(webElem)
# This selects the area for the second click
webElem <- remDr$findElement(using = "xpath", value="/html/body/div/div/a[1]")
remDr$mouseMoveToLocation(webElement = webElem)
# And then the right-click but none of these seem to work:
remDr$click(buttonId = 2)
remDr$click('right')
Thanks for any advice.
I am trying to scrape a page, getting the move list of a game of chess, which is located in the menu on the right, under the "moves" tab.
library(RSelenium)
url <- "https://play.xiangqi.com/game/oX00ly"
rD <- RSelenium::rsDriver(browser = "firefox", check = F)
remDr <- rD$client
remDr$navigate(url = url)
when manually clicking the Moves tab in the browser, I can get the desired text via
webElem <- remDr$findElement("css selector", ".Wrapper__MovesTabWrapper-sc-13rqht3-2")
webElem$getElementText()[[1]]
which (correctly) returns
[1] "1\np3+1\nP3+1\n2\ne3+5\nH2+3\n3\nh8+7\nH8+7\n4\nh2+3\nR1+1\n5\nc8=9\nH3+2\n6\nc2+1\nE7+5\n7\nh3+4\nA6+5\n8\nh4+3\nR9=6\n9\nr1=3\nR6+6\n10\nc2+2\nH2+3\n11\nr9=8\nC2=3\n12\nr8+3\nR1=4\n13\nc2-1\nR6=8\n14\nr8+4\nH3+1\n15\ne7+9\nC3+5\n16\ne9-7\nR4+3\n17\nc2=1\nR8=9\n18\nh3-4\nR4=6\n19\nc1=2\nR9-1\n20\nr3=2\nC8+7\n21\ne5-3\nR9=8\n22\nh4-3\nR8+2\n23\nh3-2\nR8+2\n24\ne7+5\nH7+8\n25\nr8-5\nC3+1\n26\nr8+2\nH8+7\n27\np9+1\nH7+5\n28\na6+5\nH5+7\n29\nk5=6\nR6=4\n30\na5+6\nR4+3"
Problem
When trying to click the button through RSelenium, by using
webElem <- remDr$findElement("css selector", "#moves-tab")
webElem <-webElem$clickElement() # or webElem$click()
Nothing seems to happen, and I'm at a loss on how to proceed troubleshooting.
Question
How can I switch to the Moves tab by simulating a click (active event listener)?
Bonus pts: is this possible using the rvest package?
Sometimes being too trigger happy is a problem.
Adding
webElem <- webElem$clickElement()
Sys.sleep(2)
solved the problem.
I have had my first go at using RSelenium today to scrape data from websites. I can navigate to the data I require via the tabs and drop-down menus (the hard bit?) but am now stuck at the point of extracting the actual data I need (the easy bit!)
My code so far is:
library(RSelenium)
checkForServer()
startServer()
remDr <- remoteDriver$new()
remDr$open()
remDr$navigate("https://www.whoscored.com/Teams/31")
webElem1 <- remDr$findElement(value = '//a[#href = "#team-squad-stats-detailed"]')
webElem1$clickElement()
webElem2 <- remDr$findElement("id", "category")
webElem2$clickElement()
webElem2$sendKeysToElement(list(key="down_arrow", key="down_arrow", key="down_arrow",
key="down_arrow", key="down_arrow", key="enter"))
webElem3 <- remDr$findElement("id", "subcategory")
webElem3$clickElement()
webElem3$sendKeysToElement(list(key="down_arrow", key="enter"))
webElem4 <- remDr$findElement("id", "statsAccumulationType")
webElem4$clickElement()
webElem4$sendKeysToElement(list(key="down_arrow", key="down_arrow", key="down_arrow",
key="enter"))
webElem5 <- remDr$findElement("id", "player-table-statistics-body")
Can someone advise the simplest way to now extract the data in this player table into csv form please? I am used to using the XML package and readHTMLTable to scrape other (static) websites but I am stuck on how to combine this with my RSelenium steps above.
Thank you
EDIT - having come back to this with fresh eyes the answer I have found is below:
webElem5 <- remDr$findElement(using = "id", value = "statistics-table-detailed")
webElem5txt <- webElem5$getElementAttribute("outerHTML")[[1]]
table <- readHTMLTable(webElem5txt, header=TRUE, as.data.frame=TRUE)[[1]]
This allows me to proceed with what I need on this part of the website.
If I may, I would like to ask for help with another part of the same site. I navigate to the data I need as follows:
remDr$navigate("https://www.whoscored.com/Matches/959894")
webElem1 <- remDr$findElement(using = "link text", value = "Match Centre")
webElem1$clickElement()
webElem2 <- remDr$findElement(value = '//a[#href = "#chalkboard"]')
webElem2$clickElement()
The data I would like to extract is in these boxes, but as the HTML doesn't say they are built as tables I don't really know how to proceed.
I'm trying to download a spreadsheet from this website using RSelenium. The first code I made was:
remDr <- remoteDriver()
remDr$open()
remDr$navigate("http://observatorios.dieese.org.br/ws/tabela/porto-alegre/bairros/numero-de-estabelecimentos-formais-por-grande-setor-de-atividade-economica")
remDr$executeScript("return baixarArquivo(1)")
And it works! But I want download the entire data (ie. all years), so I need check the years checkbox (Filtros -> Anos). I can do this in 2 ways:
Selecting the checkbox 'Ano(s)', and automatically select all years
Selecting all years manually
I've tried both ways, but didn't work. The 'best' result I got was:
remDr <- remoteDriver()
remDr$open()
remDr$navigate("http://observatorios.dieese.org.br/ws/tabela/porto-alegre/bairros/numero-de-estabelecimentos-formais-por-grande-setor-de-atividade-economica")
webElem <- remDr$findElement(using = 'id', value = 'anos')
remDr$executeScript("visualizar('filtros', true)")
remDr$executeScript("visualizarAnos()")
chkbox <- remDr$findElement(using = 'xpath', "//input[#name='inputAno'][#type='checkbox']")
chkbox$clickElement()
remDr$executeScript("return submeter()")
remDr$executeScript("return baixarArquivo(1)")
But this uncheck the first year (2012) (It's my best result because it's the onlyone that do something :( )
So, the question is: how can I solve this problem?
In your best result attempt, you are trying to get all the checkboxes within anos but are calling findElement. That is why only 2012 is being clicked, because findElement is returning the first element it can find that satisfies your xpath, //input[#name='inputAno'][#type='checkbox'].
You could fix your solution by using findElements like so:
sapply
(
remDr$findElements(using = 'xpath', "//input[#name='inputAno'][#type='checkbox']"),
function(element){ element$clickElement() }
)
Alternatively, you could search for the select all checkbox with a css selector and click just that:
selectAll <- remDr$findElement(using = 'css selector', '#anos > #alternar')
selectAll$clickElement()
Like the beginning to any problem before I post it on stack overflow I think I have tried everything. This is a learning experience for me on how to work with javascript and xml so I'm guessing my problem is there.
My question is how to get the results of clicking on the parcel number links that are javascript links? I've tried getting the xpath of the link and using the $click method which following my intuition but this wasn't right or is at least not working for me.
Firefox 26.0
R 3.0.2
require(relenium)
library(XML)
library(stringr)
initializing_parcel_number <- "00000000000"
firefox <- firefoxClass$new()
firefox$get("http://www.muni.org/pw/public.html")
inputElement <- firefox$findElementByXPath("/html/body/form[2]/table/tbody/tr[2]/td/table[1]/tbody/tr[3]/td[4]/input[1]")
inputElement$sendKeys(initializing_parcel_number)
inputElement$sendKeys(key = "ENTER")
##xpath to the first link. Or is it?
first_link <- "/html/body/table/tbody/tr[2]/td/table[5]/tbody/tr[2]/td[1]/a"
##How I'm trying to click the thing.
linkElement <- firefox$findElementByXPath("/html/body/table/tbody/tr[2]/td/table[5]/tbody/tr[2]/td[1]/a")
linkElement$click()
You can do this using RSelenium. See http://johndharrison.github.io/RSelenium/ . DISCLAIMER I am the author of the RSelenium package. A basic vignette on operation can be viewed at RSelenium basics and
RSelenium: Testing Shiny apps
If you are unsure of what element is selected you can use the highlightElement utility method in the webElement class see the commented out code.
The element click event wont work in this case. You need to simulate a click using javascript:
require(RSelenium)
# RSelenium::startServer # if needed
initializing_parcel_number <- "00000000000"
remDr <- remoteDriver()
remDr$open()
remDr$navigate("http://www.muni.org/pw/public.html")
webElem <- remDr$findElement(using = "name", "PAR1")
# webElem$highlightElement() # to visually check what elemnet is selected
webElem$sendKeysToElement(list(initializing_parcel_number, key = "enter"))
# get first link containing javascript:getParcel
webElem <- remDr$findElement(using = "css selector", '[href*="javascript:getParcel"]')
# webElem$highlightElement() # to visually check what elemnet is selected
# send a webElement as an argument.
remDr$executeScript("arguments[0].click();", list(webElem))
#