I am new in Rselenium, I have been trying to scrape an X page, this page has a lot of pager page (1220), R clicks on each pager page and take information of each one with the following code:
links <- matrix(, 1220 = rpag, ncol = 1)
xx<-1
for (y in 1:(1220)){
remDr$findElement(using="xpath", paste0("//*[#id='paginador_pagina_",y,"']/span"))$clickElement()
page_source<-remDr$getPageSource()
doc <- htmlParse(remDr$getPageSource()[[1]])
for(k in 1:32){
v <- paste0("//*[#id='rb_contResultados']/dl[",k,"]/div/a[2]")
link <- try(doc[v])
link<- try(toString(xmlAttrs(link[[1]])))
link<- sub(".*_blank, ", "", link)
links[xx,1] <- link
xx<- xx+1
}
Sys.sleep(runif(1, 3, 10))
}
remDr$closeWindow()
}
}
But when R is in the page 166 (for example), I having the following error:
Error: Summary: UnexpectedAlertOpen
Detail: A modal dialog was open, blocking this operation
class: org.openqa.selenium.UnhandledAlertException
Why I'm having this error? How can I solve it?
Related
I am new to web scraping and want to scrape data from https://www.forwardpathway.com/us-college-database. I used the following code to extract the data from the table but the page just kept on loading after I clicked the next button. Can anybody point out what is wrong?
library(RSelenium)
library(tidyverse)
library(netstat)
library(xml2)
library(data.table)
library(rvest)
binman::list_versions("chromedriver")
rs_driver_object<-rsDriver(browser="chrome",
chromever="107.0.5304.62",
verbose=F,
port=free_port())
## create the client
remDr<-rs_driver_object$client
## open the brower
remDr$open()
remDr$navigate("https://www.forwardpathway.com/us-college-database")
## locate the table that stores the data
data_table<-remDr$findElement(using = "id","table_1")
#And I tried three different methods to click the next button, but the problem persisted.
## next button method 1
next_button<-remDr$findElement(using = "id",'table_1_next')
next_button$clickElement()
## next button method 2
remDr$executeScript("document.getElementById('table_1_next').click()")
## next button method 3
next_button <- remDr$findElement("id", "table_1_next")
next_button$sendKeysToElement(list(key="enter"))
all_data<-list()
cond<-TRUE
while(cond == TRUE){
data_table_html<-data_table$getPageSource()
page<-read_html(data_table_html %>% unlist())
df<-html_table(page) %>% .[[1]]
all_data<-rbindlist((list(all_data,df)))
Sys.sleep(5)
tryCatch(
{next_button <- remDr$findElement("id", "table_1_next")
next_button$sendKeysToElement(list(key="enter"))
},
error=function(e){
print("script complete")
cond<<-FALSE
}
)
if (cond ==FALSE){
break
}
}
I'm using chromote R package and I'm testing it with shiny application. I'm trying to click on the icon that should duplicate few select elements. But all I have is tooltip when I take a screenshot and if I open the browser it freezes the R process.
Here is my code:
#' Run shiny in background - based on shinytest source code
#' #export
shiny.bg <- function(path, loadTimeout = 10000, shinyOptions = list()) {
tempfile_format <- tempfile("%s-", fileext = ".log")
p <- callr::r_bg(function(path, shinyOptions) {
do.call(shiny::runApp, c(path, shinyOptions))
},
args = list(
path = normalizePath(path),
shinyOptions = shinyOptions
),
stdout = sprintf(tempfile_format, "shiny-stdout"),
stderr = sprintf(tempfile_format, "shiny-stderr"),
supervise = TRUE
)
if (! p$is_alive()) {
abort(paste0(
"Failed to start shiny. Error: ",
strwrap(readLines(p$get_error_file()))
))
}
## Try to read out the port. Try 5 times/sec, until timeout.
max_i <- loadTimeout / 1000 * 5
for (i in seq_len(max_i)) {
err_lines <- readLines(p$get_error_file())
if (!p$is_alive()) {
abort(paste0(
"Error starting application:\n", paste(err_lines, collapse = "\n")
))
}
if (any(grepl("Listening on http", err_lines))) break
Sys.sleep(0.2)
}
if (i == max_i) {
abort(paste0(
"Cannot find shiny port number. Error:\n", paste(err_lines, collapse = "\n")
))
}
line <- err_lines[grepl("Listening on http", err_lines)]
m <- rematch::re_match(text = line, "https?://(?<host>[^:]+):(?<port>[0-9]+)")
url <- sub(".*(https?://.*)", "\\1", line)
list(
process = p,
url = url
)
}
#' Run shiny application and Chromeote instance
chromote.shiny <- function() {
chr <- chromote::ChromoteSession$new()
app <- shiny.bg('.')
chr$Page$navigate(app$url)
chr$Page$loadEventFired()
chr$screenshot()
list(
chr = chr,
app = app
)
}
#' kill browser and R shiny process
cleanUp <- function(obj) {
obj$chr$Browser$close()
obj$app$process$kill()
}
#' click on the element
chromote.click <- function(chromote, selector) {
doc = chromote$DOM$getDocument()
node = chromote$DOM$querySelector(doc$root$nodeId, selector)
box <- chromote$DOM$getBoxModel(node$nodeId)
left <- box$model$content[[1]]
top <- box$model$content[[2]]
x <- left + (box$model$width / 2)
y <- top + (box$model$height / 2)
chromote$Input$dispatchMouseEvent(type = "mousePressed", x = x, y = y, button="left")
chromote$Input$dispatchMouseEvent(type = "mouseReleased", x = x, y = y, button="left")
}
tmp <- chromote.shiny()
chromote.click(tmp$chr, ".clone-pair")
tmp$chr$screenshot()
I have no idea how I can debug this and there are not much information how to make a click, I've found dispatchMouseEvent in issue in GitHub repo for chromote.
Links to repo https://github.com/rstudio/chromote
The reason why I want to use chromote is I want to create unit/integration test for my application and shinytest is way outdated it use phantomJS that was abandoned years ago (so you need to use very old JavaScript because otherwise pantomJS will throw error and test will fail) and RSelenium is also not maintained anymore.
Had the same issue..
I found this library that uses chromote but has a number of functions (GetElement, Click) from RSelenium.
install.packages("remotes")
remotes::install_github("rundel/hayalbaz")
I recently noticed that the install.pandoc function in the installr package appears to be broken.
I get the following error message:
trying URL 'https://github.com/'
Content type 'text/html; charset=utf-8' length unknown
downloaded 78 KB
github.com is not compatible with the version of Windows you're running. Check your computer's system information and then contact the software publisher.
It looks like the function is not finding the appropriate file from GitHub. I have submitted a pull request to the installr package on GitHub which corrects this error.
Here is the function that should install Pandoc correctly and that was submitted as a pull request. In case you run into this error before it is fixed.
library(installr)
FixedInstall.Pandoc <- function (URL = "https://github.com/jgm/pandoc/releases", use_regex = TRUE,
to_restart, ...)
{
URL <- "https://github.com/jgm/pandoc/releases"
page_with_download_url <- URL
if (!use_regex)
warning("use_regex is no longer supported, you can stop using it from now on...")
page <- readLines(page_with_download_url, warn = FALSE)
sysArch <- Sys.getenv("R_ARCH")
sysArch <- gsub("/ |/x", "", sysArch)
pat <- paste0("jgm/pandoc/releases/download/[0-9.]+/pandoc-[0-9.-]+-windows",".*", sysArch, ".*", ".msi")
target_line <- grep("windows", page, value = TRUE)
m <- regexpr(pat, target_line)
URL <- regmatches(target_line, m)
URL <- head(URL, 1)
URL <- paste("https://github.com/", URL, sep = "")
installed <- install.URL(URL, ...)
if (!installed)
return(invisible(FALSE))
if (missing(to_restart)) {
if (is.windows()) {
you_should_restart <- "You should restart your computer\n in order for pandoc to work properly"
winDialog(type = "ok", message = you_should_restart)
choices <- c("Yes", "No")
question <- "Do you want to restart your computer now?"
the_answer <- menu(choices, graphics = "TRUE", title = question)
to_restart <- the_answer == 1L
}
else {
to_restart <- FALSE
}
}
if (to_restart)
os.restart()
}
I'm having an issue when using rvest to scrape 466 pages from a wiki. Each page represents a metric that I need further information about. I have the following code which loops through each link (loaded from a csv file) and extracts the information I need from a html table on each page.
Metrics <- read.csv("C:\\Users\\me\\Documents\\WebScraping\\LONMetrics.csv")
Metrics$Theme <- as.character(paste0(Metrics$Theme))
Metrics$Metric <- as.character(paste0(Metrics$Metric))
Metrics$URL <- as.character(paste0(Metrics$URL))
n = nrow(Metrics)
i = 1
while (i <= n) {
webPage <- read_html(Metrics$URL[i])
pageTable <- html_table(webpage)
Metrics$Definition[i] <- pageTable[[1]]$X2[1]
Metrics$Category[i] <- pageTable[[1]]$X2[2]
Metrics$Calculation[i] <- pageTable[[1]]$X2[3]
Metrics$UOM[i] <- pageTable[[1]]$X2[4]
Metrics$ExpectedTrend[i] <- pageTable[[1]]$X2[6]
Metrics$MinTech[i] <- pageTable[[1]]$X2[7]
i = i+1
}
The problem I'm having is that it stops returning data after 32 pages giving an error as:
Error in read_connection_(x, n) :
Evaluation error: Failure when receiving data from the peer
I'm wondering what the cause may be and how to get around this seeming limitation?
Thanks.
Rob
I am aiming to build a twitter network by constructing an igraph from twitter relationships.
After 10 iterations R gives me this error
Error in twInterfaceObj$doAPICall(paste("users", "show", sep = "/"), params = params, :
Not Found (HTTP 404).
# Grab latest tweets
tweets_galway <- searchTwitter('#galway', n=100)
# make into df
df <- do.call("rbind", lapply(tweets_galway, as.data.frame))
# extract users from galway hashtag tweets
galway_users = df$screenName
connectiondf=data.frame()
for( i in 1:100)
{
name<-getUser(galway_users[i])
following<-name$getFriends()
following <- twListToDF(following)
connect1=cbind(follower=galway_users[i],following=following$screenName)
follower<-name$getFollowers()
follower <- twListToDF(follower)
connect2=cbind(follower=follower$screenName,following=galway_users[i])
connection<-rbind(connect1,connect2)
connectiondf=rbind(connectiondf,connection)
print(i)
}
Why am I getting this error? Apologies if this is a silly query but I am new to R