RSelenium can't find element with given parameters - r

I'm using RSelenium to construct a data frame with information about managers. I have problem to select an element in a drop-down list.
My code is the following:
> require(RSelenium)
> remDr<-remoteDriver(browserName = "chrome")
> remDr$open()
> enlace<-'https://www.sisben.gov.co/atencion-al-ciudadano/Paginas/Directorio-administradores.aspx'
> remDr$navigate(enlace)
> remDr$findElement(using = "xpath", '//*[#id="ddlDepartamento"]/option[2]')$clickElement()
With the last line, I obtain the following error:
Selenium message:no such element: Unable to locate element: {"method":"xpath","selector":"//*[#id="ddlMunicipio"]"}
(Session info: chrome=58.0.3029.110)
(Driver info: chromedriver=2.27.440174 (e97a722caafc2d3a8b807ee115bfb307f7d2cfd9),platform=Windows NT 10.0.14393 x86_64) (WARNING: The server did not provide any stacktrace information)
Command duration or timeout: 140 milliseconds
For documentation on this error, please visit: http://seleniumhq.org/exceptions/no_such_element.html
Build info: version: '3.2.0', revision: '8c03df6', time: '2017-03-02 09:34:51 -0800'
System info: host: 'PATY-FRAN', ip: '192.168.0.20', os.name: 'Windows 10', os.arch: 'x86', os.version: '10.0', java.version: '1.8.0_121'
Driver info: org.openqa.selenium.chrome.ChromeDriver
Capabilities [{applicationCacheEnabled=false, rotatable=false, mobileEmulationEnabled=false, networkConnectionEnabled=false, chrome={chromedriverVersion=2.27.440174 (e97a722caafc2d3a8b807ee115bfb307f7d2cfd9), userDataDir=C:\Users\victor\AppData\Local\Temp\scoped_dir6076_6551}, takesHeapSnapshot=true, pageLoadStrategy=normal, databaseEnabled=false, handlesAlerts=true, hasTouchScreen=false, version=58.0.3029.110, platform=XP, browserConnectionEnabled=false, nativeEvents=true, acceptSslCerts=true, locationContextEnabled=true, webStorageEnabled=true, browserName=chrome, takesScreenshot=true, javascriptEnabled=true, cssSelectorsEnabled=true, unexpectedAlertBehaviour=}]
Session ID: 0815f4f9dcca9d364a7c15b4a50352e7
*** Element info: {Using=xpath, value=//*[#id="ddlMunicipio"]}
Error: Summary: NoSuchElement
Detail: An element could not be located on the page using the given search parameters.
class: org.openqa.selenium.NoSuchElementException
Further Details: run errorDetails method
I will be grateful with your help.

The content is in an iframe. You need to switch to the iframe first:
library(RSelenium)
rD<-rsDriver()
remDr <- rD$client
enlace<-'https://www.sisben.gov.co/atencion-al-ciudadano/Paginas/Directorio-administradores.aspx'
remDr$navigate(enlace)
# content is in iframe
frames <- remDr$findElements("css", "iframe")
# switch to first iframe
remDr$switchToFrame(frames[[1]])
selectElem <- remDr$findElement("id", "ddlDepartamento")
selectOpt <- selectElem$selectTag()
> selectOpt$text
[1] "AMAZONAS" "ANTIOQUIA" "ARAUCA" "ATLANTICO"
[5] "BOGOTÁ D.C." "BOLIVAR" "BOYACA" "CALDAS"
[9] "CAQUETA" "CASANARE" "CAUCA" "CESAR"
[13] "CHOCO" "CORDOBA" "CUNDINAMARCA" "GUAINIA"
[17] "GUAJIRA" "GUAVIARE" "HUILA" "MAGDALENA"
[21] "META" "NARIÑO" "NORTE DE SANTANDER" "PUTUMAYO"
[25] "QUINDIO" "RISARALDA" "SAN ANDRES" "SANTANDER"
[29] "SUCRE" "TOLIMA" "VALLE" "VAUPES"
[33] "VICHADA"
# click 2nd one
selectOpt$elements[[2]]$clickElement()
....
....
rm(rD)
gc()

Related

Webscraping with Selenium: Page stuck loading even with Timeout

I have trouble getting some dynamic webpages to show before I can scrape the css elements.
I'm trying RSelenium on AirBNB, trying to scrape one example listing in San Francisco. On AirBNB, if you click on a listing, it would open up a new window with the details of the listing. I have trouble getting this details page to show.
I'm hosting a Selenium server via Docker with the standalone-firefox:2.53.0 image.
The R script:
library(RSelenium)
url<- "https://www.airbnb.com/s/san-francisco/homes?adults=1"
remDr <- remoteDriver(
remoteServerAddr = "localhost",
port = 4445L
)
remDr$open()
remDr$setTimeout(type = "page load", milliseconds = 30000)
remDr$setImplicitWaitTimeout(milliseconds = 10000)
#otherwise too fast need to wait for the page to load.
remDr$navigate(url)
#remDr$navigate(paste0(urls[[1]]))
#listings <- remDr$findElements(using = "css selector",'._8s3ctt')
remDr$screenshot(display=T)
remDr$findElements(using = "css selector",'._8s3ctt')[[1]]$clickElement()
id <- remDr$getWindowHandles()
remDr$switchToWindow(id[[2]][1])
price_night <- remDr$findElements(using="css selector","._tyxjp1")
descrpt <- remDr$findElements(using="css selector","._tqmy57")
parking <- remDr$findElements(using="css selector","._6c4wvw")
It does not matter how much ms I set in remDr$setTimeout. The details page would not show. Calling remDr$screenshot(display=TRUE) yields the following image:
Which seems to suggest that the webpage failed to load fully before I started finding the CSS elements that I'm trying to scrape.
An excerpt of the log on the Selenium server is attached:
19:30:05.490 INFO - Executing: [new session: Capabilities [{nativeEvents=true, browserName=firefox, javascriptEnabled=true, version=, platform=ANY}]])
19:30:05.491 INFO - Creating a new session for Capabilities [{nativeEvents=true, browserName=firefox, javascriptEnabled=true, version=, platform=ANY}]
19:30:06.810 INFO - Done: [new session: Capabilities [{nativeEvents=true, browserName=firefox, javascriptEnabled=true, version=, platform=ANY}]]
19:30:18.650 INFO - Executing: [page load wait: 30000])
19:30:18.655 INFO - Done: [page load wait: 30000]
19:30:20.084 INFO - Executing: [implicitly wait: 10000])
19:30:20.089 INFO - Done: [implicitly wait: 10000]
19:30:26.267 INFO - Executing: [delete session: 5da351c5-bd0e-4a95-a357-c049b71ed680])
19:30:26.377 INFO - Done: [delete session: 5da351c5-bd0e-4a95-a357-c049b71ed680]
19:30:58.763 INFO - Executing: [new session: Capabilities [{nativeEvents=true, browserName=firefox, javascriptEnabled=true, version=, platform=ANY}]])
19:30:58.764 INFO - Creating a new session for Capabilities [{nativeEvents=true, browserName=firefox, javascriptEnabled=true, version=, platform=ANY}]
19:30:59.913 INFO - Done: [new session: Capabilities [{nativeEvents=true, browserName=firefox, javascriptEnabled=true, version=, platform=ANY}]]
19:31:00.922 INFO - Executing: [page load wait: 30000])
19:31:00.927 INFO - Done: [page load wait: 30000]
19:31:01.675 INFO - Executing: [implicitly wait: 10000])
19:31:01.680 INFO - Done: [implicitly wait: 10000]
19:31:03.800 INFO - Executing: [get: https://www.airbnb.com/s/san-francisco/homes?adults=1])
19:31:05.301 INFO - Done: [get: https://www.airbnb.com/s/san-francisco/homes?adults=1]
19:31:10.700 INFO - Executing: [find elements: By.cssSelector: ._8s3ctt])
19:31:10.741 INFO - Done: [find elements: By.cssSelector: ._8s3ctt]
19:31:10.806 INFO - Executing: [click: 0 [[FirefoxDriver: firefox on LINUX (1741f648-be7b-48e0-96c9-0c1d2e14a498)] -> css selector: ._8s3ctt]])
19:31:10.962 INFO - Done: [click: 0 [[FirefoxDriver: firefox on LINUX (1741f648-be7b-48e0-96c9-0c1d2e14a498)] -> css selector: ._8s3ctt]]
19:31:14.947 INFO - Executing: [get window handles])
19:31:14.950 INFO - Done: [get window handles]
19:31:15.860 INFO - Executing: [switch to window: {679b54f5-ec42-4ba6-8939-cb7b0d40a7b9}])
19:31:15.864 INFO - Done: [switch to window: {679b54f5-ec42-4ba6-8939-cb7b0d40a7b9}]
19:31:20.580 INFO - Executing: [find elements: By.cssSelector: ._tyxjp1])
19:31:30.592 INFO - Done: [find elements: By.cssSelector: ._tyxjp1]
19:31:44.099 INFO - Executing: [take screenshot])
19:31:44.145 INFO - Done: [take screenshot]
I don't see anything wrong on the server side, but I might be wrong.
Is the timeout really implemented? If not, is there another way that I can get the page to fully load before I scrape?

Unable to launch Chromium edge browser using codeceptjs

I'm trying to launch Chromium edge browser using codeceptjs framework. I'm new to codeceptjs. My scripts are running fine with Chrome and Firefox browsers but, getting following error with Chromium edge.
Error: Can't connect to WebDriver.
Error: Failed to create session.
Unable to create session from {
"desiredCapabilities": {
"browserName": "edge",
"ms:edgeChromium": true,
"platformName": "windows",
"browserVersion": "88.0.705.81"
},
"capabilities": {
"firstMatch": [
{
"browserName": "edge",
"browserVersion": "88.0.705.81",
"ms:edgeChromium": true,
"platformName": "windows"
}
]
}
}
Build info: version: '3.141.59', revision: 'e82be7d358', time: '2018-11-14T08:25:53'
System info: host: 'XXX', ip: 'YYY', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0',
java.version: '1.8.0_231'
Driver info: driver.version: unknown
Please make sure Selenium Server (ChromeDriver or PhantomJS) is running and accessible
Can someone please help me here?
I think browser name should be MicrosoftEdge. At least it's working such way using Selenoid

Sending mail with blastula through a secured smtp server fails (r)

I try to send an email using the R package blastula.
The email should be sent though my employers secure smtp server, but I am stuck with the error "No Kerberos credentials available".
A similar setup works in python, but I would like to do it from R, as it fits my workflow better.
The r code used to send the mail, is shown here.
library(blastula)
email <- prepare_test_message()
to <- "receiver_address#gmail.com"
from <- "sender_address#domain.com"
create_smtp_creds_file(
file = "cred_file",
user = "username",
host = "smtps.server.com",
port = 465,
use_ssl = TRUE
)
#> Please enter password in TK window (Alt+Tab)
#> The SMTP credentials file (`cred_file`) has been generated
smtp_send(email, to, from,
subject = "Hello",
credentials = creds_file(file = "cred_file"),
verbose = TRUE)
#> Error in curl::curl_fetch_memory(url, handle = h): Failure when receiving data from the peer
Created on 2019-12-10 by the reprex package (v0.3.0)
The verbose output from the smtp_send command is given here:
* Rebuilt URL to: smtps://smtps.server.com:465/
* Trying xx.xx.xx.xx...
* TCP_NODELAY set
* Connected to smtps.server.com (xx.xx.xx.xx) port 465 (#1)
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* Server certificate:
* subject: XXXXXXXX
* start date: Apr 20 00:00:00 2017 GMT
* expire date: Apr 23 12:00:00 2020 GMT
* subjectAltName: host "smtps.server.com" matched cert's "smtps.server.com"
* issuer: C=NL; ST=Noord-Holland; L=Amsterdam; O=TERENA; CN=TERENA SSL CA 3
* SSL certificate verify ok.
< 220 mail.server.com Microsoft ESMTP MAIL Service ready at Tue, 10 Dec 2019 12:49:44 +0100
> EHLO henrik-HP-EliteBook-840-G5
< 250-mail.sdu.dk Hello [xx.xx.xx.xx]
< 250-SIZE 62914560
< 250-PIPELINING
< 250-DSN
< 250-ENHANCEDSTATUSCODES
< 250-STARTTLS
< 250-AUTH GSSAPI NTLM LOGIN
< 250-8BITMIME
< 250-BINARYMIME
< 250 CHUNKING
> AUTH GSSAPI
< 334 GSSAPI supported
* gss_init_sec_context() failed: No Kerberos credentials available (default cache: FILE:/tmp/krb5cc_501).
* Closing connection 1
Error in curl::curl_fetch_memory(url, handle = h) :
Failure when receiving data from the peer
And output from sessionInfo()
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] blastula_0.3.1.9000
loaded via a namespace (and not attached):
[1] Rcpp_1.0.3 compiler_3.6.1 prettyunits_1.0.2 remotes_2.1.0 tools_3.6.1 getPass_0.2-2
[7] testthat_2.3.1 digest_0.6.23 pkgbuild_1.0.6 uuid_0.1-2 pkgload_1.0.2 jsonlite_1.6
[13] memoise_1.1.0 rlang_0.4.2 cli_2.0.0 rstudioapi_0.10 commonmark_1.7 curl_4.3
[19] yaml_2.2.0 xfun_0.11 withr_2.1.2 stringr_1.4.0 knitr_1.26 fs_1.3.1
[25] desc_1.2.0 devtools_2.2.1 rprojroot_1.3-2 glue_1.3.1 R6_2.4.1 processx_3.4.1
[31] fansi_0.4.0 sessioninfo_1.1.1 callr_3.4.0 magrittr_1.5 usethis_1.5.1 backports_1.1.5
[37] ps_1.3.0 ellipsis_0.3.0 htmltools_0.4.0 assertthat_0.2.1 stringi_1.4.3 crayon_1.3.4
This looks identical to an error I received while trying to use SendGrid as the SMTP service. The issue was not with blastula, but with the fact that I had not verified a "Single Sender Identity".
In the SendGrid UI you will find "Sender Authentication" under settings. Verify the address that you will use as the from address in you blastula message. Once you have clicked on the verification email that you receive in your inbox send again from blastula. The message should now be delivered without issues.

load .yml file in R

I am trying to use config.yml in R. But whenever I load the file using
my config file looks like this
default:
dataconnection:
driver: 'ODBC Driver 11 for SQL Server'
server: 'server'
uid: 'Username'
pwd: 'password'
port: 1433
database: 'Data_Science'
rsconnect:
dataconnection:
driver: 'FreeTDS'
server: 'server'
uid: 'username'
pwd: 'password'
port: 1433
database: 'Data_Science'
Code:
config <- config::get(file = "C:/Users/Samuel.Golomeke/Documents/Data Science/Codes/R codes/SQL_Server_shiny_connect/config")
I keep getting the following warning message:
Warning message:
In readLines(con) :
incomplete final line found on 'C:\Users\Samuel.Golomeke\Documents\Data Science\Codes\R codes\SQL_Server_shiny_connect\config.yaml'"
why is that?
Make sure the last line in the config.yaml file has a blank line with no whitespace (space, tab, etc).

webdriver io crashes inside a jade template on rendered function

I am testing my meteor app's UI with some browser tests. I use http://webdriver.io and a selenium chrome https://hub.docker.com/r/selenium/standalone-chrome/ node.
I use the webdriver.io testrunner for tests and mocha as the test framework.
When I enter this block inside a jade template (by opening the corresponding page):
Template.boardBody.onRendered(function() {
let imagePath = new ReactiveVar('');
this.autorun(() => {
imagePath.set(Meteor.settings.public.backgroundPath[1]);
//document.getElementsByClassName('board-wrapper')[0].style.backgroundImage = "url('" + imagePath.get() + "')";
$('.board-wrapper').css('background-image', "url('" + path + "')");
});
}
The headless chrome crashes with this error:
{ Error: An unknown server-side error occurred while processing the command.
at BoardPage.open (tests/board.page.js:20:5)
at Context.<anonymous> (tests/board.test.js:22:17)
at Promise.F (node_modules/core-js/library/modules/_export.js:35:28)
at execute(<Function>) - at BoardPage.open (tests/page.js:11:13)
message: 'unknown error: session deleted because of page crash\nfrom tab crashed',
type: 'RuntimeError',
screenshot: 'Just a black page',
seleniumStack:
{ status: 13,
type: 'UnknownError',
message: 'An unknown server-side error occurred while processing the command.',
orgStatusMessage: 'unknown error: session deleted because of page crash\nfrom tab crashed\n (Session info: chrome=59.0.3071.115)\n (Driver info: chromedriver=2.30.477691 (6ee44a7247c639c0703f291d320bdf05c1531b57),platform=Linux 4.4.4-200.fc22.x86_64 x86_64) (WARNING: The server did not provide any stacktrace information)\nCommand duration or timeout: 4.13 seconds\nBuild info: version: \'3.4.0\', revision: \'unknown\', time: \'unknown\'\nSystem info: host: \'f362d8ab8951\', ip: \'172.17.0.1\', os.name: \'Linux\', os.arch: \'amd64\', os.version: \'4.4.4-200.fc22.x86_64\', java.version: \'1.8.0_131\'\nDriver info: org.openqa.selenium.chrome.ChromeDriver\nCapabilities [{applicationCacheEnabled=false, rotatable=false, mobileEmulationEnabled=false, networkConnectionEnabled=false, chrome={chromedriverVersion=2.30.477691 (6ee44a7247c639c0703f291d320bdf05c1531b57), userDataDir=/tmp/.org.chromium.Chromium.NUsUeZ}, takesHeapSnapshot=true, pageLoadStrategy=normal, databaseEnabled=false, handlesAlerts=true, hasTouchScreen=false, version=59.0.3071.115, platform=LINUX, browserConnectionEnabled=false, nativeEvents=true, acceptSslCerts=true, locationContextEnabled=true, webStorageEnabled=true, browserName=chrome, takesScreenshot=true, javascriptEnabled=true, cssSelectorsEnabled=true, unexpectedAlertBehaviour=}]\nSession ID: f1e261ec57fde3697e98945af051d236' },
shotTaken: true }
I use chai.expect for my assertion statements and i have a feeling that the promises are somehow messing up the headless chrome.
Anyone knows why this is happening?

Resources