Using Rselenium with firefox and socks5h - r

I'm using the RSelenium package to connect to firefox, but I wish to do it via a socks proxy.
In Python, this is achievable using the webdriver package and setting the preferences of the FirefoxProfile, e.g.
profile=webdriver.FirefoxProfile()
profile.set_preference('network.proxy.socks', x.x.x.x)
profile.set_preference('network.proxy.socks_port', ****)
browser=webdriver.Firefox(profile)
However, I can't find how to try set the proxy to be a socks proxy, or to set the socks port in RSelenium. I've tried setting it using the RCurl options, as follows
options(RCurlOptions = list(proxy = "socks5h://x.x.x.x:****"))
but this gives me the following error message
Error in function (type, msg, asError = TRUE) :
Can't complete SOCKS5 connection to 0.0.0.0:0. (1)
Has anyone successfully connected to Firefox using a socks proxy using R code?
I am using version 1.3.5 of RSelenium and version 28.0 of Firefox.

Not tested but something like the following should work:
fprof <- makeFirefoxProfile(list(
"network.proxy.socks" = "squid.home-server"
, "network.proxy.socks_port" = 3128L
, "network.proxy.type" = 1L
)
)
remDr <- remoteDriver(extraCapabilities = fprof)
remDr$open()

Related

How to run RSelenium script in Heroku

I've been struggling to run an R script on Heroku with RSelenium. I have other scripts running on the platform, but I can't make a Selenium server start. I'm trying it using the help I found
here on stackoverflow. I have Google Chrome and ChromeDriver added to my Heroku slug. I managed to pull up the following code:
eCaps <- list(chromeOptions = list(args = c('--headless', '--disable-gpu', '--blink-settings=imagesEnabled=false',
'--disable-dev-shm-usage', '--no-sandbox', '-Dwebdriver.chrome.driver=/app/.chromedriver/bin/chromedriver'),
binary = Sys.getenv("GOOGLE_CHROME_BIN")))
remDr <- remoteDriver(
browser = "chrome",
extraCapabilities = eCaps
)
remDr$open()
where I have an environment variable GOOGLE_CHROME_BIN storing the path of the Chrome binary as /app/.apt/usr/bin/google-chrome.
When I run the script I'm getting the following error:
[1] "Connecting to remote server"
Error in checkError(res) :
Undefined error in httr call. httr output: Failed to connect to localhost port 4444: Connection refused
Calls: <Anonymous> -> queryRD -> checkError

rsDriver error when executed through company's network

I am facing an issue while running the rsDriver() function to open up the chrome browser.
Code:
library("RSelenium")
library("wdman")
mybrowser <- rsDriver(browser=c("chrome"), chromever="80.0.3987.16",port = 443L)
remDr <- mybrowser$client
remDr$navigate("https://google.co.in/")
Sys.sleep(2)
When I run this code on my machine while connected to my home network the code works as expected. But when I run this code from my office network, the rsDriver(browser=c("chrome"), chromever="80.0.3987.16",port = 443L) gives me the below error and I am stuck at this point.
checking Selenium Server versions:
BEGIN: PREDOWNLOAD
Error in open.connection(con, "rb") :
Timeout was reached: [www.googleapis.com] Operation timed out after 10000 milliseconds with 0 out of
0 bytes received
I tried connecting through the company's proxy with the below code but still no luck. I tried using the port numbers 4444,4445,4567 but still the same error.
cprof <- list(chromeOptions = list(args = list("--proxy-server= gproxy.go.company.org:8080")))
mybrowser <- rsDriver(browser=c("chrome"), chromever="80.0.3987.16", port = 443L,extraCapabilities = cprof)
It would be very helpful someone can help me in understanding the issue and suggest me a solution. Am I missing something in the code. Any help would be highly appreciated.
Also do let me know for any additional information required.
To me this looks like a proxy issue. Are you able to retrieve an arbitrary website? E.g. using httr::GET("www.google.com"). If not, this would also point to a problem with the proxy.
Have you tried to configure it in .Renviron? Like so:
file.edit('~/.Renviron')
Add this line to the file and restart RStudio:
http_proxy=USER:PASSWORD#PROXY:PORT
Another option: setting proxy with httr/curl:
set_config(use_proxy(url="proxy.com",
port = 8080,
username = "foo",
password = "bar"))
Achieved this by switching the networks, first connected to my local network and when the browser opens up switch to company's network.

RSelenium with RSDriver. Error: httr output: Failed to connect to localhost port 4445: Connection refused

I am trying to use RSelenium for webscraping. I am following the basics tutorial as explained on cran. The recommended approach is to install Docker (see tutorial as well as this stackoverflow answer). If I understand correctly, this is not an option for me as I am operating on Windows 7 for which Docker seems not to be available (see docker forum).
Thus, I am trying option 2 using the RSDriver. I run
RSelenium::rsDriver()
remDr <- remoteDriver(
remoteServerAddr = "localhost",
port = 4445L,
browserName = "firefox"
)
remDr$open()
and get the error
> remDr$open()
[1] "Connecting to remote server"
Error in checkError(res) :
Undefined error in httr call. httr output: Failed to connect to localhost port 4445: Connection refused
This question has been asked and answered before here, here, here and here, though these are about the same error when using Docker and their solutions did not work for me.
Is there anyway to get this running with rsDriver? Is there any option for me as a Windows 7 user?
With RSelenium version 1.7.7 this is a workaround:
library(RSelenium)
remDr <- rsDriver(
port = 4445L,
browser = "firefox"
)
This command combines the server setup, and driver initation.
My issue (on Mac) was updating Java:
https://www.oracle.com/java/technologies/downloads/#jdk19-mac
Worked after this.

RSelenium Installation on MacBook, with Chrome

I have RSelenium (the package) installed, to do some scraping of NHL statistics from hockeyreference.com
It was working, all fine, but recently stopped working, giving this error:
[1] "Connecting to remote server"
Could not open chrome browser.
Client error message:
Undefined error in httr call. httr output: Failed to connect to
localhost port 4567: Connection refused
Check server log for further details.
$client
[1] "No sessionInfo. Client browser is mostly likely not opened."
$server
Process Handle
command : /private/var/folders/dk/kf4tf83n1lg40687w6fmq5wh0000gn/T/Rtmpiy1cOY/file1d1856ef53ae.sh
system id : 18786
state : exited
Warning message:
In rsDriver(port = 4567L, geckover = NULL, phantomver = NULL) :
Could not determine server status.
I've tried reinstalling, but could not get it working. My original install is outlined in this question (using homebrew, with the latest chromedriver installed):
Css selector issue with rvest and NHL statistics
Any help would be great. The code I'm running with RSelenium is here:
https://github.com/papelr/nhldatar/blob/master/nhldatar/R/nhldatar-phase-2.R
TL;DR, I can't get the rsDriver argument to work, and it gives the error posted above:
rsDriver(port = 4567L, geckover = NULL, phantomver = NULL)
remDr <- remoteDriver(browserName = "chrome")
remDr$open()
If RSelenium works (opening a chrome browser), then the rest of this will run! Thanks!
I reccomend to:
install docker for Mac,
pull the image for chrome, firefox (recommend version 3.5.1) or phantom
run image in docker: docker run...
create remotedriver:
remDr <- remoteDriver(remoteServerAddr = "here you IP", port = 4445L,
browserName = "firefox")
if you have debug version you can watch in VNC what are you doing

How to set the proxy port and URL in R for the blsAPI

My employer's firewall requires I set the proxy port and url when downloading data in R using the Quandl package. I'm now trying to use the blsAPI package and the work around I used for Quandl is not working. How do I set the proxy port and URL for the blsAPI package?
Below see the code I use for Quandl:
proxyURL <- "##.#.##.###"
proxyPort <- ####,
set_config(use_proxy(url = proxyURL, proxyPort))
Below see the error I get using the blsAPI:
> response <- blsAPI('LAUCN040010000000005')
Error in function (type, msg, asError = TRUE) :
Failed to connect to api.bls.gov port 80: Timed out
Found a solution, see code below:
proxyURL <- "##.#.##.###"
proxyPort <- ####
Sys.setenv(http_proxy = paste(proxyURL, proxyPort, sep = ":"))
I can now use the blsAPI.

Resources