Main purpose: Use selenium in non-internet private network with R code (Ubuntu 20.04).
Steps: Install Java, install Selenium Server 4.7.2, install Chrome (specific version), Download and use chromeDriver (same version as Chrome), Download and used desire R package (RSelenium) and start coding.
library("RSelenium")
rd <- rsDriver()
rd$open()
Problem: When I call open function I face this error
checking Selenium Server versions:
BEGIN: PREDOWNLOAD
Error in open.connection(con, "rb") :
Could not resolve host: www.googleapis.com
I do some R&D and find Selenium need to download some relevant driver files! Our server are in private network and there is not any proxy for internet at all. So regardless of I use of R on any other languages, can I use Selenium in non-internet private network? If yes which files should I download offline and where should I copy them?
Thanks in advance
I think the issue here is that rsDriver creates both the server and the client. As such it includes a wrapper for the function wdman::selenium() that is meant to download and manage the drivers needed. I would look into one of the two options: 1) using rsDriver() as the package manager and let it download the drivers for you or 2) using remoteDriver() on its own (which will not install drivers) to connect to your Selenium Server instead.
In the description for rsDriver:
A list containing a server and a client. The server is the object returned by selenium() and the client is an object of class remoteDriver()
For people who wants to use Selenium in private non internet network:
As #bingbongtelecom mention rsDrive() manage to download some drives as chromeDriver, Phantomjs, geckodriver and etc to use them. You should download them in other network and copy them in your private network. After that use 'check = False' option to stop checking driver and download process
Regards
Related
I am trying to use the RCurl library to access an SFTP site to download files on a MacOS running Monterrey v12.4. As has happened to others, when RCurl calls the curl libraries SFTP is not enabled.
Following SFTP Support for curl on OSX I installed curl with openssl using homebrew. I uninstalled the curl and Rcurl libraries in RStudio.
In a terminal window, running 'curl -v' shows that sftp is available.
In RStudio, running "system('which curl')" shows sftp is available.
However, when I try to retrieve a file via SFTP using the RCurl library, I receive the message:
Error in function (type, msg, asError = TRUE) :
Protocol "sftp" not supported or disabled in libcurl
I thought maybe the PATH was not set correctly. I added the following line to my .Renviron file.
PATH=/opt/homebrew/opt/curl/bin:$PATH
At this stage, it's not clear to me why sftp is not supported when trying to access a file on an SFTP site when using the RCurl library.
What else can I do to try to diagnose why this is occurring?
I found a solution. In the post
SFTP Support for curl on OSX
there is code to enable SFTP support in curl on MacOS.
I ran the three lines:
PATH <- Sys.getenv("PATH")
version <- '7.86.0'
Sys.setenv(PATH = paste0("/opt/homebrew/Cellar/curl/", version, "/bin:", PATH))
and SFTP is enabled in RCurl.
an "echo $PATH" run from the terminal shows "/opt/homebrew/opt/curl/bin" in the path. I assumed this meant the OpenSSL enabled curl was being used, but that is not the case. Apparently the R PATH is different than the MacOS PATH.
My next step is to add this line to my .Rprofile file so that it is included each time I start R.
I am new to Rselenium and I want to use it to scrape data from a website. When trying to establish a connection I get the error that my chrome driver is not compatible with my current browser version, here is the code and the error I got so far
library(RSelenium)
library(RCurl)
driver <- rsDriver()
remDr <- driver[["client"]]
remDr$open()
> remDr$open()
[1] "Connecting to remote server"
Selenium message:session not created: This version of ChromeDriver only supports Chrome version 97
Current browser version is 96.0.4664.45 with binary path C:\Program Files (x86)\Google\Chrome\Application\chrome.exe
The error report is very long, but I think the essence of the error is described in the first sentence; my chromedriver and current version of chrome are not compatible.
So I did some research, since I don't know what a chrome driver is. I found out I can download older versions of chrome driver. I searched the 96.0.4664.45 version since my browser is that version (it seems there is only a 32 bit version, which I thought was odd but its the only one for windows so I downloaded that one).
After I downloaded the chromedriver.exe, I moved it to a specific folder and I added that folder to my system path as I read online this is what I have to do in order for R to recognize. Then, I used the cmd prompt to execute the chrome driver, which seems to work;
Then when I start up R and retry, I still got the same error, so it seems I am still doing something wrong. Do I need to 'overide' the chromedriver 97? And why do I even have a version 97 when my browser is a 96? Any help would be greatly appreciated.
I'm an undergraduate research assistant working on a Linux server without root privilege. I'm trying to install the Rstudio server but the Rstudio website only provides the installation method for sudoers. Is it possible to install it without root access? I'm asking because I'm really not sure if I could get access from the manager. Any help will be appreciated!
No, you can't install it without root access. But there are a couple of things you could do to piece together a solution. Here are two options:
Extract the server and run it directly
You have to be root to install packages, so you can't install the .deb/.rpm file yourself. However, you could extract the contents of the file to a directory inside your home directory and run RStudio Server from there, by executing the rserver program in a regular shell.
Note that this will probably require an afternoon of editing the rserver.conf file to tell it where to find the rest of the files in the installation (since it presumes they are installed in /usr/lib by default). You can get some inspiration for how to do this here: https://github.com/rstudio/rstudio/blob/master/src/cpp/conf/rserver-dev.conf
Run the desktop version and forward the graphics
The other route is to run RStudio Desktop on the server; we make several builds of RStudio Desktop that are installer-less and can just be unpacked into your home directory. Then run an X11 server on your own computer and an X11 client on the RStudio server, so that the RStudio Desktop instance appears on your computer instead of the server.
Yes, you can run rserver without root priveliges.
For RStudio 1.4 I patched the following line into src/cpp/core/LogOptions.cpp
const FilePath kDefaultLogPath = core::system::xdg::userDataDir().completePath("log");
Then you need to set the system environment variables to some location read-writeable for the user, like
RSTUDIO_CONFIG_DIR=$HOME/.config/rstudio
RSTUDIO_CONFIG_HOME=$HOME/.config/rstudio
RSTUDIO_DATA_HOME=$HOME/.local/share/rstudio
And start rserver with the option
--server-data-dir={directory writeable for user}
--server-pid-file={file-path creatable for user}
--database-config-file={config-file}
With these adjustments it runs for me when I start it as a simple user (no root privileges) with
rserver --auth-none=1 --www-frame-origin=same --www-port={port} --www-verify-user-agent=0 --server-data-dir={my-tmp-path} --server-pid-file={my-tmp-path}/rstudio.pid --database-config-file={my-tmp-path}/db.conf}
ATTENTION:
But be aware, that anyone who can reach your system and the specified port from the network has access to the running RStudio in his browser and therefore can run any command in the name of the user on your system now.
I'm trying to execute code on R with the package RSelenium to do some webscraping, but I'm blocked at the very first step. After loading the library, I try to run this line of code:
rmDr <- rsDriver(browser = "chrome", chromever = 'latest')
But the console returns :
Error in java_check() :
PATH to JAVA not found. Please check JAVA is installed.
Java is indeed installed on my computer, but I'm guessing the path isn't the one the package is waiting for. Does someone know where I could modify the path in the RSelenium package code so that I can run this?
To be noted, I'm working on a company computer and so I don't have every admin rights.
Thanks for your help!
You can use method "remoteDriver()" instead of "rsDriver()". I checked it today with the last stable version of selenium driver (3.141.59) and it works perfectly.
Here's the code sample:
library(RSelenium)
driver <- remoteDriver()
driver$open()
driver$navigate("https://www.google.com/")
Just got the same error, installed the latest Java Development Kit (JDK), restarted a machine and everything worked ok.
The best way to use RSelenium is to go through Docker.
I used this tutorial, https://rpubs.com/johndharrison/RSelenium-Docker, not a long time ago, all went smooth.
Besides, you need a debugger, you can not scrape without it. That's why this tutorial is a good idea.
Let know if anything goes wrong.
This question has been asked in: Configure proxy on Rstudio. However, it was never resolved.
I am a user of RStudio 0.99.486 version and R 3.2.2 version. I have tried 2 ways to configure proxy settings in the office without success after reading several suggestions:
FIRST TRY:
Type in Rstudio as first line:
Sys.setenv(http_proxy="http://user_name:password#proxy.company_domain.es:8080/")
Go to:
-Tools, -Global Options, -Packages, and unmark option:
"Use internet library/proxy for HTTP"
I also unmarked the option: "Use secure download method for HTTP".
Besides, I right-clicked on the R x64 3.2.2 icon of the desktop and added after 1 space in the "Target" camp:
http_proxy=http://user_name:password#proxy.company_domain.es:8080/
It did not work as I received the message:
Warning in install.packages : cannot open: HTTP status was '407
Proxy Authentication Required'
SECOND TRY:
Create a notepad file with the name:
.Renviron
Saved it in: "C:\Users\username\Documents".
The file contains inside the following 2 lines:
http_proxy=http://proxy.company_domain.es/
http_proxy_user=user_name:password
When I try installing a package I receive:
"Warning in install.packages : unable to connect to
'cran.rstudio.com' on port 80.
Unable to access index for repository http://cran.rstudio.com/src/contrib"
After running code line: R.home() My R_HOME route is:
"C:/Program Files/R/R-32~1.2"
I appreciate before-hand for your advice and help.
Thank you for your question. It helped me to resolve my issue. I had to unmark the option to use the settings from the Internet Explorer and restart.
Maybe your .Renviron does not contain the proxy port, you have to write
http_proxy=http://proxy.company_domain.es:8080/
http_proxy_user=user_name:password
If you specify
http_proxy_user=ask
it should prompt you for user name and password - then you know that the file is read
I struggled with this when I initially started working behind a proxy. Here's what I believe is to be the solution. Disclaimer, I am working on a Windows 7 Workstation.
Even though when you read through the documentation, R suggests that .Renviron and .Rprofile should be in R.home(), that is not the case for Windows.
By default (I believe), the R.home() for Windows is actually the Documents folder for your user. You can check that with
path.expand("~/")
which defaults to "My Documents" directory.
Therefore, do place the .Renviron file with the content you
already have, disable the Internet Explorer option in RStudio, and
make sure you place the file in "My Documents."
Hope it helps!
I also nearly gave up on this problem until I found this simple solution (R3.3.1):
specify the system environment variables (in Windows Advanced System Settings add the variables http_proxy and https_proxy and set it to
http://user_name:password#proxy.company_domain.es:8080/ with your specific settings)
in the R console type
update.packages(ask='graphics',method="libcurl",checkBuilt=TRUE)