I've read a few posts on sftp with R, but was not able to address the problem that I have. There's a decent change I'm just not searching in the right place, and if that's the case, please point me in the right direction. Here's where I'm at:
> library(RCurl)
> curlVersion()
$age
[1] 3
$version
[1] "7.43.0"
$vesion_num
[1] 469760
$host
[1] "x86_64-apple-darwin15.0"
$features
ipv6 ssl libz ntlm asynchdns spnego largefile ntlm_wb
1 4 8 16 128 256 512 32768
$ssl_version
[1] "SecureTransport"
$ssl_version_num
[1] 0
$libz_version
[1] "1.2.5"
$protocols
[1] "dict" "file" "ftp" "ftps" "gopher" "http" "https" "imap" "imaps" "ldap" "ldaps" "pop3" "pop3s" "rtsp" "smb" "smbs" "smtp" "smtps" "telnet" "tftp"
$ares
[1] ""
$ares_num
[1] 0
$libidn
[1] ""
Right away, I notice that sftp is not a protocol accepted in my current version of RCurl, which is my main problem. As a result, when I run the following code below, I get the following error:
# Input
protocol <- "sftp"
server <- "00.000.00.00"
userpwd <- "userid:userpass"
tsfrFilename <- 'myfile.txt'
ouptFilename <- 'myfile.txt'
# Run
url <- paste0(protocol, "://", server, tsfrFilename)
data <- getURL(url = url, userpwd = userpwd)
Error in function (type, msg, asError = TRUE) :
Protocol "sftp" not supported or disabled in libcurl
I actually have a second question as well. My understanding is that getURL grabs data from the other server and pulls it to my local machine, whereas I would like to put a file onto the server from my local machine.
To summarize: (1) can I update RCurl / libcurl in R to support sftp, and (2) how do i put files from my local machine into the server, rather than get files from the server to my local machine?
Thanks!
I found the answer, for the most part...
http://andrewberls.com/blog/post/adding-sftp-support-to-curl - following this link addressed the problem for me.
I've successfully added sftp support to cURL, however I am now struggling to update the RCurl package to have the same...
Related
Update: As of RSelenium 1.7.9 the described problems have disappeared.
I know, similar questions have been asked, but their solutions didn't work for me.
Summary:
I would like to open a Selenium-server and a client under Linux via R's package RSelenium.
But even though I try two ways described in the documentation (while I want to avoid docker)
it doesn't work reliably.
My system:
Linux 5.19, R 4.2.1,
RSelenium 1.7.7, selenium-server-standalone-4.0.0-alpha-2,
chromedriver 104.0.5112.79-2.1, geckodriver 0.31.0 (binman),
I have tested with OpenJDK 11 and OpenJDK 18 (currently)
I. Selenium via JAVA
In the Linux-console
#localhost:~/Documents/selenium> java -jar selenium-server-standalone-4.0.0-alpha-2.jar
20:04:49.470 INFO [GridLauncherV3.parse] - Selenium server version: 4.0.0-alpha-2, revision: f148142cf8
20:04:49.526 INFO [GridLauncherV3.lambda$buildLaunchers$3] - Launching a standalone Selenium Server on port 4444
20:04:49.730 INFO [WebDriverServlet.<init>] - Initialising WebDriverServlet
20:04:49.793 INFO [SeleniumServer.boot] - Selenium Server is up and running on port 4444
In R I type:
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4444L, browserName = "chrome", version = "104.0.5112.79")
and get in the Linux console the output:
20:07:49.463 INFO [ActiveSessionFactory.apply] - Capabilities are: {
"browserName": "chrome",
"javascriptEnabled": true,
"nativeEvents": true,
"version": "104.0.5112.79"
}
20:07:49.465 INFO [ActiveSessionFactory.lambda$apply$11] - Matched factory org.openqa.selenium.grid.session.remote.ServicedSession$Factory (provider: org.openqa.selenium.chrome.ChromeDriverService)
Starting ChromeDriver 104.0.5112.79 (3cf3e8c8a07d104b9e1260c910efb8f383285dc5-refs/branch-heads/5112#{#1307}) on port 15987
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
20:07:50.023 INFO [ProtocolHandshake.createSession] - Detected dialect: W3C
20:07:50.044 INFO [RemoteSession$Factory.lambda$performHandshake$0] - Started new session 732d7c7ddfeaed42fc80fac54f91fcb5 (org.openqa.selenium.chrome.ChromeDriverService)
The Chrome-Browser opens and the R console gives me the kiss of death:
Error in checkError(res) :
Undefined error in httr call. httr output: Failed initialization
That means, I cannot use the R-console for navigation. The other approach:
II. Selenium via RSelenium::rsDriver
rD <- RSelenium::rsDriver(browser="firefox", port = 4567L, verbose = FALSE)
mostly yields (with a browser window opening)
Could not open firefox browser.
Client error message:
Undefined error in httr call. httr output: Failed initialization
Check server log for further details.
BUT: The very same code can work! Randomly. Or after a long time having R open?!? Endless testing?!?
Suddenly I get several running server/client connections including navigation on web-pages:
$acceptInsecureCerts
[1] FALSE
$browserName
[1] "firefox"
$browserVersion
[1] "103.0.2"
$`moz:accessibilityChecks`
[1] FALSE
$`moz:buildID`
[1] "20220815180539"
$`moz:geckodriverVersion`
[1] "0.31.0"
etc.pp.
But at the latest when I reboot my PC, I get the same error-message again. It also can work after deleting and reinstalling the four drivers via RSelenium in ./local/share. Or when I try the same again, it simply doesn't.
I have never run in such a kind of problem with randomness. Where can it come from?
PS: The server log, if it doesn't work, can have additional lines, which I add:
> rD$server$log()
$stderr
[26] "Missing chrome or resource URL: resource://gre/modules/UpdateListener.jsm"
[27] "Missing chrome or resource URL: resource://gre/modules/UpdateListener.sys.mjs"
[28] "console.error: \"Error during quit-application-granted: [Exception... \\\"File error: Not found\\\" nsresult: \\\"0x80520012 (NS_ERROR_FILE_NOT_FOUND)\\\" location: \\\"JS frame :: resource:///modules/BrowserGlue.jsm :: _onQuitApplicationGranted/tasks< :: line 2006\\\" data: no]\""
[29] "1661020441351\tMarionette\tINFO\tStopped listening on port 42425"
[30] "JavaScript error: chrome://remote/content/marionette/cert.js, line 57: NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsICertOverrideService.setDisableAllSecurityChecksAndLetAttackersInterceptMyData]"
$stdout
character(0)
Maybe you can try the following approach which relies on Docker :
library(RSelenium)
url <- "https://www.hubs.com/3d-printing/#/?place=New%20York&latitude=40.7144&longitude=-74.006&distanceLimit=250&distanceUnit=miles&shipsToCountry=US&shipsToState=NY"
shell('docker run -d -p 4445:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "firefox")
remDr$open()
remDr$navigate(url)
remDr$getPageSource()[[1]]
Since I have to use the sftp protocol to retrieve some documents from a remote server I reinstalled the curl library with ssl.
The curl --version now correctly returns
curl 7.72.0 (x86_64-pc-linux-gnu) libcurl / 7.72.0 OpenSSL / 1.1.1 zlib / 1.2.11 brotli / 1.0.4 libidn2 / 2.0.4 libpsl / 0.19.1 (+ libidn2 / 2.0.4 ) libssh / 0.7.0 / openssl / zlib nghttp2 / 1.30.0 librtmp / 2.3
Release-Date: 2020-08-19
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: AsynchDNS brotli GSS-API HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP UnixSockets
and the sftp protocol is now enabled.
In R I tried to reinstall the RCurl package from source with install.package('RCurl', type = 'source') but the libcurlVersion() command keeps returning
[1] "7.58.0"
attr (, "ssl_version")
[1] "OpenSSL / 1.1.1"
attr (, "libssh_version")
[1] ""
attr (, "protocols")
[1] "dict" "file" "ftp" "ftps" "gopher" "http" "https" "imap"
[9] "imaps" "ldap" "ldaps" "pop3" "pop3s" "rtmp" "rtsp" "smb"
[17] "smbs" "smtp" "smtps" "telnet" "tftp"
where, as you can see, the library version is different from the one installed on the machine and the sftp protocol is not enabled.
How can I force R to use the correct curl version?
You need to re-install Older curl to solve the issue. You can specify the version of curl what you want, the guide build curl by source code
exampe:
wget https://curl.haxx.se/download/curl-7.58.0.zip
The guide to re-install curl by zip
https://yannmjl.medium.com/how-to-manually-update-curl-on-ubuntu-server-899476062ad6
I try to post to my Wordpress site using the RWordpress package (https://github.com/duncantl/RWordPress). This worked until recently, and I now get the following error message.
options(WordPressLogin = c(bla = 'fasel'),
WordPressURL = 'https://www.econinfo.de/xmlrpc.php')
getRecentPostTitles()
Error in function (type, msg, asError = TRUE) :
error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
Searching around, it seems that there is a conflict with the TLS version, but I don't understand on what side. The certificate from my hoster supports TSL 1.1 and TLS 1.2.
Any help would be appreciated.
I'm on Win 10 with
> RCurl::curlVersion()
$age
[1] 3
$version
[1] "7.40.0"
$vesion_num
[1] 468992
$host
[1] "x86_64-pc-win32"
$features
ssl libz ntlm asynchdns spnego largefile idn sspi
4 8 16 128 256 512 1024 2048
$ssl_version
[1] "OpenSSL/1.0.0o"
The RWordpress package has not been touched for over 7 years. You might want to explore other options.
These days, people use the curl or httr package for internet access. The RCurl package has been unmaintained for years unfortunately, and it only supports very old version of SSL (apparenlty not TSL 1.1 and TLS 1.2)
As you are using Windows 10 you can eventually download IIS Crypto and easily manage and disable TLS and SSL.
But #Jeron is right, RWordpress is deprecated.
I've looked at https://github.com/ropensci/RSelenium/issues/94 and https://github.com/ropensci/RSelenium/issues/82 but was not able to solve my problem. It didn't help that this person was on Windows, and I am on Mac (El Capitan, version 10.11.6)
I am trying to learn data scraping with RSelenium, but some of the technical aspects of it are giving me issues early on. I have a few questions first and then will share my code:
(1) Right away, it says that startServer() is deprecated. specifically, that:
startServer()
# output
Warning message:
startServer is deprecated.
Users in future can find the function in
file.path(find.package("RSelenium"), "example/serverUtils").
The sourcing/starting of a Selenium Server is a users responsiblity.
Options include manually starting a server see
vignette("RSelenium-basics", package = "RSelenium")
and running a docker container see
vignette("RSelenium-docker", package = "RSelenium")
.
what should i use in place of startSever(), or what do I need to change on my computer? I'm confused as to what this warming message is saying.
(2) Since it's just a warning, I continue by trying to open a browser in chrome. I quickly run into another error:
remDr = remoteDriver$new(browserName = 'chrome')
remDr$open()
# output
[1] "Connecting to remote server"
$webdriver.remote.sessionid
[1] "4d0ad1d9-1c4b-4171-8dce-ba8363f5849e"
$locationContextEnabled
[1] TRUE
$webStorageEnabled
[1] TRUE
$takesScreenshot
[1] TRUE
$javascriptEnabled
[1] TRUE
$message
[1] "session not created exception\nfrom unknown error: Runtime.executionContextCreated has invalid 'context': {\"auxData\":{\"frameId\":\"34144.1\",\"isDefault\":true},\"id\":1,\"name\":\"\",\"origin\":\"://\"}\n (Session info: chrome=54.0.2840.71)\n (Driver info: chromedriver=2.20.353124 (035346203162d32c80f1dce587c8154a1efa0c3b),platform=Mac OS X 10.11.6 x86_64)"
$hasTouchScreen
[1] TRUE
$platform
[1] "ANY"
$cssSelectorsEnabled
[1] TRUE
$id
[1] "4d0ad1d9-1c4b-4171-8dce-ba8363f5849e"
the $message line output mentions that the session was not created. on my desktop, what i see is that chrome opens initially for a split second, and then closes / crashes / doesn't actually open up. I try again for firefox, and get:
remDr = remoteDriver$new(browserName = 'firefox')
remDr$open()
# output
[1] "Connecting to remote server"
Selenium message:The path to the driver executable must be set by the webdriver.gecko.driver system property; for more information, see https://github.com/mozilla/geckodriver. The latest version can be downloaded from https://github.com/mozilla/geckodriver/releases
Error: Summary: UnknownError
Detail: An unknown server-side error occurred while processing the command.
class: java.lang.IllegalStateException
Further Details: run errorDetails method
it is frustrating to try to learn this, but to not even be able to get past the very first steps of opening a browser. Any help is greatly appreciated!
As noted checkForServer and startServer are deprecated you may be able to use them as follows:
unlink(file.path(find.package("RSelenium"), "bin"), recursive = TRUE, force = TRUE)
RSelenium::checkForServer()
For Firefox:
In terminal, run the following command
brew install geckodriver
Running selenium at the default port on Mac has an issue as often Kerberos is already running on default port 4444 on MAC. Run the following command in R console
selServ <- RSelenium::startServer(args = c("-port 5556"))
remDr <- RSelenium::remoteDriver(extraCapabilities = list(marionette = TRUE), port=5556)
remDr$open()
......
# when finished
selServ$stop()
For chrome:
brew install chromedriver
Running selenium at the default port on Mac has an issue. Run the following command in R console
selServ <- RSelenium::startServer(args = c("-port 5556"))
remDr <- RSelenium::remoteDriver(browserName = "chrome",
extraCapabilities = list(marionette = TRUE),
port=5556)
remDr$open()
......
# when finished
selServ$stop()
If the above doesnt help then look at running a Docker container see
http://rpubs.com/johndharrison/RSelenium-Docker and https://github.com/SeleniumHQ/docker-selenium . This basically involves running a Docker container using something like:
$ docker run -d -p 5556:4444 selenium/standalone-chrome:3.0.1-aluminum
then a selenium server and chrome browser should be accessible on port 5556 which you can connect to giving appropriate arguments in remoteDriver.
I am trying to install RCurl package that has sftp support. I install curl with sftp. On console, when I do curl -V, I do get the list of protocols supported:
curl 7.39.0 (x86_64-unknown-linux-gnu) libcurl/7.39.0 OpenSSL/0.9.8j zlib/1.2.7 libssh2/1.4.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smtp smtps telnet tftp
Features: Largefile NTLM NTLM_WB SSL libz
However, when I try to install RCurl version RCurl_1.95, I dont see sftp as one of the protocols:
curlVersion()$protocols
[1] "tftp" "ftp" "telnet" "dict" "ldap" "ldaps" "http" "file"
[9] "https" "ftps"
Is there a way to force Rcurl include sftp when manually installing the RCurl from the source?
I do not thinks so. While curl/libcurl might provide a huge amount of protocols -- as far as I understand the matter -- RCurl does not port all theoretically available protocols to be used in R.
You might either do that yourself or maybe kindly ask Duncan Temple Lang to add further protocols. A workaround might be to access curl from within R via shell() like this:
shell("curl example.com", intern = TRUE)