Using package RCurl ftpUpload with # symbol in the username - r

I have been using the RCurl package to pull data into memory from an SFTP and am trying to upload the transformed data into a different SFTP. The problem I am having is that the username assigned on the new SFTP has an # sign in it. When I try to run the code below (sensitive info removed):
ftpUpload(what = file,
to = "sftp://user#school.edu:password#site.net/incoming/subfolder/data.csv")
The following error appears:
Error in function (type, msg, asError = TRUE) :
Failed to connect to school.edu port 22: Timed out
The # sign is creating an issue where the file is attempting to upload to the wrong location(school.edu as opposed to site.net). Unfortunately, I’m unable to alter the username as I’m told the site automatically generates usernames and will always use an # sign. I really don’t know that much about SFTPs, so any help would be appreciated, even if that means working outside of R for a solution.

Perhaps a safer way to pass the username and password is via the userpwd= parameter. For example
ftpUpload(what = file,
to = "sftp://site.net/incoming/subfolder/data.csv",
userpwd="user#school.edu:password")

Related

How can I read a json file from a remote server in R?

So I have a collection on json files located on my local machine that I am reading in currently using the command
file <- tbl_df(ndjson::stream_in("path/to/file.json")
I have copied these files to a linux server (using WinSCP) and I want to stream them in to my R session just like I did in the above code with ndjson. When searching for ways to do this I came across one method using RCurl that looked like this
file <- scp(host = "hostname", "path/to/file.json", "pass", "user")
but that returned an error
Error in function (type, msg, asError = TRUE) : Authentication failure
but either way I want to avoid copying my passphrase into my Rscript as other will see this script. I also came across a method suggesting this
d <- read.table(pipe('ssh -l user host "cat path/to/file.json"'))
however this command returned the error
no lines available in input
and I believe read.table would cause me issues anyways. Does anyone know I way I could read new line delimited json files from a remote server into an R session? Thank you in advance! Let me know if I can make my question more clear.

Delete files from SFTP using R Studio

I need to delete files from an FTP site once I have processed them in R (parsing content). However, nothing I try seems to work.
this is what ive trying, and variations of.
library(RCurl)
curlPerform(url="sftp://user:password#sftplocation/folder/", quote="DELE filename.pdf")
curlPerform(url="ftp://xxx.xxx.xxx.xxx/", quote="DELE file.txt", userpwd = "user:pass")
Error is
Error in function (type, msg, asError = TRUE) : Unknown SFTP command
When I run the following code, I get a lovely list of all the files (which is used to download them). So I know the connection is working just great, and the parsing from the downloaded files works great!
curlPerform(url="sftp://user:password#sftplocation/folder/")
Thanks,
Siobhan
To delete over sftp, use rm instead of DELE - which looks like an ftp rather than an sftp command.
Then make sure you have the full file path. This works for me:
curlPerform(
url="sftp://me#host.example.com/",
.opts=list(
ssh.public.keyfile=pub,
ssh.private.keyfile=pri),
verbose=TRUE,
quote="rm /home/me/test/test.txt")
Note I've put my credentials in some key files so I don't put the password in plain text in the code.
I'm not convinced this is the best way to do it, since I can't stop it printing the contents of the URL... There's might be an option...

Importing a csv.gz file from an FTP server with Port and Directory Credentials into R

I want to import datasets in R that come from an FTP server. I am using FileZilla to manually see the files. Currently my data is in a xxxx.csv.gz file in the FTP server and a new file gets added once a day.
My issues are that I have tried using the following link as guidance and it doesn't seem to work well in my case:
Using R to download newest files from ftp-server
When I attempt the following code an error message comes up:
library(RCurl)
url <- "ftp://yourServer"
userpwd <- "yourUser:yourPass"
filenames <- getURL(url, userpwd = userpwd,
ftp.use.epsv = FALSE,dirlistonly = TRUE)
Error:
Error in function (type, msg, asError = TRUE) :
Failed to connect to ftp.xxxx.com port 21: Timed out
The reason why this happened is because under the credentials: it states that I should use Port: 22 for secure port
How do I modify my getURL function so that I can access Port: 22?
Also there is a directory after making this call that I need to get to in order to access the files.
For example purposes: let's say the directory is:
Directory: /xxx/xxxxx/xxxxxx
(I've also tried attaching this to the original URL callout and the same error message comes up)
Basically I want to get access to this directory, upload individual csv.gz files into R and then automatically call the following day's data.
The file names are:
XXXXXX_20160205.csv.gz
(The file names are just dates and each file will correspond to the previous day)
I guess the first step is to just make a connection to the files and download them and later down the road, automatically call the previous day's csv.gz file.
Any help would be great, thanks!

R: Specify SSL version in Rcurl getURL statement

Aim
I am trying to write a statement to connect to a sharepoint online list to retrieve the data for use in R.
Background
This is my first time using RCurl/curl/libcurl and I've attempted to read the documentation but it's beyond me and doesn't have any relevant examples.
Using
a<-getURL("https://example.sharepoint.com/_vti_bin/listdata.svc/Glossary")
Resulted in an error
Error in function (type, msg, asError = TRUE) :
Unknown SSL protocol error in connection to example.sharepoint.com:443
A combination of google and deciphering of cryptic libcurl documentation identified that the issue was due to the SSL type. I used curl in shell to verify and I got a positive response from the sharepoint server as a result (well, except the need to add the user name and password)
curl https://example.sharepoint.com/_vti_bin/listdata.svc/Glossary -v -SSLv3
Problem
I cannot work out how to include the -SSLv3 argument in my getURL() function after RTFMs
Requested solution
Amend this statement to utilise SSLv3
a<-getURL("https://example.sharepoint.com/_vti_bin/listdata.svc/Glossary")
For me this works:
a<-getURL("https://example.sharepoint.com/_vti_bin/listdata.svc/Glossary", ssl.verifypeer=TRUE, sslversion=4L)

How to open Excel 2007 File from Password Protected Sharepoint 2007 site in R using RODBC or RCurl?

I am interested in opening an Excel 2007 file in R 2.11.1 using RODBC. The Excel file resides in the shared documents page of a MOSS2007 website. I currently download the .xlsx file to my hard drive and then import to R using the following code:
library(RODBC)
con<-odbcConnectExcel2007("C:/file location/file.xlsx")
data<-sqlFetch(con, "worksheet name")
close(con)
When I type in the web url for the document into the odbcConnectExcel2007 connection, an error message pops up with:
ODBC Excel Driver Login Failed: Invalid internet Address.
followed by the following message in my R console:
ERROR: Could not SQLDriverConnect
Any insights you can provide would be greatly appreciated.
Thanks!
**UPDATE**
The site I am attempting to download from is password protected. I tried another method using the method 'getUrl' in the package RCurl:
x = getURL("http://website.com/file.xlsx", userpwd = "uname:pw")
The error that I receive is:
Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
embedded nul in string: 'PK\003\004\024\0\006\0\b\0\0\0!\0dA»ï\001\0\0O\n\0\0\023\0Ò\001[Content_Types].xml ¢Î\001( \0\002\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
I have no idea what this means. Any help would be appreciated. Thanks!
Two solutions worked for me.
If you do not need to automate the script that pulls the data, you can map a network drive pointing to the sharepoint folder from which you want to extract the Excel document.
If you need to automate a script to pull the Excel file every couple of minutes, I recommend sending your authentication credentials in a request that automatically saves the file to a local drive. From there you can read it into R for further data wrangling.
library("httr")
library("openxlsx")
user <- <USERNAME>
password <- <PASSWORD>
url <- "https://sharepoint.company/file_to_obtain.xlsx"
httr::GET(url,
authenticate(user, password, type="ntlm"),
write_disk("C:/tempfile.xlsx", overwrite = TRUE))
df <- openxlsx::read.xlsx("C:/tempfile.xlsx")
You can obtain the correct URL to the file by clicking on the sharepoint location and removing "?Web=1" after the file ending (xlsx, xlsb, xls,...). USERNAME and PASSWORD are usually windows credentials. It helps storing them in a key manager (such as:
library("keyring")
keyring::key_set_with_value(service = "Windows", username = "Key", password = <PASSWORD>)
and then authenticating via
authenticate(user, kreyring::key_get("Windows", "Key"), type="ntlm")
in some instances it may be sufficient to pass
authenticate(":", ":", type="ntlm")
if only your Windows credentials are required and the code is running from your machine.

Resources