R import csv files from FTP Server to data-frame - r

I am trying to import csv-files from a ftp server to R.
It would be best to import files into a dataframe.
I want to import only specific files from ftp server, not all of the files.
My issues began by trying to import only one file:
url <- "ftp:servername.de/"
download.file(url, "testdata.csv")
I got this error message:
try URL 'ftp://servername/'
Fehler in download.file(url, "testdata") :
can not open 'ftp://servername.de/'
Additional Warning
In download.file(url, "tesdata.csv") :
URL 'ftp://servername/': status was 'Couldn't connect to server'
Another way I tried was:
url <- "ftp://servername.de/"
userpwd <- "a:n"
filenames <- getURL(url, userpwd = userpwd
,ftp.use.epsv = FALSE, dirlistonly = TRUE
)
Here I do not understand how to import the files into an R-Object.
Additionally, it would be great to get a clue on how to handle this process with zipped data instead of csv-data (format: .gz)

Use the curl library to extract the directory listing
library(curl)
url = "ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/11/PXD000299/"
h = new_handle(dirlistonly=TRUE)
con = curl(url, "r", h)
tbl = read.table(con, stringsAsFactors=TRUE, fill=TRUE)
close(con)
head(tbl)
V1
1 12-0210_Druart_Uterus_J0N-Co_1a_ORBI856.raw.mzML
2 12-0210_Druart_Uterus_J0N-Co_2a_ORBI857.raw.mzML
3 12-0210_Druart_Uterus_J0N-Co_3a_ORBI858.raw.mzML
4 12-0210_Druart_Uterus_J10N-Co_1a_ORBI859.raw.mzML
5 12-0210_Druart_Uterus_J10N-Co_2a_ORBI860.raw.mzML
6 12-0210_Druart_Uterus_J10N-Co_3a_ORBI861.raw.mzML
Paste the relevant ones on to the url and use
urls <- paste0(url, tbl[1:5,1])
fls = basename(urls)
curl_fetch_disk(urls[1], fls[1])
Reference:
Downloading files from ftp with R

Related

Unable to import from Sharepoint Excel file to an R dataset

I am trying to download an Excel in Sharepoint site into an R dataset. But I keep getting errors. I have also tried searching for solutions across forums for this issue. Although I got many updates, none of them were working for me. Here is what I have tried so far.
METHOD 1:
library(readxl)
library(httr)
url1 <- 'http://<companyname>.sharepoint.com/sites/<sitename>/Shared%20Documents/General/TRACKERS/<FolderName>/<TrackerName>.xlsx?d=wbae96ce171e14926863e453a8bec146a?Web=0'
GET(url1,write_disk(tf <- tempfile(fileext = ".xlsx")))
df <- read_excel(tf,sheet = "sheetname")
OUTPUT 1:
GET(url1,write_disk(tf <- tempfile(fileext = ".xlsx")))
Response ['https://<companyname>.sharepoint.com/sites/<sitename>/Shared%20Documents/General/TRACKERS/<FolderName>/<TrackerName>.xlsx?d=wbae96ce171e14926863e453a8bec146a?Web=0']
Date: 2020-08-06 08:43
Status: 400
Content-Type: text/html; charset=us-ascii
Size: 311 B
<ON DISK> C:\Users\
\<username>\AppData\Local\Temp\RtmpuM3YpD\file2c646e4c5d50.xlsx
df <- read_excel(tf,sheet = "sheetname")
Error: Evaluation error: zip file 'C:\Users\<username>\AppData\Local\Temp\RtmpuM3YpD\file2c646e4c5d50.xlsx' cannot be opened.
Please note that I had added “?Web=0” at the end of the url to make the xls directly download.
METHOD 2:
url1 <- 'http://<companyname>.sharepoint.com/sites/<sitename>/Shared%20Documents/General/TRACKERS/<FolderName>/<TrackerName>.xlsx?d=wbae96ce171e14926863e453a8bec146a?Web=0'
destfile <- "C:/Users/<username> /Downloads/<TrackerName>.xlsx"
download.file(url = url1,destfile = destfile)
df <- read_excel(destfile,sheet = "sheetname")
OUTPUT 2:
trying URL …
cannot open URL …
HTTP status was '403 FORBIDDEN'Error in download.file(url = url1, destfile = destfile) :
cannot open URL …
METHOD 3:
url1 <- 'http://<companyname>.sharepoint.com/sites/<sitename>/Shared%20Documents/General/TRACKERS/<FolderName>/<TrackerName>.xlsx?d=wbae96ce171e14926863e453a8bec146a?Web=0'
GET(url1,authenticate("<myusername>","<mypassword>", type = "any"),write_disk(tf <- tempfile(fileext = ".xls")))
df <- read_excel(tf,sheet = "sheetname")
OUTPUT 3:
GET(url1,authenticate("<myusername>","<mypassword>", type = "any"),write_disk(tf <- tempfile(fileext = ".xls")))
Response ['https://<companyname>.sharepoint.com/sites/<sitename>/Shared%20Documents/General/TRACKERS/<FolderName>/<TrackerName>.xlsx?d=wbae96ce171e14926863e453a8bec146a?Web=0']
Date: 2020-08-06 09:04
Status: 400
Content-Type: text/html; charset=us-ascii
Size: 311 B
\<ON DISK> C:\Users\<username>\AppData\Local\Temp\RtmpuM3YpD\ file2c6456bd6d20.xlsx
df <- read_excel(tf,sheet = "sheetname")
Error: Evaluation error: zip file 'C:\Users\<username>\AppData\Local\Temp\RtmpuM3YpD\ file2c6456bd6d20.xlsx' cannot be opened.
Of course, Initially, I tried reading the excel from Sharepoint directly (Method 4 below). But that didn’t work. Then I tried the above methods, by first downloading the Excel and then importing to a dataset.
METHOD 4:
url1 <- 'http://<companyname>.sharepoint.com/sites/<sitename>/Shared%20Documents/General/TRACKERS/<FolderName>/<TrackerName>.xlsx?d=wbae96ce171e14926863e453a8bec146a?Web=0'
df <- read.xlsx(file = url1,sheetName = " sheetname")
OUTPUT 4:
Error in loadWorkbook(file, password = password) :
Cannot find <url> …
I encounter the same issue as you did. And I thought there are some problem with url. So I did this instead of directly copy url above the browser :
Find your file in Sharepoint site and click "Show actions"(3 dots) buttons of your file.
Click "Details", and then find "Path" at the end of details information.
Click "Copy" icon.
Last, follow the METHOD 1 you use :
GET(url1,write_disk(tf <- tempfile(fileext = ".xlsx")))
readxl::read_excel(tf, sheet = "Sheet 1")
This works on me. Hope it can be useful to you.

R Downloading an excel file from HTTPS site with redirect

I want to download a number of excel files from a website. The website requires a username and password, but I can log in manually before running the code. After logging in, if I manually copy the url to my browser (chrome) it downloads file for me. But when I do it in R, closest it gives me a text file which looks like HTML code of the same website from which I need to download the file. Also note the URL structure. There is an "occ4.xlsx" in the middle which I believe is the file.
I can also explain the other parameters in url:
country=JP (country - Japan)
jloc=state-1808 (state id - it will change with states)
...
...
time frame etc etc
Here is what I have tried:
Iteration 1 (inbuilt methods):
url <- "https://www.wantedanalytics.com/wa/counts/occ4.xlsx?country=JP&jloc=state-1808&mapview=msa&methodology=available&t%5Bsegment%5D%5Bperiod_prior%5D=count&t%5Bsegment%5D%5Bperiod_timeframe%5D=count&t%5Bsegment%5D%5Bperiod_type%5D=&t%5Bsegment%5D%5Bqty%5D=1000&t%5Btimeframe%5D=f2013-10-17-2017-02-17&timeframe=f2013-09-28-2017-02-17"
url_ns <- "http://www.wantedanalytics.com/wa/counts/occ4.xlsx?country=JP&jloc=state-1808&mapview=msa&methodology=available&t%5Bsegment%5D%5Bperiod_prior%5D=count&t%5Bsegment%5D%5Bperiod_timeframe%5D=count&t%5Bsegment%5D%5Bperiod_type%5D=&t%5Bsegment%5D%5Bqty%5D=1000&t%5Btimeframe%5D=f2013-10-17-2017-02-17&timeframe=f2013-09-28-2017-02-17"
destfile <- "test"
download.file(url, destfile,method="auto")
download.file(url, destfile,method="wininet")
download.file(url, destfile,method="auto", mode="wb")
download.file(url, destfile,method="wininet", mode="wb")
download.file(url_ns, destfile,method="auto")
download.file(url_ns, destfile,method="wininet")
download.file(url_ns, destfile,method="auto", mode="wb")
download.file(url_ns, destfile,method="wininet", mode="wb")
#all of above download the webpage and not the file
Iteration 2 (using RCurl):
# install.packages("RCurl")
library(RCurl)
library(readxl)
x <- getURL(url)
y <- getURL(url, ssl.verifypeer = FALSE)
z <- getURL(url, ssl.verifypeer = FALSE, ssl.verifyhost=FALSE)
identical(x,y) #TRUE
identical(y,z) #TRUE
x
[1] "<html><body>You are being redirected.</body></html>"
# **Note the text about redirect**
out <- readxl::read_xlsx(textConnection(x)) # I know it won't work
#Error in read_fun(path = path, sheet = sheet, limits = limits, shim = shim, :
Expecting a single string value: [type=integer; extent=1].
w = substr(x,36,nchar(x)-31) #removing redirect text
identical(w,url) # FALSE
out <- readxl::read_xlsx(textConnection(w))
#Error in read_fun(path = path, sheet = sheet, limits = limits, shim = shim, :
Expecting a single string value: [type=integer; extent=1].
download.file(w, destfile,method="auto")
#Downloads the webpage again
download.file(url_ns,destfile,method="libcurl")
#Downloads the webpage again
I also tried downloader package but same results!
I can't share username and password on this question, but if you are trying your hands on this problem, do let me know by comment/pm and I will share the same with you!

Using R to download SAS file from ftp-server

I am attempting to download some files onto my local from an ftp-server. I have had success using the following method to move .txt and .csv files from the server but not the .sas7bdat files that I need.
protocol <- "sftp"
server <- "ServerName"
userpwd <- "User:Pass"
tsfrFilename <- "/filepath/file.sas7bdat"
ouptFilename <- "out.sas7bdat"
# Run #
## Download Data
url <- paste0(protocol, "://", server, tsfrFilename)
data <- getURL(url = url, userpwd=userpwd)
## Create File
fconn <- file(ouptFilename)
writeLines(data, fconn)
close(fconn)
When I run the getURL command, however, I am met with the following error:
Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
embedded nul in string:
Does anyone know of any alternative way which I can download a sas7bdat file from an ftp-server to my local, or if there is a way to alter my code below to successfully download the file. Thanks!
As #MrFlick suggested, I solved this problem using getBinaryURL instead of getURL(). Also, I had to use the function write() instead of writeLines(). The result is as follows:
protocol <- "sftp"
server <- "ServerName"
userpwd <- "User:Pass"
tsfrFilename <- "/filepath/file.sas7bdat"
ouptFilename <- "out.sas7bdat"
# Run #
## Download Data
url <- paste0(protocol, "://", server, tsfrFilename)
data <- getBinaryURL(url = url, userpwd=userpwd)
## Create File
fconn <- file(ouptFilename)
write(data, fconn)
close(fconn)
Alternatively, to transform the read data into R data frame, one can use the library haven, as follows
library(haven)
df_data= read_sas(data)

Downloading files from ftp with R

I am trying to get files from this FTP
ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/11/PXD000299/
From there, I need only the files starting with the .dat extension. But there are other files that I am not interested in.
I want to avoid downloading each one at a time, so I thought in creating a vector with the names and loop over them.
How can I download only the files I want?
Thanks
EDIT:
I have tried doing the following
downloadURL <- "ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/11/PXD000299/F010439.dat"
download.file(downloadURL, "F010439.dat") #this is a trial using one file
And after a few seconds I get the following error:
trying URL
'ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/11/PXD000299/F010439.dat'
Error in download.file(downloadURL, "F010439.dat") :
cannot open URL 'ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/11/PXD000299/F010439.dat'
In addition: Warning message:
In download.file(downloadURL, "F010439.dat") :
InternetOpenUrl failed: 'Die FTP-Sitzung wurde beendet.
'
Use the curl library to extract the directory listing
> library(curl)
> url = "ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/11/PXD000299/"
> h = new_handle(dirlistonly=TRUE)
> con = curl(url, "r", h)
> tbl = read.table(con, stringsAsFactors=TRUE, fill=TRUE)
> close(con)
> head(tbl)
V1
1 12-0210_Druart_Uterus_J0N-Co_1a_ORBI856.raw.mzML
2 12-0210_Druart_Uterus_J0N-Co_2a_ORBI857.raw.mzML
3 12-0210_Druart_Uterus_J0N-Co_3a_ORBI858.raw.mzML
4 12-0210_Druart_Uterus_J10N-Co_1a_ORBI859.raw.mzML
5 12-0210_Druart_Uterus_J10N-Co_2a_ORBI860.raw.mzML
6 12-0210_Druart_Uterus_J10N-Co_3a_ORBI861.raw.mzML
Paste the relevant ones on to the url and use
urls <- paste0(url, tbl[1:5,1])
fls = basename(urls)
curl_fetch_disk(urls[1], fls[1])

Download and Read Zip CSV file in R

I am trying to download and read a zipped csv file from Kaggle within an R script. After researching other posts including post1 and post2 I have tried:
# Read data with temp file
url <- "https://www.kaggle.com/c/rossmann-store-sales/download/store.csv.zip"
tmp <- tempfile()
download.file(url, tmp, mode = "wb")
con <- unz(tmp, "store.csv.zip")
store <- read.table(con, sep = ",", header = TRUE)
unlink(tmp)
the read.table command throws an error:
Error in open.connection(file, "rt") : cannot open the connection
I have also tried:
# Download file, unzip, and read
url <- "https://www.kaggle.com/c/rossmann-store-sales/download/store.csv.zip"
download.file(url, destfile = "./SourceData/store.csv.zip", mode = "wb")
unzip("./SourceData/store.csv.zip")
Unzip throws the error:
error 1 in extracting from zip file
Bypassing the unzip command and reading directly from the zip file
store <- read_csv("SourceData/store.csv.zip")
Throws the error:
zip file ... SourceData/store.csv.zip cannot be opened
I prefer to use the temp file, but at this point I'll use either approach if I can make it work.

Resources