I am trying to download an Excel in Sharepoint site into an R dataset. But I keep getting errors. I have also tried searching for solutions across forums for this issue. Although I got many updates, none of them were working for me. Here is what I have tried so far.
METHOD 1:
library(readxl)
library(httr)
url1 <- 'http://<companyname>.sharepoint.com/sites/<sitename>/Shared%20Documents/General/TRACKERS/<FolderName>/<TrackerName>.xlsx?d=wbae96ce171e14926863e453a8bec146a?Web=0'
GET(url1,write_disk(tf <- tempfile(fileext = ".xlsx")))
df <- read_excel(tf,sheet = "sheetname")
OUTPUT 1:
GET(url1,write_disk(tf <- tempfile(fileext = ".xlsx")))
Response ['https://<companyname>.sharepoint.com/sites/<sitename>/Shared%20Documents/General/TRACKERS/<FolderName>/<TrackerName>.xlsx?d=wbae96ce171e14926863e453a8bec146a?Web=0']
Date: 2020-08-06 08:43
Status: 400
Content-Type: text/html; charset=us-ascii
Size: 311 B
<ON DISK> C:\Users\
\<username>\AppData\Local\Temp\RtmpuM3YpD\file2c646e4c5d50.xlsx
df <- read_excel(tf,sheet = "sheetname")
Error: Evaluation error: zip file 'C:\Users\<username>\AppData\Local\Temp\RtmpuM3YpD\file2c646e4c5d50.xlsx' cannot be opened.
Please note that I had added “?Web=0” at the end of the url to make the xls directly download.
METHOD 2:
url1 <- 'http://<companyname>.sharepoint.com/sites/<sitename>/Shared%20Documents/General/TRACKERS/<FolderName>/<TrackerName>.xlsx?d=wbae96ce171e14926863e453a8bec146a?Web=0'
destfile <- "C:/Users/<username> /Downloads/<TrackerName>.xlsx"
download.file(url = url1,destfile = destfile)
df <- read_excel(destfile,sheet = "sheetname")
OUTPUT 2:
trying URL …
cannot open URL …
HTTP status was '403 FORBIDDEN'Error in download.file(url = url1, destfile = destfile) :
cannot open URL …
METHOD 3:
url1 <- 'http://<companyname>.sharepoint.com/sites/<sitename>/Shared%20Documents/General/TRACKERS/<FolderName>/<TrackerName>.xlsx?d=wbae96ce171e14926863e453a8bec146a?Web=0'
GET(url1,authenticate("<myusername>","<mypassword>", type = "any"),write_disk(tf <- tempfile(fileext = ".xls")))
df <- read_excel(tf,sheet = "sheetname")
OUTPUT 3:
GET(url1,authenticate("<myusername>","<mypassword>", type = "any"),write_disk(tf <- tempfile(fileext = ".xls")))
Response ['https://<companyname>.sharepoint.com/sites/<sitename>/Shared%20Documents/General/TRACKERS/<FolderName>/<TrackerName>.xlsx?d=wbae96ce171e14926863e453a8bec146a?Web=0']
Date: 2020-08-06 09:04
Status: 400
Content-Type: text/html; charset=us-ascii
Size: 311 B
\<ON DISK> C:\Users\<username>\AppData\Local\Temp\RtmpuM3YpD\ file2c6456bd6d20.xlsx
df <- read_excel(tf,sheet = "sheetname")
Error: Evaluation error: zip file 'C:\Users\<username>\AppData\Local\Temp\RtmpuM3YpD\ file2c6456bd6d20.xlsx' cannot be opened.
Of course, Initially, I tried reading the excel from Sharepoint directly (Method 4 below). But that didn’t work. Then I tried the above methods, by first downloading the Excel and then importing to a dataset.
METHOD 4:
url1 <- 'http://<companyname>.sharepoint.com/sites/<sitename>/Shared%20Documents/General/TRACKERS/<FolderName>/<TrackerName>.xlsx?d=wbae96ce171e14926863e453a8bec146a?Web=0'
df <- read.xlsx(file = url1,sheetName = " sheetname")
OUTPUT 4:
Error in loadWorkbook(file, password = password) :
Cannot find <url> …
I encounter the same issue as you did. And I thought there are some problem with url. So I did this instead of directly copy url above the browser :
Find your file in Sharepoint site and click "Show actions"(3 dots) buttons of your file.
Click "Details", and then find "Path" at the end of details information.
Click "Copy" icon.
Last, follow the METHOD 1 you use :
GET(url1,write_disk(tf <- tempfile(fileext = ".xlsx")))
readxl::read_excel(tf, sheet = "Sheet 1")
This works on me. Hope it can be useful to you.
Related
is there a way to do error handling using function "read_excel" from R when the file exists, but it can't be read for some other reason (e.g., wrong format or something)?
Just to illustrate, my piece of code is as follows:
f <- GET(url, authenticate(":", ":", type="ntlm"), write_disk(tf <- tempfile(tmpdir = here("data/temp"), fileext = ".xlsx")))
dt <- read_excel(tf)
where url contains the http file address.
I would like to check if read_excel returns an error to do the proper handling and prevent the markdown stops.
Thanks in advance!
Looks like a duplicate question. The code below is modified from the answer found here. The someOtherFunction in the code below is where one can have some function to run if there is an error.
f <- GET(url, authenticate(":", ":", type="ntlm"), write_disk(tf <- tempfile(tmpdir = here("data/temp"), fileext = ".xlsx")))
t <- try(read_excel(tf))
if("try-error" %in% class(t)) SomeOtherFunction()
I am trying to import csv-files from a ftp server to R.
It would be best to import files into a dataframe.
I want to import only specific files from ftp server, not all of the files.
My issues began by trying to import only one file:
url <- "ftp:servername.de/"
download.file(url, "testdata.csv")
I got this error message:
try URL 'ftp://servername/'
Fehler in download.file(url, "testdata") :
can not open 'ftp://servername.de/'
Additional Warning
In download.file(url, "tesdata.csv") :
URL 'ftp://servername/': status was 'Couldn't connect to server'
Another way I tried was:
url <- "ftp://servername.de/"
userpwd <- "a:n"
filenames <- getURL(url, userpwd = userpwd
,ftp.use.epsv = FALSE, dirlistonly = TRUE
)
Here I do not understand how to import the files into an R-Object.
Additionally, it would be great to get a clue on how to handle this process with zipped data instead of csv-data (format: .gz)
Use the curl library to extract the directory listing
library(curl)
url = "ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/11/PXD000299/"
h = new_handle(dirlistonly=TRUE)
con = curl(url, "r", h)
tbl = read.table(con, stringsAsFactors=TRUE, fill=TRUE)
close(con)
head(tbl)
V1
1 12-0210_Druart_Uterus_J0N-Co_1a_ORBI856.raw.mzML
2 12-0210_Druart_Uterus_J0N-Co_2a_ORBI857.raw.mzML
3 12-0210_Druart_Uterus_J0N-Co_3a_ORBI858.raw.mzML
4 12-0210_Druart_Uterus_J10N-Co_1a_ORBI859.raw.mzML
5 12-0210_Druart_Uterus_J10N-Co_2a_ORBI860.raw.mzML
6 12-0210_Druart_Uterus_J10N-Co_3a_ORBI861.raw.mzML
Paste the relevant ones on to the url and use
urls <- paste0(url, tbl[1:5,1])
fls = basename(urls)
curl_fetch_disk(urls[1], fls[1])
Reference:
Downloading files from ftp with R
I want to download a number of excel files from a website. The website requires a username and password, but I can log in manually before running the code. After logging in, if I manually copy the url to my browser (chrome) it downloads file for me. But when I do it in R, closest it gives me a text file which looks like HTML code of the same website from which I need to download the file. Also note the URL structure. There is an "occ4.xlsx" in the middle which I believe is the file.
I can also explain the other parameters in url:
country=JP (country - Japan)
jloc=state-1808 (state id - it will change with states)
...
...
time frame etc etc
Here is what I have tried:
Iteration 1 (inbuilt methods):
url <- "https://www.wantedanalytics.com/wa/counts/occ4.xlsx?country=JP&jloc=state-1808&mapview=msa&methodology=available&t%5Bsegment%5D%5Bperiod_prior%5D=count&t%5Bsegment%5D%5Bperiod_timeframe%5D=count&t%5Bsegment%5D%5Bperiod_type%5D=&t%5Bsegment%5D%5Bqty%5D=1000&t%5Btimeframe%5D=f2013-10-17-2017-02-17&timeframe=f2013-09-28-2017-02-17"
url_ns <- "http://www.wantedanalytics.com/wa/counts/occ4.xlsx?country=JP&jloc=state-1808&mapview=msa&methodology=available&t%5Bsegment%5D%5Bperiod_prior%5D=count&t%5Bsegment%5D%5Bperiod_timeframe%5D=count&t%5Bsegment%5D%5Bperiod_type%5D=&t%5Bsegment%5D%5Bqty%5D=1000&t%5Btimeframe%5D=f2013-10-17-2017-02-17&timeframe=f2013-09-28-2017-02-17"
destfile <- "test"
download.file(url, destfile,method="auto")
download.file(url, destfile,method="wininet")
download.file(url, destfile,method="auto", mode="wb")
download.file(url, destfile,method="wininet", mode="wb")
download.file(url_ns, destfile,method="auto")
download.file(url_ns, destfile,method="wininet")
download.file(url_ns, destfile,method="auto", mode="wb")
download.file(url_ns, destfile,method="wininet", mode="wb")
#all of above download the webpage and not the file
Iteration 2 (using RCurl):
# install.packages("RCurl")
library(RCurl)
library(readxl)
x <- getURL(url)
y <- getURL(url, ssl.verifypeer = FALSE)
z <- getURL(url, ssl.verifypeer = FALSE, ssl.verifyhost=FALSE)
identical(x,y) #TRUE
identical(y,z) #TRUE
x
[1] "<html><body>You are being redirected.</body></html>"
# **Note the text about redirect**
out <- readxl::read_xlsx(textConnection(x)) # I know it won't work
#Error in read_fun(path = path, sheet = sheet, limits = limits, shim = shim, :
Expecting a single string value: [type=integer; extent=1].
w = substr(x,36,nchar(x)-31) #removing redirect text
identical(w,url) # FALSE
out <- readxl::read_xlsx(textConnection(w))
#Error in read_fun(path = path, sheet = sheet, limits = limits, shim = shim, :
Expecting a single string value: [type=integer; extent=1].
download.file(w, destfile,method="auto")
#Downloads the webpage again
download.file(url_ns,destfile,method="libcurl")
#Downloads the webpage again
I also tried downloader package but same results!
I can't share username and password on this question, but if you are trying your hands on this problem, do let me know by comment/pm and I will share the same with you!
I am trying to get files from this FTP
ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/11/PXD000299/
From there, I need only the files starting with the .dat extension. But there are other files that I am not interested in.
I want to avoid downloading each one at a time, so I thought in creating a vector with the names and loop over them.
How can I download only the files I want?
Thanks
EDIT:
I have tried doing the following
downloadURL <- "ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/11/PXD000299/F010439.dat"
download.file(downloadURL, "F010439.dat") #this is a trial using one file
And after a few seconds I get the following error:
trying URL
'ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/11/PXD000299/F010439.dat'
Error in download.file(downloadURL, "F010439.dat") :
cannot open URL 'ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/11/PXD000299/F010439.dat'
In addition: Warning message:
In download.file(downloadURL, "F010439.dat") :
InternetOpenUrl failed: 'Die FTP-Sitzung wurde beendet.
'
Use the curl library to extract the directory listing
> library(curl)
> url = "ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/11/PXD000299/"
> h = new_handle(dirlistonly=TRUE)
> con = curl(url, "r", h)
> tbl = read.table(con, stringsAsFactors=TRUE, fill=TRUE)
> close(con)
> head(tbl)
V1
1 12-0210_Druart_Uterus_J0N-Co_1a_ORBI856.raw.mzML
2 12-0210_Druart_Uterus_J0N-Co_2a_ORBI857.raw.mzML
3 12-0210_Druart_Uterus_J0N-Co_3a_ORBI858.raw.mzML
4 12-0210_Druart_Uterus_J10N-Co_1a_ORBI859.raw.mzML
5 12-0210_Druart_Uterus_J10N-Co_2a_ORBI860.raw.mzML
6 12-0210_Druart_Uterus_J10N-Co_3a_ORBI861.raw.mzML
Paste the relevant ones on to the url and use
urls <- paste0(url, tbl[1:5,1])
fls = basename(urls)
curl_fetch_disk(urls[1], fls[1])
I'm trying to get a data table off of a website using the RCurl package. My code works successfully for the URL that you get to by clicking through the website:
http://statsheet.com/mcb/teams/air-force/game_stats/
Once you try to select previous years (which I want); my code no longer works.
Example link:
http://statsheet.com/mcb/teams/air-force/game_stats?season=2012-2013
I'm guessing this has something to do with the reserved symbol(s) in the year specific address. I've tried URLencode as well as manually encoding the address but that hasn't worked either.
My code:
library(RCurl)
library(XML)
#Define URL
theurl <-URLencode("http://statsheet.com/mcb/teams/air-force/game_stats?season=2012-
2013", reserved=TRUE)
webpage <- getURL(theurl)
webpage <- readLines(tc <- textConnection(webpage)); close(tc)
pagetree <- htmlTreeParse(webpage, error=function(...){}, useInternalNodes = TRUE)
# Extract table header and contents
tablehead <- xpathSApply(pagetree, "//*/table[1]/thead[1]/tr[2]/th", xmlValue)
results <- xpathSApply(pagetree,"//*/table[1]/tbody/tr/td", xmlValue)
content <- as.data.frame(matrix(results, ncol = 19, byrow = TRUE))
testtablehead <- c("W/L","Opponent",tablehead[c(2:18)])
names(content) <- testtablehead
The relevant error that R returns:
Error in function (type, msg, asError = TRUE) :
Could not resolve host: http%3a%2f%2fstatsheet.com%2fmcb%2fteams%2fair-
force%2fgame_stats%3fseason%3d2012-2013; No data record of requested type
Does anyone have an idea what the problem is and how to fix it?
Skip the unneeded encoding and download of the url:
library(XML)
url <- "http://statsheet.com/mcb/teams/air-force/game_stats?season=2012-2013"
pagetree <- htmlTreeParse(url, useInternalNodes = TRUE)