This question already has answers here:
Downloading Excel File Using R
(3 answers)
Closed 4 years ago.
The following code
library(readxl)
url <- "http://www.econ.yale.edu/~shiller/data/ie_data.xls"
destfile <- "ie_data.xls"
download.file(url, destfile)
ie_data <- read_xls(destfile, sheet="Data", skip = 7)
produces Error in sheets_fun(path) : Failed to open ie_data.xls
One thing that perplexes me is that if goto the URL and download the file manually I can use read_xls to open it. I think the issue may be with download.file function.
I'd like to be able to read this Excel file directly from the URL or at least download it and read it without doing so manually. I'm on a Window x86_64 system using R 3.5.1 and readxl version 1.1.0. Thanks.
I still don't know why the code above doesn't work. Using this SO post, I find that the following code will work:
library(httr)
library(readxl)
url <- "http://www.econ.yale.edu/~shiller/data/ie_data.xls"
GET(url, write_disk(tf <- tempfile(fileext = ".xls")))
ie_data <- read_excel(tf, sheet="Data", skip = 7)
Because you are using windows you have to specify binary mode
download.file(url, destfile, mode="wb")
Related
I have a secure Url which provides data in Excel format. How do I read it in R studio?
Please mention the necessary package and functions. I have tried read.xls(),
read_xlsx, read.URL and some more. Nothing seems to work.
You can do it in two steps. First, you'll need to download it with something like download.file, then read it with readxl::read_excel
download.file("https://file-examples.com/wp-content/uploads/2017/02/file_example_XLS_10.xls", destfile = "/tmp/file.xls")
readxl::read_excel("/tmp/file.xls")
library(readxl)
library(httr)
url<-'https://......xls'
GET(url, write_disk(TF <- tempfile(fileext = ".xls")))
read_excel(TF)
Have you tried importing it as a .csv dataset into RStudio? Might be worth a try!:)
I want to download some files (NetCDF although I don't think that matters) from a website
and write them to a specified data directory on my hard drive. Some code that illustrates my problems follows
library(curl)
baseURL <- "http://gsweb1vh2.umd.edu/LUH2/LUH2_v2f/"
fileChoice <- "IMAGE_SSP1_RCP19/multiple-states_input4MIPs_landState_ScenarioMIP_UofMD-IMAGE-ssp119-2-1-f_gn_2015-2100.nc"
destDir <- paste0(getwd(), "/data-raw/")
url <- paste0(baseURL, fileChoice)
destfile <- paste0(destDir, "test.nc")
curl_download(url, destfile) # this one works
destfile <- paste0(destDir, fileChoice)
curl_download(url, destfile) #this one fails
The error message is
Error in curl_download(url, destfile) :
Failed to open file /Users/gcn/Documents/workspace/landuse/data-raw/IMAGE_SSP1_RCP19/multiple-states_input4MIPs_landState_ScenarioMIP_UofMD-IMAGE-ssp119-2-1-f_gn_2015-2100.nc.curltmp.
It turns out the curl_download internally adds .curltmp to destfile and then removes it. I can't figure out what is writing
It turns out that the problem is the fileChoice variable includes a new directory; IMAGE_SSP1_RCP19. Once I created the directory the process worked fine. I'm posting this because someone else might make the same mistake I did.
I'm using a microsoft app (from http://portal.office.com) to translate and stock tweet on an online excel sheet, now I want to read it with R.
The data in excel sheet url https://myagency-my.sharepoint.com/.../tweet.xlsx
I tried:
library(readxl)
read_excel('//companySharepointSite/project/.../ExcelFilename.xlsx', Sheet1', skip=1)`
from this post. It gives:
Error in sheets_fun(path) : Evaluation error: zip file
I believe read_excel works only with local files. This might do the trick:
library(xlsx)
library(httr)
url <- 'https://myagency-my.sharepoint.com/.../tweet.xlsx'
GET(url, write_disk("excel.xlsx", overwrite=TRUE))
frmData <- read.xlsx("excel.xlsx", sheetIndex=1, header=TRUE)
I am getting an error from fread:
Internal error: ch>eof when detecting eol
when trying to read a csv file downloaded from an https server, using R 3.2.0. I found something related on Github, https://github.com/Rdatatable/data.table/blob/master/src/fread.c, but don't know how I could use this, if at all. Thanks for any help.
Added info: the data was downloaded from here:
fileURL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv"
then I used
download.file(fileURL, "Idaho2006.csv", method = "Internal")
The problem is that download.file doesn't work with https with method=internal unless you're on Windows and set an option. Since fread uses download.file when you pass it a URL and not a local file, it'll fail. You have to download the file manually then open it from a local file.
If you're on Linux or have either of the following already then do method=wget or method=curl instead
If you're on Windows and don't have either and don't want to download them then do setInternet2(use = TRUE) before your download.file
http://www.inside-r.org/r-doc/utils/setInternet2
For example:
fileURL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv"
tempf <- tempfile()
download.file(fileURL, tempf, method = "curl")
DT <- fread(tempf)
unlink(tempf)
Or
fileURL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv"
tempf <- tempfile()
setInternet2 = TRUE
download.file(fileURL, tempf)
DT <- fread(tempf)
unlink(tempf)
fread() now utilises curl package for downloading files. And this seems to work just fine atm:
require(data.table) # v1.9.6+
fread(fileURL, showProgress = FALSE)
The easiest way to fix this problem in my experience is to just remove the s from https. Also remove the method you don't need it. My OS is Windows and i have tried the following code and works.
fileURL <- "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv"
download.file(fileURL, "Idaho2006.csv")
I run an automated script to download 3 .xls files from 3 websites every hour. When I later try to read in the .xls files in R to further work with them, R produces the following error message:
"Error: IOException (Java): block[ 2 ] already removed - does your POIFS have circular or duplicate block references?"
When I manually open and save the .xls files this problem doesn't appear anymore and everything works normal, but since the total number of files is increasing with 72 every day this is not a nice work around.
The script I use to download and save the files:
library(httr)
setwd("WORKDIRECTION")
orig_wd <- getwd()
FOLDERS <- c("NAME1","NAME2","NAME3") #representing folder names
LINKS <- c("WEBSITE_1", #the urls from which I download
"WEBSITE_2",
"WEBSITE_3")
NO <- length(FOLDERS)
for(i in 1:NO){
today <- as.character(Sys.Date())
if (!file.exists(paste(FOLDERS[i],today,sep="/"))){
dir.create(paste(FOLDERS[i],today,sep="/"))
}
setwd(paste(orig_wd,FOLDERS[i],today,sep="/"))
dat<-GET(LINKS[i])
bin <- content(dat,"raw")
now <- as.character(format(Sys.time(),"%X"))
now <- gsub(":",".",now)
writeBin(bin,paste(now,".xls",sep=""))
setwd(orig_wd)
}
I then read in the files with the following script:
require(gdata)
require(XLConnect)
require(xlsReadWrite)
wb = loadWorkbook("FILEPATH")
df = readWorksheet(wb, "Favourite List" , header = FALSE)
Does anybody have experience with this type of error, and knows a solution or workaround?
The problem is partly resolved by using the readxl package available in the CRAN library. After installation files can be read in with:
library(readxl)
read_excel("PathToFile")
The only problem is, that the last column is omitted while reading in. If I find a solution for this I'll update the awnser.