I would like to ask how I could download an .ods dataset from web (specifically this site: https://knowledge4policy.ec.europa.eu/territorial/ardeco-online_en?fbclid=IwAR1CPVLzdey8MnMZDLA-9NpvMDAJqMq1WHmm6yu8FtRAk01u9K184wCU7Wc) directly to R? I tried the following read_ODS code
a <- read_ods(path = url("https://knowledge4policy.ec.europa.eu/sites/default/files/RNPTD.ods"), sheet = 1)
and got the error
"Error in file.exists(file) : invalid 'file' argument"
Did I make a mistake here or does read_ods load only local files?
This seems to work fine:
url1 <- "https://knowledge4policy.ec.europa.eu/sites/default/files/RNPTD.ods"
f <- tempfile()
download.file(url1, dest=f)
x <- readODS::read_ods(f)
unlink(f)
That is, you can't read directly from an ODS file located at a URL (or at least, it didn't work for me), but downloading to a temp file and reading works.
Related
I am attempting to load a .Rdata file without having to download the actual file to my computer.
Here is what I have so far:
primate_URL <- "https://ftp.ncbi.nlm.nih.gov/geo/series/GSE118nnn/GSE118546/suppl/GSE118546_macaque_fovea_all_10X_Jan2018.Rdata.gz"
con <- gzcon(url(primate_URL))
load(con)
When I run the script, it returns this error:
Error in load(con) :
the input does not start with a magic number compatible with loading from a connection
Any tips on what I might be doing wrong?
Try following. I tried it with a small file from the same source of same type.
primate_URL <- "https://ftp.ncbi.nlm.nih.gov/geo/series/GSE118nnn/GSE118992/suppl/GSE118992_supplementaryData.Rdata.gz"
download.file(primate_URL, "GSE118992_supplementaryData.Rdata.gz")
file.x <- gzfile("GSE118992_supplementaryData.Rdata.gz", open = "r")
file.x.data <- readLines(file.x, encoding="UTF-8")
(NB- I am very much a beginner in R.)
This is the code I tried:
read_xlsx("valid/url")
For some reason I get the error message:
'path' does not exist:'valid/url'
I know the URL works, I have tested it many times. I am mystified, so any help would be much appreciated.
If I understand your issue correctly, I think you are inputting the URL into the read_xlsx command. Far as I am aware, this will not work if your excel file is online, you will need to download it locally first.
I suggest the following adjustment:
url <- "valid/url"
temp <- tempfile()
download.file(url, temp, mode="wb")
df1 <- read_excel(path = temp)
This will download the excel file into a temporary file, which you can then read into a dataframe, since it will be saved locally.
I am trying and failing to write a process that will download a .zip archive, extract a particular Excel file from that archive, and load that Excel file into my R workspace without ever writing any of those files (the .zip or the .xls) to my hard drive.
I have written a version of this process that works for zipped .csvs, but it doesn't work for .xls. Here's how that version goes, using one of the URLs I'm targeting in my current project and using readWorksheetFromFile() instead of read.csv() at the appropriate moment:
library(XLConnect)
waed.old.link <- "http://eventdata.parusanalytics.com/data.dir/pitf.world.19950101-20121231.xls.zip"
waed.old.file <- "pitf.world.19950101-20121231.xls"
tmp <- tempfile()
download.file(waed.old.link, tmp)
tmp2 <- tempfile()
tmp2 <- unz(tmp, waed.old.file)
WAED.old <- readWorksheetFromFile(tmp2, sheet = 1, startRow = 3, startCol = 1, endCol = 73)
unlink(tmp)
unlink(tmp2)
And here's what pops up after line 8, the one that tries to ingest the spreadsheet as WAED.old:
Error in path.expand(filename) : invalid 'path' argument
I also tried read_excel() at that step and got the same result:
> WAED.old <- read_excel(tmp2, skip = 2)
Error in file.exists(path) : invalid 'file' argument
I gather that this has something to do with pointing readWorksheetFromFile() at a connection rather than a file, but I'm not sure that's right, and I don't know how to fix it if it is. I searched stackoverflow and the web for an answer but couldn't find one that was right on point. I'd really appreciate some help.
As you say, it is because unz returns a connection object for the file within the zip (but does not explicitly unzip that file), while readWorksheetFromFile expects a path to a file.
Use unzip to explicitly unzip the file.
tmp2 <- unzip(zipfile=tmp, files = waed.old.file, exdir=tempdir())
# readWorksheetFromFile(tmp2, ...)
I run an automated script to download 3 .xls files from 3 websites every hour. When I later try to read in the .xls files in R to further work with them, R produces the following error message:
"Error: IOException (Java): block[ 2 ] already removed - does your POIFS have circular or duplicate block references?"
When I manually open and save the .xls files this problem doesn't appear anymore and everything works normal, but since the total number of files is increasing with 72 every day this is not a nice work around.
The script I use to download and save the files:
library(httr)
setwd("WORKDIRECTION")
orig_wd <- getwd()
FOLDERS <- c("NAME1","NAME2","NAME3") #representing folder names
LINKS <- c("WEBSITE_1", #the urls from which I download
"WEBSITE_2",
"WEBSITE_3")
NO <- length(FOLDERS)
for(i in 1:NO){
today <- as.character(Sys.Date())
if (!file.exists(paste(FOLDERS[i],today,sep="/"))){
dir.create(paste(FOLDERS[i],today,sep="/"))
}
setwd(paste(orig_wd,FOLDERS[i],today,sep="/"))
dat<-GET(LINKS[i])
bin <- content(dat,"raw")
now <- as.character(format(Sys.time(),"%X"))
now <- gsub(":",".",now)
writeBin(bin,paste(now,".xls",sep=""))
setwd(orig_wd)
}
I then read in the files with the following script:
require(gdata)
require(XLConnect)
require(xlsReadWrite)
wb = loadWorkbook("FILEPATH")
df = readWorksheet(wb, "Favourite List" , header = FALSE)
Does anybody have experience with this type of error, and knows a solution or workaround?
The problem is partly resolved by using the readxl package available in the CRAN library. After installation files can be read in with:
library(readxl)
read_excel("PathToFile")
The only problem is, that the last column is omitted while reading in. If I find a solution for this I'll update the awnser.
I'm trying to adopt the Reproducible Research paradigm but meet people who like looking at Excel rather than text data files half way, by using Dropbox to host Excel files which I can then access using the .xlsx package.
Rather like downloading and unpacking a zipped file I assumed something like the following would work:
# Prerequisites
require("xlsx")
require("ggplot2")
require("repmis")
require("devtools")
require("RCurl")
# Downloading data from Dropbox location
link <- paste0(
"https://www.dropbox.com/s/",
"{THE SHA-1 KEY}",
"{THE FILE NAME}"
)
url <- getURL(link)
temp <- tempfile()
download.file(url, temp)
However, I get Error in download.file(url, temp) : unsupported URL scheme
Is there an alternative to download.file that will accept this URL scheme?
Thanks,
Jon
You have the wrong URL - the one you are using just goes to the landing page. I think the actual download URL is different, I managed to get it sort of working using the below.
I actually don't think you need to use RCurl or the getURL() function, and I think you were leaving out some relatively important /'s in your previous formulation.
Try the following:
link <- paste("https://dl.dropboxusercontent.com/s",
"{THE SHA-1 KEY}",
"{THE FILE NAME}",
sep="/")
download.file(url=link,destfile="your.destination.xlsx")
closeAllConnections()
UPDATE:
I just realised there is a source_XlsxData function in the repmis package, which in theory should do the job perfectly.
Also the function below works some of the time but not others, and appears to get stuck at the GET line. So, a better solution would be very welcome.
I decided to try taking a step back and figure out how to download a raw file from a secure (https) url. I adapted (butchered?) the source_url function in devtools to produce the following:
download_file_url <- function (
url,
outfile,
..., sha1 = NULL)
{
require(RCurl)
require(devtools)
require(repmis)
require(httr)
require(digest)
stopifnot(is.character(url), length(url) == 1)
filetag <- file(outfile, "wb")
request <- GET(url)
stop_for_status(request)
writeBin(content(request, type = "raw"), filetag)
close(filetag)
}
This seems to work for producing local versions of binary files - Excel included. Nicer, neater, smarter improvements in this gratefully received.