r what doesn't curl_download like about a filename

r what doesn't curl_download like about a filename - r

I want to download some files (NetCDF although I don't think that matters) from a website
and write them to a specified data directory on my hard drive. Some code that illustrates my problems follows
library(curl)
baseURL <- "http://gsweb1vh2.umd.edu/LUH2/LUH2_v2f/"
fileChoice <- "IMAGE_SSP1_RCP19/multiple-states_input4MIPs_landState_ScenarioMIP_UofMD-IMAGE-ssp119-2-1-f_gn_2015-2100.nc"
destDir <- paste0(getwd(), "/data-raw/")
url <- paste0(baseURL, fileChoice)
destfile <- paste0(destDir, "test.nc")
curl_download(url, destfile) # this one works
destfile <- paste0(destDir, fileChoice)
curl_download(url, destfile) #this one fails
The error message is
Error in curl_download(url, destfile) :
Failed to open file /Users/gcn/Documents/workspace/landuse/data-raw/IMAGE_SSP1_RCP19/multiple-states_input4MIPs_landState_ScenarioMIP_UofMD-IMAGE-ssp119-2-1-f_gn_2015-2100.nc.curltmp.
It turns out the curl_download internally adds .curltmp to destfile and then removes it. I can't figure out what is writing

It turns out that the problem is the fileChoice variable includes a new directory; IMAGE_SSP1_RCP19. Once I created the directory the process worked fine. I'm posting this because someone else might make the same mistake I did.

Related

Passed a filename that is NOT a string of characters! (RMarkdown)

I'm accessing ncdf files directly from a website [here][1] into my RMarkdown.
When I try to read the file using the nc_open functions as in the code below, I get the error 'Passed a filename that is NOT a string of characters!'
Any idea how I can solve this?
ps: I even tried uncompressing the files with the gzcon function but the result is the same when I try to read the data.
Thanks for your help!
Kami
library(httr)
library(ncdf4)
nc<-GET("https://crudata.uea.ac.uk/cru/data/hrg/cru_ts_4.05/cruts.2103051243.v4.05/pre/cru_ts4.05.2011.2020.pre.dat.nc.gz")
cru_nc<-nc_open(nc)

OK here is the fill answer:
library(httr)
library(ncdf4)
library(R.utils)
url <- "https://crudata.uea.ac.uk/cru/data/hrg/cru_ts_4.05/cruts.2103051243.v4.05/pre/cru_ts4.05.2011.2020.pre.dat.nc.gz"
filename <- "/tmp/file.nc.gz"
# Download the file and store it as a temp file
download.file(url, filename, mode = "wb")
# Unzip the temp file
gunzip(filename)
# The unzipped filename drops the .gz
unzip_filename <- "/tmp/file.nc"
# You can now open the unzipped file with its **filename** rather than the object
cru_nc<-nc_open(unzip_filename)

Is this a mode="w" Vs mode="wb" issue. I've had this with files before. No experience of ncdf4.
Not sure if you can pass mode="wb" to get but does
file.download(yourUrl, mode="wb")
Work / help
Edit:
Ah. Other thing is you are storing the object as an object (nc) but nc_open wants to open a file.
I think you need to save the object locally (unless nc_open can just take the URL) and then open it? Possibly after unzipping.

read_xlsx function fails to import online dataset with valid url

(NB- I am very much a beginner in R.)
This is the code I tried:
read_xlsx("valid/url")
For some reason I get the error message:
'path' does not exist:'valid/url'
I know the URL works, I have tested it many times. I am mystified, so any help would be much appreciated.

If I understand your issue correctly, I think you are inputting the URL into the read_xlsx command. Far as I am aware, this will not work if your excel file is online, you will need to download it locally first.
I suggest the following adjustment:
url <- "valid/url"
temp <- tempfile()
download.file(url, temp, mode="wb")
df1 <- read_excel(path = temp)
This will download the excel file into a temporary file, which you can then read into a dataframe, since it will be saved locally.

trying to use fread() on .csv file but getting internal error "ch>eof"

I am getting an error from fread:
Internal error: ch>eof when detecting eol
when trying to read a csv file downloaded from an https server, using R 3.2.0. I found something related on Github, https://github.com/Rdatatable/data.table/blob/master/src/fread.c, but don't know how I could use this, if at all. Thanks for any help.
Added info: the data was downloaded from here:
fileURL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv"
then I used
download.file(fileURL, "Idaho2006.csv", method = "Internal")

The problem is that download.file doesn't work with https with method=internal unless you're on Windows and set an option. Since fread uses download.file when you pass it a URL and not a local file, it'll fail. You have to download the file manually then open it from a local file.
If you're on Linux or have either of the following already then do method=wget or method=curl instead
If you're on Windows and don't have either and don't want to download them then do setInternet2(use = TRUE) before your download.file
http://www.inside-r.org/r-doc/utils/setInternet2
For example:
fileURL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv"
tempf <- tempfile()
download.file(fileURL, tempf, method = "curl")
DT <- fread(tempf)
unlink(tempf)
Or
fileURL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv"
tempf <- tempfile()
setInternet2 = TRUE
download.file(fileURL, tempf)
DT <- fread(tempf)
unlink(tempf)

fread() now utilises curl package for downloading files. And this seems to work just fine atm:
require(data.table) # v1.9.6+
fread(fileURL, showProgress = FALSE)

The easiest way to fix this problem in my experience is to just remove the s from https. Also remove the method you don't need it. My OS is Windows and i have tried the following code and works.
fileURL <- "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv"
download.file(fileURL, "Idaho2006.csv")

How to download an .xlsx file from a dropbox (https:) location

I'm trying to adopt the Reproducible Research paradigm but meet people who like looking at Excel rather than text data files half way, by using Dropbox to host Excel files which I can then access using the .xlsx package.
Rather like downloading and unpacking a zipped file I assumed something like the following would work:
# Prerequisites
require("xlsx")
require("ggplot2")
require("repmis")
require("devtools")
require("RCurl")
# Downloading data from Dropbox location
link <- paste0(
"https://www.dropbox.com/s/",
"{THE SHA-1 KEY}",
"{THE FILE NAME}"
)
url <- getURL(link)
temp <- tempfile()
download.file(url, temp)
However, I get Error in download.file(url, temp) : unsupported URL scheme
Is there an alternative to download.file that will accept this URL scheme?
Thanks,
Jon

You have the wrong URL - the one you are using just goes to the landing page. I think the actual download URL is different, I managed to get it sort of working using the below.
I actually don't think you need to use RCurl or the getURL() function, and I think you were leaving out some relatively important /'s in your previous formulation.
Try the following:
link <- paste("https://dl.dropboxusercontent.com/s",
"{THE SHA-1 KEY}",
"{THE FILE NAME}",
sep="/")
download.file(url=link,destfile="your.destination.xlsx")
closeAllConnections()

UPDATE:
I just realised there is a source_XlsxData function in the repmis package, which in theory should do the job perfectly.
Also the function below works some of the time but not others, and appears to get stuck at the GET line. So, a better solution would be very welcome.
I decided to try taking a step back and figure out how to download a raw file from a secure (https) url. I adapted (butchered?) the source_url function in devtools to produce the following:
download_file_url <- function (
url,
outfile,
..., sha1 = NULL)
{
require(RCurl)
require(devtools)
require(repmis)
require(httr)
require(digest)
stopifnot(is.character(url), length(url) == 1)
filetag <- file(outfile, "wb")
request <- GET(url)
stop_for_status(request)
writeBin(content(request, type = "raw"), filetag)
close(filetag)
}
This seems to work for producing local versions of binary files - Excel included. Nicer, neater, smarter improvements in this gratefully received.

Download and unzip shapefile to temporary directory

In an effort to make reproducible posts at SO, I am trying to download data files to a temporary location and from there to load them into R. I am mostly using code from JD Longs answer in this SO Post. The downloading and unzipping works all fine, but I am unable to load the file from the temporary directory. This is the code I am using:
library(maptools)
tmpdir <- tempdir()
url <- 'http://epp.eurostat.ec.europa.eu/cache/GISCO/geodatafiles/NUTS_2010_03M_SH.zip'
file <- basename(url)
download.file(url, file)
unzip(file, exdir = tmpdir )
## I guess the error is somewhere in the next two lines
shapeFile <- paste(tmpdir,"/Shape/data/NUTS_RG_03M_2010")
EU <- readShapeSpatial(shapeFile)
# --> Error in getinfo.shape(fn) : Error opening SHP file
I have been looking into the man files for tempdir() without success. Settinge the working directory to the temporary location didn't work either. I probably miss something very basic here. Do you have any hints how to get around this?

shapeFile <- paste(tmpdir,"/Shape/data/NUTS_RG_03M_2010", sep="")
As default, paste use a space as separator, which cause your path to be wrong.
Of course, the alternative, as of R 2.15.0, would be paste0:
shapefile <- paste0(tmpdir,"/Shape/data/NUTS_RG_03M_2010")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

r what doesn't curl_download like about a filename - r

It turns out that the problem is the fileChoice variable includes a new directory; IMAGE_SSP1_RCP19. Once I created the directory the process worked fine. I'm posting this because someone else might make the same mistake I did.

Related

Passed a filename that is NOT a string of characters! (RMarkdown)

read_xlsx function fails to import online dataset with valid url

trying to use fread() on .csv file but getting internal error "ch>eof"

How to download an .xlsx file from a dropbox (https:) location

Download and unzip shapefile to temporary directory

Categories

Resources