Download and unzip shapefile to temporary directory - r

In an effort to make reproducible posts at SO, I am trying to download data files to a temporary location and from there to load them into R. I am mostly using code from JD Longs answer in this SO Post. The downloading and unzipping works all fine, but I am unable to load the file from the temporary directory. This is the code I am using:
library(maptools)
tmpdir <- tempdir()
url <- 'http://epp.eurostat.ec.europa.eu/cache/GISCO/geodatafiles/NUTS_2010_03M_SH.zip'
file <- basename(url)
download.file(url, file)
unzip(file, exdir = tmpdir )
## I guess the error is somewhere in the next two lines
shapeFile <- paste(tmpdir,"/Shape/data/NUTS_RG_03M_2010")
EU <- readShapeSpatial(shapeFile)
# --> Error in getinfo.shape(fn) : Error opening SHP file
I have been looking into the man files for tempdir() without success. Settinge the working directory to the temporary location didn't work either. I probably miss something very basic here. Do you have any hints how to get around this?

shapeFile <- paste(tmpdir,"/Shape/data/NUTS_RG_03M_2010", sep="")
As default, paste use a space as separator, which cause your path to be wrong.
Of course, the alternative, as of R 2.15.0, would be paste0:
shapefile <- paste0(tmpdir,"/Shape/data/NUTS_RG_03M_2010")

Related

how to convert the .hdf file to dataset?

I am using one of the files in here: http://orca.science.oregonstate.edu/1080.by.2160.monthly.hdf.vgpm.m.chl.m.sst.php:
untar(tarfile = "http://orca.science.oregonstate.edu/data/1x2/monthly/vgpm.r2018.m.chl.m.sst/hdf/vgpm.m.2010.tar", exdir = "./foo")
I get error: ar.exe: Error opening archive: Failed to open 'http://orca.science.oregonstate.edu/data/1x2/monthly/vgpm.r2018.m.chl.m.sst/hdf/vgpm.m.2010.tar'
so I manually had to download the file and untar it ( that is why cant provide a reproducible example here). Inside there are files of .hdf format:
I also was not able to read them:
library(ncdf4)
ncin <- nc_open(".\\vgpm.m.2010\\vgpm.2010001.hdf")
ncin
lon <- ncvar_get(ncin,"fakeDim0")
head(lon)
lat <- ncvar_get(ncin,"fakeDim1")
head(lat)
fillvalue <- ncatt_get(ncin,"npp","_FillValue")
Can you please help to explain why i cant utar the file and why .hdf files have no fill value?
You should be able to untar the file once you have downloaded it. Download the file first to your working directory, then untar from your working directory: untar("vgpm.m.2002.tar", exdir = "mydir"). Your issue is likely with the connection. There can be many reasons for that which are specific to your computer's settings. You'll need to troubleshoot that separately.
Once you untar the directory, the contents inside are not .hdf files. They are compressed .hdf files (thus why their file names end in .gz). You'll need to decompress:
library(R.utils)
gunzip("mydir/vgpm.2002335.hdf.gz", remove = FALSE)
Once you actually have the .hdf file, you need to open it and then read it. You are correct to use ncdf4 because it accommodates multiple .hdf file formats. Some of the older formats would need different packages or software.
To open and read it, you need two different functions, nc_open() and ncvar_get():
library(ncdf4)
dat <- nc_open("mydir/vgpm.2002335.hdf", write = TRUE)
ncvar_get(dat)
Note that these functions will NOT work if you have not completed the pre-requisite set-up explained in detail in the documentation. For example:
Both the netCDF library and the HDF5 library must already be installed on your machine for this R interface to the library to work.
I also tried to rasterize it: it also works great:
library(raster)
x <- raster("~.\\vgpm.2010001.hdf")
extent(x) <- extent(-180, 180, -90, 90)
crs(x) <- "+proj=longlat +datum=WGS84"
NAvalue(x) <- -9999
#plot(x)
f1<- as.data.frame(x, xy=TRUE)

Passed a filename that is NOT a string of characters! (RMarkdown)

I'm accessing ncdf files directly from a website [here][1] into my RMarkdown.
When I try to read the file using the nc_open functions as in the code below, I get the error 'Passed a filename that is NOT a string of characters!'
Any idea how I can solve this?
ps: I even tried uncompressing the files with the gzcon function but the result is the same when I try to read the data.
Thanks for your help!
Kami
library(httr)
library(ncdf4)
nc<-GET("https://crudata.uea.ac.uk/cru/data/hrg/cru_ts_4.05/cruts.2103051243.v4.05/pre/cru_ts4.05.2011.2020.pre.dat.nc.gz")
cru_nc<-nc_open(nc)
OK here is the fill answer:
library(httr)
library(ncdf4)
library(R.utils)
url <- "https://crudata.uea.ac.uk/cru/data/hrg/cru_ts_4.05/cruts.2103051243.v4.05/pre/cru_ts4.05.2011.2020.pre.dat.nc.gz"
filename <- "/tmp/file.nc.gz"
# Download the file and store it as a temp file
download.file(url, filename, mode = "wb")
# Unzip the temp file
gunzip(filename)
# The unzipped filename drops the .gz
unzip_filename <- "/tmp/file.nc"
# You can now open the unzipped file with its **filename** rather than the object
cru_nc<-nc_open(unzip_filename)
Is this a mode="w" Vs mode="wb" issue. I've had this with files before. No experience of ncdf4.
Not sure if you can pass mode="wb" to get but does
file.download(yourUrl, mode="wb")
Work / help
Edit:
Ah. Other thing is you are storing the object as an object (nc) but nc_open wants to open a file.
I think you need to save the object locally (unless nc_open can just take the URL) and then open it? Possibly after unzipping.

Can't read shp file in R

I tried to open an shp file on my Mac, using this code:
library(tidyverse)
library(sf)
library(rgeos)
sf_trees_raw <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-28/sf_trees.csv')
temp_shapefile <- tempfile()
download.file("https://www2.census.gov/geo/tiger/TIGER2017//ROADS/tl_2017_06075_roads.zip", temp_shapefile)
sf_roads <- unzip(temp_shapefile, "tl_2017_06075_roads.shp") %>%
read_sf()
But I receive this error message:
Error: Cannot open "/Users/name/Documents/Playground/Trees_SF/tl_2017_06075_roads.shp"; The source could be corrupt or not supported. See `st_drivers()` for a list of supported formats.
I tried other shp files, and I receive the same error message:
map <- read_sf("per_admbnda_adm1_2018.shp")
Error: Cannot open "/Users/name/Documents/Playground/Trees_SF/per_admbnda_adm1_2018.shp"; The source could be corrupt or not supported. See `st_drivers()` for a list of supported formats.
I tried copying the shx and dbf files but it doesn't fix the problem.
Any help will be greatly appreciated.
Try unzipping the whole download first, then reading the shapefile file. Although for sf you only need to point at the .shp file, all the other ones (.cpg, .prj, .shx, etc) need to be unzipped and in the same directory.
temp_shapefile <- tempfile()
download.file("https://www2.census.gov/geo/tiger/TIGER2017//ROADS/tl_2017_06075_roads.zip", temp_shapefile)
unzip(temp_shapefile)
sf_roads <- read_sf('tl_2017_06075_roads.shp')

r what doesn't curl_download like about a filename

I want to download some files (NetCDF although I don't think that matters) from a website
and write them to a specified data directory on my hard drive. Some code that illustrates my problems follows
library(curl)
baseURL <- "http://gsweb1vh2.umd.edu/LUH2/LUH2_v2f/"
fileChoice <- "IMAGE_SSP1_RCP19/multiple-states_input4MIPs_landState_ScenarioMIP_UofMD-IMAGE-ssp119-2-1-f_gn_2015-2100.nc"
destDir <- paste0(getwd(), "/data-raw/")
url <- paste0(baseURL, fileChoice)
destfile <- paste0(destDir, "test.nc")
curl_download(url, destfile) # this one works
destfile <- paste0(destDir, fileChoice)
curl_download(url, destfile) #this one fails
The error message is
Error in curl_download(url, destfile) :
Failed to open file /Users/gcn/Documents/workspace/landuse/data-raw/IMAGE_SSP1_RCP19/multiple-states_input4MIPs_landState_ScenarioMIP_UofMD-IMAGE-ssp119-2-1-f_gn_2015-2100.nc.curltmp.
It turns out the curl_download internally adds .curltmp to destfile and then removes it. I can't figure out what is writing
It turns out that the problem is the fileChoice variable includes a new directory; IMAGE_SSP1_RCP19. Once I created the directory the process worked fine. I'm posting this because someone else might make the same mistake I did.

Using R to download zipped data file, extract, and import data

#EZGraphs on Twitter writes:
"Lots of online csvs are zipped. Is there a way to download, unzip the archive, and load the data to a data.frame using R? #Rstats"
I was also trying to do this today, but ended up just downloading the zip file manually.
I tried something like:
fileName <- "http://www.newcl.org/data/zipfiles/a1.zip"
con1 <- unz(fileName, filename="a1.dat", open = "r")
but I feel as if I'm a long way off.
Any thoughts?
Zip archives are actually more a 'filesystem' with content metadata etc. See help(unzip) for details. So to do what you sketch out above you need to
Create a temp. file name (eg tempfile())
Use download.file() to fetch the file into the temp. file
Use unz() to extract the target file from temp. file
Remove the temp file via unlink()
which in code (thanks for basic example, but this is simpler) looks like
temp <- tempfile()
download.file("http://www.newcl.org/data/zipfiles/a1.zip",temp)
data <- read.table(unz(temp, "a1.dat"))
unlink(temp)
Compressed (.z) or gzipped (.gz) or bzip2ed (.bz2) files are just the file and those you can read directly from a connection. So get the data provider to use that instead :)
Just for the record, I tried translating Dirk's answer into code :-P
temp <- tempfile()
download.file("http://www.newcl.org/data/zipfiles/a1.zip",temp)
con <- unz(temp, "a1.dat")
data <- matrix(scan(con),ncol=4,byrow=TRUE)
unlink(temp)
I used CRAN package "downloader" found at http://cran.r-project.org/web/packages/downloader/index.html . Much easier.
download(url, dest="dataset.zip", mode="wb")
unzip ("dataset.zip", exdir = "./")
For Mac (and I assume Linux)...
If the zip archive contains a single file, you can use the bash command funzip, in conjuction with fread from the data.table package:
library(data.table)
dt <- fread("curl http://www.newcl.org/data/zipfiles/a1.zip | funzip")
In cases where the archive contains multiple files, you can use tar instead to extract a specific file to stdout:
dt <- fread("curl http://www.newcl.org/data/zipfiles/a1.zip | tar -xf- --to-stdout *a1.dat")
Here is an example that works for files which cannot be read in with the read.table function. This example reads a .xls file.
url <-"https://www1.toronto.ca/City_Of_Toronto/Information_Technology/Open_Data/Data_Sets/Assets/Files/fire_stns.zip"
temp <- tempfile()
temp2 <- tempfile()
download.file(url, temp)
unzip(zipfile = temp, exdir = temp2)
data <- read_xls(file.path(temp2, "fire station x_y.xls"))
unlink(c(temp, temp2))
To do this using data.table, I found that the following works. Unfortunately, the link does not work anymore, so I used a link for another data set.
library(data.table)
temp <- tempfile()
download.file("https://www.bls.gov/tus/special.requests/atusact_0315.zip", temp)
timeUse <- fread(unzip(temp, files = "atusact_0315.dat"))
rm(temp)
I know this is possible in a single line since you can pass bash scripts to fread, but I am not sure how to download a .zip file, extract, and pass a single file from that to fread.
Using library(archive) one can also read in a particular csv file within the archive, without having to UNZIP it first; read_csv(archive_read("http://www.newcl.org/data/zipfiles/a1.zip", file = 1), col_types = cols())
which I find more convenient & is faster.
It also supports all major archive formats & is quite a bit faster than the base R untar or unz - it supports tar, ZIP, 7-zip, RAR, CAB, gzip, bzip2, compress, lzma, xz & uuencoded files.
To unzip everything one can use archive_extract("http://www.newcl.org/data/zipfiles/a1.zip", dir=XXX)
This works on all platforms & given the superior performance for me would be the preferred option.
Try this code. It works for me:
unzip(zipfile="<directory and filename>",
exdir="<directory where the content will be extracted>")
Example:
unzip(zipfile="./data/Data.zip",exdir="./data")
rio() would be very suitable for this - it uses the file extension of a file name to determine what kind of file it is, so it will work with a large variety of file types. I've also used unzip() to list the file names within the zip file, so its not necessary to specify the file name(s) manually.
library(rio)
# create a temporary directory
td <- tempdir()
# create a temporary file
tf <- tempfile(tmpdir=td, fileext=".zip")
# download file from internet into temporary location
download.file("http://download.companieshouse.gov.uk/BasicCompanyData-part1.zip", tf)
# list zip archive
file_names <- unzip(tf, list=TRUE)
# extract files from zip file
unzip(tf, exdir=td, overwrite=TRUE)
# use when zip file has only one file
data <- import(file.path(td, file_names$Name[1]))
# use when zip file has multiple files
data_multiple <- lapply(file_names$Name, function(x) import(file.path(td, x)))
# delete the files and directories
unlink(td)
I found that the following worked for me. These steps come from BTD's YouTube video, Managing Zipfile's in R:
zip.url <- "url_address.zip"
dir <- getwd()
zip.file <- "file_name.zip"
zip.combine <- as.character(paste(dir, zip.file, sep = "/"))
download.file(zip.url, destfile = zip.combine)
unzip(zip.file)

Resources