How to extract KML file from downloaded gzip file using R? - r

I'm trying to download a zipped file from the web, then extract the single kml file within. I have tried several different utils functions to unzip and extract but am not sure how to get the kml that I can begin to work with (in sf package).
zipFileName <- "http://satepsanone.nesdis.noaa.gov/pub/volcano/FIRE/HMS_ARCHIVE/2010/KML/smoke20100101.kml.gz"
smokeFileName <- "smoke20100101.kml"
temp <- tempfile()
download.file(url = zipFileName, destfile = temp)
untar(tarfile = temp, files = smokeFileName)
# Error in getOctD(x, offset, len) : invalid octal digit
untar(tarfile = zipFileName, files = smokeFileName)
# Error in gzfile(path.expand(tarfile), "rb") : cannot open the connection
# In addition: Warning message:
# In gzfile(path.expand(tarfile), "rb") :
# cannot open compressed file 'http://satepsanone.nesdis.noaa.gov/pub/volcano/FIRE/HMS_ARCHIVE/2010/KML/smoke20100101.kml.gz', probable reason 'Invalid argument'
unz(temp, smokeFileName)
# A connection with
# description "C:\\Users\\jvargo\\AppData\\Local\\Temp\\RtmpemFaXC\\file33f82dd83714:smoke20100101.kml"
# class "unz"
# mode "r"
# text "text"
# opened "closed"
# can read "yes"
# can write "yes"

adapted from https://community.rstudio.com/t/download-gz-file-and-extract-kml/13783
library(R.utils)
gzFileURL <- "http://satepsanone.nesdis.noaa.gov/pub/volcano/FIRE/HMS_ARCHIVE/2010/KML/smoke20100101.kml.gz")
smokeZipName <-"smoke20100101.kml.gz"
smokeFileName <- "smoke20100101.kml"
directory <- tempdir()
setwd(directory)
temp <- tempfile(pattern = "", fileext = ".kml.gz")
download.file(url = gzFileURL, destfile = temp)
gunzip(temp)
kmlFile <- list.files(tempdir(), pattern = ".kml")
layers <- st_layers(kmlFile)$name

Related

Combine All USA Block Group Census Shapefiles together

As the title suggests, I'm trying to load in all the SHP files from the Census found here (https://www2.census.gov/geo/tiger/TIGER2019/BG/), and merge them all together as 1 large shp file for the entire US overcoming issues with duplicate polygons.
I adopted code found from a question asked previously but could not get it to work as it stops once I hit state 6.
Error in download.file(x, destfile = path, mode = "wb") :
cannot open URL 'ftp://ftp2.census.gov/geo/tiger/TIGER2019/BG/tl_2019_06_bg.zip'
In addition: warning messages: 1: In download.file(x, destfile = path,
mode = "wb") : downloaded length 29680232 != reported length
50020624
Any suggestions would be much appreciated.
library(RCurl)
library(rgdal)
# get the directory listing
u <- 'ftp://ftp2.census.gov/geo/tiger/TIGER2019/BG/'
f <- paste0(u, strsplit(getURL(u, ftp.use.epsv = FALSE, ftplistonly = TRUE),
'\\s+')[[1]])
# download and extract to tempdir/shps
invisible(sapply(f, function(x) {
path <- file.path(tempdir(), basename(x))
download.file(x, destfile=path, mode = 'wb')
unzip(path, exdir=file.path(tempdir(), 'shps'))
}))
# read in all shps, and prepend shapefile name to IDs
shps <- lapply(sub('\\.zip', '', basename(f)), function(x) {
shp <- readOGR(file.path(tempdir(), 'shps'), x)
shp <- spChFIDs(shp, paste0(x, '_', sapply(slot(shp, "polygons"), slot, "ID")))
shp
})
# rbind to a single object
shp <- do.call(rbind, as.list(shps))
# write out to wd/USA.shp
writeOGR(shp, '.', 'USA', 'ESRI Shapefile')

Unzipping file in R after download

I am trying to unzip a file after download using R. It unzips fine on Windows 10.
verbose <- T
zipdir <- file.path("downloads","zip")
datadir <- file.path("downloads","data")
if (!file.exists("downloads")) dir.create("downloads")
if (!file.exists(zipdir)) dir.create(zipdir)
if (!file.exists(datadir)) dir.create(datadir)
filename <- "On_Time_Reporting_Carrier_On_Time_Performance_1987_present_2019_2.zip"
fileurl <- str_c("https://transtats.bts.gov/PREZIP/",filename)
if (verbose == TRUE) print(str_c("File url: ",fileurl))
zipfile <- file.path(zipdir, filename)
if (verbose == TRUE) print(str_c("File: ",zipfile))
download.file(fileurl, zipfile)
unzip(zipfile)
Error 1 for a zip file means "operation not permitted"
Warning message:
In unzip(zipfile) : error 1 in extracting from zip file
Here is the solution with the help of r2evans:
download.file(fileurl, zipfile, mode = wb)
unzip(zipfile, exdir=datadir, overwrite=TRUE)
Here comes the complete code to copy and try
verbose <- T
zipdir <- file.path("downloads","zip")
datadir <- file.path("downloads","data")
if (!file.exists("downloads")) dir.create("downloads")
if (!file.exists(zipdir)) dir.create(zipdir)
if (!file.exists(datadir)) dir.create(datadir)
filename <- "On_Time_Reporting_Carrier_On_Time_Performance_1987_present_2019_2.zip"
fileurl <- str_c("https://transtats.bts.gov/PREZIP/",filename)
if (verbose == TRUE) print(str_c("File url: ",fileurl))
zipfile <- file.path(zipdir, filename)
if (verbose == TRUE) print(str_c("File: ",zipfile))
#These are the modified lines in the code
#Mode = wb is required to download binary files
download.file(fileurl, zipfile, mode = wb)
#Changed the function so that it specifies the target directory
#I recommend overwrite=TRUE otherwise it might crash. Alternative would be to check with file.exists
unzip(zipfile, exdir=datadir, overwrite=TRUE)

Unzipping and reading shape file in R without rgdal installed

I would like to unzip and read in a shape file from the web in R without relying on rgdal. I found the read.shp function of the fastshp package that can apparently accomplish this without rgdal installed in the environment, however, I'm having trouble implementing.
I would like a function that can unzip and then read in the shape file akin to what's found in this SO post but for the read.shp function. I tried the following but to no avail:
dlshape=function(shploc, format) {
temp=tempfile()
download.file(shploc, temp)
unzip(temp)
shp.data <- sapply(".", function(f) {
f <- file.path(temp, f)
return(read.shp(".", format))
})
}
shp_object<-dlshape('https://www2.census.gov/geo/tiger/TIGER2017/COUNTY/tl_2017_us_county.zip', 'polygon')
Error in read.shp(".", format) : unused argument (format)
I also tried the following:
dlshape=function(shploc) {
temp=tempfile()
download.file(shploc, temp)
unzip(temp)
shp.data <- sapply(".", function(f) {
f <- file.path(temp, f)
return(read.shp("."))
})
}
shp_object<-dlshape('https://www2.census.gov/geo/tiger/TIGER2017/COUNTY/tl_2017_us_county.zip')
Error in file(shp.name, "rb") : cannot open the connection
In addition: Warning messages:
1: In file(shp.name, "rb") : 'raw = FALSE' but '.' is not a regular file
2: In file(shp.name, "rb") :
Show Traceback
Rerun with Debug
Error in file(shp.name, "rb") : cannot open the connection
I suspect it has to do with the fact that in the function read.shp() I'm feeding it the folder name and not the .shp name (for readOGR that works but not for read.shp). Any assistance is much appreciated.
You can use unzip() from utils and read_sf() from sf to unzip and then load your shapefile. Here is a working example:
# Create temp files
temp <- tempfile()
temp2 <- tempfile()
# Download the zip file and save to 'temp'
URL <- "https://www2.census.gov/geo/tiger/TIGER2017/COUNTY/tl_2017_us_county.zip"
download.file(URL, temp)
# Unzip the contents of the temp and save unzipped content in 'temp2'
unzip(zipfile = temp, exdir = temp2)
# Read the shapefile. Alternatively make an assignment, such as f<-sf::read_sf(your_SHP_file)
sf::read_sf(temp2)

How can I use a variable as an argument to a function, specifically unz() in R

I am writing an R function that reads CSV files from a subdirectory in a ZIP file without first unzipping it, using read.csv() and unz().
The CSV files are named with leading 0 as in 00012.csv, 00013.csv etc.
The function has the following parameters: MyZipFile, ASubDir, VNum (a vector e.g. 1:42) which forms the filename.
What I want is to use the variable PathNfilename in unz().
# Incorporate the directory in the ZIP file while constructing the filename using stringr package
PathNfilename <- paste0("/", ASubDir, "/", str_pad(Vnum, 5, pad = "0"), ".csv", sep="")
What works is:
csvdata <- read.csv(unz(description = "MyZipFile.zip", filename = "ASubDirectory/00039.csv"), header=T, quote = "")
What I need is something along these lines of this:
csvdata <- read.csv(unz(description = "MyZipFile.zip", filename = PathNFileName), header=T, quote = "")
The error that I get is:
Error in open.connection(file, "rt") : cannot open the connection
In addition: Warning message:
In open.connection(file, "rt") :
cannot locate file '/ASubDir/00039.csv' in zip file 'MyZipFile.zip'
I'd like to understand why I'm getting the error and how to resolve it. Is it a scoping issue?
Try with some PathFilename without the leading /
ASubDir <- "ASubDirectory"
Vnum <- 1:5
PathNfilename <- file.path(ASubDir,
paste0(str_pad(Vnum, 5, pad = "0"), ".csv")
)
PathNfilename
#> [1] "ASubDirectory/00001.csv" "ASubDirectory/00002.csv"
#> [3] "ASubDirectory/00003.csv" "ASubDirectory/00004.csv"
#> [5] "ASubDirectory/00005.csv"

Download and Read Zip CSV file in R

I am trying to download and read a zipped csv file from Kaggle within an R script. After researching other posts including post1 and post2 I have tried:
# Read data with temp file
url <- "https://www.kaggle.com/c/rossmann-store-sales/download/store.csv.zip"
tmp <- tempfile()
download.file(url, tmp, mode = "wb")
con <- unz(tmp, "store.csv.zip")
store <- read.table(con, sep = ",", header = TRUE)
unlink(tmp)
the read.table command throws an error:
Error in open.connection(file, "rt") : cannot open the connection
I have also tried:
# Download file, unzip, and read
url <- "https://www.kaggle.com/c/rossmann-store-sales/download/store.csv.zip"
download.file(url, destfile = "./SourceData/store.csv.zip", mode = "wb")
unzip("./SourceData/store.csv.zip")
Unzip throws the error:
error 1 in extracting from zip file
Bypassing the unzip command and reading directly from the zip file
store <- read_csv("SourceData/store.csv.zip")
Throws the error:
zip file ... SourceData/store.csv.zip cannot be opened
I prefer to use the temp file, but at this point I'll use either approach if I can make it work.

Resources