I have to download several zip files from the website http://www.kase.kz/ru/marketvaluation
This question basically originates from this topic. Having not solved the problem as of now, I tried the following approach:
for (i in 1:length(data[,2])){
URL = data[i, 2]
dir = basename(URL)
download.file(URL, dir)
unzip(dir)
TXT <- list.files(pattern = "*.TXT")
zip <- list.files(pattern = "*.zip")
file.remove(TXT, zip)
}
Now I am facing another problem - after 4th or 5th trial R is giving me:
trying URL 'http://www.kase.kz/files/market_valuation/ru/2017/val170403170409.zip'
Error in download.file(URL, dir) :
cannot open URL 'http://www.kase.kz/files/market_valuation/ru/2017/val170403170409.zip'
In addition: Warning message:
In download.file(URL, dir) :
cannot open URL 'http://www.kase.kz/files/market_valuation/ru/2017/val170403170409.zip': HTTP status was '503 Service Temporarily Unavailable'
I don't know why this is happening. I would appreciate any suggestions/solutions.
Ahh, this was a piece of cake:
for (i in 1:length(data[,2])){
URL = data[i, 2]
dir = basename(URL)
download.file(URL, dir)
unzip(dir)
TXT <- list.files(pattern = "*.TXT")
zip <- list.files(pattern = "*.zip")
file.remove(TXT, zip)
Sys.sleep(sample(10, 1))
}
Related
I am trying to unzip a file after download using R. It unzips fine on Windows 10.
verbose <- T
zipdir <- file.path("downloads","zip")
datadir <- file.path("downloads","data")
if (!file.exists("downloads")) dir.create("downloads")
if (!file.exists(zipdir)) dir.create(zipdir)
if (!file.exists(datadir)) dir.create(datadir)
filename <- "On_Time_Reporting_Carrier_On_Time_Performance_1987_present_2019_2.zip"
fileurl <- str_c("https://transtats.bts.gov/PREZIP/",filename)
if (verbose == TRUE) print(str_c("File url: ",fileurl))
zipfile <- file.path(zipdir, filename)
if (verbose == TRUE) print(str_c("File: ",zipfile))
download.file(fileurl, zipfile)
unzip(zipfile)
Error 1 for a zip file means "operation not permitted"
Warning message:
In unzip(zipfile) : error 1 in extracting from zip file
Here is the solution with the help of r2evans:
download.file(fileurl, zipfile, mode = wb)
unzip(zipfile, exdir=datadir, overwrite=TRUE)
Here comes the complete code to copy and try
verbose <- T
zipdir <- file.path("downloads","zip")
datadir <- file.path("downloads","data")
if (!file.exists("downloads")) dir.create("downloads")
if (!file.exists(zipdir)) dir.create(zipdir)
if (!file.exists(datadir)) dir.create(datadir)
filename <- "On_Time_Reporting_Carrier_On_Time_Performance_1987_present_2019_2.zip"
fileurl <- str_c("https://transtats.bts.gov/PREZIP/",filename)
if (verbose == TRUE) print(str_c("File url: ",fileurl))
zipfile <- file.path(zipdir, filename)
if (verbose == TRUE) print(str_c("File: ",zipfile))
#These are the modified lines in the code
#Mode = wb is required to download binary files
download.file(fileurl, zipfile, mode = wb)
#Changed the function so that it specifies the target directory
#I recommend overwrite=TRUE otherwise it might crash. Alternative would be to check with file.exists
unzip(zipfile, exdir=datadir, overwrite=TRUE)
I would like to unzip and read in a shape file from the web in R without relying on rgdal. I found the read.shp function of the fastshp package that can apparently accomplish this without rgdal installed in the environment, however, I'm having trouble implementing.
I would like a function that can unzip and then read in the shape file akin to what's found in this SO post but for the read.shp function. I tried the following but to no avail:
dlshape=function(shploc, format) {
temp=tempfile()
download.file(shploc, temp)
unzip(temp)
shp.data <- sapply(".", function(f) {
f <- file.path(temp, f)
return(read.shp(".", format))
})
}
shp_object<-dlshape('https://www2.census.gov/geo/tiger/TIGER2017/COUNTY/tl_2017_us_county.zip', 'polygon')
Error in read.shp(".", format) : unused argument (format)
I also tried the following:
dlshape=function(shploc) {
temp=tempfile()
download.file(shploc, temp)
unzip(temp)
shp.data <- sapply(".", function(f) {
f <- file.path(temp, f)
return(read.shp("."))
})
}
shp_object<-dlshape('https://www2.census.gov/geo/tiger/TIGER2017/COUNTY/tl_2017_us_county.zip')
Error in file(shp.name, "rb") : cannot open the connection
In addition: Warning messages:
1: In file(shp.name, "rb") : 'raw = FALSE' but '.' is not a regular file
2: In file(shp.name, "rb") :
Show Traceback
Rerun with Debug
Error in file(shp.name, "rb") : cannot open the connection
I suspect it has to do with the fact that in the function read.shp() I'm feeding it the folder name and not the .shp name (for readOGR that works but not for read.shp). Any assistance is much appreciated.
You can use unzip() from utils and read_sf() from sf to unzip and then load your shapefile. Here is a working example:
# Create temp files
temp <- tempfile()
temp2 <- tempfile()
# Download the zip file and save to 'temp'
URL <- "https://www2.census.gov/geo/tiger/TIGER2017/COUNTY/tl_2017_us_county.zip"
download.file(URL, temp)
# Unzip the contents of the temp and save unzipped content in 'temp2'
unzip(zipfile = temp, exdir = temp2)
# Read the shapefile. Alternatively make an assignment, such as f<-sf::read_sf(your_SHP_file)
sf::read_sf(temp2)
I am trying to extract some info from a number of links.
I am applying the following function:
walk(filinginfohref, function(x) {
download.file(x, destfile = paste0("D:/deleteme/",x), quiet = FALSE)
})
However it returns the following error:
Error in download.file(x, destfile = paste0("D:/deleteme/", x), quiet = FALSE) :
cannot open destfile 'D:/deleteme/https://www.sec.gov/Archives/edgar/data/1750/000104746918004978/0001047469-18-004978-index.htm', reason 'Invalid argument'
Which I assume is because I cannot store the link as the destination file.
I need to somehow preserve the link from where the file is being downloaded from
How can I overcome this issue?
Data
filinginfohref <- c("https://www.sec.gov/Archives/edgar/data/1750/000104746918004978/0001047469-18-004978-index.htm",
"https://www.sec.gov/Archives/edgar/data/1750/000104746917004528/0001047469-17-004528-index.htm",
"https://www.sec.gov/Archives/edgar/data/1750/000104746916014299/0001047469-16-014299-index.htm",
"https://www.sec.gov/Archives/edgar/data/1750/000104746915006136/0001047469-15-006136-index.htm",
"https://www.sec.gov/Archives/edgar/data/1750/000104746914006243/0001047469-14-006243-index.htm",
"https://www.sec.gov/Archives/edgar/data/1750/000104746913007797/0001047469-13-007797-index.htm",
"https://www.sec.gov/Archives/edgar/data/1750/000104746912007300/0001047469-12-007300-index.htm",
"https://www.sec.gov/Archives/edgar/data/1750/000104746911006302/0001047469-11-006302-index.htm",
"https://www.sec.gov/Archives/edgar/data/1750/000104746910006500/0001047469-10-006500-index.htm",
"https://www.sec.gov/Archives/edgar/data/1750/000104746909006783/0001047469-09-006783-index.htm"
)
Each link have / interpreted as folders. The path that is built does not exist.
Please replace destfile = paste0("D:/deleteme/",x) by destfile = paste0("D:/deleteme/", gsub("/", "_", x, fixed = TRUE))
This way you have the character _ avoiding troubles.
There is probably a way to keep links intacts.
As you have figure it out, windows doesn't allow you to save those name files with the special characters. Add a function to remove the common part of the file name and get rid of those "/".
library(purrr)
htmName <- function (x) {
x <- gsub("https://www.sec.gov/Archives/edgar/data/", "",x)
x <- gsub("/","_",x)
return(x)
}
walk(filinginfohref, function(x) {
download.file(x, destfile = paste0("output/", htmName(x)), quiet = FALSE)
})
I'm trying to download a zipped file from the web, then extract the single kml file within. I have tried several different utils functions to unzip and extract but am not sure how to get the kml that I can begin to work with (in sf package).
zipFileName <- "http://satepsanone.nesdis.noaa.gov/pub/volcano/FIRE/HMS_ARCHIVE/2010/KML/smoke20100101.kml.gz"
smokeFileName <- "smoke20100101.kml"
temp <- tempfile()
download.file(url = zipFileName, destfile = temp)
untar(tarfile = temp, files = smokeFileName)
# Error in getOctD(x, offset, len) : invalid octal digit
untar(tarfile = zipFileName, files = smokeFileName)
# Error in gzfile(path.expand(tarfile), "rb") : cannot open the connection
# In addition: Warning message:
# In gzfile(path.expand(tarfile), "rb") :
# cannot open compressed file 'http://satepsanone.nesdis.noaa.gov/pub/volcano/FIRE/HMS_ARCHIVE/2010/KML/smoke20100101.kml.gz', probable reason 'Invalid argument'
unz(temp, smokeFileName)
# A connection with
# description "C:\\Users\\jvargo\\AppData\\Local\\Temp\\RtmpemFaXC\\file33f82dd83714:smoke20100101.kml"
# class "unz"
# mode "r"
# text "text"
# opened "closed"
# can read "yes"
# can write "yes"
adapted from https://community.rstudio.com/t/download-gz-file-and-extract-kml/13783
library(R.utils)
gzFileURL <- "http://satepsanone.nesdis.noaa.gov/pub/volcano/FIRE/HMS_ARCHIVE/2010/KML/smoke20100101.kml.gz")
smokeZipName <-"smoke20100101.kml.gz"
smokeFileName <- "smoke20100101.kml"
directory <- tempdir()
setwd(directory)
temp <- tempfile(pattern = "", fileext = ".kml.gz")
download.file(url = gzFileURL, destfile = temp)
gunzip(temp)
kmlFile <- list.files(tempdir(), pattern = ".kml")
layers <- st_layers(kmlFile)$name
I am trying to download and read a zipped csv file from Kaggle within an R script. After researching other posts including post1 and post2 I have tried:
# Read data with temp file
url <- "https://www.kaggle.com/c/rossmann-store-sales/download/store.csv.zip"
tmp <- tempfile()
download.file(url, tmp, mode = "wb")
con <- unz(tmp, "store.csv.zip")
store <- read.table(con, sep = ",", header = TRUE)
unlink(tmp)
the read.table command throws an error:
Error in open.connection(file, "rt") : cannot open the connection
I have also tried:
# Download file, unzip, and read
url <- "https://www.kaggle.com/c/rossmann-store-sales/download/store.csv.zip"
download.file(url, destfile = "./SourceData/store.csv.zip", mode = "wb")
unzip("./SourceData/store.csv.zip")
Unzip throws the error:
error 1 in extracting from zip file
Bypassing the unzip command and reading directly from the zip file
store <- read_csv("SourceData/store.csv.zip")
Throws the error:
zip file ... SourceData/store.csv.zip cannot be opened
I prefer to use the temp file, but at this point I'll use either approach if I can make it work.