I am trying to download and read a zipped csv file from Kaggle within an R script. After researching other posts including post1 and post2 I have tried:
# Read data with temp file
url <- "https://www.kaggle.com/c/rossmann-store-sales/download/store.csv.zip"
tmp <- tempfile()
download.file(url, tmp, mode = "wb")
con <- unz(tmp, "store.csv.zip")
store <- read.table(con, sep = ",", header = TRUE)
unlink(tmp)
the read.table command throws an error:
Error in open.connection(file, "rt") : cannot open the connection
I have also tried:
# Download file, unzip, and read
url <- "https://www.kaggle.com/c/rossmann-store-sales/download/store.csv.zip"
download.file(url, destfile = "./SourceData/store.csv.zip", mode = "wb")
unzip("./SourceData/store.csv.zip")
Unzip throws the error:
error 1 in extracting from zip file
Bypassing the unzip command and reading directly from the zip file
store <- read_csv("SourceData/store.csv.zip")
Throws the error:
zip file ... SourceData/store.csv.zip cannot be opened
I prefer to use the temp file, but at this point I'll use either approach if I can make it work.
Related
I would like to unzip and read in a shape file from the web in R without relying on rgdal. I found the read.shp function of the fastshp package that can apparently accomplish this without rgdal installed in the environment, however, I'm having trouble implementing.
I would like a function that can unzip and then read in the shape file akin to what's found in this SO post but for the read.shp function. I tried the following but to no avail:
dlshape=function(shploc, format) {
temp=tempfile()
download.file(shploc, temp)
unzip(temp)
shp.data <- sapply(".", function(f) {
f <- file.path(temp, f)
return(read.shp(".", format))
})
}
shp_object<-dlshape('https://www2.census.gov/geo/tiger/TIGER2017/COUNTY/tl_2017_us_county.zip', 'polygon')
Error in read.shp(".", format) : unused argument (format)
I also tried the following:
dlshape=function(shploc) {
temp=tempfile()
download.file(shploc, temp)
unzip(temp)
shp.data <- sapply(".", function(f) {
f <- file.path(temp, f)
return(read.shp("."))
})
}
shp_object<-dlshape('https://www2.census.gov/geo/tiger/TIGER2017/COUNTY/tl_2017_us_county.zip')
Error in file(shp.name, "rb") : cannot open the connection
In addition: Warning messages:
1: In file(shp.name, "rb") : 'raw = FALSE' but '.' is not a regular file
2: In file(shp.name, "rb") :
Show Traceback
Rerun with Debug
Error in file(shp.name, "rb") : cannot open the connection
I suspect it has to do with the fact that in the function read.shp() I'm feeding it the folder name and not the .shp name (for readOGR that works but not for read.shp). Any assistance is much appreciated.
You can use unzip() from utils and read_sf() from sf to unzip and then load your shapefile. Here is a working example:
# Create temp files
temp <- tempfile()
temp2 <- tempfile()
# Download the zip file and save to 'temp'
URL <- "https://www2.census.gov/geo/tiger/TIGER2017/COUNTY/tl_2017_us_county.zip"
download.file(URL, temp)
# Unzip the contents of the temp and save unzipped content in 'temp2'
unzip(zipfile = temp, exdir = temp2)
# Read the shapefile. Alternatively make an assignment, such as f<-sf::read_sf(your_SHP_file)
sf::read_sf(temp2)
I am trying to import csv-files from a ftp server to R.
It would be best to import files into a dataframe.
I want to import only specific files from ftp server, not all of the files.
My issues began by trying to import only one file:
url <- "ftp:servername.de/"
download.file(url, "testdata.csv")
I got this error message:
try URL 'ftp://servername/'
Fehler in download.file(url, "testdata") :
can not open 'ftp://servername.de/'
Additional Warning
In download.file(url, "tesdata.csv") :
URL 'ftp://servername/': status was 'Couldn't connect to server'
Another way I tried was:
url <- "ftp://servername.de/"
userpwd <- "a:n"
filenames <- getURL(url, userpwd = userpwd
,ftp.use.epsv = FALSE, dirlistonly = TRUE
)
Here I do not understand how to import the files into an R-Object.
Additionally, it would be great to get a clue on how to handle this process with zipped data instead of csv-data (format: .gz)
Use the curl library to extract the directory listing
library(curl)
url = "ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2015/11/PXD000299/"
h = new_handle(dirlistonly=TRUE)
con = curl(url, "r", h)
tbl = read.table(con, stringsAsFactors=TRUE, fill=TRUE)
close(con)
head(tbl)
V1
1 12-0210_Druart_Uterus_J0N-Co_1a_ORBI856.raw.mzML
2 12-0210_Druart_Uterus_J0N-Co_2a_ORBI857.raw.mzML
3 12-0210_Druart_Uterus_J0N-Co_3a_ORBI858.raw.mzML
4 12-0210_Druart_Uterus_J10N-Co_1a_ORBI859.raw.mzML
5 12-0210_Druart_Uterus_J10N-Co_2a_ORBI860.raw.mzML
6 12-0210_Druart_Uterus_J10N-Co_3a_ORBI861.raw.mzML
Paste the relevant ones on to the url and use
urls <- paste0(url, tbl[1:5,1])
fls = basename(urls)
curl_fetch_disk(urls[1], fls[1])
Reference:
Downloading files from ftp with R
In Rshiny application, I am using zip from the utils package and trying to download a zip using the downloadHandler.
So, I can download a folder having csv, images in it by zipping but when I am trying to download a folder having a workbook (excel) file in it by zipping, it's giving me following error:
Warning: Error in path.expand: invalid 'path' argument
[No stack trace available]
Here is my code snippet:
temporary_directory <- tempdir()
#ZIP purpose directory creation
output_directory <-
paste("network_data_bundle", gsub("[ :]", "_", Sys.time()), sep = "_")
network_data_bundle_path <-
paste(temporary_directory, output_directory, sep = "/")
## Create a new workbook
wb <- createWorkbook("temp_wrokbook")
#************* START: ADDING INSTRUCTION SHEET ************#
# adding instructions worksheet to the workbook
addWorksheet(wb, CONSTANT_MM_WORKBOOK_SHEET_INSTRUCTION, gridLines = TRUE)
# loading instructions sheet workbook
instructions_wb <- loadWorkbook(CONSTANT_MM_WORKBOOK_INSTRUCTION_FILE_NAME)
wb <- copyWorkbook(instructions_wb)
temp_mm_data_file <- paste(network_data_bundle_path, "test1.xlsx", sep = "/")
saveWorkbook(wb, temp_mm_data_file, overwrite = TRUE)
# set working directory as tem directory for zip purpose
setwd(temporary_directory)
# zip all the files and sub-directories
zip(zipfile = filename,
files = paste(output_directory, "/", sep = ""))
I have already spent more than couple of hours but couldn't find anything helpful.
I'm trying to download a zipped file from the web, then extract the single kml file within. I have tried several different utils functions to unzip and extract but am not sure how to get the kml that I can begin to work with (in sf package).
zipFileName <- "http://satepsanone.nesdis.noaa.gov/pub/volcano/FIRE/HMS_ARCHIVE/2010/KML/smoke20100101.kml.gz"
smokeFileName <- "smoke20100101.kml"
temp <- tempfile()
download.file(url = zipFileName, destfile = temp)
untar(tarfile = temp, files = smokeFileName)
# Error in getOctD(x, offset, len) : invalid octal digit
untar(tarfile = zipFileName, files = smokeFileName)
# Error in gzfile(path.expand(tarfile), "rb") : cannot open the connection
# In addition: Warning message:
# In gzfile(path.expand(tarfile), "rb") :
# cannot open compressed file 'http://satepsanone.nesdis.noaa.gov/pub/volcano/FIRE/HMS_ARCHIVE/2010/KML/smoke20100101.kml.gz', probable reason 'Invalid argument'
unz(temp, smokeFileName)
# A connection with
# description "C:\\Users\\jvargo\\AppData\\Local\\Temp\\RtmpemFaXC\\file33f82dd83714:smoke20100101.kml"
# class "unz"
# mode "r"
# text "text"
# opened "closed"
# can read "yes"
# can write "yes"
adapted from https://community.rstudio.com/t/download-gz-file-and-extract-kml/13783
library(R.utils)
gzFileURL <- "http://satepsanone.nesdis.noaa.gov/pub/volcano/FIRE/HMS_ARCHIVE/2010/KML/smoke20100101.kml.gz")
smokeZipName <-"smoke20100101.kml.gz"
smokeFileName <- "smoke20100101.kml"
directory <- tempdir()
setwd(directory)
temp <- tempfile(pattern = "", fileext = ".kml.gz")
download.file(url = gzFileURL, destfile = temp)
gunzip(temp)
kmlFile <- list.files(tempdir(), pattern = ".kml")
layers <- st_layers(kmlFile)$name
I have to download several zip files from the website http://www.kase.kz/ru/marketvaluation
This question basically originates from this topic. Having not solved the problem as of now, I tried the following approach:
for (i in 1:length(data[,2])){
URL = data[i, 2]
dir = basename(URL)
download.file(URL, dir)
unzip(dir)
TXT <- list.files(pattern = "*.TXT")
zip <- list.files(pattern = "*.zip")
file.remove(TXT, zip)
}
Now I am facing another problem - after 4th or 5th trial R is giving me:
trying URL 'http://www.kase.kz/files/market_valuation/ru/2017/val170403170409.zip'
Error in download.file(URL, dir) :
cannot open URL 'http://www.kase.kz/files/market_valuation/ru/2017/val170403170409.zip'
In addition: Warning message:
In download.file(URL, dir) :
cannot open URL 'http://www.kase.kz/files/market_valuation/ru/2017/val170403170409.zip': HTTP status was '503 Service Temporarily Unavailable'
I don't know why this is happening. I would appreciate any suggestions/solutions.
Ahh, this was a piece of cake:
for (i in 1:length(data[,2])){
URL = data[i, 2]
dir = basename(URL)
download.file(URL, dir)
unzip(dir)
TXT <- list.files(pattern = "*.TXT")
zip <- list.files(pattern = "*.zip")
file.remove(TXT, zip)
Sys.sleep(sample(10, 1))
}