How to read csv from a zip file? - r

I am trying to automate fetch of chat transcripts using an API provided by the vendor. On a successful request to the API, the response contains a link from which the chat transcripts can be downloaded as a zip containing 1 csv file with the required data.
Following the steps in the link here, I was able to download the zip successfully from the link in R and store it in the temp folder. However I wasn't able to extract the csv from the zip file
temp = tempfile(pattern = "", fileext = ".zip")
download.file(download_link,temp, mode = "wb")
file_name <- as.character(unzip(temp, list = TRUE)$Name)
con <- unz(temp,file_name)
chatsData <- read.csv(con, header = T)
I received the following error on the last line-
Error in open.connection(file, "rt") : cannot open the connection
In addition: Warning message:
In open.connection(file, "rt") :
cannot open zip file 'C:\Users\Public\Documents\Wondershare\CreatorTemp\RtmpqWLYGf\file4a5435b13659:2021-04-05T10:00_2021-04-06T10'
On checking the temp location, I was able to locate, unzip the file and read its content using WinRar. Just clueless as to why this cant be replicated in code in R.
You can download a sample of the zipfile that I am trying to extract the csv from the following link

There is a special package on CRAN that brings everything necessary to zip and unzip archives:
https://cran.r-project.org/package=zip
If you are sure you downloaded your zip file to a local directory you might be able to use unzip() function this package provides to extract your desired CSV.
You could download your file, unzip it, read the contained csv and delete both the original zip and the csv if you want to keep your harddrive "clean"...

Related

How to st_read a .shp-file from a subfolder in a .zip-file without unzipping it in R?

i try to read a .shp file from a subfolder in a .zip file without unzipping it.
I tried
con <- unz(description = "C:/Test/File.zip", filename = "subfolder/shape.shp")
db <- st_read(con)
but it failed in some way and receive this message.
Cannot open; The file doesn't seem to exist.
Any advise?

How to read in an xls file after using unzip() to place it into a temporary folder?

I receive weekly emails from a database as zip files. In the zip file is a single xls file. When I use unzip() and place the xls file into a designated shared network directory, I cannot use any xls read functions to actually access and manipulate the data (I haven't tried read.xls because of the Perl dependencies, but I'm willing to if there are no other options).
I've tried every Excel reader I can find.
read_xls(unzip(zipfile = zfile, files = "Data Extract.xls", exdir = "~\\Excel Files"))
Errors include the following where I would simply expect a dataframe output:
"libxls error: Unable to open file"
"Error in open.connection(file, "rt") : cannot open the connection
In addition: Warning message:
In open.connection(file, "rt") :
cannot open zip file 'Data.zip:F'
EDIT: It turns out that despite the DB interface calling this file within the zip an .xls file, it's an HTML file and readHTMLTable() from library(XML) did the trick just fine. Thank you for the questions which lead me to looking at ths issue from a different angle.

RStudio : Problem with downloading a ZIP file from an URL and read CSV files from ZIP file

I am relatively new to R programming. I am trying to download a few zip files which contain CSV files using URL and read them. Below are the code, URL and the errors. From errors I suspect it is only downloading some text or html code and not the ZIP file (the download is only 10KB as against 396KB for the ZIP file as shown on website). I have tried downloading a few other datsets from same site, but having the same issue. Appreciate if someone can help. Please note, I can directly download the ZIP files, extract and view the CSV files.
tempdl <- tempfile()
download.file("https://www.kaggle.com/russellyates88/suicide-rates-overview-1985-to-2016/downloads/suicide-rates-overview-1985-to-2016.zip",tempdl, mode="wb")
unzip(tempdl, "master.csv")
data <- read.table("master.csv", sep=",")
the error I get is:
> download.file("https://www.kaggle.com/russellyates88/suicide-rates-overview-1985-to-2016/downloads/suicide-rates-overview-1985-to-2016.zip",tempdl, mode="wb")
trying URL 'https://www.kaggle.com/russellyates88/suicide-rates-overview-1985-to-2016/downloads/suicide-rates-overview-1985-to-2016.zip'
Content type 'text/html; charset=utf-8' length unknown
downloaded 10 KB
> unzip(tempdl, "master.csv")
Warning message:
In unzip(tempdl, "master.csv") : error 1 in extracting from zip file
> data <- read.table("master.csv", sep=",")
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'master.csv': No such file or directory

To read a zipped (csv) file downloaded in a temp file

I was trying code given by Dirk to download and read a zipped csv file from a website. The code :
temp <- tempfile()
download.file("https://datacatalog.worldbank.org/dataset/Economic_Fitness_CSV.zip" ,temp)
I can see that the file is being read into a temp file.
temp
C:\\Users\\DCC\\AppData\\Local\\Temp\\Rtmpu61Cca\\file27346fd263d0
However, I am getting an error when I run the following code
data <- read.csv(unz(temp, "Economic_Fitness_CSV.zip"))
unlink(temp)
Warning message in unzip(temp, "C:/Users/DCC/AppData/Local/Temp/Rtmpu61Cca/Economic_Fitness_CSV.csv"):
"error 1 in extracting from zip file"
Error: 'Economic_Fitness_CSV.csv' does not exist in current working directory ('D:/MADS/02 DATA422_Data Wrangling/LABs/Lab 1').
Traceback:
I don't understand this error - how is 'Economic_Fitness_CSV.csv' supposed to exist in my working directory before it has been extracted / copied from the temp file ?
Maybe, I am missing out a simple point - but have not been able to resolve this on my own.

R exdir does not exist error

I'm trying to download and extract a zip file using R. Whenever I do so I get the error message
Error in unzip(temp, list = TRUE) : 'exdir' does not exist
I'm using code based on the Stack Overflow question Using R to download zipped data file, extract, and import data
To give a simplified example:
# Create a temporary file
temp <- tempfile()
# Download ZIP archive into temporary file
download.file("http://cran.r-project.org/bin/windows/contrib/r-release/ggmap_2.2.zip",temp)
# ZIP is downloaded successfully:
# trying URL 'http://cran.r-project.org/bin/windows/contrib/r-release/ggmap_2.2.zip'
# Content type 'application/zip' length 4533970 bytes (4.3 Mb)
# opened URL
# downloaded 4.3 Mb
# Try to do something with the downloaded file
unzip(temp,list=TRUE)
# Error in unzip(temp, list = TRUE) : 'exdir' does not exist
What I've tried so far:
Accessing the temp file manually and unzipping it with 7zip: Can do this no problem, file is there and accessible.
Changing the temp directory to c:\temp. Again, the file is downloaded successfully, I can access it and unzip it with 7zip but R throws the exdir error message when it tries to access it.
R version 2.15.2
R-Studio version 0.97.306
Edit: The code works if I use unz instead of unzip but I haven't been able to figure out why one works and the other doesn't. From CRAN guidance:
unz reads (only) single files within zip files...
unzip extracts files from or list a zip archive
On a windows setup:
I had this error when I had exdir specified as a path. For me the solution was removing the trailing / or \\ in the path name.
Here's an example and it did create the new folder if it didn't already exist
locFile <- pathOfMyZipFile
outPath <- "Y:/Folders/MyFolder"
# OR
outPath <- "Y:\\Folders\\MyFolder"
unzip(locFile, exdir=outPath)
This can manifest another way, and the documentation doesn't make clear the cause. Your exdir cannot end in a "/", it must be just the name of the target folder.
For example, this was failing with 'exdir' does not exist:
unzip(temp, overwrite = F, exdir = "data_raw/system-data/")
And this worked fine:
unzip(temp, overwrite = F, exdir = "data_raw/system-data")
Presumably when unzip sees the "/" at the end of the exdir path it keeps looking; whereas omitting the "/" tells unzip "you've found it, unzip here".
A couple of years late but I still get this error when trying to use unzip(). It appears to be a bug because the man pages for unzip state if exdir is specified it will be created:
exdir The directory to extract files to (the equivalent of unzip -d).
It will be created if necessary.
A workaround I've been using is to manually create the necessary directory:
dir.create("directory")
unzip("file-to-unzip.zip", exdir = "directory/")
A pain, but it seems to work, at least for me.
I am using R3.2.1 on a Windows 7 machine.
The way I found to address this issue takes a few steps, but it works for me:
Create a vector that contains the name of the url from where you are downloading the file, e.g.
file_url <- "http://your.file.com/file_name.zip"
Use download.file to specify the url where you are downloading the file from (using your newly created vector), followed by the file name of the zipped file (that should be the last part of the url name). It will be saved as such in your working directory*, e.g.
download.file(file_url, "file_name.zip")
*If you are not sure of your working directory, you can use getwd() to check it. If you want to change your working directory, you can use setwd("C:users/username/...") to set it to what you want.
Use "unzip" to unzip the file into your working directory, with the name you will set using exdir, e.g.
unzip("file_name.zip", exdir = "file_name")
To check your work, you can use list.files, e.g.
list.files("file_name")
Hope this helps!

Resources