R zip multiple files from multiple subfolers into one zip with one folder - r

I have some files in some folders/subfolders
subfolder1/file1.csv
subfolder2/file2.csv
when using r to zip these files, with the following code
zip::zip("test.zip",files=c("subfolder1/file1.csv","subfolder2/file2.csv"))
It will create a zip file, but the files will be in their subfolders.
How can I create a zip file without the structure of the subfolders? which means that the files will be in the test.zip directly.

Thanks to stefan's comment, I tried some arguments in the zip::zip function and found out the argument to do so:
zip(..., mode="cherry-pick")

Related

Extract a CSV inside a zip file inside another zip online in R?

I am attempting to extract a CSV that is inside of a zip file nested in another zip file posted online.
The analysis I am doing draws on files that usually have the same name but are updated regularly. Every so often they update format. This time they decided to put multiple versions of the data embedded in zip files inside of a larger zip file.
What have I done and tried?
I know have a list of many other file that I have downloaded and then loaded into objects. In all the cases the code block looks similar to this:
temp <- tempfile()
download.file("http://fakeurl.com/data/filename.zip",temp, mode="wb")
unzip(temp, "data.csv")
db <- read.csv("data.csv", header=T)
I cannot wrap my head around taking it to the next level. Because I am not downloading it 'directly' I do not know how to manipulate it.
Ideally, I want to unzip one file into a temp file, then unzipping the next file, then reading in the csv into a data frame.
I thank you all for your help and will answer any questions you might have to help clarify.
Unzip the downloaded file into the current directory and then iterate through the generated files unzipping any that itself is a zip file.
Files <- unzip(temp)
for(File in Files) if (grepl("\\.zip$", File)) unzip(File)
There also various approaches listed here:
How do I recursively unzip nested ZIP files?
Creating zip file from folders in R
https://superuser.com/questions/1287028/how-to-explore-nested-zip-without-extract
https://github.com/ankitkaushal/nzip

use read_sas to read specific file under a zipped file in R

I have a zipped sas file and there are couple sas files inside that. Just want to know if there is a chance that I can use read_sas function to read specific file under that zipped file? Couldn't find anything online about that.
Checked the ?read_sas nothing mentioned about that.
code I used :
# zipped file name: example.zip
# files inside example.zip file are file1.sas7bdat, file2.sas7bdat and targetfilename.sas7bdat
file <- read_sas(example.zip, 'targetfilename.sas7bdat')
outcome: read_sas only read the first file inside that zipped file.
Sorted solution:
read_sas(unz("examp;e.zip", "'targetfilename.sas7bdat'"))
solved:
read_sas(unz("examp;e.zip", "'targetfilename.sas7bdat'"))

Is there a way to load csv files saved in different folders with only a partial file name in R

I am trying to load multiple csv files that are each saved in different folders within my working directory in R. However I only know part of each of the file name.
For example the file in "folder1" will be named "xxx_xxx_folder1.csv", and the file in "folder2" is "xxx_xxx_folder2.csv" etc. There is only one csv in each folder.
I was wondering is there a way to load files saved in different folders with only a partial file name?
The only way I have got it to partially work so far is to have all the files in one folder
Thanks and sorry if any of this is unclear!
From your description you could use list.files with option recursive=TRUE to get a list of your csv files. You could then loop over the list to read your files:
fn <- list.files(PATH_TO_WORKING_DIRECTORY, "\\.csv$", recursive = TRUE, full.names = TRUE)
lapply(fn, read.csv)

Import multiple csv files into R from zip folder

I know that this question has been asked exhaustively on this site, but I cannot find any question which addresses my problem.
I am trying to import multiple .csv files into R which are located in nested .zip files on my PC. The other questions seem to relate to importing a single file from a URL, which is not my issue.
I have set my working directory to the folder which contains the first .zip file, but there is another one inside of it, which then contains normal folders, and finally hundreds of .csv files which I am looking to access.
Up to now I have always manually extracted the data since I have no idea where to begin with unzipping code, but considering this folder contains around 20GB of data, I'm going to need to try something else.
Any help would be appreciated!
EDIT - CODE:
setwd("C:/docs/data/241115")
temp <- tempfile()
unzip("C:/docs/data/241115/Requested.zip",exdir=temp)
l = list.files(temp)
unzip("C:/docs/data/241115/Requested/Data Requested.zip",exdir=temp)
> error 1 in extracting from zip file
Without a minimal reproducible example it's difficult to know exactly where the problem lies. My best guess is that using a tempfile() is causing problems.
I would create a folder within your working directory to unzip the files to. You can do this from within R if you like:
# Create the folder 'temp' in your wd
dir.create("temp")
Now assuming your zip file is in the working directory I would unzip the first .zip in to temp in one step:
unzip("Requested.zip", exdir = "temp")
Finally, unzip the final .zip:
unzip("temp/Data Requested.zip", exdir = "temp")

Reading data from zip files located in zip files with R

I'd like to use R to extract data from zip files located in zip files (i.e. preform some ZIP file inception).
An example "directory" of one of my datapoints looks like this:
C:\ZipMother.zip\ZipChild.zip\index.txt
My goal is to read in the "index.txt" from each ZipChild.zip. The issue is that I have 324 ZipMother.zip files with an average of 2000 ZipChild.zip files, therefore unzipping the ZipMother.zip files is a concern due to memory constraints (the ZipMother files are about 600 megabytes on average).
With the unzip package, I can successfully get the filepaths of each ZipChild located in the ZipMother, but I cannot use it to list the files located in the ZipChild folders.
Therefore,
unzip("./ZipMother.zip",list=TRUE)
works just fine, but...
unzip("./ZipMother.zip/ZipChild.zip",list=TRUE)
gives me the following error
Error in unzip("./ZipMother.zip/ZipChild.zip", list = TRUE) :
zip file './ZipMother.zip/ZipChild.zip' cannot be opened
Is there any way to use unzip or another method to extract the data from the ZipChild files?
Once I get this to work, I plan on using the ldply function to compile the index.txt files into a dataset.
Any input is very much appreciated. Thank you!
A reproducible example (i.e. a link to a zip file with the appropriate structure) would be useful, but how about:
tmpd <- tempdir()
## extract just the child
unzip("./ZipMother.zip",
files="zipChild.zip",exdir=tmpd)
ff <- file.path(tmpd,"zipChild.zip")
index <- unzip(ff,list=TRUE)
unlink(ff)
This could obviously be packaged into a function for convenience.
It could be slow, but it means you never have to unpack more than one child at a time ...

Resources