Check existence of file in archive (zip) - r

I'm using unz to extract data from a file within an archive. This actually works pretty well but unfortunately I've a lot of zip files and need to check the existence of a specific file within the archive. I could not manage to get a working solution with if exists or else.
Has anyone an idea how to perform a check if a file exists in an archive without extracting the whole archive before?
Example:
read.table(unz(D:/Data/Test.zip, "data.csv"), sep = ";")[-1,]
This works pretty well if data.csv exists but gives an error if the file is not available in the archive Test.zip.
Error in open.connection(file, "rt") : cannot open the connection
In addition: Warning message:
In open.connection(file, "rt") :
cannot locate file 'data.csv' in zip file 'D:/Data/Test.zip'
Any comments are welcome!

You could use unzip(file, list = TRUE)$Name to get the names of the files in the zip without having to unzip it. Then you can check to see if the files you need are in the list.
## character vector of all file names in the zip
fileNames <- unzip("D:/Data/Test.zip", list = TRUE)$Name
## check if any of those are 'data.csv' (or others)
check <- basename(fileNames) %in% "data.csv"
## extract only the matching files
if(any(check)) {
unzip("D:/Data/Test.zip", files = fileNames[check], junkpaths = TRUE)
}
You could probably put another if() statement to run unz() in cases where there is only one matched file name, since it's faster than running unzip() on a single file.

Related

Why am i getting this error when try to read my file on R

# Removing all Objects from current environment.
rm(list = ls())
# Setting directory
setwd("C:/Users/Ashwin/Desktop")
# Task 2
RG <- read.csv("Assignment3")
In file(file, "rt") :
cannot open file 'Assignment3': No such file or directory
When reading the file you need to provide the correct file name including the extension, the file name is character size sensitive.
You are getting the error because the name is not matching the file in the directory.
Solutions:
Add an extension eg. .csv, .DAT, .txt
Make sure the file is in the path "C:/Users/Ashwin/Desktop"
MAke sure the file name is correct and the characters are the same size. eg. if file in the drive is "assignment3 you need to write in R "assignment3 as well.
You can also use:
RG <- read.csv(file.choose())
This way a window will pop up to select your file in the file from the folder.

How to read in file with dynamic name while avoiding hard-coding in R?

I run into issues reading in csv files with dynamic names and avoiding hard coding the file path. I'd like short tidy code (non-hardcoded). If I hardcode the full path (everything before the "~") it reads in the files fine. But soft-coding (if that is the opposite of hard coding) the file path it gives the error (despite showing the correct path in the error. I have two variable parts of the file name that I paste into the file name before reading it in. If I avoid paste and just type a path per individual it also works.
#dynamic part I usually have in a loop with all the options.
part_a <- "outside" #other options here in my loop include "inside"
part_b <- "late" # other option "early" or "preterm"
#reading in the df
df <-read.csv(paste0("~/Data/FromR/clean_",part_a,part_b,"_2016.csv"),
check.names=FALSE, na.strings="null")
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'C:/Users/myname/Documents/Data/FromR/clean_outsidelate_2016.csv': No such file or directory
if I use getwd() in the first part of the paste in place of ~ as suggested here it works by producing this string "C:/Users/myname/Documents/MyR_Projects/Specific_R_project/" at the beginning of the paste. But how can I get it to work with the "~"? when using the ~ it stops at the "Documents" folder...
The desired outcome is to read in the file without error perform functions and repeat with other files. My loop works fine hardcoded, and I only wanted to make it more general or softcoded.
I just tried to read a file (testFile.txt) in my home from a different wdand it works fine with ~
myFile <- "testFile
mymy <- ".txt"
ciao <- read.delim(paste0("~/",myFile,mymy))
In powershell you can use %~% (have a look here tread), but I am not sure how to expand the $HOME in R.
#-------- edit
Have a look here and here. Basically any variable defined in your .Renviron should be accessible.

Use of wildcards with readtext()

A basic question. I have a bunch of transcripts (.docx files) I want to read into a corpus. I use readtext() to read in single files no problem.
dat <- readtext("~/ownCloud/NLP/interview_1.docx")
As soon as I put "*.docx" in my readtext statement it spits an error.
dat <- readtext("~/ownCloud/NLP/*.docx")
Error: '/var/folders/bl/61g7ngh55vs79cfhfhnstd4c0000gn/T//RtmpWD6KSx/readtext-aa71916b691c0cf3cabc73a2e04a45f7/word/document.xml' does not exist.
In addition: Warning message:
In utils::unzip(file, exdir = path) : error 1 in extracting from zip file
Why the reference to a zip file? I have only .docx files in the directory.
I was able to reproduce the same problem. The issue was there are some hidden/temp .docx files in that folder, if you delete them and then try the code it works.
To see the hidden files, go to the folder from where you are reading docx files and based on your OS select a way to show them. On my mac I used
CMD + SHIFT + .
Once you delete them, try the code again and it should work
library(readtext)
dat <- readtext("~/ownCloud/NLP/*.docx")

Function reading in a folder of files returning "Error in file(con, "r"): cannot open conection..."

I'm trying to import a folder of files into R. The following code works for one folder that contains the same type of files, but will not work for another folder. The type of data is the same (both debian files formatted in the same way, just containing different subject's data).
The following code allows me to read all the files (named subject1-subject10) in the "Data1" folder and put it into a list named Data:
files <- as.character(list.files(path="/Users/wendy/Box Sync/Data1"))
data <- list()
for (i in seq_along(files)) {
data[[i]] <- readLines(files[[i]])
}
But the following code does not work - this folder (Data2) contains subject11 - subject50:
files <- as.character(list.files(path="/Users/wendy/Box Sync/Data2"))
data <- list()
for (i in seq_along(files)) {
data[[i]] <- readLines(files[[i]])
}
This brings up the following message:
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file 'subject11': No such file or directory
I'm confused, because both folders, containing their respective subject data are in the same file path, except for the last folder name in the path.
The second folder (Data2) differs only in the following ways:
Number of files in the folder
contains different subjects
There is more data (more variables) recorded in "Data2" (e.g. recording age, height, race in Data 2 versus only recording age and height in Data1)
If I were to put some of Data2's files into the Data1 folder and run the top code again, it will produce the same error message as when I run the second code chunk.
You should add the full.names option.
list.files(path="/Users/wendy/Box Sync/Data2", full.names = TRUE)
Without it, it only outputs the name of the files, and thus it works only if files with that exact file name are found in the current working directory.

read csv file from zipped temp file with multiple folders in R

I am trying to read a csv file that is contained in a file I extracted from the web. The problem is the zipped file has multiple cascading folders. I have to do that for several different units, so I am performing a loop. There is no problem with the loop, the file name is correct and I get to download the file. However I get an error message (and I think is because R cannot find the exact file I am asking it to find). The error is:
Error in open.connection(file, "rt") : cannot open the connection
In addition: Warning message:
In open.connection(file, "rt") :
cannot locate file 'XXXX.csv' in zip file 'c:\yyy\temp\bla\'
download.file(paste("http://web.com_",units[i],"_",places[j],".zip",
sep=""),
temp,
cacheOK = F )
data <- read.csv2(unz(temp,
paste("name_",units[i],"_",places[j],".csv",
sep="")),
header=F,
skip=1)
unlink(temp)
fili<-rbind(X,
data)
}
How do I make R find the file I want?
You have the right approach but (as the warning tells you) the wrong filename.
It's worth double checking that the zip file does exist before you start trying to read its contents.
if(file.exists(temp))
{
read.csv2(unz(...))
} else
{
stop("ZIP file has not been downloaded to the place you expected.")
}
It's also a good idea to a browse around inside the downloaded file (you may wish to unzip it first) to make sure that you are looking in the right place for the CSV contents.
It looks like the file, you're going to read, is located in directory. In this case your reading should be changed as follows:
data <- read.csv2(unz(temp,
paste("**dirname**/name_",units[i],"_",places[j],".csv",
sep="")),
header=F,
skip=1)

Resources