R: Reading a csv from within 2 zip folders - r

I am working under some unfortunate circumstances, and need to read in a csv file from within 2 zip folders. What I mean by this is that the file path looks something like this:
//path/folder1.zip/folder2.zip/wanttoread.csv
I tried mimicking the slick work of this problem found here: Extract certain files from .zip , but have had no luck so far. Specifically, when I ran something similar on my end, I got an error message reading
Error in fread(x, sep = ",", header = TRUE, stringsAsFactors = FALSE) :
embedded nul in string:
followed by a bunch of encoded nonsense.
Any ideas on how to handle this problem? Thanks in advance!

Here's an approach using tempdir():
temp<-tempdir(check = TRUE) #Create temporary directory to extract into
unzip("folder1.zip",exdir = temp) #Unzip outer archive to temp directory
unzip(file.path(temp,"folder2.zip"), #Use file.path to generate the path to the inner archive
exdir = file.path(temp,"temp2")) #Extract to a subfolder inside temp
#This covers the case when the outer archive might also have a file named wanttoread.csv
list.files(file.path(temp,"temp2")) #We can see the .csv file is now there
#[1] "wanttoread.csv"
read.csv(file.path(temp,"temp2","wanttoread.csv")) #Read it in
# Var1 Var2
#1 Hello obewanjacobi

Related

How to combine path and variable in readr read_csv (list.files for loop)?

I need to mass-import some data for my R project. Following some guide, I wrote a simple for loop which goes like that:
for (for_variable in list.files(path = "./data", pattern = ".csv$")) {
temp <- read_csv(for_variable)
# Some data wranglig
database <- rbind(database, temp)
rm(temp)
}
The problem is that my data is in the data folder in my working directory, as I've specified in list.files(path = "./data"). The problem is that I can't use read_csv(for_variable) because I get an error:
'file_name.csv' does not exist in current working directory
And if I try to specify the path in read_csv, it doesn't understand what 'for_variable' is, it tries to find literal 'for_variable' file in the data folder. So how can I combine path and variable name in read_csv? Or is there any other way of solving the problem?
I would recommend reading this post as it is helpful for importing multiple csv files.
But to help with your specific question, your error is likely caused becauseo you need to pass the full path name for the files you want to import and that can be specified by using the full.names = TRUE argument in list.files(). Passing just the file name contained in for_variable to read_csv won't work.
list.files(path = "./data", full.names = TRUE, pattern = ".csv$")

R renaming file extension

I have tried looking at File extension renaming in R and using the script without any luck. My question is very much the same.
I have a bunch of files with the a file extension that I want to change. I have used the following code but cannot get the last step to work.
I know similar questions have been asked before but I'm simply stuck and therefore reaching out anyway.
startingDir<-"/Users/anders/Documents/Juni 2019/DATA"
endDir<-"/Users/anders/Documents/Juni 2019/DATA/formatted"
#List over files in startingDir with the extension .zipwblibcurl that I want to replace
old_files<-list.files(startingDir,pattern = "\\.zipwblibcurl")
#View(old_files)
#Renaming the file extension and making a new list i R changing the file extension from .zipwblibcurl to .zip
new_files <- gsub(".zipwblibcurl", ".zip", old_files)
#View(new_files)
#Replacing the old files in the startingDir. Eventually I would like to move them to newDir. For simplicity I have just tried as in the other post without any luck:...
file.rename( old_files, new_files)
After running file.rename I get the output FALSE for every entry.
The full answer here, including comment from #StephaneLaurent: make sure that you have full.names = TRUE inside the list.files(); otherwise the path to the file will not be captured, just the file name.
Full working snippet:
old = list.files(startingDir,
pattern = "\\.zipwblibcurl",
full.names = TRUE) #
# replace the file names
new <- gsub(".zipwblibcurl", ".zip", old )
# Rename old files names to the new file names
file.rename(old, new)
Like #StéphaneLaurent said, it's most likely that R tries to look in the current working directory for the files and can't find them. You can correct this by adding
file.rename(paste(startingDir, old_files, sep = "/"), paste(newDir, new_files, sep = "/"))

Unzip failing due to long name in zipped folder

I want to be able to read and edit spatial SQlite tables that are downloaded from a server. These come compressed.
These zip files have a folder in them that contains information about the model that has been run as the name of the folder, and as such these can sometimes be quite long.
When this folder name gets too long, unziping the folder fails. I ultimately dont need to unzip the file. But i seem to get the same error when I use unz within readOGR.
I cant think of how to recreate a replicate able example but I can give an example of a path that works and one that doesnt.
Works:
"S:\3_Projects\CRC00001\4699-12103\scenario_initialised model\performance_assessment.sqlite"
4699-12103 is the zip file name
and "scenario_initialised model" is the offending subfolder
Fails:
""S:\3_Projects\CRC00001\4699-12129\scenario_tree_canopy_7, number_of_trees_0, roads_False, compliance_75, year_2030, nrz_cover_0.6, green_roofs_0\performance_assessment.sqlite""
4699-12103 is the zip file name
and "scenario_tree_canopy_7, number_of_trees_0, roads_False, compliance_75, year_2030, nrz_cover_0.6, green_roofs_0" is the offending subfolder
The code would work in a similar fashion to this.
list_zips <- list.files(pattern = "*.zip", recursive = TRUE, include.dirs = TRUE)
unzip(zipfile = paste(getwd(),"/",list_zips[i],sep = ""),
exdir=substr(paste(getwd(),"/",list_zips[i],sep = ""),1,nchar(paste(getwd(),"/",list_zips[i],sep = ""))-4))
But I would prefer to directly be able to load the spatial file in without unzipping. Such as:
sq_path <- unzip(list_zips[i], list=TRUE)[2,1]
temp <- unz(paste(getwd(),"/",list_zips[i],sep = ""),sq_path)
vectorImport <- readOGR(dsn=temp, layer="micro_climate_grid")
Any help would be appreciated! Tim

Data table fread with zip file in other directory with spaces in the name

I am trying to read a csv in a zip file by using the command fread("unzip -cq file.zip") which works perfectly when the file is in my working directory.
But when I try the command by specifying the path of the file without changing the directory say fread("unzip -cq C:/Users/My user/file.zip") I get an error saying the following unzip: cannot find either C:/Users/My or C:/Users/My.zip
The reason why this happens is that there are spaces in my path but what would be the workaround?
The only option that I have thought is to just change to the directory where each file is located and read it from there but this is not ideal.
I use shQuote for this, like...
fread_zip = function(fp, silent=FALSE){
qfp = shQuote(fp)
patt = "unzip -cq %s"
thecall = sprintf(patt, qfp)
if (!silent) cat("The call:", thecall, sep="\n")
fread(thecall)
}
Defining a pattern and then substituting in with sprintf can keep things readable and easier to manage. For example, I have a similar wrapper for .tar.gz files (which apparently need to be unzipped twice with a | pipe between the steps).
If your zip contains multiple csvs, fread isn't set up to read them all (though there's an open issue). My workaround for that case currently looks like...
library(magrittr)
fread_zips = function(fp, unzip_dir = file.path(dirname(fp), sprintf("csvtemp_%s", sub(".zip", "", basename(fp)))), silent = FALSE, do_cleanup = TRUE){
# only tested on windows
# fp should be the path to mycsvs.zip
# unzip_dir should be used only for CSVs from inside the zip
dir.create(unzip_dir, showWarnings = FALSE)
# unzip
unzip(fp, overwrite = TRUE, exdir = unzip_dir)
# list files, read separately
# not looking recursively, since csvs should be only one level deep
fns = list.files(unzip_dir)
if (!all(tools::file_ext(fns) == "csv")) stop("fp should contain only CSVs")
res = lapply(fns %>% setNames(file.path(unzip_dir, .), .), fread)
if (do_cleanup) unlink(unzip_dir, recursive = TRUE)
res
}
So, because we're not passing a command-line call directly to fread, there's no need for shQuote here. I wrote and used this function yesterday, so there are probably still some oversights or bugs.
The magrittr %>% pipe part could be written as setNames(file.path(unzip_dir, fns), fns) instead.
Try to assign the location to a variable and use paste to call the zip file like below:
myVar<-"C:/Users/Myuser/"
fread(paste0("unzip -cq ",myVar,"file.zip"))

Error comes while importing files by data.table

I'm new to R studio and was not well aware of this portal T&C, so was blocked for questing for 5 days.
I have a code for importing multiple files from any directory to R.
Using this code for doing so, but the problem is this code runs sometime and sometime it gets failed with mentioned error.
I tried to found the solution of this but yet not found any solution.
library(data.table)
t = setwd("/home/dp/vishan/olp_data/19164/1/")
files <- file.info(list.files(path = t,pattern = "", full.names=TRUE))
files = rownames(files)[files$size > 0]
temp <- lapply(files, fread, sep=",")
Error:
Error in FUN(X[[i]], ...) :
'input' can not be a directory name, but must be a single character string containing a file name, a command, full path to a file, a URL starting 'http[s]://', 'ftp[s]://' or 'file://', or the input data itself.
Thanks in advance!
try using
files <- file.info(list.files(path = t,pattern = "", full.names=TRUE))
files <- subset(files, !isdir & size > 0)
temp <- lapply(rownames(files), fread, sep=',')
since list.files also shows directories. The data.frame you create in files can be easily subset on the isdir column which indicates if this is a directory or a file.

Resources