read.csv.folder - quickly pulling pieces of data from one folder - r

I am currently trying to merge two data files using the map_df code. I have downloaded my dataset [https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-level-data] and placed it within my working directory's file location. It is a file with many separate smaller files. I am hoping to import the dataset quickly using map_df instead of having to name every single file in code. However, when I try to pull the data from that folder:
namedata.df <- read.csv.folder(Namedata, x = TRUE, y = TRUE, header = TRUE, dec = ".", sep = ";", pattern = "csv", addSpec = NULL, back = TRUE)
I get a return of: Error in substr(folder, start = nchar(folder), stop = nchar(folder)) :
object 'Namedata' not found
Why might it be missing the folder? Is there a better way to pull in a folder of data?

Try projectTemplate. When you run load.project() command it loads all csv, xls files as dataframes. The data frame names are same as the file names

Related

Excel Exporting Multiple Data Sets to a Single Spreadsheet File

I am trying to output multiple small data frames to an Excel file. The data frames are residuals and predicted from mgcv models run from a loop. Each is a separate small data set that I am trying to output to a separate worksheet in the same Excel spreadsheet file.
The relevant line of code that is causing the error is from what I can tell this line of code
write.xlsx(resid_pred, parfilename, sheetName=parsheetname, append = TRUE)**
where resid_pred is the residuals predicted data frame, parfilename is the file name and path and
parsheetname is the sheet name.
The error message is
Error in save Workbook(wb, file = file, overwrite = overwrite) : File already exists!
Which makes no sense since the file would HAVE to exist if I am appending to it. Does anyone have a clue?
Amazingly the following code will work:
write.xlsx2(resid_pred, file=parfilename, sheetName= parsheetname, col.names =
TRUE, rowNames = FALSE, append = TRUE, overwrite = FALSE)
The only difference is it is write.xlsx2 not just xlsx.

How to combine path and variable in readr read_csv (list.files for loop)?

I need to mass-import some data for my R project. Following some guide, I wrote a simple for loop which goes like that:
for (for_variable in list.files(path = "./data", pattern = ".csv$")) {
temp <- read_csv(for_variable)
# Some data wranglig
database <- rbind(database, temp)
rm(temp)
}
The problem is that my data is in the data folder in my working directory, as I've specified in list.files(path = "./data"). The problem is that I can't use read_csv(for_variable) because I get an error:
'file_name.csv' does not exist in current working directory
And if I try to specify the path in read_csv, it doesn't understand what 'for_variable' is, it tries to find literal 'for_variable' file in the data folder. So how can I combine path and variable name in read_csv? Or is there any other way of solving the problem?
I would recommend reading this post as it is helpful for importing multiple csv files.
But to help with your specific question, your error is likely caused becauseo you need to pass the full path name for the files you want to import and that can be specified by using the full.names = TRUE argument in list.files(). Passing just the file name contained in for_variable to read_csv won't work.
list.files(path = "./data", full.names = TRUE, pattern = ".csv$")

R: Reading a csv from within 2 zip folders

I am working under some unfortunate circumstances, and need to read in a csv file from within 2 zip folders. What I mean by this is that the file path looks something like this:
//path/folder1.zip/folder2.zip/wanttoread.csv
I tried mimicking the slick work of this problem found here: Extract certain files from .zip , but have had no luck so far. Specifically, when I ran something similar on my end, I got an error message reading
Error in fread(x, sep = ",", header = TRUE, stringsAsFactors = FALSE) :
embedded nul in string:
followed by a bunch of encoded nonsense.
Any ideas on how to handle this problem? Thanks in advance!
Here's an approach using tempdir():
temp<-tempdir(check = TRUE) #Create temporary directory to extract into
unzip("folder1.zip",exdir = temp) #Unzip outer archive to temp directory
unzip(file.path(temp,"folder2.zip"), #Use file.path to generate the path to the inner archive
exdir = file.path(temp,"temp2")) #Extract to a subfolder inside temp
#This covers the case when the outer archive might also have a file named wanttoread.csv
list.files(file.path(temp,"temp2")) #We can see the .csv file is now there
#[1] "wanttoread.csv"
read.csv(file.path(temp,"temp2","wanttoread.csv")) #Read it in
# Var1 Var2
#1 Hello obewanjacobi

Unzip failing due to long name in zipped folder

I want to be able to read and edit spatial SQlite tables that are downloaded from a server. These come compressed.
These zip files have a folder in them that contains information about the model that has been run as the name of the folder, and as such these can sometimes be quite long.
When this folder name gets too long, unziping the folder fails. I ultimately dont need to unzip the file. But i seem to get the same error when I use unz within readOGR.
I cant think of how to recreate a replicate able example but I can give an example of a path that works and one that doesnt.
Works:
"S:\3_Projects\CRC00001\4699-12103\scenario_initialised model\performance_assessment.sqlite"
4699-12103 is the zip file name
and "scenario_initialised model" is the offending subfolder
Fails:
""S:\3_Projects\CRC00001\4699-12129\scenario_tree_canopy_7, number_of_trees_0, roads_False, compliance_75, year_2030, nrz_cover_0.6, green_roofs_0\performance_assessment.sqlite""
4699-12103 is the zip file name
and "scenario_tree_canopy_7, number_of_trees_0, roads_False, compliance_75, year_2030, nrz_cover_0.6, green_roofs_0" is the offending subfolder
The code would work in a similar fashion to this.
list_zips <- list.files(pattern = "*.zip", recursive = TRUE, include.dirs = TRUE)
unzip(zipfile = paste(getwd(),"/",list_zips[i],sep = ""),
exdir=substr(paste(getwd(),"/",list_zips[i],sep = ""),1,nchar(paste(getwd(),"/",list_zips[i],sep = ""))-4))
But I would prefer to directly be able to load the spatial file in without unzipping. Such as:
sq_path <- unzip(list_zips[i], list=TRUE)[2,1]
temp <- unz(paste(getwd(),"/",list_zips[i],sep = ""),sq_path)
vectorImport <- readOGR(dsn=temp, layer="micro_climate_grid")
Any help would be appreciated! Tim

Does readr read_csv allow one to specific the specific file in a zip

The readr package in the tidyverse has the option to automatically unpack a zip file and convert it to a tibble. But I have a zip file that holds multiple csv files. In the line of code below, SSPdataZip has three files in it. When I run it I get a warning "Multiple files in zip ..." and the name of the one it chooses. I know the name of the one I want but can't figure out how to tell read_csv what it is. Is there an option I'm missing?
temp <- readr::read_csv(SSPdataZip, col_names = TRUE, guess_max = 2000)
I believe you can use unz to achieve this:
readr::read_csv(unz(description = "SSPdataZip", filename = "FileName.csv"), col_names = TRUE, guess_max = 2000)

Resources