Looping through files using dynamic name variable in R - r

I have a large number of files to import which are all saved as zip files.
From reading other posts it seems I need to pass the zip file name and then the name of the file I want to open. Since I have a lot of them I thought I could loop through all the files and import them one by one.
Is there a way to pass the name dynamically or is there an easier way to do this?
Here is what I have so far:
Temp_Data <- NULL
Master_Data <- NULL
file.names <- c("f1.zip", "f2.zip", "f3.zip", "f4.zip", "f5.zip")
for (i in 1:length(file.names)) {
zipFile <- file.names[i]
dataFile <- sub(".zip", ".csv", zipFile)
Temp_Data <- read.table(unz(zipFile,
dataFile), sep = ",")
Master_Data <- rbind(Master_Data, Temp_Data)
}
I get the following error:
In open.connection(file, "rt") :
I can import them manually using:
dt <- read.table(unz("D:/f1.zip", "f1.csv"), sep = ",")
I can create the sting dynamically but it feels long winded - and doesn't work when I wrap it with read.table(unz(...)). It seems it can't find the file name and so throws an error
cat(paste(toString(shQuote(paste("D:/",zipFile, sep = ""))),",",
toString(shQuote(dataFile)), sep = ""), "\n")
But if I then print this to the console I get:
"D:/f1.zip","f1.csv"
I can then paste this into `read.table(unz(....)) and it works so I feel like I am close
I've tagged in data.table since this is what I almost always use so if it can be done with 'fread' that would be great.
Any help is appreciated

you can use the list.files command here:
first set your working directory, where all your files are stored there:
setwd("C:/Users/...")
then
file.names = list.files(pattern = "*.zip", recursive = F)
then your for loop will be:
for (i in 1:length(file.names)) {
#open the files
zipFile <- file.names[i]
dataFile <- sub(".zip", ".csv", zipFile)
Temp_Data <- read.table(unz(zipFile,
dataFile), sep = ",")
# your function for the opened file
Master_Data <- rbind(Master_Data, Temp_Data)
#write the file finaly
write_delim(x = Master_Data, path = paste(file.names[[i]]), delim = "\t",
col_names = T )}

Related

R: Importing Entire Folder of Files

I am using the R programming language (in R Studio). I am trying to import an entire folder of ".txt" files (notepad files) into R and "consistently" name them.
I know how to do this process manually:
#find working directory:
getwd()
[1] "C:/Users/Documents"
#import files manually and name them "consistently":
df_1 <- read.table("3rd_file.txt")
df_2 <- read.table("file_1.txt")
df_3 <- read.table("second_file.txt")
Of course, this will take a long time to do if there are 100 files.
Right now, suppose these files are in a folder : "C:/Users/Documents/files_i_want"
Is there a way to import all these files at once and name them as "df_1", "df_2", "df_3", etc.?
I found another stackoverflow post that talks about a similar problem: How to import folder which contains csv file in R Studio?
setwd("where is your folder")
#
#List file subdirectories
folders<- list.files(path = "C:/Users/Documents/files_i_want")
#
#Get all files...
files <- rep(NA,0)
for(i in c(1:length(folders)))
{
files.i <- list.files(path = noquote(paste("C:/Users/Documents/files_i_want/",folders[i], "/", sep = "")))
n <- length(files.i)
files.i <- paste(folders[i], files.i, sep = "/")
files <- c(files, files.i)
}
#
#
#Read first data file (& add file name as separate column)
T1 <- read.delim(paste("C:/Users/Documents/files_i_want", files[1], sep = ""), sep = "", header=TRUE)
T1 <- cbind(T1, "FileName" = files[1])
But this produces the following error:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
Is this because there is a problem in the naming convention?
Thanks
You can try the following :
#Get the path of filenames
filenames <- list.files("C:/Users/Documents/files_i_want", full.names = TRUE)
#Read them in a list
list_data <- lapply(filenames, read.table)
#Name them as per your choice (df_1, df_2 etc)
names(list_data) <- paste('df', seq_along(filenames), sep = '_')
#Create objects in global environment.
list2env(list_data, .GlobalEnv)

How to save multiple files in loop with original filenames

I'm trying to import several SAS datafiles from a folder and then save them back into the folder as R dataframes with the same original SAS dataset name. Everything works except I can't figure out how to save the file with the original file name (i.e., I can't figure out the x in > save(xxx, file = ...).
The code I've tried is as follows:
path <- "path to folder with sas files"
list.files(pattern=".sas7bdat$")
list.filenames<-list.files(pattern=".sas7bdat$")
for (i in 1:length(list.filenames)){
assign(list.filenames[i], read_sas(list.filenames[i]))
filename <- paste(list.filenames[i])
save(list.filenames[i],file = paste0(path, paste(list.filenames[i], "Rdat", sep = ".")))
}
doesn't work...
for (i in 1:length(list.filenames)){
assign(list.filenames[i], read_sas(list.filenames[i]))
filename <- paste(list.filenames[i])
save(list.filenames[[i]],file = paste0(path, paste(list.filenames[i], "Rdat", sep = ".")))
}
doesn't work
for (i in 1:length(list.filenames)){
assign(list.filenames[i], read_sas(list.filenames[i]))
filename <- paste(list.filenames[i])
save(filename,file = paste0(path, paste(list.filenames[i], "Rdat", sep = ".")))
}
Any help on figuring out how to save the files with the original names from list.filenames[i]?
Use the "list" argument of save. Something like
path <- "path to folder with sas files"
list.filenames <- list.files(path, pattern="\\.sas7bdat$")
for (i in list.filenames) {
datName <- tools::file_path_sans_ext(i)
assign(datName, read_sas(i))
save(list=datName, file = paste0(path, paste(datName, "Rdat", sep = ".")))
}
would work. Also, I imagine you want pattern=".sas7bdat$" as pattern="\\.sas7bdat$, since "." is a wildcard in regex.

Apply a script by looping through multiple files in a directory and copy the changes back to every corresponding file

I want to apply the below script to every file in the Weather directory and copy the changes back to the same csv file (Bladen.csv in this case).
Bladen <- read.csv("C:/Users//Desktop/Weather/Bladen.csv",header=T, na.strings=c("","NA"))
Bladen <- Bladen[,c(1,6,11,17,18,19)]
I would try something like this:
setwd('/adress/to/the/path')
files <- dir()
for(i in files){
Bladen <- read.csv(i, header=T, na.strings=c("","NA"))
Bladen <- Bladen[,c(1,6,11,17,18,19)]
write.csv(Bladen, i)
}
Please tell me if it works for you.
If you are looking to update each file in your directory by adding the same column to each file and writing the file back to the same directory.
setwd(set_your_path)
filenames <- list.files()
lapply(filenames, function(i){
Bladen = read.csv(i, sep = ",", header = TRUE, na.strings = c("NA","N/A","null",""," "))
Bladen<- Bladen[, c(1,6,11,17,18,19)]
write.csv(Bladen, i, sep = ",")
})

Combine csv files with common file identifier

I have a list of approximately 500 csv files each with a filename that consists of a six-digit number followed by a year (ex. 123456_2015.csv). I would like to append all files together that have the same six-digit number. I tried to implement the code suggested in this question:
Import and rbind multiple csv files with common name in R but I want the appended data to be saved as new csv files in the same directory as the original files are currently saved. I have also tried to implement the below code however the csv files produced from this contain no data.
rm(list=ls())
filenames <- list.files(path = "C:/Users/smithma/Desktop/PM25_test")
NAPS_ID <- gsub('.+?\\([0-9]{5,6}?)\\_.+?$', '\\1', filenames)
Unique_NAPS_ID <- unique(NAPS_ID)
n <- length(Unique_NAPS_ID)
for(j in 1:n){
curr_NAPS_ID <- as.character(Unique_NAPS_ID[j])
NAPS_ID_pattern <- paste(".+?\\_(", curr_NAPS_ID,"+?)\\_.+?$", sep = "" )
NAPS_filenames <- list.files(path = "C:/Users/smithma/Desktop/PM25_test", pattern = NAPS_ID_pattern)
write.csv(do.call("rbind", lapply(NAPS_filenames, read.csv, header = TRUE)),file = paste("C:/Users/smithma/Desktop/PM25_test/MERGED", "MERGED_", Unique_NAPS_ID[j], ".csv", sep = ""), row.names=FALSE)
}
Any help would be greatly appreciated.
Because you're not doing any data manipulation, you don't need to treat the files like tabular data. You only need to copy the file contents.
filenames <- list.files("C:/Users/smithma/Desktop/PM25_test", full.names = TRUE)
NAPS_ID <- substr(basename(filenames), 1, 6)
Unique_NAPS_ID <- unique(NAPS_ID)
for(curr_NAPS_ID in Unique_NAPS_ID){
NAPS_filenames <- filenames[startsWith(basename(filenames), curr_NAPS_ID)]
output_file <- paste0(
"C:/Users/nwerth/Desktop/PM25_test/MERGED_", curr_NAPS_ID, ".csv"
)
for (fname in NAPS_filenames) {
line_text <- readLines(fname)
# Write the header from the first file
if (fname == NAPS_filenames[1]) {
cat(line_text[1], '\n', sep = '', file = output_file)
}
# Append every line in the file except the header
line_text <- line_text[-1]
cat(line_text, file = output_file, sep = '\n', append = TRUE)
}
}
My changes:
list.files(..., full.names = TRUE) is usually the best way to go.
Because the digits appear at the start of the filenames, I suggest substr. It's easier to get an idea of what's going on when skimming the code.
Instead of looping over the indices of a vector, loop over the values. It's more succinct and less likely to cause problems if the vector's empty.
startsWith and endsWith are relatively new functions, and they're great.
You only care about copying lines, so just use readLines to get them in and cat to get them out.
You might consider something like this:
##will take the first 6 characters of each file name
six.digit.filenames <- substr(filenames, 1,6)
path <- "C:/Users/smithma/Desktop/PM25_test/"
unique.numbers <- unique(six.digit.filenames)
for(j in unique.numbers){
sub <- filenames[which(substr(filenames,1,6) == j)]
data.for.output <- c()
for(file in sub){
##now do your stuff with these files including read them in
data <- read.csv(paste0(path,file))
data.for.output <- rbind(data.for.output,data)
}
write.csv(data.for.output,paste0(path,j, '.csv'), row.names = F)
}

R, Rscript, Works when variables hard coded, but not when passed as argument

I built the following R script to take a .csv generated by an automated report and split it into several .csv files.
This code works perfectly, and outputs a .csv file for each unique value of "facility" in "todays_data.csv":
disps <- read.csv("/Users/me/Downloads/todays_data.csv", header = TRUE, sep=",")
for (facility in levels(disps$Facility)) {
temp <- subset(disps, disps$Facility == facility & disps$Alert.End == "")
temp <- temp[order(temp$Unit, temp$Area),]
fn <- paste("/Users/me/Documents/information/", facility, "_todays_data.csv", sep = "")
write.csv(temp, fn, row.names=FALSE)
}
But this does not output anything:
args <- commandArgs(trailingOnly = TRUE)
file <- args[1]
disps <- read.csv(file, header = TRUE, sep=",")
for (facility in levels(disps$Facility)) {
temp <- subset(disps, disps$Facility == facility & disps$Alert.End == "")
temp <- temp[order(temp$Unit, temp$Area),]
fn <- paste("/Users/me/Documents/information/", facility, "_todays_data.csv", sep = "")
write.csv(temp, fn, row.names=FALSE)
}
The only difference between the two files is that the first hardcodes the path to the .csv file to be split, while the second one has it passed as an argument in the command line using Rscript.
The read.csv() command works with the passed file path, because I can successfully run commands like head(disps) while running the script via Rscript.
Nothing within the for-loop will execute when run via Rscript, but things before and after it will.
Does anyone have any clues as to what I've missed? Thank you.

Resources