Looping over all files in a folder in R over multiple loops - r

I have written code that runs sampling functions on a csv files for a biome.
I have 30 csv files that I want to loops these for loops over. I am struggling with applying this code I've written easily across all files in my folder. I'm sure this is an easy fix, I'm just finding issue with the loop inside of a loop.
temp <- read.csv("tropical_grassland_N_Am_point_summary_table_gdrive.csv")
temp <- temp[which(temp$nd > 0.1),]
index <- substr(temp[,5],30,(nchar(as.character(temp[,5]))-2))
unique_index <- unique(index)
unique_index <- sample(unique_index, 20, replace=F)
for (i in 1:length(unique_index)){
temp2 <- temp[which(index==unique_index[i]),]
theDates = strptime(temp2[,2], format="%Y-%m-%d")
}
thresh <- 0.95
for (i in 1:length(unique_index)){
temp2 <- temp[which(index==unique_index[i]),]
theDates = strptime(temp2[,2], format="%Y-%m-%d")
}
All of the csv files have _point_summary_table_gdrive.csv in common after the description of the continent and biome.

Are they in the same folder without other files ? if so, if your code works, you don't need to know the file names. just make a function of it and then
filenames <- list.files(path = whatyouwant, pattern = "*\\.csv$", all.files = TRUE,full.names = TRUE, recursive = TRUE)
files <- lapply(filenames,read.csv)
lapply(files,yourfunction)

Related

Apply a function to a list of csv files

I have 45 csv files in a folder called myFolder. Each csv file has 13 columns and 640 rows.
I want to read each csv and divide the columns 7:12 by 10 and save it in a new folder called 'my folder'. Here's my appraoch which
is using the simple for loop.
library(data.table)
dir.create('newFolder')
allFiles <- list.files(file.path('myFolder'), pattern = '.csv')
for(a in seq_along(allFiles)){
fileRef <- allFiles[a]
temp <- fread(file.path('myFolder', fileRef)
temp[, 7:12] <- temp[, 7:12]/10
fwrite(temp, file.path('myFolder', paste0('new_',fileRef)))
}
Is there a more simple solution in a line or two using datatable and apply function to achieve this?
Your code is already pretty good but these improvements could be made:
define the input and output folders up front for modularity
use full.names = TRUE so that allFiles contains complete paths
use .csv$ as the pattern to anchor it to the end of the filename
iterate over the full names rather than an index
use basename in fwrite to extract out the base name from the path name
The code is then
library(data.table)
myFolder <- "myFolder"
newFolder <- "newFolder"
dir.create(newFolder)
allFiles <- list.files(myFolder, pattern = '.csv$', full.names = TRUE)
for(f in allFiles) {
temp <- fread(f)
temp[, 7:12] <- temp[, 7:12] / 10
fwrite(temp, file.path(newFolder, paste0('new_', basename(f))))
}
You can use purrr::walk if you want to improve readability of your code and get rid of the loop:
allFiles <- list.files(file.path('myFolder'), pattern = '.csv')
purrr::walk(allFiles, function(x){
temp <- fread(file.path('myFolder', x)
temp[, 7:12] <- temp[, 7:12]/10
fwrite(temp, file.path('myFolder', paste0('new_',fileRef)))
})
From the reference page of purrr::walk:
walk() returns the input .x (invisibly)
I don't think it helps speed-wise, though.

How to 'read.csv' many files in a folder using R?

How can I read many CSV files and make each of them into data tables?
I have files of 'A1.csv' 'A2.csv' 'A3.csv'...... in Folder 'A'
So I tried this.
link <- c("C:/A")
filename<-list.files(link)
listA <- c()
for(x in filename) {
temp <- read.csv(paste0(link , x), header=FALSE)
listA <- list(unlist(listA, recursive=FALSE), temp)
}
And it doesn't work well. How can I do this job?
Write a regex to match the filenames
reg_expression <- "A[0-9]+"
files <- grep(reg_expression, list.files(directory), value = TRUE)
and then run the same loop but use assign to dynamically name the dataframes if you want
for(file in files){
assign(paste0(file, "_df"),read.csv(file))
}
But in general introducing unknown variables into the scope is bad practice so it might be best to do a loop like
dfs <- list()
for(index in 1:length(files)){
file <- files[index]
dfs[index] <- read.csv(file)
}
Unless each file is a completely different structure (i.e., different columns ... the number of rows does not matter), you can consider a more efficient approach of reading the files in using lapply and storing them in a list. One of the benefits is that whatever you do to one frame can be immediately done to all of them very easily using lapply.
files <- list.files(link, full.names = TRUE, pattern = "csv$")
list_of_frames <- lapply(files, read.csv)
# optional
names(list_of_frames) <- files # or basename(files), if filenames are unique
Something like sapply(list_of_frames, nrow) will tell you how many rows are in each frame. If you have something more complex,
new_list_of_frames <- lapply(list_of_frames, function(x) {
# do something with 'x', a single frame
})
The most immediate problem is that when pasting your file path together, you need a path separator. When composing file paths, it's best to use the function file.path as it will attempt to determine what the path separator is for operating system the code is running on. So you want to use:
read.csv(files.path(link , x), header=FALSE)
Better yet, just have the full path returned when listing out the files (and can filter for .csv):
filename <- list.files(link, full.names = TRUE, pattern = "csv$")
Combining with the idea to use assign to dynamically create the variables:
link <- c("C:/A")
files <-list.files(link, full.names = TRUE, pattern = "csv$")
for(file in files){
assign(paste0(basename(file), "_df"), read.csv(file))
}

Combining .txt files in R using a loop

I'm currently trying to use R to combine dozens of .txt files into one single .txt file. Attached below is the code that I've been experimenting with so far. The files that I'm trying to combine have very similar names, for example: "e20171ny0001000.txt" and "e20171ct0001000.txt". As you can see, the only difference in the file names are the different state abbreviations. This is why I've been trying to use a for loop, in order to try to go through all the state abbreviations.
setwd("/Users/tim/Downloads/All_Geographies")
statelist = c('ak','al','ar','az','ca','co','ct','dc','de','fl','ga','hi','ia','id','il','in','ks','ky','la','ma','md','me','mi','mn','mo','ms','mt','nc','nd','ne','nh','nj','nm','nv','ny','oh','ok','or','pa','ri','sc','sd','tn','tx','ut','va','vt','wa','wi','wv','wy')
for (i in statelist){
file_names <- list.files(getwd())
file_names <- file_names[grepl(paste0("e20171", i, "0001000.txt"),file_names)]
files <- lapply(file_names, read.csv, header=F, stringsAsFactors = F)
files <- do.call(rbind,files)
}
write.table(files, file = "RandomFile.txt", sep="\t")
When I run the code, there isn't a specific error that pops up. Instead the entire code runs and nothing happens. I feel like my code is missing something that is preventing it from running correctly.
We need to create a list to update. In the OP's code,files is a list of data.frame that gets updated in the for loop. Instead, the output needss to be stored in a list. For this, we can create a list of NULL 'out' and then assign the output to each element of 'out'
out <- vector('list', length(statelist))
for (i in seq_along(statelist)){
file_names <- list.files(getwd())
file_names <- file_names[grepl(paste0("e20171", statelist[i],
"0001000.txt"),file_names)]
files <- lapply(file_names, read.csv, header=FALSE, stringsAsFactors = FALSE)
out[[i]] <- do.call(rbind, files)
}
As out is a list of data.frame, we need to loop over the list and then write it back to file
newfilenames <- paste0(statelist, "_new", ".txt")
lapply(seq_along(out), function(i) write.table(out[[i]],
file = newfilenames[i], quote = FALSE, row.names = FALSE))

Combine csv files with common file identifier

I have a list of approximately 500 csv files each with a filename that consists of a six-digit number followed by a year (ex. 123456_2015.csv). I would like to append all files together that have the same six-digit number. I tried to implement the code suggested in this question:
Import and rbind multiple csv files with common name in R but I want the appended data to be saved as new csv files in the same directory as the original files are currently saved. I have also tried to implement the below code however the csv files produced from this contain no data.
rm(list=ls())
filenames <- list.files(path = "C:/Users/smithma/Desktop/PM25_test")
NAPS_ID <- gsub('.+?\\([0-9]{5,6}?)\\_.+?$', '\\1', filenames)
Unique_NAPS_ID <- unique(NAPS_ID)
n <- length(Unique_NAPS_ID)
for(j in 1:n){
curr_NAPS_ID <- as.character(Unique_NAPS_ID[j])
NAPS_ID_pattern <- paste(".+?\\_(", curr_NAPS_ID,"+?)\\_.+?$", sep = "" )
NAPS_filenames <- list.files(path = "C:/Users/smithma/Desktop/PM25_test", pattern = NAPS_ID_pattern)
write.csv(do.call("rbind", lapply(NAPS_filenames, read.csv, header = TRUE)),file = paste("C:/Users/smithma/Desktop/PM25_test/MERGED", "MERGED_", Unique_NAPS_ID[j], ".csv", sep = ""), row.names=FALSE)
}
Any help would be greatly appreciated.
Because you're not doing any data manipulation, you don't need to treat the files like tabular data. You only need to copy the file contents.
filenames <- list.files("C:/Users/smithma/Desktop/PM25_test", full.names = TRUE)
NAPS_ID <- substr(basename(filenames), 1, 6)
Unique_NAPS_ID <- unique(NAPS_ID)
for(curr_NAPS_ID in Unique_NAPS_ID){
NAPS_filenames <- filenames[startsWith(basename(filenames), curr_NAPS_ID)]
output_file <- paste0(
"C:/Users/nwerth/Desktop/PM25_test/MERGED_", curr_NAPS_ID, ".csv"
)
for (fname in NAPS_filenames) {
line_text <- readLines(fname)
# Write the header from the first file
if (fname == NAPS_filenames[1]) {
cat(line_text[1], '\n', sep = '', file = output_file)
}
# Append every line in the file except the header
line_text <- line_text[-1]
cat(line_text, file = output_file, sep = '\n', append = TRUE)
}
}
My changes:
list.files(..., full.names = TRUE) is usually the best way to go.
Because the digits appear at the start of the filenames, I suggest substr. It's easier to get an idea of what's going on when skimming the code.
Instead of looping over the indices of a vector, loop over the values. It's more succinct and less likely to cause problems if the vector's empty.
startsWith and endsWith are relatively new functions, and they're great.
You only care about copying lines, so just use readLines to get them in and cat to get them out.
You might consider something like this:
##will take the first 6 characters of each file name
six.digit.filenames <- substr(filenames, 1,6)
path <- "C:/Users/smithma/Desktop/PM25_test/"
unique.numbers <- unique(six.digit.filenames)
for(j in unique.numbers){
sub <- filenames[which(substr(filenames,1,6) == j)]
data.for.output <- c()
for(file in sub){
##now do your stuff with these files including read them in
data <- read.csv(paste0(path,file))
data.for.output <- rbind(data.for.output,data)
}
write.csv(data.for.output,paste0(path,j, '.csv'), row.names = F)
}

Looping through files in R and applying a function

I'm not a very experienced R user. I need to loop through a folder of csv files and apply a function to each one. Then I would like to take the value I get for each one and have R dump them into a new column called "stratindex", which will be in one new csv file.
Here's the function applied to a single file
ctd=read.csv(file.choose(), header=T)
stratindex=function(x){
x=ctd$Density..sigma.t..kg.m.3..
(x[30]-x[1])/29
}
Then I can spit out one value with
stratindex(Density..sigma.t..kg.m.3..)
I tried formatting another file loop someone made on this board. That link is here:
Looping through files in R
Here's my go at putting it together
out.file <- 'strat.csv'
for (i in list.files()) {
tmp.file <- read.table(i, header=TRUE)
tmp.strat <- function(x)
x=tmp.file(Density..sigma.t..kg.m.3..)
(x[30]-x[1])/29
write(paste0(i, "," tmp.strat), out.file, append=TRUE)
}
What have I done wrong/what is a better approach?
It's easier if you read the file in the function
stratindex <- function(file){
ctd <- read.csv(file)
x <- ctd$Density..sigma.t..kg.m.3..
(x[30] - x[1]) / 29
}
Then apply the function to a vector of filenames
the.files <- list.files()
index <- sapply(the.files, stratindex)
output <- data.frame(File = the.files, StratIndex = index)
write.csv(output)

Resources