Looping through files in R and applying a function - r

I'm not a very experienced R user. I need to loop through a folder of csv files and apply a function to each one. Then I would like to take the value I get for each one and have R dump them into a new column called "stratindex", which will be in one new csv file.
Here's the function applied to a single file
ctd=read.csv(file.choose(), header=T)
stratindex=function(x){
x=ctd$Density..sigma.t..kg.m.3..
(x[30]-x[1])/29
}
Then I can spit out one value with
stratindex(Density..sigma.t..kg.m.3..)
I tried formatting another file loop someone made on this board. That link is here:
Looping through files in R
Here's my go at putting it together
out.file <- 'strat.csv'
for (i in list.files()) {
tmp.file <- read.table(i, header=TRUE)
tmp.strat <- function(x)
x=tmp.file(Density..sigma.t..kg.m.3..)
(x[30]-x[1])/29
write(paste0(i, "," tmp.strat), out.file, append=TRUE)
}
What have I done wrong/what is a better approach?

It's easier if you read the file in the function
stratindex <- function(file){
ctd <- read.csv(file)
x <- ctd$Density..sigma.t..kg.m.3..
(x[30] - x[1]) / 29
}
Then apply the function to a vector of filenames
the.files <- list.files()
index <- sapply(the.files, stratindex)
output <- data.frame(File = the.files, StratIndex = index)
write.csv(output)

Related

Apply a function to a list of csv files

I have 45 csv files in a folder called myFolder. Each csv file has 13 columns and 640 rows.
I want to read each csv and divide the columns 7:12 by 10 and save it in a new folder called 'my folder'. Here's my appraoch which
is using the simple for loop.
library(data.table)
dir.create('newFolder')
allFiles <- list.files(file.path('myFolder'), pattern = '.csv')
for(a in seq_along(allFiles)){
fileRef <- allFiles[a]
temp <- fread(file.path('myFolder', fileRef)
temp[, 7:12] <- temp[, 7:12]/10
fwrite(temp, file.path('myFolder', paste0('new_',fileRef)))
}
Is there a more simple solution in a line or two using datatable and apply function to achieve this?
Your code is already pretty good but these improvements could be made:
define the input and output folders up front for modularity
use full.names = TRUE so that allFiles contains complete paths
use .csv$ as the pattern to anchor it to the end of the filename
iterate over the full names rather than an index
use basename in fwrite to extract out the base name from the path name
The code is then
library(data.table)
myFolder <- "myFolder"
newFolder <- "newFolder"
dir.create(newFolder)
allFiles <- list.files(myFolder, pattern = '.csv$', full.names = TRUE)
for(f in allFiles) {
temp <- fread(f)
temp[, 7:12] <- temp[, 7:12] / 10
fwrite(temp, file.path(newFolder, paste0('new_', basename(f))))
}
You can use purrr::walk if you want to improve readability of your code and get rid of the loop:
allFiles <- list.files(file.path('myFolder'), pattern = '.csv')
purrr::walk(allFiles, function(x){
temp <- fread(file.path('myFolder', x)
temp[, 7:12] <- temp[, 7:12]/10
fwrite(temp, file.path('myFolder', paste0('new_',fileRef)))
})
From the reference page of purrr::walk:
walk() returns the input .x (invisibly)
I don't think it helps speed-wise, though.

How to 'read.csv' many files in a folder using R?

How can I read many CSV files and make each of them into data tables?
I have files of 'A1.csv' 'A2.csv' 'A3.csv'...... in Folder 'A'
So I tried this.
link <- c("C:/A")
filename<-list.files(link)
listA <- c()
for(x in filename) {
temp <- read.csv(paste0(link , x), header=FALSE)
listA <- list(unlist(listA, recursive=FALSE), temp)
}
And it doesn't work well. How can I do this job?
Write a regex to match the filenames
reg_expression <- "A[0-9]+"
files <- grep(reg_expression, list.files(directory), value = TRUE)
and then run the same loop but use assign to dynamically name the dataframes if you want
for(file in files){
assign(paste0(file, "_df"),read.csv(file))
}
But in general introducing unknown variables into the scope is bad practice so it might be best to do a loop like
dfs <- list()
for(index in 1:length(files)){
file <- files[index]
dfs[index] <- read.csv(file)
}
Unless each file is a completely different structure (i.e., different columns ... the number of rows does not matter), you can consider a more efficient approach of reading the files in using lapply and storing them in a list. One of the benefits is that whatever you do to one frame can be immediately done to all of them very easily using lapply.
files <- list.files(link, full.names = TRUE, pattern = "csv$")
list_of_frames <- lapply(files, read.csv)
# optional
names(list_of_frames) <- files # or basename(files), if filenames are unique
Something like sapply(list_of_frames, nrow) will tell you how many rows are in each frame. If you have something more complex,
new_list_of_frames <- lapply(list_of_frames, function(x) {
# do something with 'x', a single frame
})
The most immediate problem is that when pasting your file path together, you need a path separator. When composing file paths, it's best to use the function file.path as it will attempt to determine what the path separator is for operating system the code is running on. So you want to use:
read.csv(files.path(link , x), header=FALSE)
Better yet, just have the full path returned when listing out the files (and can filter for .csv):
filename <- list.files(link, full.names = TRUE, pattern = "csv$")
Combining with the idea to use assign to dynamically create the variables:
link <- c("C:/A")
files <-list.files(link, full.names = TRUE, pattern = "csv$")
for(file in files){
assign(paste0(basename(file), "_df"), read.csv(file))
}

Load CSV's by Matching names and fetch specific Cols by maching Tag Names

I have two data frames having "TagNames" and "FileNames" and I have CSV files in a directory. I need to open csv files one by one using "FileNames" then fetch columns from CSV file by matching "TagNames", append them to a "result" data frame and move to next CSV file (repeat).
Note: I also have to take care of date and time because records coming from different files must be place according to date and time.
TagNames and File Names are as follows: Tag Names and File Names
Files Directory and Data Looks Like This: Files Directory and Data Shape in CSV
My R Script is this:
basepath <- dirname(rstudioapi::getActiveDocumentContext()$path)
# Load the Data
basepath <- dirname(rstudioapi::getActiveDocumentContext()$path)
FilesDF <- read.csv("Config/Files.csv")
TagsDF <- read.csv("Config/Tags.csv")
FilesList <- list(FilesDF)
TagsList <- list(TagsDF)
extractData <- function(x) {
result <- NULL;
temp <- NULL;
for (i in 1:nrow(x)) {
new_df <- read.csv(file=x$FileNames[i,], header=TRUE, sep=",")
for(j in q:ncol(new_df))
{
temp <- rbind(temp, new_df[which(new_df[1,j])==TagsList$Tag.Names[i,]])
}
result <- rbind(result, temp)
temp <- NULL
}
return(result)
}
df_combined <- lapply(FilesList, extractData)
write.csv(df_combined, file = "UreaSVR2.csv")
In base R would use something like:
rbind(lapply(lapply(fileList, read.csv), subset, select = TagsList))
The inner lapply() reads in all of the files in the list, the outer one subsets the data and uses the select argument which takes in a vector of column names. Finally, rbind puts the list together into a single data.frame.
I would probably using purrr and dplyr myself though I write it more like this:
map(fileList, read.csv) %>%
map_df(select, TagNames)

Specifying consecutive file names and assigning consecutive vectors with counter variable in for loops

I am trying to analyze 10 sets of data, for which I have to import the data, remove some values and plot histograms. I could do it individually but can naturally save a lot of time with a for loop. I know this code is not correct, but I have no idea of how to specify the name for the input files and how to name each iterated variable in R.
par(mfrow = c(10,1))
for (i in 1:10)
{
freqi <- read.delim("freqspeci.frq", sep="\t", row.names=NULL)
freqveci <- freqi$N_CHR
freqveci <- freqveci[freqveci != 0 & freqveci != 1]
hist(freqveci)
}
What I want to do is to have the counter number in every "i" in my code. Am I just approaching this the wrong way in R? I have read about the assign and paste functions, but honestly do not understand how I can apply them properly in this particular problem.
you can do if in several ways:
Use list.files() to get all files given directory. You can use regular expression as well. See here
If the names are consecutive, then you can use
for (i in 1:10)
{
filename <- sprintf("freqspeci.frq_%s",i)
freqi <- read.delim(filename, sep="\t", row.names=NULL)
freqveci <- freqi$N_CHR
freqveci <- freqveci[freqveci != 0 & freqveci != 1]
hist(freqveci)
}
Use also can use paste() to create file name.
paste("filename", 1:10, sep='_')
you could just save all your datafiles into an otherwise empty Folder. Then get the filenames like:
filenames <- dir()
for (i in 1:length(filenames)){
freqi <- read.delim("freqspeci.frq", sep="\t", row.names=NULL)
# and here whatever else you want to do on These files
}

R: Saving Output as xlsx in for loop

Using (openxlsx) package to write xlsx files.
I have a variable that is a vector of numbers
x <- 1:8
I then paste ".xlsx" to the end of each element of x to later create an xlsx file
new_x <- paste(x,".xlsx", sep = "")
I then write.xlsx using the ("openxlsx") package in a forloop to create new xlsx files
for (i in x) {
for (j in new_x) {
write.xlsx(i,j)
}}
When I open ("1.xlsx" - "8.xlsx"), all the files only have the number "8" on them. What I don't understand is why it doesn't have the number 1 for 1.xlsx - 7 for 7.xlsx, why does the 8th one overwrite everything else.
I even tried creating a new output for the dataframes as most others suggested
for (i in x) {
for (j in new_x) {
output[[i]] <- i
write.xlsx(output[[i]],j)
}}
And it still comes up with the same problem. I don't understand what is going wrong.
The problem is that you are creating each Excel file multiple times because you have nested loops. Try just using a single loop, and referring to an element of new_x.
x <- 1:8
new_x <- paste(x,".xlsx", sep = "")
for (i in seq_along(x)) {
write.xlsx(i,new_x[i])
}
if you want to read a number of .csv files and save them as xlsx files it is a similar approach, you still want to only have a single for loop such as:
# Define directory of where to look for csv files and where to save Excel files
csvDirectory <- "C:/Foo/Bar/"
ExcelDirectory <- paste0(Sys.getenv(c("USERPROFILE")),"\\Desktop")
# Find all the csv files of interest
csvFiles <- list.files(csvDirectory,"*.csv")
# Go through the list of files and for each one read it into R, and then save it as Excel
for (i in seq_along(csvFiles)) {
csvFile <- read.csv(paste0(csvDirectory,"/",csvFiles[i]))
write.xlsx(csvFile, paste0(ExcelDirectory,"/",gsub("\\.csv$","\\.xlsx",csvFiles[i])))
}

Resources