Loading files from multiple folders in r and processing

Loading files from multiple folders in r and processing - r

I have multiple csv files in different folders i.e. Main folder contains week 1 and week 2 folder. Week1 in turns contains file1.csv and week2 contains file2.csv. All files have same column name. There are 100's of such files in different directories
file1 <- data.frame(name = c("Bill","Tom"),Age = c(23,45))
file2 <- data.frame(name = c("Harry","John"),Age = c(34,56))
How can I load them and do a rbind in r and get them in a final data frame
I got some clue here: How can I read multiple files from multiple directories into R for processing?
what I did is slight modification to the function to do row bind as follows but nowhere near to what I want
# Creating a function to process each file
empty_df <- data.frame()
processFile <- function(f) {
df <- read.csv(f)
rbind(empty_df,df)
}
# Find all .csv files
files <- dir("/foo/bar/", recursive=TRUE, full.names=TRUE, pattern="\\.csv$")
# Apply the function to all files.
result <- sapply(files, processFile)
Any help is greatly appreciated!

I'd have tried to do something with a for loop on my side such as
temp = read.csv('week1/file1.csv')
for(i in 2:n){ #n being the number of weeks you have
temp= rbind(temp, read.csv(paste('week',i,'/file',i,'.csv', sep='')))
}
I hope it helped

Related

Loop function for reading csv files and store them in a list

I have a folder in which are stored approximately 10 subfolders, containing 4 .csv files each! Each subfolder corresponds to a weather station, and each file in each subfolder contain temperature data for different period
(e.g. station134_2000_2005.csv,station134_2006_2011.csv,station134_2012_2018.csv etc.) .
I wrote a loop for opening each folder, and rbind all data in one data frame but it is not very handy to do my work.
I need to create a loop so that those 4 files from each subfolder, rbined together to a dataframe, and then stored in a different "slot" in a list, or if it's easier,each station rbined csv data (namely each subfolder) to be exported from the loop as dataframe.
The code I wrote, which opens all files in all folders and create a big (rbined) data frame is:
directory <- list.files() # to have the names of each subfolder
stations <- data.frame() # to store all the rbined csv files
library(plyr)
for(i in directory){
periexomena <- list.files(i,full.names = T, pattern = "\\.csv$")
for(f in periexomena){
data_files <- read.csv(f, stringsAsFactors = F, sep = ";", dec = ",")
stations <- rbind.fill(data_files,stations)
}
Does anyone knows how can I have a list with each subfolder's rbined 4 csv files data in different slot, or how can I modify the abovementioned code in order to export in different data frame, the data from each subfolder?

Try:
slotted <- lapply(setNames(nm = directory), function(D) {
alldat <- lapply(list.files(D, pattern="\\.csv$", full.names=TRUE),
function(fn) {
message(fn)
read.csv2(fn, stringsAsFactors=FALSE)
})
# stringsAsFactors=F should be the default as of R-3.6, I believe
do.call(rbind.fill, alldat)
})

Save multiple files with multiple names with R

I have three files (T1.dnd, T2.dnd, T3.dnd) in which in all of them i have to substitude a specific row with another one with mgsub function. After doing that i have to save these files separately in a new folder (Output) with their respective names (T1.dnd, T2.dnd, T3.dnd) with a for loop. However the loop save only the last file (T3.dnd) and not all three (so T1.dnd and T2.dnd are missing).
How can i save all the three files with their names?
Any help will be greatly appreciated.
library(mgsub)
setwd("C:/Users/feder/Project/test")
treat <- as.list(c("C:/Users/feder/Project/test/T1.dnd",
"C:/Users/feder/Project/test/T2.dnd",
"C:/Users/feder/Project/test/T3.dnd"))
step <- list()
step2 <- list()
step <- lapply(treat, readLines)
step2 <- lapply(step, mgsub, c("______Leaf_fraction__________0.3100"),
c("______Leaf_fraction__________0.1500"))
files <- list.files("C:/Users/feder/Project/test")
names_files <- as.list(files)
savewd <- c("C:/Users/feder/Project/Output")
for (i in length(step2)){
writeLines(step2[[i]], paste(savewd, names_files[[i]], sep = "/"))
}

R: selectively importing data from several csv files into single data frame while also changing data from rows to individual columns

I’m looking to do the following in R.
I have 250+ csv files of chromatographic data structured similarly to the example below, but with 21 rows instead of three:
1 4.708252 BB 9.946890 7.830349 0.01982016 4.684836 4.742056
2 4.970352 BB 1.792341 1.497008 0.01896829 4.945352 5.005390
3 6.393414 BB 6.599891 5.309925 0.01950091 6.368413 6.428723
What I want to do is read a subset of the data in all 250 files into a single data frame, which is easy enough — but I also need to restructure it a fair bit.
Every row in the table above is a peak. I only want the data from the first and fourth columns (which are ‘peak number’ and ‘area under the peak’, respectively), and in the output I need to make each peak an individual column, rather than a row as above, with the peak number as the header. Finally, I want to create a new column where each row (that is, the data from each individual csv file) is given the same name as the csv file name.
So, imagine I have 3 files: ABC1.csv, ABC2.csv, and ABC3.csv. Each file looks like my example above. I want to automatically take all those files and merge them into a single data frame such as the one below.
ID 1 2 3
ABC1 9.94689 1.792341 6.599891
ABC2 9.76651 1.932332 6.600022
ABC3 8.99193 2.556471 6.718934
I hope I’ve made this clear enough. I’ve been able to manage most of the steps but haven’t been successful writing them into a single script. And I have no idea how, if there is any way, to make the file name into a variable.
Cheers

I am assuming the working directory is set to where the files are. Then you can get the list of files below.
filenames <- list.files()
Have a helper function to read a file and keep just columns 1 and 4.
readdata <- function(filename) {
df <- read.csv(filename)
vec <- df[, 4]
names(vec) <- df[, 1]
return(vec)
}
Loop over all of the files and rbind them
result <- do.call(rbind, lapply(filenames, readdata))
Name them as you like
row.names(result) <- filenames

this following code can probably be of some help, though the file name is still not working properly -
path <- "C:\\Users\\Vidyut\\"
filenames <- list.files(path = path,pattern = ".csv")
l <- data.frame(ID=character(),col1=numeric(),col2=numeric(),col3=numeric(),stringsAsFactors=FALSE)
for (i in filenames) {
#i = filenames[1]
full = paste(path,i,sep="")
m <- read.csv(full, header=F)
# extract the subset of rows required from each file
# m <- m[c(),]
n<- m[,c(1,4)]
y <- gsub('.csv','',i)
print("y=")
print(y)
d <- list(ID=as.character(y),col1=n[1,2],col2=n[2,2],col3=n[3,2])
print("d=")
print(d)
l <- rbind.data.frame(l,d)
print("l=")
print(l)
}
Mind you, this is not very pretty code - just something hacked together to get the job done (visible from the multiple print lines scattered across).

Here's a solution for you. This only works if we can assume that there are exactly 21 peaks in each file and they are in order 1:21. If that's not the case a few changes to the code should remedy this.
folder = "c:/temp/"
files <- dir(folder)
first_loop <- TRUE
for (file in files) {
# Read one file, only the first and fourth columns
temp <- read.csv(file=paste0(folder,file),
header = FALSE,
colClasses = c("integer", "NULL", "NULL", "numeric", "NULL", "NULL", "NULL", "NULL"))
# Transpose the data
temp <- data.frame(t(temp))
# Remove the peak number
temp <- temp[2,]
# Concatenate the dataframes together
temp$file <- file
if (first_loop) {
data <- temp
first_loop <- FALSE
} else {
data <- rbind(data, temp)
}
}
data

R - Dynamic reference to files for read csv

I would like to make a script that reads data from the correct folder. I have several lines in my code refering to the foldername, therefore I would like to make this dynamic. Is it possible to make the reference to a folder name dynamic? See below what I would like to do
# Clarifies the name of the folder, afterwards "Foldername" will be used as reference
FolderA <- Foldername
# Read csv to import the data from the selected location
data1 <- read.csv(file="c:/R/Foldername/datafile1.csv", header=TRUE, sep=",")
data2 <- read.csv(file="c:/R/Foldername/datafile2.csv", header=TRUE, sep=",")
I am trying to get the same result as what I would get with this code:
data1 <- read.csv(file="c:/R/FolderA/datafile1.csv", header=TRUE, sep=",")
data2 <- read.csv(file="c:/R/FolderA/datafile2.csv", header=TRUE, sep=",")
Can somebody please clarify how it would be possible to make this dynamic?

You could use paste0 for this:
FolderA <- "Foldername"
paste0("c:/R/", FolderA, "/datafile1.csv")
#[1] "c:/R/Foldername/datafile1.csv"
So in your case:
data1 <- read.csv(file=paste0("c:/R/", FolderA, "/datafile1.csv"), header=TRUE, sep=",")

A slight generalization of #LyzandeR's answer,
make_files <- function(directory, filenames) {
sprintf("C:/R/%s/%s", directory, filenames)
}
##
Files <- sprintf("file%i.csv", 1:3)
##
make_files("FolderA", Files)
#[1] "C:/R/FolderA/file1.csv" "C:/R/FolderA/file2.csv" "C:/R/FolderA/file3.csv"

you could also try the following method. The loop will create a list with output file, but if your files all have the same column names you could just rbind them together (method 2). This method will allow you to specify your folder, then use the list.files function to extract all files with extension ".csv". This way if you have many csv files in a folder you won't have to write them all out individually.
# Specify working directory or location of files:
FolderA = "c:/R/Foldername"
# identify all files with specific extension:
files = list.files(FolderA,pattern="*.csv")
Method 1 - Separate by lists
data = NULL
for(i in 1:length(files)){
data[[i]] = read.csv(files[i],header=F,stringsAsFactors=F)
}
Method 2 - single dataframe
data = NULL
for(i in 1:length(files)){
df = read.csv(files[i],header=F,stringsAsFactors=F)
data = rbind(data,df)
}

How to apply a function to every possible pairwise combination of files stored in a common directory

I have a directory containing a large number of csv files. I would like to load the data into R and apply a function to every possible pair combination of csv files in the directory, then write the output to file.
The function that I would like to apply is matchpt() from the biobase library which compares locations between two data frames.
Here is an example of what I would like to do (although I have many more files than this):
Three files in directory: A, B and C
Perform matchpt on each pairwise combination:
nn1 = matchpt(A,B)
nn2 = matchpt(A,C)
nn3 = matchpt(B,C)
Write nn1, nn2 and nn3 to csv file.
I have not been able to find any solutions for this yet and would appreciate any suggestions. I am really not sure where to go from here but I am assuming that some sort of nested for loop is required to somehow cycle sequentially through all pairwise combinations of files. Below is a beginning at something but this only compares the first file with all the others in the directory so does not work!
library("Biobase")
# create two lists of identical filenames stored in the directory:
filenames1 = list.files(path=dir, pattern="csv$", full.names=FALSE, recursive=FALSE)
filenames2 = list.files(path=dir, pattern="csv$", full.names=FALSE, recursive=FALSE)
for(i in 1:length(filenames2)){
# load the first data frame in list 1
df1 <- lapply(filenames1[1], read.csv, header=TRUE, stringsAsFactors=FALSE)
df1 <- data.frame(df1)
# load a second data frame from list 2
df2 <- lapply(filenames2[i], read.csv, header=TRUE, stringsAsFactors=FALSE)
df2 <- data.frame(df2)
# isolate the relevant columns from within the two data frames
dat1 <- as.matrix(df1[, c("lat", "long")])
dat2 <- as.matrix(df2[, c("lat", "long")])
# run the matchpt function on the two data frames
nn <- matchpt(dat1, dat2)
#Extract the unique id code in the two filenames (for naming the output file)
file1 = filenames1[1]
code1 = strsplit(file1,"_")[[1]][1]
file2 = filenames2[i]
code2 = strsplit(file2,"_")[[1]][1]
outname = paste(code1, code2, sep=”_”)
outfile = paste(code, "_nn.csv", sep="")
write.csv(nn, file=outname, row.names=FALSE)
}
Any suggestions on how to solve this problem would be greatly appreciated. Many thanks!

You could do something like:
out <- combn( list.files(), 2, FUN=matchpt )
write.table( do.call( rbind, out ), file='output.csv', sep=',' )
This assumes that matchpt is expecting 2 strings with the names of the files and that the result is the same structure each time so that the rbinding makes sense.
You could also write your own function to pass to combn that takes the 2 file names, runs matchpt and then appends the results to the csv file. Remember that if you pass an open filehandle to write.table then it will append to the file instead of overwriting what is there.

Try this example:
#dummy filenames
filenames <- paste0("file_",1:5,".txt")
#loop through unique combination
for(i in 1:(length(filenames)-1))
for(j in (i+1):length(filenames))
{
flush.console()
print(paste("i=",i,"j=",j,"|","file1=",filenames[i],"file2=",filenames[j]))
}

In response to my question I seem to have found a solution. The below uses a for loop to perform every pairwise combination of files in a common directory (this seems to work and gives EVERY combination of files i.e. A & B and B & A):
# create a list of filenames
filenames = list.files(path=dir, pattern="csv$", full.names=FALSE, recursive=FALSE)
# For loop to compare the files
for(i in 1:length(filenames)){
# load the first data frame in the list
df1 = lapply(filenames[i], read.csv, header=TRUE, stringsAsFactors=FALSE)
df1 = data.frame(df1)
file1 = filenames[i]
code1 = strsplit(file1,"_")[[1]][1] # extract unique id code of file (in case where the id comes before an underscore)
# isolate the columns of interest within the first data frame
d1 <- as.matrix(df1[, c("lat_UTM", "long_UTM")])
# load the comparison file
for (j in 1:length(filenames)){
# load the second data frame in the list
df2 = lapply(filenames[j], read.csv, header=TRUE, stringsAsFactors=FALSE)
df2 = data.frame(df2)
file2 = filenames[j]
code2 = strsplit(file2,"_")[[1]][1] # extract uniqe id code of file 2
# isolate the columns of interest within the second data frame
d2 <- as.matrix(df2[, c("lat_UTM", "long_UTM")])
# run the comparison function on the two data frames (in this case matchpt)
out <- matchpt(d1, d2)
# Merge the unique id code in the two filenames (for naming the output file)
outname = paste(code1, code2, sep="_")
outfile = paste(outname, "_out.csv", sep="")
# write the result to file
write.csv(out, file=outfile, row.names=FALSE)
}
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex