Merge multiple .txt files from multiple directories in R - r

Merge .txt files from different sub directories
I have a folder that is filled with sub folders of past dates (01_14 for example), inside each date folder there are 11 files named 01.txt, 02.txt... How can I merge all the .txt files into one data frame, with one column with the name of the folder from where it came from and a column with the name of file from where it came from?
My hierarchy would look something like this:
\Data
\01_14
01.txt
02.txt
...
11.txt
\02_14
01.txt
02.txt
...
11.txt
\03_14
01.txt
02.txt
...
11.txt

When I need to read multiple files, i use a read.stack helper function which is basically a wrapper to read.table but it allows you to also add extra columns on a per-file basis. Here's how I might use it with your scenario.
dir<-"Data"
subdir<-list.dirs(dir, recursive=F)
#get dir/file names
ff<-do.call(rbind, lapply(subdir, function(x) {
ff<-list.files(x, "\\.txt$", include.dirs = FALSE, full.names = TRUE)
data.frame(dir=basename(x), file=basename(ff),
fullpath=ff, stringsAsFactors=F)
}))
#read into data.frame
read.stack(ff$fullpath, extra=list(file=ff$file, dir=ff$dir))

Try this:
fileNames <- list.files("Data", recursive = TRUE, full.names = TRUE)
fileContents <- lapply(fileNames, function(fileName)
paste(readLines(fileName, warn = FALSE), collapse = "\n"))
meta <- regmatches(fileNames, regexec(".*Data/(.*)/(.*)$", fileNames))
merged <- mapply(c, fileContents, lapply(meta, "[", -1), SIMPLIFY = FALSE)
as.data.frame(t(do.call(cbind, merged)))

Related

Merging files of the same names from different folders and creating seperate files [duplicate]

Merge .txt files from different sub directories
I have a folder that is filled with sub folders of past dates (01_14 for example), inside each date folder there are 11 files named 01.txt, 02.txt... How can I merge all the .txt files into one data frame, with one column with the name of the folder from where it came from and a column with the name of file from where it came from?
My hierarchy would look something like this:
\Data
\01_14
01.txt
02.txt
...
11.txt
\02_14
01.txt
02.txt
...
11.txt
\03_14
01.txt
02.txt
...
11.txt
When I need to read multiple files, i use a read.stack helper function which is basically a wrapper to read.table but it allows you to also add extra columns on a per-file basis. Here's how I might use it with your scenario.
dir<-"Data"
subdir<-list.dirs(dir, recursive=F)
#get dir/file names
ff<-do.call(rbind, lapply(subdir, function(x) {
ff<-list.files(x, "\\.txt$", include.dirs = FALSE, full.names = TRUE)
data.frame(dir=basename(x), file=basename(ff),
fullpath=ff, stringsAsFactors=F)
}))
#read into data.frame
read.stack(ff$fullpath, extra=list(file=ff$file, dir=ff$dir))
Try this:
fileNames <- list.files("Data", recursive = TRUE, full.names = TRUE)
fileContents <- lapply(fileNames, function(fileName)
paste(readLines(fileName, warn = FALSE), collapse = "\n"))
meta <- regmatches(fileNames, regexec(".*Data/(.*)/(.*)$", fileNames))
merged <- mapply(c, fileContents, lapply(meta, "[", -1), SIMPLIFY = FALSE)
as.data.frame(t(do.call(cbind, merged)))

How doI read files in a systematic manner in R

I have a folder with 1000 .txt files, with names like file1.txt, file2.txt,....,and file1000.txt.
I want to extract a variable that is present in all the files. The problem is when reading the files, R reads the files from file1, file10, file11,...file1000 and then goes to file2, ...file299 and so on. How can I make the program read the files in a systematic manner (i.e. 1,2,3....,1000), so that it becomes easy to match the variable needed with the file number. I am using this piece of code:
list_of_files <- list.files(path = ".", recursive = TRUE,
pattern = "\\.txt$",
full.names = TRUE)
# Read all the files and create a FileName column to store filenames
DT <- rbindlist(sapply(list_of_files, fread, simplify = FALSE),
use.names = FALSE, idcol = "FileName")
If we want to order the files, do this with mixedsort/mixedorder and then read the files. Also, instead of sapply, use lapply
library(gtools)
list_of_files <- list_of_files[mixedorder(basename(list_of_files))]

Read several txt file from different directories in R

I have several txt files in different directories. I want to read each file separately in R that I will apply some analysis on each one later.
The directories are the same except the last folder as the following:
c:/Desktop/ATA/1/"files.txt"
c:/Desktop/ATA/2/"files.txt"
c:/Desktop/ATA/3/"files.txt"
...
...
The files in all directories have the same name and the last folder starts from 1 to last order.
Create all the filenames to read using sprintf or something similar. Then use read.table or whatever you use to read the text files.
lapply(sprintf("c:/Desktop/ATA/%d/files.txt", 1:10), function(x)
read.table(x, header = TRUE))
Replace 10 with the number of folders you have.
Maybe you can try:
list_file <- list.files(path = "c:/Desktop/ATA", recursive = T, pattern = ".txt", full.names = T)
This will return the list of text files contained in your folder. Then, you can create a for loop to open them and apply some functions on each.
for(i in 1:length(list_file))
{
data = read.table(list_file[i],header = T, sep = "\t")
... function to apply
}
First Thanks Guys, I mixed your codes and modified a little bit:
common_path = "c:/Desktop/ATA/"
primary_dirs = length(list.files(common_path)) # Gives no. of folders in path
list_file <- sprintf("c:/Desktop/ATA/%d/files.txt", 1:primary_dirs)
for(i in 1:length(list_file))
{
data = read.table(list_file[i],header = T, sep = "\t")
}
So, by this way the folders are sorted based on 1,2,3 not 1,10,11,2,3.

Appending a list in a loop (R)

I want to use a loop to read in multiple csv files and append a list in R.
path = "~/path/to/csv/"
file.names <- dir(path, pattern =".csv")
mylist=c()
for(i in 1:length(file.names)){
datatmp <- read.csv(file.names[i],header=TRUE, sep=";", stringsAsFactors=FALSE)
listtmp = datatmp[ ,6]
finallist <- append(mylist, listtmp)
}
finallist
For each csv file, the desired column has a different length.
In the end, I want to get the full appended list with all values in that certain column from all csv files.
I am fairly new to R, so I am not sure what I'm missing...
There are four errors in your approach.
First, file.names <- dir(path, pattern =".csv") will extract just file names, without path. So, when you try to import then, read.csv() doesn't find.
Building the path
You can build the right path including paste0():
path = "~/path/to/csv/"
file.names <- paste0(path, dir(path, pattern =".csv"))
Or file.path(), which add slashes automaticaly.
path = "~/path/to/csv"
file.names <- file.path(path, dir(path, pattern =".csv"))
And another way to create the path, for me more efficient, is that suggested in the answer commented by Tung.
file.names <- list.files(path = "~/path/to/csv", recursive = TRUE,
pattern = "\\.csv$", full.names = TRUE)
This is better because in addition to being all in one step, you can use within a directory containing multiple files of various formats. The code above will match all .csv files in the folder.
Importing, selecting and creating the list
The second error is in mylist <- c(). You want a list, but this creates a vector. So, the correct is:
mylist <- list()
And the last error is inside the loop. Instead of create other list when appending, use the same object created before the loop:
for(i in 1:length(file.names)){
datatmp <- read.csv(file.names[i], sep=";", stringsAsFactors=FALSE)
listtmp = datatmp[, 6]
mylist <- append(mylist, list(listtmp))
}
mylist
Another approach, easier and cleaner, is looping with lapply(). Just this:
mylist <- lapply(file.names, function(x) {
df <- read.csv(x, sep = ";", stringsAsFactors = FALSE)
df[, 6]
})
Hope it helps!

r looping through folders and searching for file and then concatenate data

I have a base folder and it has many folders in it. I want to go to each folder, find a file that has name table_amzn.csv (if exists) and then read all of those files in R and put all files in a single dataframe one after other. I have verified that all files have same columns. I know how to read CSVs into R. But how could i loop over all the folders within a base folder and concatenate data
This also can be straightforward in base R:
## change `dir` to whatever your 'base folder' actually is
dir <- '~/base_folder'
ff <- list.files(dir, pattern = "table_amzn.csv", recursive = TRUE, full.names = TRUE)
out <- do.call(rbind, lapply(ff, read.csv))
In the event that your columns are the same but for whatever reason (typo, etc) have different column names, you could modify the above like:
out <- do.call(rbind, lapply(ff, read.csv, header = FALSE, skip = 1))
names(out) <- c('stub1', 'stub2') # whatever they should be
Here is an implementation that was recently added to the package rio:
files <- list.files(pattern = "table_amzn.csv", recursive = TRUE, full.names = TRUE)
devtools::install_github("leeper/rio")
library(rio)
df <- import_list(files, rbind = TRUE)
This will load all the objects in files to a single data.frame object. Alternatively, if you call with rbind = FALSE then a list of data.frames is returned

Resources