How to 'read.csv' many files in a folder using R? - r

How can I read many CSV files and make each of them into data tables?
I have files of 'A1.csv' 'A2.csv' 'A3.csv'...... in Folder 'A'
So I tried this.
link <- c("C:/A")
filename<-list.files(link)
listA <- c()
for(x in filename) {
temp <- read.csv(paste0(link , x), header=FALSE)
listA <- list(unlist(listA, recursive=FALSE), temp)
}
And it doesn't work well. How can I do this job?

Write a regex to match the filenames
reg_expression <- "A[0-9]+"
files <- grep(reg_expression, list.files(directory), value = TRUE)
and then run the same loop but use assign to dynamically name the dataframes if you want
for(file in files){
assign(paste0(file, "_df"),read.csv(file))
}
But in general introducing unknown variables into the scope is bad practice so it might be best to do a loop like
dfs <- list()
for(index in 1:length(files)){
file <- files[index]
dfs[index] <- read.csv(file)
}

Unless each file is a completely different structure (i.e., different columns ... the number of rows does not matter), you can consider a more efficient approach of reading the files in using lapply and storing them in a list. One of the benefits is that whatever you do to one frame can be immediately done to all of them very easily using lapply.
files <- list.files(link, full.names = TRUE, pattern = "csv$")
list_of_frames <- lapply(files, read.csv)
# optional
names(list_of_frames) <- files # or basename(files), if filenames are unique
Something like sapply(list_of_frames, nrow) will tell you how many rows are in each frame. If you have something more complex,
new_list_of_frames <- lapply(list_of_frames, function(x) {
# do something with 'x', a single frame
})

The most immediate problem is that when pasting your file path together, you need a path separator. When composing file paths, it's best to use the function file.path as it will attempt to determine what the path separator is for operating system the code is running on. So you want to use:
read.csv(files.path(link , x), header=FALSE)
Better yet, just have the full path returned when listing out the files (and can filter for .csv):
filename <- list.files(link, full.names = TRUE, pattern = "csv$")
Combining with the idea to use assign to dynamically create the variables:
link <- c("C:/A")
files <-list.files(link, full.names = TRUE, pattern = "csv$")
for(file in files){
assign(paste0(basename(file), "_df"), read.csv(file))
}

Related

Combining .txt files in R using a loop

I'm currently trying to use R to combine dozens of .txt files into one single .txt file. Attached below is the code that I've been experimenting with so far. The files that I'm trying to combine have very similar names, for example: "e20171ny0001000.txt" and "e20171ct0001000.txt". As you can see, the only difference in the file names are the different state abbreviations. This is why I've been trying to use a for loop, in order to try to go through all the state abbreviations.
setwd("/Users/tim/Downloads/All_Geographies")
statelist = c('ak','al','ar','az','ca','co','ct','dc','de','fl','ga','hi','ia','id','il','in','ks','ky','la','ma','md','me','mi','mn','mo','ms','mt','nc','nd','ne','nh','nj','nm','nv','ny','oh','ok','or','pa','ri','sc','sd','tn','tx','ut','va','vt','wa','wi','wv','wy')
for (i in statelist){
file_names <- list.files(getwd())
file_names <- file_names[grepl(paste0("e20171", i, "0001000.txt"),file_names)]
files <- lapply(file_names, read.csv, header=F, stringsAsFactors = F)
files <- do.call(rbind,files)
}
write.table(files, file = "RandomFile.txt", sep="\t")
When I run the code, there isn't a specific error that pops up. Instead the entire code runs and nothing happens. I feel like my code is missing something that is preventing it from running correctly.
We need to create a list to update. In the OP's code,files is a list of data.frame that gets updated in the for loop. Instead, the output needss to be stored in a list. For this, we can create a list of NULL 'out' and then assign the output to each element of 'out'
out <- vector('list', length(statelist))
for (i in seq_along(statelist)){
file_names <- list.files(getwd())
file_names <- file_names[grepl(paste0("e20171", statelist[i],
"0001000.txt"),file_names)]
files <- lapply(file_names, read.csv, header=FALSE, stringsAsFactors = FALSE)
out[[i]] <- do.call(rbind, files)
}
As out is a list of data.frame, we need to loop over the list and then write it back to file
newfilenames <- paste0(statelist, "_new", ".txt")
lapply(seq_along(out), function(i) write.table(out[[i]],
file = newfilenames[i], quote = FALSE, row.names = FALSE))

Appending a list in a loop (R)

I want to use a loop to read in multiple csv files and append a list in R.
path = "~/path/to/csv/"
file.names <- dir(path, pattern =".csv")
mylist=c()
for(i in 1:length(file.names)){
datatmp <- read.csv(file.names[i],header=TRUE, sep=";", stringsAsFactors=FALSE)
listtmp = datatmp[ ,6]
finallist <- append(mylist, listtmp)
}
finallist
For each csv file, the desired column has a different length.
In the end, I want to get the full appended list with all values in that certain column from all csv files.
I am fairly new to R, so I am not sure what I'm missing...
There are four errors in your approach.
First, file.names <- dir(path, pattern =".csv") will extract just file names, without path. So, when you try to import then, read.csv() doesn't find.
Building the path
You can build the right path including paste0():
path = "~/path/to/csv/"
file.names <- paste0(path, dir(path, pattern =".csv"))
Or file.path(), which add slashes automaticaly.
path = "~/path/to/csv"
file.names <- file.path(path, dir(path, pattern =".csv"))
And another way to create the path, for me more efficient, is that suggested in the answer commented by Tung.
file.names <- list.files(path = "~/path/to/csv", recursive = TRUE,
pattern = "\\.csv$", full.names = TRUE)
This is better because in addition to being all in one step, you can use within a directory containing multiple files of various formats. The code above will match all .csv files in the folder.
Importing, selecting and creating the list
The second error is in mylist <- c(). You want a list, but this creates a vector. So, the correct is:
mylist <- list()
And the last error is inside the loop. Instead of create other list when appending, use the same object created before the loop:
for(i in 1:length(file.names)){
datatmp <- read.csv(file.names[i], sep=";", stringsAsFactors=FALSE)
listtmp = datatmp[, 6]
mylist <- append(mylist, list(listtmp))
}
mylist
Another approach, easier and cleaner, is looping with lapply(). Just this:
mylist <- lapply(file.names, function(x) {
df <- read.csv(x, sep = ";", stringsAsFactors = FALSE)
df[, 6]
})
Hope it helps!

r looping through folders and searching for file and then concatenate data

I have a base folder and it has many folders in it. I want to go to each folder, find a file that has name table_amzn.csv (if exists) and then read all of those files in R and put all files in a single dataframe one after other. I have verified that all files have same columns. I know how to read CSVs into R. But how could i loop over all the folders within a base folder and concatenate data
This also can be straightforward in base R:
## change `dir` to whatever your 'base folder' actually is
dir <- '~/base_folder'
ff <- list.files(dir, pattern = "table_amzn.csv", recursive = TRUE, full.names = TRUE)
out <- do.call(rbind, lapply(ff, read.csv))
In the event that your columns are the same but for whatever reason (typo, etc) have different column names, you could modify the above like:
out <- do.call(rbind, lapply(ff, read.csv, header = FALSE, skip = 1))
names(out) <- c('stub1', 'stub2') # whatever they should be
Here is an implementation that was recently added to the package rio:
files <- list.files(pattern = "table_amzn.csv", recursive = TRUE, full.names = TRUE)
devtools::install_github("leeper/rio")
library(rio)
df <- import_list(files, rbind = TRUE)
This will load all the objects in files to a single data.frame object. Alternatively, if you call with rbind = FALSE then a list of data.frames is returned

Read in multiple txt files and create a list of it to access each file by accessing the list element in R

Being relatively new to R programming I am struggling with a huge data set of 16 text files (, seperated) saved in one dierctory. All the files have same number of columns and the naming convention, for example file_year_2000, file_year_2001 etc. I want to create a list in R where i can access each file individually by accessing the list elementts. By searching through the web i found some code and tried the following but as a result i get one huge list (16,2 MB) where the output is just strange. I would like to have 16 elements in the list each represting one file read from the directory. I tried the following code but it does not work as i want:
path = "~/.../.../.../Data_1999-2015"
list.files(path)
file.names <- dir(path, pattern =".txt")
length(file.names)
df_list = list()
for( i in length(file.names)){
file <- read.csv(file.names[i],header=TRUE, sep=",", stringsAsFactors=FALSE)
year = gsub('[^0-9]', '', file)
df_list[[year]] = file
}
Any suggestions?
Thanks in advance.
Just to give more details
path = "~/.../.../.../Data_1999-2015"
list.files(path)
file.names <- dir(path, pattern =".txt")
length(file.names)
df_list = list()
for(i in seq(length(file.names))){
year = gsub('[^0-9]', '', file.names[i])
df_list[[year]] = read.csv(file.names[i],header=TRUE, sep=",", stringsAsFactors=FALSE)
}
Maybe it would be worth joining the data frames into one big data frame with an additional column being the year?
I assume that instead of "access each file individually" you mean you want to access individually data in each file.
Try something like this (untested):
path = "~/.../.../.../Data_1999-2015"
file.names <- dir(path, pattern =".txt")
df_list = vector("list", length(file.names))
# create a list of data frames with correct length
names(df_list) <- rep("", length(df_list))
# give it empty names to begin with
for( i in seq(along=length(file.names))) {
# now i = 1,2,...,16
file <- read.csv(file.names[i],header=TRUE, sep=",", stringsAsFactors=FALSE)
df_list[[i]] = file
# save the data
year = gsub('[^0-9]', '', file.names[i])
names(df_list)[i] <- year
}
Now you can use either df_list[[1]] or df_list[["2000"]] for year 2000 data.
I am uncertain if you are reading yout csv files in the right directory. If not, use
file <- read.csv(paste0(path, file.names[i], sep="/"),header=TRUE, sep=",", stringsAsFactors=FALSE)
when reading the file.

Renaming dataframes in R after read in via loop

I'm trying to read in a bunch of .tsv files from a folder ('Test'), and I got this far:
files <- list.files("Test", pattern="*.tsv", full.names=TRUE)
for (i in 1:length(files)) assign(files[i], read.delim(files[i], na.strings = c(
"FAILED", "ERROR"))
)
This works, however the data frames it creates includes the file path and extension (in this case Test/ and .tsv resulting in a data frame named Test/100_1.tsv). I've been playing around with the code for a few hours trying to get it to name each data frame by the necessary information only (e.g., 100_1). If anyone has any suggestions, this novice would greatly appreciate the help.
I would suggest loading these as a list, rather than into the global namespace with assign.
Something like
files <- list.files("Test", pattern="*.tsv", full.names=TRUE)
names(files) <- files
all.data <- lapply(files, function(fle) {
read.delim(fle, na.strings=c("FAILED", "ERROR"))
})
will yield a list keyed by filename, which will be much easier to deal with.
I would make a label object that contains the names you would like to apply to your dataframes. The label object I made assumes you always have the same size names. Should do what you are looking for though.
files <- list.files("Test", pattern="*.tsv", full.names=TRUE)
dflabels <- substr(files,5,nchar(files)-4)
for (i in 1:length(files)) assign(dflabels[i], read.delim(files[i], na.strings = c(
"FAILED", "ERROR"))
)
If you're just trying to get 100_1.tsv all you need to do is set full.names = FALSE and preset your working directory rather than call it in list.files.
E.g.
setwd("C:/Your/Working/Directory/Test")
files <- list.files(pattern="*.tsv")
for (i in 1:length(files)) assign(files[i], read.delim(files[i], na.strings = c(
"FAILED", "ERROR"))
)
full.names defaults to FALSE.

Resources