Importing and binding multiple and specific csv files into R - r

I would like to import and bind, all together in a R file, specific csv files named as "number.CSV" (e.g. 3437.CSV) which I have got in a folder with other csv files that I do not want to import.
How can I select only the ones that interest me?
I have got a list of all the csv files that I need and in the following column there are some of them.
CODE
49002
47001
64002
84008
46003
45001
55008
79005
84014
84009
45003
45005
51001
55012
67005
19004
7003
55023
55003
76004
21013
I have got 364 csv files to read and bind.
n.b. I can't select all the "***.csv" files from my folder because I have got other files that I do not need.
Thanks

You could iterate over the list of CSV files of interest, read in each one, and bind it to a common data frame:
path <- "path/to/folder/"
ROOT <- c("49002", "47001", "21013")
files <- paste0(path, ROOT)
sapply(files, bindFile, var2=all_files_df)
bindFile <- function(x, all_df) {
df <- read.csv(x)
all_df <- rbind(df, all_df)
}

Just make file names out of your numeric codes:
filenames = paste(code, 'csv', sep = '.')
# [1] "49002.csv" "47001.csv" "64002.csv" …
You might need to specify the full path to the files as well:
directory = '/example/path'
filenames = file.path(directory, filenames)
# [1] "/example/path/49002.csv" "/example/path/47001.csv" "/example/path/64002.csv" …
And now you can simply read them into R in one go:
data = lapply(filenames, read.csv)
Or, if your CSV files don’t have column headers (this is the case, in particular, when the file’s lines have different numbers of items!)
data = lapply(filenames, read.csv, header = FALSE)
This will give you a list of data.frames. If you want to bind them all into one table, use
data = do.call(rbind, data)

I don't know if you can do that from .CSV file. What you can do is open all your data and then use the command cbind.
For example:
data1 <- read.table("~/YOUR/DATA", quote="\"", comment.char="")
data2 <- read.table("~/YOUR/DATA", quote="\"", comment.char="")
data3 <- read.table("~/YOUR/DATA", quote="\"", comment.char="")
And then:
df <- cbind(data1$Col1, data2$col3...)
Where col is the name of the column that you want.

Related

Loop function for reading csv files and store them in a list

I have a folder in which are stored approximately 10 subfolders, containing 4 .csv files each! Each subfolder corresponds to a weather station, and each file in each subfolder contain temperature data for different period
(e.g. station134_2000_2005.csv,station134_2006_2011.csv,station134_2012_2018.csv etc.) .
I wrote a loop for opening each folder, and rbind all data in one data frame but it is not very handy to do my work.
I need to create a loop so that those 4 files from each subfolder, rbined together to a dataframe, and then stored in a different "slot" in a list, or if it's easier,each station rbined csv data (namely each subfolder) to be exported from the loop as dataframe.
The code I wrote, which opens all files in all folders and create a big (rbined) data frame is:
directory <- list.files() # to have the names of each subfolder
stations <- data.frame() # to store all the rbined csv files
library(plyr)
for(i in directory){
periexomena <- list.files(i,full.names = T, pattern = "\\.csv$")
for(f in periexomena){
data_files <- read.csv(f, stringsAsFactors = F, sep = ";", dec = ",")
stations <- rbind.fill(data_files,stations)
}
Does anyone knows how can I have a list with each subfolder's rbined 4 csv files data in different slot, or how can I modify the abovementioned code in order to export in different data frame, the data from each subfolder?
Try:
slotted <- lapply(setNames(nm = directory), function(D) {
alldat <- lapply(list.files(D, pattern="\\.csv$", full.names=TRUE),
function(fn) {
message(fn)
read.csv2(fn, stringsAsFactors=FALSE)
})
# stringsAsFactors=F should be the default as of R-3.6, I believe
do.call(rbind.fill, alldat)
})

Creating a data frame in R with the content of multiple text files

I'm new to R programming and wondering how I can take the contents of 1,172 text files and create a data frame with the contents of each text file in individual rows in the data frame.
So I want to go from having 1,172 text documents to having a data frame with 1,172 rows and 1 column, with each row having the contents of each individual text file. So the fifth row of the data frame would include the text from the fifth text document in the list I feed into R.
Thanks,
Tyler
# get all files with extension "txt" in the current directory
file.list <- list.files(path = ".", pattern="*.txt", full.names=TRUE)
# this creates a vector where each element contains one file
all.files <- sapply(file.list, FUN = function(x)readChar(x, file.info(x)$size))
# create a dataframe
df <- data.frame( files= all.files, stringsAsFactors=FALSE)
The last 2 steps could be united into one to avoid creating an extra vector:
df <- data.frame( files= sapply(file.list,
FUN = function(x)readChar(x, file.info(x)$size)),
stringsAsFactors=FALSE)
I just tested this and it worked fine for me.
# set the working directory (where files are saved)
setwd("C:/your_path_here/")
file_names = list.files(getwd())
file_names = file_names[grepl(".TXT",file_names)]
# print file_names vector
file_names
files = lapply(file_names, read.csv, header=F, stringsAsFactors = F)
files = do.call(rbind,files)

R - Dynamic reference to files for read csv

I would like to make a script that reads data from the correct folder. I have several lines in my code refering to the foldername, therefore I would like to make this dynamic. Is it possible to make the reference to a folder name dynamic? See below what I would like to do
# Clarifies the name of the folder, afterwards "Foldername" will be used as reference
FolderA <- Foldername
# Read csv to import the data from the selected location
data1 <- read.csv(file="c:/R/Foldername/datafile1.csv", header=TRUE, sep=",")
data2 <- read.csv(file="c:/R/Foldername/datafile2.csv", header=TRUE, sep=",")
I am trying to get the same result as what I would get with this code:
data1 <- read.csv(file="c:/R/FolderA/datafile1.csv", header=TRUE, sep=",")
data2 <- read.csv(file="c:/R/FolderA/datafile2.csv", header=TRUE, sep=",")
Can somebody please clarify how it would be possible to make this dynamic?
You could use paste0 for this:
FolderA <- "Foldername"
paste0("c:/R/", FolderA, "/datafile1.csv")
#[1] "c:/R/Foldername/datafile1.csv"
So in your case:
data1 <- read.csv(file=paste0("c:/R/", FolderA, "/datafile1.csv"), header=TRUE, sep=",")
A slight generalization of #LyzandeR's answer,
make_files <- function(directory, filenames) {
sprintf("C:/R/%s/%s", directory, filenames)
}
##
Files <- sprintf("file%i.csv", 1:3)
##
make_files("FolderA", Files)
#[1] "C:/R/FolderA/file1.csv" "C:/R/FolderA/file2.csv" "C:/R/FolderA/file3.csv"
you could also try the following method. The loop will create a list with output file, but if your files all have the same column names you could just rbind them together (method 2). This method will allow you to specify your folder, then use the list.files function to extract all files with extension ".csv". This way if you have many csv files in a folder you won't have to write them all out individually.
# Specify working directory or location of files:
FolderA = "c:/R/Foldername"
# identify all files with specific extension:
files = list.files(FolderA,pattern="*.csv")
Method 1 - Separate by lists
data = NULL
for(i in 1:length(files)){
data[[i]] = read.csv(files[i],header=F,stringsAsFactors=F)
}
Method 2 - single dataframe
data = NULL
for(i in 1:length(files)){
df = read.csv(files[i],header=F,stringsAsFactors=F)
data = rbind(data,df)
}

Read in multiple txt files and create a list of it to access each file by accessing the list element in R

Being relatively new to R programming I am struggling with a huge data set of 16 text files (, seperated) saved in one dierctory. All the files have same number of columns and the naming convention, for example file_year_2000, file_year_2001 etc. I want to create a list in R where i can access each file individually by accessing the list elementts. By searching through the web i found some code and tried the following but as a result i get one huge list (16,2 MB) where the output is just strange. I would like to have 16 elements in the list each represting one file read from the directory. I tried the following code but it does not work as i want:
path = "~/.../.../.../Data_1999-2015"
list.files(path)
file.names <- dir(path, pattern =".txt")
length(file.names)
df_list = list()
for( i in length(file.names)){
file <- read.csv(file.names[i],header=TRUE, sep=",", stringsAsFactors=FALSE)
year = gsub('[^0-9]', '', file)
df_list[[year]] = file
}
Any suggestions?
Thanks in advance.
Just to give more details
path = "~/.../.../.../Data_1999-2015"
list.files(path)
file.names <- dir(path, pattern =".txt")
length(file.names)
df_list = list()
for(i in seq(length(file.names))){
year = gsub('[^0-9]', '', file.names[i])
df_list[[year]] = read.csv(file.names[i],header=TRUE, sep=",", stringsAsFactors=FALSE)
}
Maybe it would be worth joining the data frames into one big data frame with an additional column being the year?
I assume that instead of "access each file individually" you mean you want to access individually data in each file.
Try something like this (untested):
path = "~/.../.../.../Data_1999-2015"
file.names <- dir(path, pattern =".txt")
df_list = vector("list", length(file.names))
# create a list of data frames with correct length
names(df_list) <- rep("", length(df_list))
# give it empty names to begin with
for( i in seq(along=length(file.names))) {
# now i = 1,2,...,16
file <- read.csv(file.names[i],header=TRUE, sep=",", stringsAsFactors=FALSE)
df_list[[i]] = file
# save the data
year = gsub('[^0-9]', '', file.names[i])
names(df_list)[i] <- year
}
Now you can use either df_list[[1]] or df_list[["2000"]] for year 2000 data.
I am uncertain if you are reading yout csv files in the right directory. If not, use
file <- read.csv(paste0(path, file.names[i], sep="/"),header=TRUE, sep=",", stringsAsFactors=FALSE)
when reading the file.

Import all txt files in folder, concatenate into data frame, use file names as variable in R?

I have a folder with 142 tab-delimited text files. Each file has 19 variables, and then a number of rows beneath (usually no more than 30 rows, but it varies).
I want to do several things with these files in R automatically, and I can't seem to get exactly what I want with my code. I am new to loops, I got both sections of code from previous posts here at stackoverflow but can't seem to figure out how to combine their functions.
I want to turn the filename into a variable when reading the files into R, so that each row has the identifying file name
Concatenate all files (with filename variable and no header) into one dataframe with dimensions Yx19, where Y=however many resulting rows there are.
I am able to create a list of the 142 dataframes using this code:
myFiles = list.files(path="~/Documents/ForR/", pattern="*.txt")
data <- lapply(myFiles, read.table, sep="\t", header=FALSE)
names(data) <- myFiles
for(i in myFiles)
data[[i]]$Source = i
do.call(rbind, data)
I am able to create the dataframe I want with 19 variables, but the filename is not present:
files <- list.files(path="~/Documents/ForR/.", pattern=".txt")
DF <- NULL
for (f in files) {
dat <- read.csv(f, header=F, sep="\t", na.strings="", colClasses="character")
DF <- rbind(DF, dat)
}
How do I add the file name (without .txt if possible) as a variable to the loop?
add to the loop
dat$file <- unlist(strsplit(f,split=".",fixed=T))[1]
files <- list.files(path="~/Documents/ForR/.", pattern=".txt")
DF <- NULL
for (f in files) {
dat <- read.csv(f, header=F, sep="\t", na.strings="", colClasses="character")
dat$file <- unlist(strsplit(f,split=".",fixed=T))[1]
DF <- rbind(DF, dat)
}
Shouldn't the row.names from the do.call be in the format names(list)[n].i where i is 1:number_of_rows_for_data.frame n? so you can just make a column from the row.names
data <- lapply(myFiles, read.table, sep="\t", header=FALSE)
combined.data <- do.call(rbind, data)
combined.data$file_origin <- row.names(combined.data)
You can use basename to get the last path element( filename) , for example:
(files = file.path("~","Documents","ForR",c("file1.txt", "file2.txt")))
"~/Documents/ForR/file1.txt" "~/Documents/ForR/file2.txt"
(basename(files))
[1] "file1.txt" "file2.txt"
Then sub to remove the extension ".txt":
sub('.txt','',basename(files),fixed=TRUE)
[1] "file1" "file2"

Resources