Picking data from all csv files placed at a location - r

I have several csv files at a common location. Each csv file has the same column names but different data. The names of these csv files are different. Can I write a code in R to read in the data from all the csv files and then put it in a single data frame? The names of the csv files keep differing so I wish to have a code such that I do not have to explicitly specify the names of the csv files. Thanks.

Have a look at list.files for listing all files at a location, read.csv for loading one file into R, and rbind to put them into a single data.frame.
The code could look like this (untested)
setwd(location)
fnames <- list.files()
csv <- lapply(fnames, read.csv)
result <- do.call(rbind, csv)

filelist <- list.files(pattern = "\\.csv") # reads only .csv files in a folder
allcsv.files <- list()
count <- 1
for (file in filelist) {
dat <- read.csv(file)
allcsv.files[[count]] <- dat
count <- count + 1
}
allfiles <- do.call(rbind.data.frame, allcsv.files)

setwd("common location")
f <- list.files()
d <- data.frame()
for(i in 1:length(f)){
file <- read.csv(f[i],stringsasFactors=F)
d <- rbind(d,file)
}
colnames(d) <- c("col1","col2")
write.csv(d,"combined.csv",row.names=F)

Related

Loop function for reading csv files and store them in a list

I have a folder in which are stored approximately 10 subfolders, containing 4 .csv files each! Each subfolder corresponds to a weather station, and each file in each subfolder contain temperature data for different period
(e.g. station134_2000_2005.csv,station134_2006_2011.csv,station134_2012_2018.csv etc.) .
I wrote a loop for opening each folder, and rbind all data in one data frame but it is not very handy to do my work.
I need to create a loop so that those 4 files from each subfolder, rbined together to a dataframe, and then stored in a different "slot" in a list, or if it's easier,each station rbined csv data (namely each subfolder) to be exported from the loop as dataframe.
The code I wrote, which opens all files in all folders and create a big (rbined) data frame is:
directory <- list.files() # to have the names of each subfolder
stations <- data.frame() # to store all the rbined csv files
library(plyr)
for(i in directory){
periexomena <- list.files(i,full.names = T, pattern = "\\.csv$")
for(f in periexomena){
data_files <- read.csv(f, stringsAsFactors = F, sep = ";", dec = ",")
stations <- rbind.fill(data_files,stations)
}
Does anyone knows how can I have a list with each subfolder's rbined 4 csv files data in different slot, or how can I modify the abovementioned code in order to export in different data frame, the data from each subfolder?
Try:
slotted <- lapply(setNames(nm = directory), function(D) {
alldat <- lapply(list.files(D, pattern="\\.csv$", full.names=TRUE),
function(fn) {
message(fn)
read.csv2(fn, stringsAsFactors=FALSE)
})
# stringsAsFactors=F should be the default as of R-3.6, I believe
do.call(rbind.fill, alldat)
})

Importing and binding multiple and specific csv files into R

I would like to import and bind, all together in a R file, specific csv files named as "number.CSV" (e.g. 3437.CSV) which I have got in a folder with other csv files that I do not want to import.
How can I select only the ones that interest me?
I have got a list of all the csv files that I need and in the following column there are some of them.
CODE
49002
47001
64002
84008
46003
45001
55008
79005
84014
84009
45003
45005
51001
55012
67005
19004
7003
55023
55003
76004
21013
I have got 364 csv files to read and bind.
n.b. I can't select all the "***.csv" files from my folder because I have got other files that I do not need.
Thanks
You could iterate over the list of CSV files of interest, read in each one, and bind it to a common data frame:
path <- "path/to/folder/"
ROOT <- c("49002", "47001", "21013")
files <- paste0(path, ROOT)
sapply(files, bindFile, var2=all_files_df)
bindFile <- function(x, all_df) {
df <- read.csv(x)
all_df <- rbind(df, all_df)
}
Just make file names out of your numeric codes:
filenames = paste(code, 'csv', sep = '.')
# [1] "49002.csv" "47001.csv" "64002.csv" …
You might need to specify the full path to the files as well:
directory = '/example/path'
filenames = file.path(directory, filenames)
# [1] "/example/path/49002.csv" "/example/path/47001.csv" "/example/path/64002.csv" …
And now you can simply read them into R in one go:
data = lapply(filenames, read.csv)
Or, if your CSV files don’t have column headers (this is the case, in particular, when the file’s lines have different numbers of items!)
data = lapply(filenames, read.csv, header = FALSE)
This will give you a list of data.frames. If you want to bind them all into one table, use
data = do.call(rbind, data)
I don't know if you can do that from .CSV file. What you can do is open all your data and then use the command cbind.
For example:
data1 <- read.table("~/YOUR/DATA", quote="\"", comment.char="")
data2 <- read.table("~/YOUR/DATA", quote="\"", comment.char="")
data3 <- read.table("~/YOUR/DATA", quote="\"", comment.char="")
And then:
df <- cbind(data1$Col1, data2$col3...)
Where col is the name of the column that you want.

R script to read / execute many .csv file from command line and write all the results in a .csv file

I have many .csv files in a folder. I want to get the binning result from each of the .csv file one by one automatically by R scripting from command line, and one by one write the result of all files into result.csv file. For example, I have file01.csv, file02.csv, file03.csv, file04.csv, file05.csv. I want that first R script will read / execute file01.csv and write the result into result.csv file, then read / execute file02.csv and write result into result.csv, again read / execute file03.csv and write result into result.csv, and so on. This is like a loop on all the files, and I want to execute the R script from the command line.
Here is my starting R script:
data <- read.table("file01.csv",sep=",",header = T)
df.train <- data.frame(data)
library(smbinning) # Install if necessary
<p>#Analysis by dwell:</p>
df.train_amp <-
rbind(df.train)
res.bin <- smbinning(df=df.train_amp, y="cvflg",x="dwell")
res.bin #Result
<p># Analysis by pv</p>
df.train_amp <-
rbind(df.train)
res.bin <- smbinning(df=df.train_amp, y="cvflg",x="pv")
res.bin #Result
Any suggestion and support would be appreciated highly.
Thank
Firstly you will want to read in the files from your directory. Place all of your source files in the same source directory. I am assuming here that your CSV files all have the same shape. Also, I am doing nothing about headers here.
directory <- "C://temp" ## for example
filenames <- list.files(directory, pattern = "*.csv", full.names = TRUE)
# If you need full paths then change the above to
# filenames <- list.files(directory, pattern = "*.csv", full.names = TRUE)
bigDF <- data.frame()
for (f in 1:length(filenames)){
tmp <- read.csv(paste(filenames[f]), stringsAsFactors = FALSE)
bigDF <- rbind(bigDF, tmp)
}
This will add the rows in tmp to bigDF for each read, and should result in final bigDF.
To write the df to a csv is trivial in R as well. Anything like
# Write to a file, suppress row names
write.csv(bigDF, "myData.csv", row.names=FALSE)
# Same, except that instead of "NA", output blank cells
write.csv(bigDF, "myData.csv", row.names=FALSE, na="")
# Use tabs, suppress row names and column names
write.table(bigDF, "myData.csv", sep="\t", row.names=FALSE, col.names=FALSE)
Finally I find the above problem can be solved as follows:
library(smbinning) #Install if necessary。
files <- list.files(pattern = ".csv") ## creates a vector with all files names in your folder
cutpoint <- rep(0,length(files))
for(i in 1:length(files)){
data <- read.csv(files[i],header=T)
df.train <- data.frame(data)
df.train_amp <- rbind(df.train,df.train,df.train,df.train,df.train,df.train,df.train,df.train) # Just to multiply the data
cutpoint[i] <- smbinning(df=df.train_amp, y="cvflg",x="dwell") # smbinning is calculated here
}
result <- cbind(files,cutpoint) # Produce details results
result <- cbind(files,bands) # Produce bands results
write.csv(result,"result_dwell.csv") # write result into csv file

read in csv-files with the same name from subdirectories in R

I have 2 folders named facebookdata1 and facebookdata2. These folders contain CSV files, which have exactly the same name ("activities","user" and so on) and they count up to the same amount.
I have to read in and merge (rbind...) the equally named csv files from the 2 different folders into R.
I know I can read in all the csv files from one folder in by using this:
temp = list.files(pattern="*.csv")
for (i in 1:length(temp)) assign(temp[i], read.csv(temp[i]))
The following should do the job:
directories <- c("path/to/facebookdata1", "path/to/facebookdata2")
files <- lapply(directories, list.files, pattern="*.csv", full.names = TRUE)
files <- lapply(files, sort)
dat <- Map(function(x,y) { rbind(read.csv(x), read.csv(y)) },
files[[1]], files[[2]])
Now they are list elements of dat.
If you want to assign them to the globalenv use
list2env(dat)
You can speed the process up by using data.table as follows:
require(data.table)
dat <- Map(function(x,y) { rbindlist(fread(x), fread(y)) },
files[[1]], files[[2]])
You just lack a rbind! I tidied up your variable names a bit by removing the ".csv".
files1 <- list.files(path = "facebookdata1/", pattern="\\.csv$")
files2 <- list.files(path = "facebookdata2/", pattern="\\.csv$")
if(length(setdiff(files1, files2))>0)
stop("Actually, the two directories do not have the same files")
for (file in files1) {
varname <- substr(file, start=1, stop=nchar(file)-4)
data1 <- read.csv(file.path("facebookdata1", file))
data1 <- read.csv(file.path("facebookdata2", file))
assign(varname, rbind(data1, data2))
}

Import all txt files in folder, concatenate into data frame, use file names as variable in R?

I have a folder with 142 tab-delimited text files. Each file has 19 variables, and then a number of rows beneath (usually no more than 30 rows, but it varies).
I want to do several things with these files in R automatically, and I can't seem to get exactly what I want with my code. I am new to loops, I got both sections of code from previous posts here at stackoverflow but can't seem to figure out how to combine their functions.
I want to turn the filename into a variable when reading the files into R, so that each row has the identifying file name
Concatenate all files (with filename variable and no header) into one dataframe with dimensions Yx19, where Y=however many resulting rows there are.
I am able to create a list of the 142 dataframes using this code:
myFiles = list.files(path="~/Documents/ForR/", pattern="*.txt")
data <- lapply(myFiles, read.table, sep="\t", header=FALSE)
names(data) <- myFiles
for(i in myFiles)
data[[i]]$Source = i
do.call(rbind, data)
I am able to create the dataframe I want with 19 variables, but the filename is not present:
files <- list.files(path="~/Documents/ForR/.", pattern=".txt")
DF <- NULL
for (f in files) {
dat <- read.csv(f, header=F, sep="\t", na.strings="", colClasses="character")
DF <- rbind(DF, dat)
}
How do I add the file name (without .txt if possible) as a variable to the loop?
add to the loop
dat$file <- unlist(strsplit(f,split=".",fixed=T))[1]
files <- list.files(path="~/Documents/ForR/.", pattern=".txt")
DF <- NULL
for (f in files) {
dat <- read.csv(f, header=F, sep="\t", na.strings="", colClasses="character")
dat$file <- unlist(strsplit(f,split=".",fixed=T))[1]
DF <- rbind(DF, dat)
}
Shouldn't the row.names from the do.call be in the format names(list)[n].i where i is 1:number_of_rows_for_data.frame n? so you can just make a column from the row.names
data <- lapply(myFiles, read.table, sep="\t", header=FALSE)
combined.data <- do.call(rbind, data)
combined.data$file_origin <- row.names(combined.data)
You can use basename to get the last path element( filename) , for example:
(files = file.path("~","Documents","ForR",c("file1.txt", "file2.txt")))
"~/Documents/ForR/file1.txt" "~/Documents/ForR/file2.txt"
(basename(files))
[1] "file1.txt" "file2.txt"
Then sub to remove the extension ".txt":
sub('.txt','',basename(files),fixed=TRUE)
[1] "file1" "file2"

Resources