This question already has answers here:
How to import multiple .csv files at once?
(15 answers)
Closed 2 years ago.
I have two lists, one with the excel file paths that I would like to read and another list with the file names that I would like to assign to each as a dataframe. Trying to create a loop using the below code but the loop only creates a single dataframe with name n. Any idea how to make this work?
files <- c("file1.xlsx","file2.xlsx")
names <- c('name1','name2')
for (f in files) {
for (n in names) {
n <- read_excel(path = f)
}
}
You are overwriting n on each iteration of the loop
Edit:
#Parfait commented that we shouldn't use assign if we can avoid it, and he is right (e.g. why-is-using-assign-bad)
This does not use assign and puts the data in a neat list:
files <- c("file1.xlsx","file2.xlsx")
names <- c('name1','name2')
result <- list()
for (i in seq_along(files)) {
result[names[i]] <- read_excel(path = files[i]))
}
Old and not recommended answer (only left here for transparency reasons):
We can use assign to use a character string as variable name:
files <- c("file1.xlsx","file2.xlsx")
names <- c('name1','name2')
for (i in seq_along(files)) {
assign(names[i], read_excel(path = files[i]))
}
An alternative is to loop through all Excel files in a folder, rather than a list. I'm assuming they exist in some kind of folder, somewhere.
# load names of excel files
files = list.files(path = "C:/your_path_here/", full.names = TRUE, pattern = ".xlsx")
# create function to read multiple sheets per excel file
read_excel_allsheets <- function(filename, tibble = FALSE) {
sheets <- readxl::excel_sheets(filename)
sapply(sheets, function(f) as.data.frame(readxl::read_excel(filename, sheet = f)),
simplify = FALSE)
}
# execute function for all excel files in "files"
all_data <- lapply(files, read_excel_allsheets)
Updated...
Related
How can I read many CSV files and make each of them into data tables?
I have files of 'A1.csv' 'A2.csv' 'A3.csv'...... in Folder 'A'
So I tried this.
link <- c("C:/A")
filename<-list.files(link)
listA <- c()
for(x in filename) {
temp <- read.csv(paste0(link , x), header=FALSE)
listA <- list(unlist(listA, recursive=FALSE), temp)
}
And it doesn't work well. How can I do this job?
Write a regex to match the filenames
reg_expression <- "A[0-9]+"
files <- grep(reg_expression, list.files(directory), value = TRUE)
and then run the same loop but use assign to dynamically name the dataframes if you want
for(file in files){
assign(paste0(file, "_df"),read.csv(file))
}
But in general introducing unknown variables into the scope is bad practice so it might be best to do a loop like
dfs <- list()
for(index in 1:length(files)){
file <- files[index]
dfs[index] <- read.csv(file)
}
Unless each file is a completely different structure (i.e., different columns ... the number of rows does not matter), you can consider a more efficient approach of reading the files in using lapply and storing them in a list. One of the benefits is that whatever you do to one frame can be immediately done to all of them very easily using lapply.
files <- list.files(link, full.names = TRUE, pattern = "csv$")
list_of_frames <- lapply(files, read.csv)
# optional
names(list_of_frames) <- files # or basename(files), if filenames are unique
Something like sapply(list_of_frames, nrow) will tell you how many rows are in each frame. If you have something more complex,
new_list_of_frames <- lapply(list_of_frames, function(x) {
# do something with 'x', a single frame
})
The most immediate problem is that when pasting your file path together, you need a path separator. When composing file paths, it's best to use the function file.path as it will attempt to determine what the path separator is for operating system the code is running on. So you want to use:
read.csv(files.path(link , x), header=FALSE)
Better yet, just have the full path returned when listing out the files (and can filter for .csv):
filename <- list.files(link, full.names = TRUE, pattern = "csv$")
Combining with the idea to use assign to dynamically create the variables:
link <- c("C:/A")
files <-list.files(link, full.names = TRUE, pattern = "csv$")
for(file in files){
assign(paste0(basename(file), "_df"), read.csv(file))
}
This question already has answers here:
How to import multiple .csv files at once?
(15 answers)
Closed 4 years ago.
I have a number of files progressively named from 1 to 22, i.e. "chr1.csv", "chr2.csv" ... "chr22.csv". Each file contains a database with the same variables as columns. I would like to create a loop so as to read these files and save them as elements of a list.
I have tried this snippet of code but it is not working.
file <- vector("character", 22)
data<-vector("list", length = 22)
for (i in 1:22) {
str <- gsub("number", i, "chrnumber.csv")
file[i] <- str
}
for (i in file) {
data <- read.csv(i, sep="")
}
Easier way to create your files vector:
file <- paste0("chr", 1:22, ".csv")
And you have to subset your output list to save the results
data<-vector("list", length = 22)
names(data)<-file
for (i in file) {
data[[i]] <- read.csv(i, sep="")
}
This question already has an answer here:
How can I read multiple (excel) files into R? [duplicate]
(1 answer)
Closed 7 years ago.
I have a yearly stock data in a folder for the last 15 years containing 15 files(one file / year). This folder is also set as my working directory. I can read each file seperately and save it to a variable but i want to make a loop or function to read all the files and create a variable for each year. I have tried with the following code but I can not get the desired results. any Help?
reading each file seperately:
allData_2000 <- read.csv("......../Data_1999-2015/scrip_high_low_year_2000.txt",sep = ",", header = TRUE, stringsAsFactors = FALSE)
allData_2001 <- read.csv("......../Data_1999-2015/scrip_high_low_year_2000.txt",sep = ",", header = TRUE, stringsAsFactors = FALSE)
But i would like to read all the files using a loop:
path <- "....Data_1999-2015"
files <- list.files(path=path, pattern="*.txt")
for(file in files)
{
perpos <- which(strsplit(file, "")[[1]]==".")
assign(
gsub(" ","",substr(file, 1, perpos-1)),
read.csv(paste(path,file,sep=",",header = TRUE, stringsAsFactors = FALSE)))
}
Try this improved code:
library(tools)
library(data.table)
files<-list.files(pattern="*.csv")
for (f in 1:length(files))
assign(paste("AllData_",gsub("[^0-9]","",file_path_sans_ext(files[[f]])),sep=""), fread(files[f]))
Try something like this, maybe.
df_list = list()
counter = 1
for(file in files){
temp_df = read.csv(paste0(path, '/', file), header=T, stringsAsFactors = F)
temp_df$year = gsub('[^0-9]', '', file)
df_list[[counter]] = temp_df
counter = counter + 1
}
big_df = do.call(rbind, df_list)
create an empty list, then iterate through the files, reading them in. Remove any non-numeric characters in the file to get the year (this is based off what your files look like above: some text, along with the year; if the files don't look like that, you'll need a different method than the gsub I did), and create that as a new variable, and then store the whole dataframe in a list. Then bind the dataframes into a single dataframe at the end.
Edit: upon a reread of your question, I'm not sure if what I told you do is what you want to do. If you just want to load up all the dataframes into memory, and give them a variable so that you can access them, without putting them into a single dataframe, I'd probably do something like this:
df_list = list()
for(file in files){
temp_df = read.csv(paste0(path, '/', file), header=T, stringsAsFactors = F)
year = gsub('[^0-9]', '', file)
df_list[[year]] = temp_df
}
Then each dataframe can be accessed like: df_list[['2000']] would be the dataframe for the year 2000.
Being relatively new to R programming I am struggling with a huge data set of 16 text files (, seperated) saved in one dierctory. All the files have same number of columns and the naming convention, for example file_year_2000, file_year_2001 etc. I want to create a list in R where i can access each file individually by accessing the list elementts. By searching through the web i found some code and tried the following but as a result i get one huge list (16,2 MB) where the output is just strange. I would like to have 16 elements in the list each represting one file read from the directory. I tried the following code but it does not work as i want:
path = "~/.../.../.../Data_1999-2015"
list.files(path)
file.names <- dir(path, pattern =".txt")
length(file.names)
df_list = list()
for( i in length(file.names)){
file <- read.csv(file.names[i],header=TRUE, sep=",", stringsAsFactors=FALSE)
year = gsub('[^0-9]', '', file)
df_list[[year]] = file
}
Any suggestions?
Thanks in advance.
Just to give more details
path = "~/.../.../.../Data_1999-2015"
list.files(path)
file.names <- dir(path, pattern =".txt")
length(file.names)
df_list = list()
for(i in seq(length(file.names))){
year = gsub('[^0-9]', '', file.names[i])
df_list[[year]] = read.csv(file.names[i],header=TRUE, sep=",", stringsAsFactors=FALSE)
}
Maybe it would be worth joining the data frames into one big data frame with an additional column being the year?
I assume that instead of "access each file individually" you mean you want to access individually data in each file.
Try something like this (untested):
path = "~/.../.../.../Data_1999-2015"
file.names <- dir(path, pattern =".txt")
df_list = vector("list", length(file.names))
# create a list of data frames with correct length
names(df_list) <- rep("", length(df_list))
# give it empty names to begin with
for( i in seq(along=length(file.names))) {
# now i = 1,2,...,16
file <- read.csv(file.names[i],header=TRUE, sep=",", stringsAsFactors=FALSE)
df_list[[i]] = file
# save the data
year = gsub('[^0-9]', '', file.names[i])
names(df_list)[i] <- year
}
Now you can use either df_list[[1]] or df_list[["2000"]] for year 2000 data.
I am uncertain if you are reading yout csv files in the right directory. If not, use
file <- read.csv(paste0(path, file.names[i], sep="/"),header=TRUE, sep=",", stringsAsFactors=FALSE)
when reading the file.
I am new to R and am currently trying to apply a function to worksheets of the same index in different workbooks using the package XLConnect.
So far I have managed to read in a folder of excel files:
filenames <- list.files( "file path here", pattern="\\.xls$", full.names=TRUE)
and I have looped through the different files, reading each worksheet of each file
for (i in 1:length(filenames)){
tmp<-loadWorkbook(file.path(filenames[i],sep=""))
lst<- readWorksheet(tmp,
sheet = getSheets(tmp), startRow=5, startCol=1, header=TRUE)}
What I think I want to do is to loop through the files in filenames and then take the worksheets with the same index (eg. the 1st worksheet of the all the files, then the 2nd worksheet of all the files etc.) and save these to a new workbook (the first workbook containing all the 1st worksheets, then a second workbook with all the 2nd worksheets etc.), with a new sheet for each original sheet that was taken from the previous files and then use
for (sheet in newlst){
Count<-t(table(sheet$County))}
and apply my function to the parameter Count.
Does anyone know how I can do this or offer me any guidance at all? Sorry if it is not clear, please ask and I will try to explain further! Thanks :)
If I understand your question correctly, the following should solve your problem:
require(XLConnect)
# Example: workbooks w1.xls - w[n].xls each with sheets S1 - S[m]
filenames = list.files("file path here", pattern = "\\.xls$", full.names = TRUE)
# Read data from all workbooks and all worksheets
data = lapply(filenames, function(f) {
wb = loadWorkbook(f)
readWorksheet(wb, sheet = getSheets(wb)) # read all sheets in one go
})
# Assumption for this example: all workbooks contain same number of sheets
nWb = sapply(data, length)
stopifnot(diff(range(nWb)) == 0)
nWb = min(nWb)
for(i in seq(length.out = nWb)) {
# List of data.frames of all i'th sheets
dout = lapply(data, "[[", i)
# Note: write all collected sheets in one go ...
writeWorksheetToFile(file = paste0("wout", i, ".xls"), data = dout,
sheet = paste0("Sout", seq(length.out = length(data))))
}
using gdata/xlsx might be for the best. Even though it's the slowest out there, it's one of the more intuitive ones.
This method fails if the excels are sorted differently.
Given there's no real example, here's some food for thought.
Gdata requires perl to be installed.
library(gdata)
filenames <- list.files( "file path here", pattern="\\.xls$", full.names=TRUE)
amountOfSheets <- #imput something
#This snippet gets the first sheet out of all the files and combining them
readXlsSheet <- function(whichSheet, files){
for(i in seq(files)){
piece <- read.xls(files[i], sheet=whichSheet)
#rbinding frames should work if the sheets are similar, use merge if not.
if(i == 1) complete <- piece else complete <- rbind(complete, piece)
}
complete
}
#Now looping through all the sheets using the previous method
animals <- lapply(seq(amountOfSheets), readXlsSheet, files=filenames)
#The output is a list where animals[[n]] has the all the nth sheets combined.
xlsx might only work on 32bit R, and has it's own fair share of issues.
library(xlsx)
filenames <- list.files(pattern="\\.xls$", full.names=TRUE)
amountOfSheets <- 2#imput something
#This snippet gets the first sheet out of all the files and combining them
readXlsSheet <- function(whichSheet, files){
for(i in seq(files)){
piece <- read.xlsx(files[i], sheetIndex=whichSheet)
#rbinding frames should work if the sheets are similar, use merge if not.
if(i == 1) complete <- piece else complete <- rbind(complete, piece)
}
complete
}
readXlsSheet(1, filenames)
#Now looping through all the sheets using the previous method
animals <- lapply(seq(amountOfSheets), readXlsSheet, files=filenames)