R - iterate over list of files - r

I have a list of files with some data that I want to read in to R and then iterate over each file for some calculations.
So far I was able to read the files with the following code:
METHOD1
filenames<-list.files(pattern="*.txt")
mynames<-gsub(".txt$", "", filenames)
for (i in 1:length(mynames)) assign(mynames[i], read.table(filenames[i]))
However when I try to apply some function to "names" it just says NULL
lapply(mynames,nrow)
I know that it could be easier to read the files directly into a list
METHOD2
temp<-list.files(pattern="*.txt")
myfiles<-lapply(temp, read.table,skip="#")
and then do lapply to that list lapply(myfiles,nrow), but this just looses the information about which file produced each list.
Is there any way to circumvent this with either method in order to keep tracking the relation list-file?
Thanks

For the first method, try
sapply(mynames,function(nameoffile){nrow(get(nameoffile))})

For method 2 you could easily use seomething like
temp <- list.files(pattern = "*.txt")
myfiles <- lapply(temp, read.table, skip = "#")
names(myfiles) <- temp
In this way the names attribute stores the filenames and you do not clutter your working environment with new variables.
So when you want to iterate over the content you can use lapply(myfiles, function(.) nrow(.)) or if you need to iterate over both the filename and the content you could so something like lapply(names(myfiles), function(.) nrow(myfiles[[.]]))

Related

Import directory of docx files

I have a directory with .docx files that I want to import via textreadr's read_docx function.
First I set the working directory and create list of files:
setwd("C:/R")
files <- list.files("C:/R", pattern = "\\.docx")
Now I want to iterate through the list and import every file individually, named data_"file":
for (file in files) {
assign("data_", file, sep = "") <- read_docx("file")
}
Optionally, I tried creating a list of lists:
data_list <- lapply(files, function(v){
read_docx("v")
})
Both variants don't work and I'm not sure what I do wrong.
Maybe the full path is not present, we can add
files <- list.files("C:/R", pattern = "\\.docx", full.names = TRUE)
The issue is that v or file is quoted i.e. "" i.e. it is trying to read a string "v" instead of the value. Thus, the code in the OP's post can be corrected to
data_list <- lapply(files, function(v){
read_docx(v)
})
or in the for loop
for (file in files) {
assign(paste0("data_", file, sep = ""), read_docx(file))
}
Also, as noted in the comments, if there are 1000 files, assign creates 1000 new objects which is a bit messy when we want to gather all of them again. Instead, as in the lapply, which creates a single list, the output from for loop can be store in a list
data_list2 <- vector('list', length(files))
names(data_list2) <- files
for(file in files) {
data_list2[[file]] <- read_docx(file)
}
First off, you need to grab the full path instead of just the filenames from list.files:
files <- list.files("C:/R", pattern = "\\.docx$", full.names = TRUE)
Then the lapply solution works if you pass the parameter v to read_docx instead of a literal string "v". You don’t even need the nested function:
data_list <- lapply(files, read_docx)
As an aside, there’s no need for setwd in your code, and its use is strongly discouraged.
Furthermore, using the assign function as in your code doesn’t work and even after fixing the syntax, this use is simply completely inappropriate: at best it is a hack that approximates the functionality of lists, but badly. The correct solution, 10 times out of 10, is to use a named list or vector in its place.

R: Generate dynamic names for dataframes

I need to read several csv files from a directory and save each data in separate dataframe.
The filenames are in a character vector:
lcl_forecast_data_files <- dir(lcl_forecast_data_path, pattern=glob2rx("*.csv"), full.names=TRUE)
For example: "fruc2021.csv", "gem2020.csv", "strb2021.csv".
So far I am reading the files step by step:
fruc2021 <- read_csv2("fruc2021.csv")
gem2020 <- read_csv2("gem2020.csv")
strb2010 <- read_csv2("strb2021.csv")
But there are many more files in the directory and subdirectories. To read them all one by one is very tedious.
Now I have already experimented a little with the map function, but I have not yet figured out how to automatically generate the names of the dataframes from the file names.
A first simple try was:
lcl_forecast_data <- lcl_forecast_data_files %>%
map(
function(x) {
str_replace(basename(x), ".csv","") <- read_csv2(x)
}
)
But this did not work :-(
Is it even possible to generate names for dataframes like this?
Or are there other, simpler possibilities?
Greetings
Benne
Translated with www.DeepL.com/Translator (free version)
If you do not want to use a list and lapply as #Onyambu suggested you can use assign() to generate the dataframes.
filenames <- c("fruc2021.csv", "gem2020.csv", "strb2021.csv")
for (i in filenames) {
assign(paste('',gsub(".csv","",i),sep=''),read.csv(i))
}

How to get named list when reading multiple csv files from given folder?

Assume I have several csv files in given folder, When I read them, I could have unnamed list where each list element has default numeric index. How can I read them as named list instead ? Here is my code what I've done:
Regarding the reproducible data, it is allowed to use public dataset
file <- list.files(folder, full.names = TRUE, "\\.csv$")
f.read <- lapply(1:length(file), function(ele_) {
res <- as(read.csv(file[ele_]), "data.frame")
res
})
I am expecting to have named list instead of default unnamed list, how can I get easily my expected output? Any idea ?
my desired output would be :
after I read cvs files from given folder, each list element must have specific name instead of default numeric index.
How can I get my expected output easily? Does anyone know possible way getting this output? Thank you
We can use setNames with the extracted file names (with basename and file_path_sans_ext)
setNames(lapply(file, read.csv), tools::file_path_sans_ext(basename(file)))
Alternatively, you can pass a named vector into lapply:
f.read <- lapply(setNames(file, tools::file_path_sans_ext(basename(file))), read.csv)

How to create an object by adding a variable to a fixed value?

I am trying to write a program to open a large amount of files and run them through a function I made called "sort". Every one of my file names starts with "sa1", however after that the characters vary based on the file. I was hoping to do something along the lines of this:
for(x in c("Put","Characters","which","Vary","by","File","here")){
sa1+x <- read.csv("filepath/sa1+x",header= FALSE)
sa1+x=sort(sa1+x)
return(sa1+x)
}
In this case, say that x was 88. It would open the file sa188, name that dataframe sa188, and then run it through the function sort. I dont think that writing sa1+x is the correct way to bind together two values, but I dont know a way to.
You need to use a list to contain the data in each csv file, and loop over the filenames using paste0.
file_suffixes <- c("put","characters","which","vary","by","file","here")
numfiles <- length(file_suffixes)
list_data <- list()
sorted_data <- list()
filename <- "filepath/sa1"
for (x in 1:numfiles) {
list_data[[x]] <- read.csv(paste0(filename, file_suffixes[x]), header=FALSE)
sorted_data[[x]] <- sort(list_data[[x]])
}
I am not sure why you use return in that loop. If you're writing a function, you should be returning the sorted_data list which contains all your post-sorting data.
Note: you shouldn't call your function sort because there is already a base R function called sort.
Additional note: you can use dir() and regex parsing to find all the files which start with "sa1" and loop over all of them, thus freeing you from having to specify the file_suffixes.

Read, process and export analysis results from multiple .csv files in R

I have a bunch of CSV files and I would like to perform the same analysis (in R) on the data within each file. Firstly, I assume each file must be read into R (as opposed to running a function on the CSV and providing output, like a sed script).
What is the best way to input numerous CSV files to R, in order to perform the analysis and then output separate results for each input?
Thanks (btw I'm a complete R newbie)
You could go for Sean's option, but it's going to lead to several problems:
You'll end up with a lot of unrelated objects in the environment, with the same name as the file they belong to. This is a problem because...
For loops can be pretty slow, and because you've got this big pile of unrelated objects, you're going to have to rely on for loops over the filenames for each subsequent piece of analysis - otherwise, how the heck are you going to remember what the objects are named so that you can call them?
Calling objects by pasting their names in as strings - which you'll have to do, because, again, your only record of what the object is called is in this list of strings - is a real pain. Have you ever tried to call an object when you can't write its name in the code? I have, and it's horrifying.
A better way of doing it might be with lapply().
# List files
filelist <- list.files(pattern = "*.csv")
# Now we use lapply to perform a set of operations
# on each entry in the list of filenames.
to_dispose_of <- lapply(filelist, function(x) {
# Read in the file specified by 'x' - an entry in filelist
data.df <- read.csv(x, skip = 1, header = TRUE)
# Store the filename, minus .csv. This will be important later.
filename <- substr(x = x, start = 1, stop = (nchar(x)-4))
# Your analysis work goes here. You only have to write it out once
# to perform it on each individual file.
...
# Eventually you'll end up with a data frame or a vector of analysis
# to write out. Great! Since you've kept the value of x around,
# you can do that trivially
write.table(x = data_to_output,
file = paste0(filename, "_analysis.csv"),
sep = ",")
})
And done.
You can try the following codes by putting all csv files in the same directory.
names = list.files(pattern="*.csv") %csv file names
for(i in 1:length(names)){ assign(names[i],read.csv(names[i],skip=1, header=TRUE))}
Hope this helps !

Resources