Load all files from folder and subfolders - r

I know about the source() in R.
It takes a path and/or filename to load i.e. a function which is saved in another .R file. What I need is basically the same command, but it is supposed to load every .R file from one folder and its subfolders.
Is there a oneliner (some library) or would I have to write a loop 'n everything?

This might work
lapply(list.files(pattern = "[.]R$", recursive = TRUE), source)

In R help of the library, you can find the following:
## If you want to source() a bunch of files, something like
## the following may be useful:
sourceDir <- function(path, trace = TRUE, ...) {
for (nm in list.files(path, pattern = "[.][RrSsQq]$")) {
if(trace) cat(nm,":")
source(file.path(path, nm), ...)
if(trace) cat("\n")
}
}

You could use a simple recursive function like
sourceRecursive <- function(path = ".") {
dirs <- list.dirs(path, recursive = FALSE)
files <- list.files(path, pattern = "^.*[Rr]$", include.dirs = FALSE, full.names = TRUE)
for (f in files)
source(f)
for (d in dirs)
sourceRecursive(d)
}

Related

Iterate over multiple subdirectories to read.csv of a specific file

I have a folder with over 100 sub-folders that each contain a specific csv "cats.csv" that I need to read into R.
so far I've got:
parent_folder <- "path of parent files"
sub_folders <- list.dirs(parent_folder, recursive = TRUE)[-1]
cat_files <- dir(sub_folders, recursive = TRUE, full.names = TRUE, pattern = "cats")
I've then tried variations of lapply and map to apply read.csv to load in all of the cat_files but it doesn't seem to work.
filelist <- list.files(pattern = "cats.csv", recursive = TRUE, full.names = TRUE)
then
lapply(setNames(nm=filelist), read.csv)
edit with thanks to r2evans below
We get the paths using Sys.glob (check that to be sure it is what you want) and then use Map to get a named list, DFs, of data.frames with the files' contents.
paths <- parent_folder |>
file.path("*", "cats.csv") |>
Sys.glob()
DFs <- Map(read.csv, paths)

For a list of folders remove all the subfolders that match list of pattern with R

I'm looking for a way to remove all subfolders matching a list of patterns (c('*_003','*_007','*_011','*_012')) contained in several folders.
I've come up with this function but unfortunately nothing happens after I run it :/
Any hints would be greatly appreciated ! :)
myfunction <- function(folders_list) {
for(dir in Sys.glob(folders_list)) {
files.to.keep <- c('*_003','*_007','*_011','*_012')
files.in.dir <- list.files(pattern = paste0(files.to.keep), recursive = TRUE)
files.to.remove <- list(files.in.dir[!(files.in.dir %in% grep(paste(files.to.keep,collapse = "|"), files.in.dir, value=TRUE))])
do.call(unlink, files.to.remove)
}}
myfunction('C:/Users/Users/myfolder/project/report_201*/')
You had a few issues here but the main one was this line:
files.in.dir <- list.files(pattern = paste0(files.to.keep), recursive = TRUE)
You were not passing the directory passed as an argument to the function to list.files() so it was returning the files in the working directory.
You could try this:
myfunction <- function(folders_list) {
for(each_dir in Sys.glob(folders_list)) {
files_to_keep_pattern <- paste(c('*_003','*_007','*_011','*_012'), collapse = "|")
files_in_dir <- dir(each_dir, recursive = TRUE, full.names = TRUE)
files_to_delete <- files_in_dir[!grepl(files_to_keep_pattern, files_in_dir)]
message("Deleting ", length(files_to_delete), " files")
print(files_to_delete)
#do.call(unlink, files_to_delete)
}}
Having said that - I would be extremely careful selecting which files to delete with a glob pattern. I have in fact commented out the line which does the deletion and just made the function print the files it is going to delete. Uncomment the last line if you are really sure you want to do this...

How to rbind similar csv files that are scattered in many different zip files, using a function?

Consider one file 'C:/ZFILE' that includes many zip files.
Now, consider that each of these zip includes many csv, among which one specific csv named 'NAME.CSV', all these scattered 'NAME.CSV' being similarly named and structured (i.e., same columns).
How to rbind all these scattered csv?
The script below allows that, but a function would be more appropriate.
How to do this?
Thanks
zfile <- "C:/ZFILE"
zlist <- list.files(path = zfile, pattern = "\\.zip$", recursive = FALSE, full.names = TRUE)
zlist # list all zip from the zfile file
zunzip <- lapply(zlist, unzip, exdir = zfile) # unzip all zip in the zfile file (may takes time depending on the number of zip)
library(data.table) # rbindlist & fread
csv_name <- "NAME.CSV"
csv_list <- list.files(path = zfile, pattern = paste0("\\", csv_name, "$"), recursive = TRUE, ignore.case = FALSE, full.names = TRUE)
csv_list # list all 'NAME.CSV' from the zfile file
csv_rbind <- rbindlist(sapply(csv_list, fread, simplify = FALSE), idcol = 'filename')
You can try this type of function ( you can pass the unzip call directly to the cmd param of data.table::fread())
get_zipped_csv <- function(path) {
fnames = list.files(path,full.names = T)
rbindlist(lapply(fnames, \(f) fread(cmd = paste0("unzip -p ",f))[,src:=f]))
}
Usage:
get_zipped_csv(path = "C:\ZFILE\")

Specifying pathname in map_dfr

The structure of my directory is as follows:
Extant_Data -> Data -> Raw
-> course_enrollment
-> frpm
I have a few different function to to read in some text files and excel files respectively.
read_fun = function(path){
test = read.delim(path, sep="\t", header=TRUE, fill = TRUE, colClasses = c(rep("character",23)))
test
}
read_fun_frpm= function(path){
test = read_excel(path, sheet = 2, col_names = frpm_names)
}
I feed this into map_dfr so that the function reads in each of the files and rowbinds them.
allfiles = list.files(path = "Extant_Data/Data/Raw/course_enrollment",
pattern = "CourseEnrollment.txt",
full.names=FALSE,
recursive = T)
# Rowbind all the course enrollment data
# !!! BUT I HAVE set the working directory to a subdirectory so that it finds those files
setwd("/Extant_Data/Data/Raw/course_enrollment")
course_combined <- map_dfr(allfiles,read_fun)
allfiles = list.files(path = "Extant_Data/Data/Raw/frpm/post12",
pattern = "frpm*",
full.names=FALSE,
recursive = T)
# Rowbind all the course enrollment data
# !!!I have to change the directory AGAIN
setwd(""Extant_Data/Data/Raw/frpm/post12")
frpm_combined <- map_dfr(allfiles,read_fun_frpm)
As mentioned in the comments, I have to keep changing the working directory so that map_dfr can locate the files. I don't think this is best practice, how might I work around this so I don't have to keep changing the directory? Any suggestions appreciated. Sorry it's hard to provide a re-producible example.
Note: This throws an error.
frpm_combined <- map_dfr(allfiles,read_fun_frpm('Extant_Data/Data/Raw/frpm/post12'))

Include pattern in list.dirs

surely a very newbish question, but how do I include a pattern inside a list.dirs function?
For example, list.files function
Imagery=list.files(full.names=TRUE, recursive=TRUE, pattern= "*20m*.tif$")
returns all the files that have 20m in their name and have .tif as extension.
But when i try to apply this logic to list.dirs
directories=list.dirs(full.names = TRUE, recursive=TRUE, pattern="R10m" )
i get this error:
Error in list.dirs(full.names = TRUE, recursive = TRUE, pattern = "R10m") :
unused argument (pattern = "R10m")
Hope I am not missing something obvious here.
My goal is to get the full path of all directories that have a folder named "R10m". I have a lot of folder that have many subdirectories, and most of them have similar structure. I would like to list only those that have this folder, and within them list all files that are tifs. I know I can get the files I need with only list.files options, but I need the directory path and file names later as variables.
Thank you beforehand for your time,
Best regards,
Davor
Three alternatives:
dirs <- list.dirs()
dirs <- dirs[ grepl(your_pattern, dirs) ]
or
dirs <- list.dirs()
dirs <- grep(your_pattern, dirs, value = TRUE)
or
files <- list.files(pattern = your_pattern, recursive = TRUE, include.dirs = TRUE)
dirs <- files[ file.info(files)$isdir ]
dir, unlike list.dirs provides that functionality:
dir(path = ".", pattern = NULL, all.files = FALSE,
full.names = FALSE, recursive = FALSE,
ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)
In your example:
directories <- dirs(full.names = TRUE, recursive=TRUE, pattern="R10m")
Yes, I also find it strange that there are 2 base functions to list directories, one of which, despite the name similarity with list.files doesn't provide the same like for like functionality. If someone knows the reason for this I would be very interested in knowing.
Update
After Gregor's comment, I decided to create a reproducible example to test my solution:
test_dirs <- c(
paste0(c(1:3), "R10m", rep("a", 3)),
paste0(c(1:3), "R200m", rep("a", 3))
)
for (test_dir in test_dirs){
dir.create(test_dir)
}
list.dirs()
[1] "." "./1R10ma" "./1R200ma" [4]
"./2R10ma" "./2R200ma" "./3R10ma" [7]
"./3R200ma" "./solo_kit-figure"
dir()
[1] "1R10ma" "1R200ma" "2R10ma" "2R200ma"
[5] "3R10ma" "3R200ma" "a1.bed" "a2.bed"
[9] "a.bed" "solo_kit-figure" "solo_kit.md"
dir(pattern = "R10m")
# dir(pattern = "*R10m")
# also works
"1R10ma" "2R10ma" "3R10ma"
dir also lists files, so if the pattern fits both files and directories it might be a problem, but I guess that for most application it will work fine.

Resources