surely a very newbish question, but how do I include a pattern inside a list.dirs function?
For example, list.files function
Imagery=list.files(full.names=TRUE, recursive=TRUE, pattern= "*20m*.tif$")
returns all the files that have 20m in their name and have .tif as extension.
But when i try to apply this logic to list.dirs
directories=list.dirs(full.names = TRUE, recursive=TRUE, pattern="R10m" )
i get this error:
Error in list.dirs(full.names = TRUE, recursive = TRUE, pattern = "R10m") :
unused argument (pattern = "R10m")
Hope I am not missing something obvious here.
My goal is to get the full path of all directories that have a folder named "R10m". I have a lot of folder that have many subdirectories, and most of them have similar structure. I would like to list only those that have this folder, and within them list all files that are tifs. I know I can get the files I need with only list.files options, but I need the directory path and file names later as variables.
Thank you beforehand for your time,
Best regards,
Davor
Three alternatives:
dirs <- list.dirs()
dirs <- dirs[ grepl(your_pattern, dirs) ]
or
dirs <- list.dirs()
dirs <- grep(your_pattern, dirs, value = TRUE)
or
files <- list.files(pattern = your_pattern, recursive = TRUE, include.dirs = TRUE)
dirs <- files[ file.info(files)$isdir ]
dir, unlike list.dirs provides that functionality:
dir(path = ".", pattern = NULL, all.files = FALSE,
full.names = FALSE, recursive = FALSE,
ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)
In your example:
directories <- dirs(full.names = TRUE, recursive=TRUE, pattern="R10m")
Yes, I also find it strange that there are 2 base functions to list directories, one of which, despite the name similarity with list.files doesn't provide the same like for like functionality. If someone knows the reason for this I would be very interested in knowing.
Update
After Gregor's comment, I decided to create a reproducible example to test my solution:
test_dirs <- c(
paste0(c(1:3), "R10m", rep("a", 3)),
paste0(c(1:3), "R200m", rep("a", 3))
)
for (test_dir in test_dirs){
dir.create(test_dir)
}
list.dirs()
[1] "." "./1R10ma" "./1R200ma" [4]
"./2R10ma" "./2R200ma" "./3R10ma" [7]
"./3R200ma" "./solo_kit-figure"
dir()
[1] "1R10ma" "1R200ma" "2R10ma" "2R200ma"
[5] "3R10ma" "3R200ma" "a1.bed" "a2.bed"
[9] "a.bed" "solo_kit-figure" "solo_kit.md"
dir(pattern = "R10m")
# dir(pattern = "*R10m")
# also works
"1R10ma" "2R10ma" "3R10ma"
dir also lists files, so if the pattern fits both files and directories it might be a problem, but I guess that for most application it will work fine.
Related
I have a folder with over 100 sub-folders that each contain a specific csv "cats.csv" that I need to read into R.
so far I've got:
parent_folder <- "path of parent files"
sub_folders <- list.dirs(parent_folder, recursive = TRUE)[-1]
cat_files <- dir(sub_folders, recursive = TRUE, full.names = TRUE, pattern = "cats")
I've then tried variations of lapply and map to apply read.csv to load in all of the cat_files but it doesn't seem to work.
filelist <- list.files(pattern = "cats.csv", recursive = TRUE, full.names = TRUE)
then
lapply(setNames(nm=filelist), read.csv)
edit with thanks to r2evans below
We get the paths using Sys.glob (check that to be sure it is what you want) and then use Map to get a named list, DFs, of data.frames with the files' contents.
paths <- parent_folder |>
file.path("*", "cats.csv") |>
Sys.glob()
DFs <- Map(read.csv, paths)
Consider one file 'C:/ZFILE' that includes many zip files.
Now, consider that each of these zip includes many csv, among which one specific csv named 'NAME.CSV', all these scattered 'NAME.CSV' being similarly named and structured (i.e., same columns).
How to rbind all these scattered csv?
The script below allows that, but a function would be more appropriate.
How to do this?
Thanks
zfile <- "C:/ZFILE"
zlist <- list.files(path = zfile, pattern = "\\.zip$", recursive = FALSE, full.names = TRUE)
zlist # list all zip from the zfile file
zunzip <- lapply(zlist, unzip, exdir = zfile) # unzip all zip in the zfile file (may takes time depending on the number of zip)
library(data.table) # rbindlist & fread
csv_name <- "NAME.CSV"
csv_list <- list.files(path = zfile, pattern = paste0("\\", csv_name, "$"), recursive = TRUE, ignore.case = FALSE, full.names = TRUE)
csv_list # list all 'NAME.CSV' from the zfile file
csv_rbind <- rbindlist(sapply(csv_list, fread, simplify = FALSE), idcol = 'filename')
You can try this type of function ( you can pass the unzip call directly to the cmd param of data.table::fread())
get_zipped_csv <- function(path) {
fnames = list.files(path,full.names = T)
rbindlist(lapply(fnames, \(f) fread(cmd = paste0("unzip -p ",f))[,src:=f]))
}
Usage:
get_zipped_csv(path = "C:\ZFILE\")
The structure of my directory is as follows:
Extant_Data -> Data -> Raw
-> course_enrollment
-> frpm
I have a few different function to to read in some text files and excel files respectively.
read_fun = function(path){
test = read.delim(path, sep="\t", header=TRUE, fill = TRUE, colClasses = c(rep("character",23)))
test
}
read_fun_frpm= function(path){
test = read_excel(path, sheet = 2, col_names = frpm_names)
}
I feed this into map_dfr so that the function reads in each of the files and rowbinds them.
allfiles = list.files(path = "Extant_Data/Data/Raw/course_enrollment",
pattern = "CourseEnrollment.txt",
full.names=FALSE,
recursive = T)
# Rowbind all the course enrollment data
# !!! BUT I HAVE set the working directory to a subdirectory so that it finds those files
setwd("/Extant_Data/Data/Raw/course_enrollment")
course_combined <- map_dfr(allfiles,read_fun)
allfiles = list.files(path = "Extant_Data/Data/Raw/frpm/post12",
pattern = "frpm*",
full.names=FALSE,
recursive = T)
# Rowbind all the course enrollment data
# !!!I have to change the directory AGAIN
setwd(""Extant_Data/Data/Raw/frpm/post12")
frpm_combined <- map_dfr(allfiles,read_fun_frpm)
As mentioned in the comments, I have to keep changing the working directory so that map_dfr can locate the files. I don't think this is best practice, how might I work around this so I don't have to keep changing the directory? Any suggestions appreciated. Sorry it's hard to provide a re-producible example.
Note: This throws an error.
frpm_combined <- map_dfr(allfiles,read_fun_frpm('Extant_Data/Data/Raw/frpm/post12'))
Is there an r function (or script) that can be used to get the size of several folders (sub directories) within a directory? Thanks
Try this:
sum(file.info(list.files(".", all.files = TRUE, recursive = TRUE))$size)
Alternatively, if you'd like to step through the code:
#get working directory
getwd()
#list all files in the directory
list.files(".", all.files = TRUE, recursive = TRUE)
#list the size of all individual files
file.info(list.files(".", all.files = TRUE, recursive = TRUE))$size
#sum all file sizes
sum(file.info(list.files(".", all.files = TRUE, recursive = TRUE))$size)
Using Tim's answer I created a simple function to calculate the size of multiple directories. NA values are returned for files.
dir_size <- function(x){
is_dir <- dir.exists(x)
folder_size <- rep_len(NA_real_, length(x))
folder_size[is_dir] <- vapply(x[is_dir],
function(x) sum(
unname(
file.size(
list.files(
x, full.names = TRUE,
recursive = TRUE, all.files = TRUE
)
)
)
),
FUN.VALUE = numeric(1))
folder_size
}
my_files <- list.files() # Files in working directory
dir_sizes <- dir_size(my_files) # Directory sizes
data.frame(my_files, dir_sizes) # Add to data frame
I know about the source() in R.
It takes a path and/or filename to load i.e. a function which is saved in another .R file. What I need is basically the same command, but it is supposed to load every .R file from one folder and its subfolders.
Is there a oneliner (some library) or would I have to write a loop 'n everything?
This might work
lapply(list.files(pattern = "[.]R$", recursive = TRUE), source)
In R help of the library, you can find the following:
## If you want to source() a bunch of files, something like
## the following may be useful:
sourceDir <- function(path, trace = TRUE, ...) {
for (nm in list.files(path, pattern = "[.][RrSsQq]$")) {
if(trace) cat(nm,":")
source(file.path(path, nm), ...)
if(trace) cat("\n")
}
}
You could use a simple recursive function like
sourceRecursive <- function(path = ".") {
dirs <- list.dirs(path, recursive = FALSE)
files <- list.files(path, pattern = "^.*[Rr]$", include.dirs = FALSE, full.names = TRUE)
for (f in files)
source(f)
for (d in dirs)
sourceRecursive(d)
}