dir_ls - how to make "glob" case insensitive - r

library(fs)
dir_ls(glob = "*.R) lists all the .R files in the directory
However, dir_ls(glob = "*.r") does not return anything
How do I make glob become case insensitive?

You can use the ... argument in fs::dir_ls() to pass ignore.case to grep():
dir_ls(glob = "*.r", ignore.case = TRUE)

Use glob2rx to convert the glob to a regular expression and then use list.files with the ignore.case=TRUE argument:
list.files(pattern = glob2rx("*.R"), ignore.case = TRUE)

Related

Using a function within list.files function in r

I want to create a program where I select files with a user defined prefix in list.files()
My folder will have files beginning with various characters. I want to define a variable or function at the beginning of the program which I can use in list.files in the program
List of file
MP201901 MP201902 MP201903 SG201901 SG201902 SG201903 XY201901 XY202001 XY202002
If I use
inpfiles1 <- list.files(path =Input, pattern = "*SG.*.csv", full.names = TRUE)
it gives correct output but I want to store the prefix somewhere so we can just change the prefix
Currently using code
A<-"SG"
inpfiles2 <- list.files(path =Input, pattern = "*A*.*.csv", full.names = TRUE)
but this is giving empty result
With your current code, R doesn't know that A is a variable name, and so it's ignoring your variable and literally using the letter A.
You can use paste0 instead:
A <- "SG"
pattern <- paste0(A, '.*.csv')
You have to concatenate the user-inputted pattern in A with your own suffix. I.e.
A <- "SG"
pattern <- paste0(A, ".*.csv")
inpfiles2 <- list.files(path=Input, pattern=pattern, full.names=TRUE)

loop over variable that is regular expression in r

I am searching for files in a folder by using a regular expression, such as:
for(i in c('exp\\d_baseline', 'exp\\d_treatment', 'control\\d_baseline', 'control\\d_treatment'){
file.list <- dir(file_path, pattern = i, full.names = T)
# ...irrelevant manipulation on these files here (concatenation)
output_name <- paste0(i, '_concat.csv')
}
This gives me filenames with the \ from the pattern detection. How can I remove the \'s when specifying my output filenames?
One option is sub with fixed = TRUE before applying the paste
output_name <- paste0(sub("\\d", "", i, fixed = TRUE), "_concat.csv")

Set number of arguments programmatically

I have the following string:
test <- "C:\\Users\\stefanj\\Documents\\Automation_Desk\\script.R"
I am separating the string on the backslash characters with the following code:
pdf_path_long <- unlist(strsplit(test, "\\\\",
fixed = FALSE, perl = FALSE, useBytes = FALSE))
What I want to do is:
pdf_path_short <- file.path(pdf_path_long[1], pdf_path_long[2], ...)
Problem is:
I know how to count the elements in the pdf_path_short - length(pdf_path_long), but I don't know how to set them in the file.path as the number of elements will very based on the length of the path.
You can directly (no need for a strsplit call) use gsub on test to change the separators (with fixed=TRUE so you don't need to escape the double backslash), you will get same output as with file.path:
pdf_path_short <- gsub("\\", "/", test, fixed=TRUE)
pdf_path_short
# "C:/Users/stefanj/Documents/Automation_Desk/script.R"
Of course, you can change the replacement part with whatever separator you need.
Note: you can also check normalizePath function:
normalizePath(test, "/", mustWork=FALSE)
#[1] "C:/Users/stefanj/Documents/Automation_Desk/script.R"

Include pattern in list.dirs

surely a very newbish question, but how do I include a pattern inside a list.dirs function?
For example, list.files function
Imagery=list.files(full.names=TRUE, recursive=TRUE, pattern= "*20m*.tif$")
returns all the files that have 20m in their name and have .tif as extension.
But when i try to apply this logic to list.dirs
directories=list.dirs(full.names = TRUE, recursive=TRUE, pattern="R10m" )
i get this error:
Error in list.dirs(full.names = TRUE, recursive = TRUE, pattern = "R10m") :
unused argument (pattern = "R10m")
Hope I am not missing something obvious here.
My goal is to get the full path of all directories that have a folder named "R10m". I have a lot of folder that have many subdirectories, and most of them have similar structure. I would like to list only those that have this folder, and within them list all files that are tifs. I know I can get the files I need with only list.files options, but I need the directory path and file names later as variables.
Thank you beforehand for your time,
Best regards,
Davor
Three alternatives:
dirs <- list.dirs()
dirs <- dirs[ grepl(your_pattern, dirs) ]
or
dirs <- list.dirs()
dirs <- grep(your_pattern, dirs, value = TRUE)
or
files <- list.files(pattern = your_pattern, recursive = TRUE, include.dirs = TRUE)
dirs <- files[ file.info(files)$isdir ]
dir, unlike list.dirs provides that functionality:
dir(path = ".", pattern = NULL, all.files = FALSE,
full.names = FALSE, recursive = FALSE,
ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)
In your example:
directories <- dirs(full.names = TRUE, recursive=TRUE, pattern="R10m")
Yes, I also find it strange that there are 2 base functions to list directories, one of which, despite the name similarity with list.files doesn't provide the same like for like functionality. If someone knows the reason for this I would be very interested in knowing.
Update
After Gregor's comment, I decided to create a reproducible example to test my solution:
test_dirs <- c(
paste0(c(1:3), "R10m", rep("a", 3)),
paste0(c(1:3), "R200m", rep("a", 3))
)
for (test_dir in test_dirs){
dir.create(test_dir)
}
list.dirs()
[1] "." "./1R10ma" "./1R200ma" [4]
"./2R10ma" "./2R200ma" "./3R10ma" [7]
"./3R200ma" "./solo_kit-figure"
dir()
[1] "1R10ma" "1R200ma" "2R10ma" "2R200ma"
[5] "3R10ma" "3R200ma" "a1.bed" "a2.bed"
[9] "a.bed" "solo_kit-figure" "solo_kit.md"
dir(pattern = "R10m")
# dir(pattern = "*R10m")
# also works
"1R10ma" "2R10ma" "3R10ma"
dir also lists files, so if the pattern fits both files and directories it might be a problem, but I guess that for most application it will work fine.

Get files number in a dir in R?

in shell ,to make a dir:
mkdir /home/test
then ,to create a file named ".test" in the "/home/test"
a=list.files(path = "/home/test",include.dirs = FALSE)
a
character(0)
a=list.files(path = "/home/test",include.dirs = TRUE)
a
character(0)
a=list.files(path = "/home/test/",include.dirs = TRUE)
a
character(0)
list.files(path = '/home/test', all.files=TRUE,inclued.dirs=FALSE)
[1] "." ".." ".test"
a=list.files(path = '/home/test', all.files=TRUE)
length(a)
[1] 3
how can i get length(a) = 1 using regular expression parameters pattern= in list.files to prune . and ..
Use all.files=TRUE to show all file names including hidden files.
list.files(path = '/home/test', all.files=TRUE)
To answer your edit, one way would be to use a negative number with tail
tail(list.files(path = '/home/test', all.files=TRUE), -2)
Using only the pattern argument:
list.files(path='/home/test', all.files=TRUE, pattern="^[^\\.]|\\.[^\\.]")
The pattern says "anything that starts with something other than a dot or anything that starts with a dot followed by anything other than a dot."
Although it breaks your requirement to use the pattern argument of list.files, I would actually probably wrap grep around list.statements in this case.
grep("^\\.*\\.$", list.files(path='/home/test', all.files=TRUE),
invert=TRUE, value=TRUE)
The above will find any file names that only contain dots, then return everything else. invert=TRUE means "find the names that do not match", and value=TRUE means "return the names instead of their location."

Resources