Find files with R console - r

there is the list.file function and a pattern in the function. However, how can i find things in a fold if it has the ending yto and within the name there is a 111 (it has to be before the dot)?

As #Roland writes, you can use the list.files function. Run ?list.files for the documentation hereof.
Say we wish to list all .txt files in foldes (and subfolders). Then something like
list.files(pattern = "\\.txt$", recursive = TRUE)
would do the trick. The recursive argument makes the function search subfolders as well. The pattern we're looking for the regular expression "\\.txt$" meaning that the filename should end with .txt. Consult ?regex for more information on regular expressions.
EDIT: If you search for files which ends in 111.tx you then need to modify the above to:
list.files(pattern = "111\\.tx$", recursive = TRUE)

Related

R: How to match a forward-slash in a regular expression?

How do I match on a forward slash / in a regular expression in R?
As demonstrated in the example below, I am trying to search for .csv files in a subdirectory and my attempts to use a literal / are failing. Looking for a modification to my regex in base R, not a function that does this for me.
Example subdirectory
# Create subdirectory in current working directory with two .csv files
# - remember to delete these later or they'll stay in your current working directory!
dir.create(path = "example")
write.csv(data.frame(x1 = letters), file = "example/example1.csv")
write.csv(data.frame(x2 = 1:20), file = "example/example2.csv")
Get relative paths of all .csv files in the example subdirectory
# This works for the example, but could mistakenly return paths to other files based on:
# (a) file name: foo/example1.csv
# (b) subdirectory name: example_wrong/foo.csv
list.files(pattern = "example.*csv", recursive = TRUE)
#> [1] "example/example1.csv" "example/example2.csv"
# This fixes issue (a) but doesn't fix issue (b)
list.files(pattern = "^example.*?\\.csv$", recursive = TRUE)
#> [1] "example/example1.csv" "example/example2.csv"
# Adding / to the end of `example` guarantees we get the correct subdirectory
# Doesn't work: / is special regex and not escaped
list.files(pattern = "^example/.*?\\.csv$", recursive = TRUE)
# Doesn't work: escapes / but throws error
list.files(pattern = "^example\/.*?\\.csv$", recursive = TRUE)
# Doesn't work: even with the \\ escaping in R!
list.files(pattern = "^example\\/.*?\\.csv$", recursive = TRUE)
Some of the solutions above work with regex tools but not in R. I've checked SO for solutions (most related below) but none seem to apply:
Escaping a forward slash in a regular expression
Regex string does not start or end (or both) with forward slash
Reading multiple csv files from a folder with R using regex
The pattern argument is only used for matching file (or directory) names, not the full path they are on (even when recursive and full.names are set to TRUE). That's why your last approach doesn't work even though it is the correct way to match / in a regular expression. You can get the correct file names by specifying path and setting full.names to TRUE.
list.files(path='example', pattern='\\.csv$', full.names=T)

A copying nightmare, choosing files to copy based on files in another folder

I have a bit of an issue with using copy.file.
I need to copy .tif files from a directory with several subdirectories (where the .tif files are) based on names of those in another file directory. I have the following code (which is almost working)
ValidatedDirectory <- "C:/Users/JS22/Desktop/R_Experiments/Raw_Folder_Testa/Validated"
RawDirectory <- "C:/Users/JS22/Desktop/R_Experiments/Raw_Folder_Testa/Raw"
OutputDirectory <- "C:/Users/JS22/Desktop/R_Experiments/Raw_Folder_Testa/Ouputfolder"
ValidatedImages <- list.files(ValidatedDirectory)
# this is to remove the extra bit that is added onto the validated images [working]
pattern <- gsub("_hc", "", ValidatedImages)
pattern <- paste(gsub("([.|()\\^{}+$*?]|\\[|\\])", "\\\\\\1", pattern), collapse="|")
# this bit tackles finding the relevant files based on the ValidatedImages
filesinRAW <- list.files(
path = RawDirectory,
recursive = TRUE,
include.dirs = FALSE,
full.names = FALSE)
filesinRAW <- as.list(filesinRAW)
# this removes subdirectory prefix in front of the file and .tif which confuses it
filesinRAW <- as.list(gsub("\\d\\d\\d\\d/", "", filesinRAW))
filesinRaw <- as.list(gsub(".tif", "", filesinRAW))
tocopy <- grep(filesinRAW, pattern = pattern, value = TRUE)
tocopy <- as.list(tocopy)
tocopy <- as.list(gsub(".tif", "", tocopy))
setwd(RawDirectory)
file.copy(from = tocopy, to = OutputDirectory, overwrite = TRUE)
I get the No such file or directory error, the files do exist (obviously), thus I must be doing something wrong with the naming.
I have been having a bash at it for a good while, if helpful I can upload the example data and share the link.
Thanks for any help community!
When debugging, try to break down your code to see if at each step your variables are still as you're expect them.
That said, I see several problems in your code right now:
grep works with pattern being a length-one regular expression. If you give it multiple regular expressions, it uses the first one (with a warning, which you don't see if you've disabled them).
To use multiple matches, you can use apply and sapply: filesinRAW[apply(sapply(pattern, grepl, x=filesinRAW), 2, any)]. But see the last point
grep by default uses pattern as a regular expression, which may break things if your pattern contains characters that are parsed. For example, grep('^test', '^test') gives zero results. To check if a string contains a literal string, you can use grep(..., fixed=TRUE)
In the last step, you use sub(".tif", "", to copy), which will remove any patterns like .tif. I suppose you meant to add .tif again at the end, right now you are trying to copy files without an extension, which won't be found. To add, you can use paste.
In several steps you use as.list. Why? In R, everything is vectorised, meaning multiple values are already used. The difference between a list and a vector is that lists can store different kinds of objects, but you're not doing that anyway. As far as I can see, the as.lists don't harm anything, because all the functions will as a first step convert your list back to a character-vector.
Finally, as far I can see you're first making a list of filenames that need to be copied (pattern), that you then compare to a full list of your files. And you try to make them match exactly. Then why use a regular expression? Regular expressions are useful if you just know a part of what your filenames look like, but is that your goal. E.g. if filename1._hc is in your ValidatedDirectory, do the files filename11.tif and filename12.tif need to be copied as well?
If you're just looking for exact matches, you can directly compare them:
tocopy <- tocopy[tocopy %in% pattern]
But generally, working in R is easy because you can do everything step-by-step, and if you just inspect tocopy, you can see whether your call makes sense.
After much help from #Emil Bode I have the following solution to the issue (perhaps not the most elegant, but it runs quick enough on 1000s of .tif files.
ValidatedDirectory <- "C:/Users/JS22/Desktop/R_Experiments/Raw_Folder_Testa/Validated"
RawDirectory <- "C:/Users/JS22/Desktop/R_Experiments/Raw_Folder_Testa/Raw"
OutputDirectory <- "C:/Users/JS22/Desktop/R_Experiments/Raw_Folder_Testa/Ouputfolder"
ValidatedImages <- list.files(ValidatedDirectory)
pattern <- gsub("_hc", "", ValidatedImages)
pattern <- paste(gsub("([.|()\\^{}+$*?]|\\[|\\])", "\\\\\\1", pattern), collapse="|")
filesinRAW <- list.files(
path = RawDirectory,
recursive = TRUE,
include.dirs = FALSE,
full.names = FALSE,
pattern = pattern)
setwd(RawDirectory)
file.copy(from = filesinRAW, to = OutputDirectory, overwrite = TRUE)

time pattern in list.files function (R)

I'm trying to get a list of subdirectories from a path. These subdirectories have a time pattern month\day\hour, i.e. 03\21\11.
I naively used the following:
list.files("path",pattern="[0-9]\[0-9]\[0-9]", recursive = TRUE, include.dirs = TRUE)
But it doesn't work.
How to code for the digitdigit\digitdigit\digitdigit pattern here?
Thank you
This Regex works for 10\11\18.
(\d\d\\\d\d\\\d\d)
I think you may need lazy matching for regex, unless there's always two digits - in which case other responses look valid.
If you could provide a vector of file name strings, that would be super helpful.
Capturing backslashes is confusing, I've found this thread helpful: R - gsub replacing backslashes
My guess is something like this: '[0-9]+?\\\\[0-9]+?\\\\[0-9]+'

R read files with for loop

I just want to use use 10 files in R. For each I want to calculate something.
Exp. file:
stat1_pwg1.out
stat23_pwg2.out
..
stat45_pwg10.out
I try this:
for (i in 1:10){
Data=paste("../XYZ/*_pwg",i,".out",sep="")
line=read.table(Data,head=T)
}
But it does not work? Any hinds?
I suspect your problem comes from the wildcard *. A better way to do this might be to first store the file names using dir, then find the ones you want.
files <- dir("../XYZ",pattern="stat[0-9]+_pwg[0-9]+\.out")
for(f in files) {
line=read.table(Data,head=T)
}
You could also use one of the apply family of functions to eliminate the for loop entirely.
A few things about your code.
paste is vectorised, so you can take it out of the loop.
paste("../XYZ/*_pwg", 1:10, ".out", sep = "")
(Though as you'll see in a moment, you don't actually need to use paste at all.)
read.table won't accept wildcards; it needs an exact match on the file name.
Rather than trying to construct a vector of the filenames, you might be better using dir to find the files that exist in your directory, filtered by a suitable naming scheme.
To filter the files, you use a regular expression in the pattern argument. You can convert from wildcards to regular expression using glob2rx.
file_names <- dir("../XYZ", pattern = glob2rx("stat*_pwg*.out"))
data_list <- lapply(filenames, read.table, header = TRUE)
For a slightly more specific fit, where the wildcard only matches numbers than anything, you need to use regular expressions directly.
file_names <- dir("../XYZ", pattern = "^stat[[:digit:]]+_pwg[[:digit:]]+\\.out$")
files <- dir(pattern="*Rip1_*")
files
for (F in files){ assign(F , Readfunc(F))}

using R to copy files

As part of a larger task performed in R run under windows, I would like to copy selected files between directories. Is it possible to give within R a command like cp patha/filea*.csv pathb (notice the wildcard, for extra spice)?
I don't think there is a direct way (shy of shelling-out), but something like the following usually works for me.
flist <- list.files("patha", "^filea.+[.]csv$", full.names = TRUE)
file.copy(flist, "pathb")
Notes:
I purposely decomposed in two steps, they can be combined.
See the regular expression: R uses true regex, and also separates the file pattern from the path, in two separate arguments.
note the ^ and $ (beg/end of string) in the regex -- this is a common gotcha, as these are implicit to wildcard-type patterns, but required with regexes (lest some file names which match the wildcard pattern but also start and/or end with additional text be selected as well).
In the Windows world, people will typically add the ignore.case = TRUE argument to list.files, in order to emulate the fact that directory searches are case insensitive with this OS.
R's glob2rx() function provides a convenient way to convert wildcard patterns to regular expressions. For example fpattern = glob2rx('filea*.csv') returns a different but equivalent regex.
You can
use system() to fire off a command as if it was on shell, incl globbing
use list.files() aka dir() to do the globbing / reg.exp matching yourself and the copy the files individually
use file.copy on individual files as shown in mjv's answer

Resources